The C memory management API is a bit like Jasper Beardly, a minor character on the animated television program The Simpsons; in a few episodes, he appears as a strict substitute teacher who imposes corporal punishment — a “paddlin”’ — for all infractions.
Here are some of things a program can do that deserve a paddling:
- If you access (read or write) any chunk that has not been allocated, that’s a paddling.
- If you free an allocated chunk and then access it, that’s a paddling.
- If you try to free a chunk that has not been allocated, that’s a paddling.
- If you free the same chunk more than once, that’s a paddling.
- If you call
reallocwith a chunk that was not allocated, or was allocated and then freed, that’s a paddling.
It might not sound difficult to follow these rules, but in a large program a chunk of memory might be allocated in one part of the program, used in several other parts, and freed in yet another part. So changes in one part of the program can require changes in many other parts.
Also, there might be many aliases, or references to the same allocated chunk, in different parts of the program. The chunk should not be freed until all references to the chunk are no longer in use. Getting this right often requires careful analysis across all parts of the program, which is difficult and contrary to fundamental principles of good software engineering.
Ideally, every function that allocates memory should include, as part of the documented interface, information about how that memory is supposed to be freed. Mature libraries often do this well, but in the real world, software engineering practice often falls short of this ideal.
To make matters worse, memory errors can be difficult to find because the symptoms are unpredictable. For example:
- If you read a value from an unallocated chunk, the system might detect the error, trigger a runtime error called a “segmentation fault”, and stop the program. Or, the program might read unallocated memory without detecting the error; in that case, the value it gets is whatever happened to be stored at the accessed location, which is unpredictable, and might be different each time the program runs.
- If you write a value to an unallocated chunk, and don’t get a segmentation fault, things are even worse. After you write a value to an invalid location, a long time might pass before it is read and causes problems. At that point it will be very difficult to find the source of the problem.
And things can be even worse than that! One of the most common problems with C-style memory management is that the data structures used to implement
free (which we will see soon) are often stored along with the allocated chunks. So if you accidentally write past the end of a dynamically-allocated chunk, you are likely to mangle these data structures. The system usually won’t detect the problem until later, when you call
free, and those functions fail in some inscrutable way.
One conclusion you should draw from this is that safe memory management requires design and discipline. If you write a library or module that allocates memory, you should also provide an interface to free it, and memory management should be part of the API design from the beginning.
If you use a library that allocates memory, you should be disciplined in your use of the API. For example, if the library provides functions to allocate and deallocate storage, you should use those functions and not, for example, call
free on a chunk you did not
malloc. And you should avoid keeping multiple references to the same chunk in different parts of your program.
Often there is a trade-off between safe memory management and performance. For example, the most common source of memory errors is writing beyond the bounds of an array. The obvious remedy for this problem is bounds checking; that is, every access to the array should check whether the index is out of bounds. High-level libraries that provide array-like structures usually perform bounds checking. But C arrays and most low-level libraries do not.