



(7 ratings)
Many people claim that they know C or C++, and they even program in those
languages. At the same time, when they stumble on a bug, they often don't
truly understand its nature, and find it hard to fix (sometimes unable to
fix it at all). Now, C and C++ coding requires a lot of usage of dynamic
memory management, by allocating memory (malloc() or
new), using it via pointers and references, and deallocating it
(free() or delete).
Many bugs reveal themselves in the general form of 'memory corruption', or as programmers often say 'this structure contains garbage! how can this be?!?'. understanding the underlying memory management mechanisms supported by the operating system and by the programming language's runtime environment, will help you understand such bugs, fix them faster and better.
As an added bonus, understanding how memory management works, could help you write more efficient programs. This is especially important when writing programs that manipulate a lot of data, or that require every last bit of CPU power.
Unix Memory Management
Every process running on a Unix system gets memory management services from the operating system. This is true for most other operating systems as well. These services are given regardless of the language in which the program was written. Such services would include virtual memory management, paging services, memory protection and the like.
Machine Architectures - Memory, Cache, Registers
Basically, computers have two main locations for storing data - physical memory and registers. Memory is usually large, and is made of a linear set of cells, also called 'bytes' or 'words', depending on their size. Each memory cell is accessed using an address, and the memory does not have to be consecutive. On various architectures, certain parts of the memory are used to access devices (this is called memory-mapped I/O). other parts of memory might not even be mapped into any physical memory at all. In the old days, a programmer had to take care of this info in their program. Modern operating systems, however, hide this information from the user, as we'll see later.
Cache is a small chunk of memory, stored either directly in the CPU (level 1 cache), or on the motherboard (level 2 cache). Its purpose is to store a copy of 'recently used' parts of the main memory, in a location that can be accessed much faster, either because its in the CPU, or because its made of a faster (and more expensive) type of memory. Usually, the cache is hidden from our our programs by the hardware - a programmer usually need only worry about the cache when doing very low-level memory managed code inside the operating system's kernel.
Registers are storage cells stored directly inside the CPU with very fast access to its internals (the arithmetic-logic unit, for example). They can be accessed much faster than memory cells, and are often used to store data that is needed for a short calculation, such as contents of local variables in a function, or intermediate results of arithmetic calculations. the keyword 'register', when used when defining a local variable in C, can serve as a hint to the compiler to assign that variable to a register, rather than to a memory cell. Note that with current compilers and their optimizers, its often better to let the compiler decide which variables are better kept in registers.
Virtual Memory
In order to ease the programmer's life, as well as to allow a program to access more memory than might be physically installed in the machine, most modern operating systems support a virtual memory system. Virtual memory works by the operating system (in tight cooperation with the CPU) mapping virtual addresses used by a process, into physical addresses. When a process allocates a new block of memory, the operating system assigns this block to some section (called 'page') of physical memory, using a virtual memory translation table. Each access to a memory location by the process is translated to access to the matching physical memory page. In order to avoid slowing down the program, this translation is done directly by the CPU.
Memory Protection
As we have seen, the operating system uses a translation table to map virtual memory to physical memory. Taking this a further step, the system can use a different translation table for each process, thereby giving each process its own address space. This means that the same virtual memory address used by two different processes, will be mapped into two different physical memory addresses. This means that one process cannot access the contents of the memory used by the other process, and thus one process corrupting its memory won't interfere with the contents of the memory of any other process in the system. This feature is known as "memory protection", and is used now by most operating systems (including all Unix variants).
Run-time Management Of Virtual Memory
Some of the virtual memory sections might be mapped to no physical memory page. When a process tries to access a memory cell in such a section, the CPU identifies a page fault, and invokes an operating system routine that needs to handle this fault. Most operating systems use this feature to store part of the virtual memory sections on disk, rather than in RAM, thereby allowing the program to use an address space larger than the physical RAM of the machine. When a page fault occurs, the operating system loads the contents of the faulted virtual memory section from the disk, copies it into a free physical memory page, and updates the virtual memory translation table so the CPU will know to map the virtual memory section to the new physical RAM page.
During the search for a free page, it might be that no free page is found. In such case, the operating system takes the contents of a busy physical memory page, copies it to the hard disk, and uses this memory page to load the data of the desired virtual memory section. The virtual memory section that was previously mapped to this physical memory page, is now marked as paged out.
As you can see from this, the algorithm used to decide which memory page should be taken when there are no free pages, has to be efficient. With a bad algorithm, a process that accesses two virtual memory sections alternately, might cause the system to keep paging these sections in and out. Since access to the disk is much much slower than access to RAM, this will slow down the system tremendously. A common algorithm used to find candidates for paging out is called LRU - Least Recently Used. In this algorithm, the memory page that was used least recently, is the page that will have its contents paged out. This algorithm works well due to the locality principle - if a process accessed a given memory page, it is likely to access this same page or one near it on the next instruction.
SIGSEGV This! SIGBUS That!
We saw that some virtual memory sections aren't mapped to physical memory. While some of them were simply paged out, others were never allocated by the process. When a process runs, its virtual memory table is small. As it allocates more memory pages, the table grows. However, if the process tries to access a virtual memory address of a section it hasn't allocated yet, the operating system has no where to bring this page from. The designers of the Unix system decided that this situation indicates a program bug, and thus instead of making an automatic allocation of a memory page in such a case, they chose to send a signal to the process. This signal is a SEGV signal (or SIGSEGV), and its default signal handler prints out a "Segmentation violation - core dumped" message, and dumps the memory image of the process into a file named 'core' in the process's current directory.
Another way to cause a 'segmentation violation' is trying to access an illegal
location of virtual memory. Because many invalid pointer problems occur
with very low pointer values, the operating system does not allow a process
to allocate a memory page for the virtual memory section beginning with the
virtual address '0'. This is what causes programs to receive a SEGV signal
when trying to dereference a NULL pointer (NULL
on _most_ machine architectures is defined as '0').
What about a BUS (or SIGBUS) signal? this signal is sent to a program that tries to access a non-aligned pointer. For instance, on many machine architectures, access to 'long' (4 byte) numbers must be done using a memory address that divides by 4. Trying to access such an entity using an address that does not abide by this rule will cause the CPU to emit a trap. The operating system's kernel catches this trap, and then sends a BUS signal to the program. The default signal handler for this signal emits a "Bus error - core dumped" message, and dumps the memory contents to a 'core' file, much like the handler for the SEGV signal does.
Load On Demand
As we have seen, the efficiency of virtual memory is based on the locality principle. The same principle can be used when loading programs into memory. There is no point in loading all the executable file during application startup. Large parts of the code are likely not to be ever executed during the process's lifetime. Furthermore, loading all of the code at once will mean it will take more time until the process can start running.
Thus, what Unix systems do, is just allocate the page table entries, and mark the pages as "load on demand". When the process tries to access a virtual memory cell in a page that wasn't loaded yet, a page fault will occur, causing the operating system to load the contents of this specific page into memory. The code part of shared libraries is treated in a similar manner.
This feature, together with some properties of file management could explain the following phenomena: if you have a process executing program file 'foo', and you then compile a new version of 'foo' and copy it on top of the original 'foo' file - the process is likely to crash very quickly. The reason for this is that the process is likely to access a virtual memory cell of a page whose contents were not loaded yet. This will cause the system to load the page from the new binary file - which is most likely different from the original binary file. Thus, the program will be executing a random machine instruction, instead of the machine instruction it was supposed to perform, most likely resulting a process crash.
In order to avoid the above scenario, there are several options. One is killing all processes currently executing the program's binary, before updating it. A better method is to rename the old binary file to a new name (e.g. foo.old) and then copy the new binary to 'foo'. This will work, because the name of a file is relevant only when a process tries to open it. After the file was opened, the process accesses it via its disk location (I-node number) - which is different for the old and the new files. Actually, you may even erase the old binary file without renaming it - the file will be removed from the directory, but not from the disk. It will be removed from disk only when all processes keeping it open (e.g. processes executing this binary program file) will close it (or exit). Note that this is NOT true if the binary file is accessed via NFS, only if it is accessed on a local file system. NFS file accesses always use the file name, whether for open operations or for read operations.
Read-Only Memory
Some of the virtual memory space used by a process does not ever need to change during the process's life. This fact can be used to optimize handling of such memory pages by the paging code of the operating system. For example, executable memory pages are read-only pages, and are marked so by the system. Trying to write into them would fail, and generate a CPU trap which will be caught by the operating system, that will send a SEGV signal to the process.
When the system needs to page out read-only code pages of this sort, it doesn't even have to page them out to disk - it already has a copy of them inside the binary executable (or the shared library) they were taken from. Thus, paging these pages out just requires marking them as free.
The process can mark other memory pages as read-only too (see the man
page for the mmap() system call, for example). This could be
useful to make sure no part of the code tries to modify the contents of
that memory - useful for trapping down some bugs.
Shared Memory
Another feature supported by Unix systems is shared memory. Using this feature, virtual memory sections of different processes are marked as shared, and thus point to the same physical memory pages. When one process changes the contents of this memory, the other processes immediatly see the new contents. Of course this means that access to this shared memory should be synchronized among the processes sharing it, using some locking mechanism.
Writable memory pages can be shared by processes using various system APIs,
such as the System-V shared memory API (see the man page of
shmget() and shmat() for details) or using the
mmap() system call. Note that a page may be
shared between more than 2 processes, and some of these processes might only
require read-only access to these shared memory pages.
Read-only shared memory is easier to manage. One place where sharing read-only memory pages comes in very handy is sharing of executable code coming from program binary files and from shared libraries. The system automatically takes care of that - if a process tries to execute a program that is already loaded to memory due to being executed by another process, the memory pages holding the binary code will be mapped to the virtual address space of the new process, and thus overall memory usage of the system would not increase (other than the memory needed to hold the translation table for the new process - this table is not shared between processes). Note that dynamic data won't be shared by processes in this manner, since it is stored to writable memory pages. This is, ofcourse, the expected behaviour - you wouldn't want one process changing a variable's contents, affecting another process running the same program, would you?.
Memory Management For Related Processes
A special case of shared memory handling is handled by the system when
a process creates a new child process using the fork() system
call. This call creates a new process that contains an almost identical copy
of the parent process's memory. The operating system makes a copy of the
virtual memory translation table for the child, but current Unix
implementations do not make a copy of the pages themselves.
Instead, they mark all the pages (now shared by parent and child
processes) as "copy-on-write". This means that as long as both processes
only read from a given page, they access the same physical page. However,
when one of these processes tries to write into such a shared memory page,
the system notes the "copy-on-write" flag, and then creates a new copy of the
page, and inserts that copy into the translation table of the writing
process. Then, the write is actually performed on the new copy of the page.
This mechanism allows the fork() operation to work faster
than if all of the contents of the virtual memory of the parent process would
have been copied. One may ask why making this copy is useful in the first
place. For the executable code pages - this is obvious, as the child process
starts to run the same code that the parent process used. Regarding the
data pages, this allows the parent process to pass some data to the child
process that it would require during its run, such as file descriptors used for
communications between the parent process and child process. This means that
a call to fork() will be most efficient if done before the parent
process allocated large ammounts of memory, i.e. as close to the parent
process's own initialization.
Note that some operating systems support a version of the system call, named
vfork() that is not supposed to even make a copy of the
translation table. This system call is not very portable - see the
relevant man page for more info.
Memory Managers
Memory managers are software libraries that perform memory allocation and de-allocation on behalf of the program. Every programming language comes with a runtime environment that includes at least one memory manager. Since the programming languages we deal with tend to be general purpose, the memory manager is also general purpose - it does not assume anything about the pattern of allocations, sizes of memory chunks allocated and the likes. Therefor, it is possible to find an allocation pattern that will cause a given memory allocation to perform poorly. Thus, having an idea regarding how a memory manager works could allow us to write programs that make more efficient use of memory, in terms of speed and of overall memory usage.
Memory Alignment
One property of all memory managers is that they return memory chunks which are aligned best for the architecture of the machine on which they are run. Alignment means that the memory chunks returned to the user begin on an address that divides by the size of a word for the CPU they are running on. This size is often of 4 bytes (for 32 bit architectures) or 8 bytes (for 64 bit architectures). Further, the size of returned chunks is also kept to a multiple of the word size. This ensures that the returned memory blocks can be used to store the largest data type the machine supports (e.g. a long integer).
The price of this mechanism is that often memory is wasted. For example, suppose that i have a machine with a word size of 4 bytes. Further, suppose that i want to store a string whose length is 5 bytes (including the terminating null character, if we're refering to a C language string). The memory manager will give me a memory chunk whose size is 8 bytes (2 words). I use 5 bytes, and 3 are wasted. Thus, general-purpose memory managers cause a lot of memory waste for programs that allocate many small memory chunks. We will discuss methods of overcoming this problem when we discuss specialized memory managers. We will also see how this affects issues with pointers and memory smearing (memory contents corrupting).
A Memory Manager's Algorithm
Various algorithms have been devised over the years for implementing efficient memory managers for general-purpose programming. We will present here a simple algorithm that is easy to follow, so you'll get a feel of how a memory manager works.
Linked List Of Memory Chunks
Our memory manager keeps a linked list of free memory chunks. Inside each
chunk, the first few bytes are used for storing the chunk's size (including
the space used by the memory manager to keep the chunk's size), and
the address of the next chunk. This means that the size of a chunk must
be at least large enough to store one number and one address (e.g. 8 bytes,
on a 32-bit architecture). Here is how a single memory chunk in the list looks:
And here is a linked list of memory chunks:
The memory manager is normally implemented as a library, which means it is
linked into our process's executable binary. This allows the memory manager
to work in the address space of our process, so it can indeed allocate memory
that our process will be able to access. Also, because it works in virtual
memory, the memory manager sees a large and consecutive address space.
However, the linked list does not have to be consecutive, for reasons we'll
soon see.
Allocating A Chunk
When the user asks to allocate a memory chunk of a given size, the memory manager scans the list until it finds the first memory chunk which is at least slightly larger than what the user asked for. This chunk is then removed from the linked list, and the manager logically splits it to three sections. In the first (small and fixed-size) section, it writes the size of the allocated chunk. The second section is the chunk whose address will be returned to the user. The third section contains the part of the chunk that goes beyond what the user required - this third section is turned into a new (free) chunk, and is added back into the linked list, instead of the original chunk. Note that if the original chunk had fit the exact size the user asked for, no third chunk will be left to be linked back to the list.
Thus, Assuming an unsigned long is used to represent a chunk's size, the chunk allocated has a size X plus the size of memory needed to store an unsigned long number (on a 32-bit architecture, an unsigned long occupies 4 bytes of memory).
Let's see how the above list looks after a user asked to allocate a chunk of memory with a size of 62 bytes. Let's assume the machine's word size is 4 bytes, so the manager will actually try to allocate a memory chunk of 64 bytes. in fact, since the manager needs to prepend the chunk's size in front of the chunk, it will actually allocate a 68 bytes block - 4 bytes to store the size, and 64 bytes to return to the user.
Freeing A Chunk
Freeing a chunk is a done by reversing the above process, with a few optimizations. The user gives the memory manager the address of the chunk it had previusly got from the manager. the manager looks at the 4 bytes right before this address for the size of the chunk. It then inserts the chunk back into its place on the linked list, checking if it can merge it with consecutive chunks already found on the list.
The reason for this merging is to avoid memory fragmentation - that is, to avoid turning one large memory chunk into several small ones after a set of allocation and free operations. This potential fragmentation would have caused allocation requests for large chunks, to be unable to use the freed chunks, if the manager hadn't noticed those chunks were actually consecutively placed in memory.
This chunk merging operation also implies that the linked list would better be stored in a sorted manner, so that it will be easy to find the neighbours of a chunk in the list.
In our specific case, if the user asks to free the memory chunk at address 0x2018, the memory manager will look at the 4 bytes before this chunk, seeing that the chunk's size is 68 bytes. Then, a scan of the linked list will show that this chunk belongs after the 1st chunk in the list. Calculating the chunk's end address will show that it is adjucent to the 3rd chunk in the list, and so the manager will merge these 2 chunks into a single chunk, and store this chunk on the linked list. The "next chunk" pointer in the previous list item will of course be updated to point to this merged chunk, and the list resulting will be the same as before the allocation we've shown above.
Note that in actuall cases, the order of memory allocation and free operations is arbitrary, so a freed chunk might be merged with a chunk it wasn't part of initially. It might also merge with two chunks - one on each side.
Algorithm's Efficiency
The usage of linked lists is not considered efficient in general. However, for short lists, this is far more efficient than storing sorted binary trees or similar structures, due to the overhead of the balancing operations performed on these trees. When dealing with memory allocation, lists tend to be short. Initially, the list will contain a single large memory chunk. The first allocation will split this chunk in two, returning the first part to the user, and leaving a single (smaller) chunk on the list. The same goes for the next allocations. When the user begins to free memroy, chunks return to the pool, often causing some fragmentation. however, these same chunks are likely to be used again on the next allocation, reducing the size of the list again. Thus, for common patterns of memory allocation, a linked list would turn out to be quite efficient both in speed, and in use of memory.
More Info On Memory Management Algorithms
The issue of memory management algorithms has been extensively researched over the years, and a lot of information about it can be found on the Web. For an extensive coverage of memory management, look at the following 1995 postscript article, that provides a survey of dynamic storage allocation techniques, from the univerisity of Texas, USA.
Interaction With Unix's Memory Management
We have claimed that the memory manager begins with one chunk on its list of free memory chunks. This chunk has to be taken from some place - this is the role of the operating system's virtual memory manager. As you remember, the memory the process accesses is actually virtual memory. Also, most virtual memory sections are not mapped anywhere. This is done for efficiency - when a process's virtual address space is of a size of several giga-bytes, it will be wasteful to allocate a map to handle this ammount of virtual memory, especially since most processes use a much smaller ammount of memory.
The way the operating system handles this situation is by not allocating
any memory for the process, unless it asks for it. Of course, the prorgam's
binary and shared library sizes can be known in advance, but memory used
to store data is of an arbitrary size. When the process requires more memory
to store data pages, it uses the brk() system call (or
sbrk()) to ask the system to enlarge its allocated address
space. The system than adds entries to the translation map for this process,
and the pages themselves will be then mapped into physical memory when they
are accessed.
Actually, the system call interface is hardly ever used by programmers.
The memory manager of the runtime environment of our programming language
is the one that uses this interface. Thus, whenever it runs out of free
memory chunks on its list, it'll call the brk() system
call to allocate more memory pages, and add the new block to its chunks
list. The ammount of memory it will ask the system to allocate can vary.
Asking for too small sizes will cause many calls to brk(),
which is time consuming. Allocating a too large block will take time for
the system to update the virtual translation map for the process.
Note: there is usually no system call that can be used to return unused memory back to the operating system. when you free memory, the memory manager cannot return this memory back to the operating system. If this freed memory is never used again, it will eventually be paged out to disk - but won't be freed until the process exits. Thus, the ammount of virtual memory used by a process always increases - never decreases. That is, unless...:
Note 2: the above rule has an exeption - shared memory, and
mmap()-ed memory.
Shared memory is counted as part of the memory space used by the process.
Thus, when the process attached to a shared memory chunk, its 'accounted'
virtual address space size is increased by the ammount of space in the shared
memory page. When the process releases a shared memory chunk, the process's
'accounted' virtual address space size is decreased. The same goes for memory
mapped using the mmap() system call.
Specialized Memory Managers
Since general memry managers are... well - general, it makes sense to implement specialized memory managers in cases where the general memory manager causes too much overhead (in CPU usage or of memory used due to fragmentation).
A good example is when a program needs to allocate many (hundreads of thousands, millions) memory chunks all of the same size. This can be done by allocating a large block (whose size is a multiple of the chunk's size) and slicing it into a set of such chunks. This allows both for faster allocation, as well as for less memory usage overhead - since we don't need to keep the size of each chunk when we allocate it - they are all of the same size, after all.
Memory Manager's Interaction With Program's Code
As we have seen, the memory manager keeps its own data structures in the
address space of the process for which it works. This means that if the
process overwrites memory out of bounds (e.g. referencing a location right
before an array it has allocated using the manager) - the data structures of
the memory manager itself may be corrupted. This would explain cases where
a program we write crashes, and the stack shows it was in a middle of a
call to the memery manager's functions (e.g. malloc() or
free() in C, new or delete in C++).
Note that these kinds of bugs are not always easy to track - the memory corruption might have occured in a completely different part of the code, and just looking at the functions on the stack and the code they execute often won't help locating the cause of the problem.
The cause of such a problem could be that our program had overwritten a location that had stored the size of a chunk, and thus when it was freed, the memory manager overtook a larger chunk than had originally been allocated. The manager later allocated parts of this chunk for other purposes, and thus we get the same chunk of memory used by different parts of our code for different purposes. Or a bug in our code might cause the list of free memory chunks to be corrupted, causing one of its pointers to contain an invalid value (e.g. pointing outside the process's address space).
C Runtime Memory Management
The C programming language and its runtime environment serve as the basic language used on Unix systems (if we ignore assembly language for the time being). The runtime environment defines not only how memory is allocated and freed, but also how the process's different pieces of information are layed out in memory. Also, being a langauge that works on a low level, it allows the programmer a lot of control on how memory is used. This means a lot of power - together with responsibility.
Program Memory Segments
A C program is layed out in memory in several seperate segments - code segment, stack segment and data segment. These segments are layed out inside the virtual address space of the process.
The code segment holds the executable code of the program (both coming from the program's binary itself, and any shared libraries it loads). All memory pages of the code segment are marked as read-only, and are shared with any other process using the same program file and/or shared library files.
Note: not all pages are shared in this way, due to 'relocation' issues. These are cases when a part of the program needs to know the address of another part, but the exact address will only be known at run-time. This address might be different for different processes (since they load a different set of shared libraries, for example), and thus cannot be written in a shared page. The dynamic loader (which is responsible for handling such situations, among other things) uses "indirect jumps" - it stores non-shared pages that contain these varied addresses in them, and the program's code is compiled in a way that it will read the addresses from these non-shared pages. If this was not done, complete pages of executeable code will need to be copied, just to change one or two locations holding such addresses. For a clearer and more complete explanation about relocations and how exactly they are handled, please refer to Ulrich Drepper's home page, and in particular to his How to write shared libraries paper.
The stack segment is used to hold the contents of the stack of the process. The stack size is defined at process startup, and cannot be increased while the process runs. For a multi-threaded process, several stacks will exist in the stack segment.
The data segment occupies the space from the stack segment, to the last
address allocated by the process using the brk() system call.
This segment can grow as the process runs and needs more virtual memory.
The Stack And Local Variables
The stack is used by the process to store the chain of functions which are currently in the middle of execution. Each function call causes the process to add an execution frame to the top of the stack. This execution frame would contain the contents of CPU registers as they were before the function call was made, the return address (i.e. where in the code this function was invoked, so we can return there when the function returns), the parameters passed to the function and all local variables of the function. Thus, when we have fucntion 'main' calling function 'a' which calls function 'b', we will have 3 frames on the stack.
The 'allocation' of a stack frame does not require performing real memory allocation - as the space for the stack was reserved during process startup. Instead, this is done by merely marking a part of the stack as containing the frame, and copying data to that frame. Note that local variables are "allocated" by simply advancing the stack pointer beyond their location on the stack, so allocating local variables takes a fixed ammount of time, no matter their size. When a function returns - its stack frame is freed by simply modifying the stack pointer to point below its frame.
The local variables are not initialized - they just contain the values that were accidentally placed in the memory locations these variables occupy. You may consider a situation in which a function was called, its variables used and given some values. Later the function returned, and its frame was released. Then, another function was called. the stack frame of this new function is located in the same place in memory as the frame of the former function, so the new function's local variables will get the values that were left there by the local variables of the former function. This can explain why the values of unintialized local variables are often neither 0, nor look like total garbage.
Since all local variables are stored in consecutive memory on the stack, if we take a pointer to such a variable, and later on write right above or right below this pointer, we'll be modifying the contents of another local variable, or even that of the function's return address, if we're "lucky". For example, consider the following code:
int foo()
{
int numbers[2];
int j;
j = 2;
printf("j - %d\n", j);
numbers[2] = 3;
printf("j - %d\n", j);
}
During execution of this function, we first assign 2 to 'j', and thus the first
print command will show "j - 2". Then, we try to assign a value to 'numbers[2]'.
However, the 'numbers' array only has 2 cells - 0 and 1. Writing into subscript
'2' of this array will cause us to write just beyond the array (which is
fully located on the stack). The variable 'j' just happens to be stored in that
location in memory, and thus, the value '3' will be actually assigned to 'j'.
Our second print command will thus show "j - 3". Note that this assumes that
the variables are stored in memory in the same order as they were declared
inside the function's code. With some compilers, this might not be the case,
and the out-of-range assignment might overwrite a different variable, or
a part of the stack that does not hold variables, leading to other unexpected
results.
Note: local variables (as well as function parameters) might be stored in registers, rather than on the stack. This could be either because we used the 'register' keyword when declaring these variables, or because the compiler's optimization chose to place a variable in a register. Ofcourse, such variables cannot be over-written by stack overflows.
The Essence Of Pointers
So what realy is a pointer? As we are taught in any C programming course, a pointer is an entity that holds the address of another entity - a variable, or some memory address. Normally, we can treat a pointer as a variable pointing to another variable. This is important - since it implies that the pointer, by itself, is a piece of data stored in memory. The only thing differentiating a pointer from a simple number, is how we treat the pointer's value. Thus, a pointer is a memory cell (or a register) which contains a number, that we happen to interpret as an address of another memory cell.
Lets view a small function that uses pointers, and see how its memory is layed out:
void
hello_pointer()
{
char data[8] = "abcdefgh";
char* p = data;
int i;
/* 1 */
for (i=0; i<9; i++, p++) {
*p = 'a';
}
/* 2 */
*p = 'b';
}
Here is the layout of the stack during this function call, at the location
marked with '/* 1 */':
As you can see, 'data' occupies 8 characters (memory cells). 'p' occupies 4 memory cells (assuming a pointer is 4 bytes long - which is true for 32 bit architectures, such as on Intel/AMD 80386 and compatible CPUs). We ignore 'i' for now. As you can see, 'p' contains the address of 'data[0]' (that is, the first byte of the array 'data'), before the loop runs.
While we execute the loop above, we modify the contents of the memory cell
into which 'p' points, to contain the ascii code of 'a', and then advance 'p'
to point to the next byte. This is all nice and dandy up to the 8th iteration,
in which 'i' equals '7', and 'p' contains the address of 'data[7]':
Then things begin to get ugly. On the next iteration, 'p' is advanced again,
and points to 'data[8]'. But wait - the array only contains 8 bytes, which
are indexed 0 through 7. Pointing to 'data[8]' means 'pointing one byte after
the end of the 'data' array. And what do we find there? that's right - the
first byte of 'p' itself. Thus, after the next iteration, we will have the
following contents on the stack:
As you can see, 'p' actually pointed to the location of memory in which its
own contents resides, and thus, writing into '*p', we modified 'p' itself
to contain some random address (0x61, in hex, is the ascii code of 'a').
Read this line again please, to be sure you understood it.
When we will try to execute the command after the loop (*p = 'b'), we will
smear some random location in memory (possibly even unmapped memory), leading
to varius random effects (either smearing another memory cell, or, in case the
address now in 'p' points to an unmapped memory page, causing the program
to crash with a SEGV signal).
Now, what would have happened if the source code of the previous example looked like this:
void
hello_pointer()
{
char data[10] = "abcdefghij";
char* p = data;
int i;
/* 1 */
for (i=0; i<11; i++, p++) {
*p = 'a';
}
/* 2 */
*p = 'b';
}
Lets see the memory layout of the stack in this case:
As you can see, the 'data' array here occupies 10 bytes, and since 'p' has to be aligned to a 4-byte boundary, there are 2 unused bytes between 'data[9]' and 'p' (the contents of these bytes is, ofcourse, undefined, since there is no need to initialize them). This time, our 'off-by-1' error in the loop will cause us to override the contents of an otherwise unused memory cell. Thus, we will not even notice this kind of memory smear, until we sometime in the future modify the size of the 'data' array, or add a 'short' (2 bytes) variable between 'data' and 'p'.
Pointers And Assembly Language
Note: This section will be readable if you know assembly. If not - skip to the heap management section.
As you know, assembly (and ofcourse machine code) uses several addressing modes to access operands of commands. Direct addressing assumes that an operand is found in a register. Indirect addressing assumes that the register contains the address of a memory cell (or memory word) that contains the data. As you can see, indirect addressing treats the register as a pointer. Memory based indirect addressing assumes that the command contains the address of a memory cell, whose contents contains the address of another memory cell, which contains the operand. This kind of addressing can be used to handle pointers - after all, pointers are just variables, and variables are just addresses of memory cells.
To illustrate this point, lets see a small fragment of C source code, and how it translates into assembly code. We will compile this code using 'gcc -S <filename>' to get the assembly code. First, here is the C code:
int
main()
{
char c;
char* p = &c;
*p = 'b';
return;
}
And here is the resulting assembly code (as generated on linux, in the AT&T
assembly format). Just a few notes about this assembly variant:
- In 2 operand commands, the first operand is the 'source', the second is the 'target'.
- Register names are prefixed with '%', like '%eax' and '%ebp'.
- Indirect addressing is denoted by surrounding a register with pharanteses, as in '(%eax)'. To add a displacement (an offset), this is prefixed with a number, as in '4(%eax)'.
- Numeric arguments are prefixed with '$', like '$98' to denote the numeric value '98'.
- Register '%esp' is the stack pointer.
bottom of stack padding %esp
| .____|_____. |
\|/ | | | \|/
v \|/ \|/ \|/ v
v v v
----------------------------------------------- ----
| p | p | p | p | | | | c | | -> towards higher memory
----------------------------------------------- ----
.
/|\
|
-8(%esp)
Now lets see the code itself:
# first come some assembly mambo-jambo - not interesting for demonstrating
# pointers.
.file "pointer.c"
.version "01.01"
gcc2_compiled.:
.text
.align 4
.globl main
.type main,@function
# next comes the 'main' function.
main:
pushl %ebp # the following lines initialize register '%ebp'
movl %esp,%ebp # to point to the top of the stack frame, where
# variables 'c' and 'p' are stored.
subl $8,%esp # then we update '%esp' (the stack pointer) to
# the bottom of the stack.
leal -1(%ebp),%edx # we want 'p' to contain the address of 'c'.
# since 'c' is a single char, we need to take an
# offset of '1' from the top of the stack frame.
# we store this address in register '%edx'.
# leal = Load Effective Address (Long).
movl %edx,-8(%ebp) # then we move that address into the location
# on the stack where 'p' is stored.
movl -8(%ebp),%eax # 'p' is stored on the stack, at location
# '%ebp' minus 8 bytes. we use indirect
# addressing, in order to fetch the contents
# of the memory word ('l') whose address is
# the contents of '%ebp', minus 4.
# so now we have the contents of 'p' (which is
# the address of 'c') in register '%eax'.
movb $98,(%eax) # and then we copy the value '98' (ascii code of
# 'b') into where '%eax' points.
xorl %eax,%eax # that's it. from here, its the code used to
# return '0' from the function.
jmp .L1
.p2align 4,,7
.L1:
leave
ret
# and some more assembly mambo-jambo.
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
If you don't get it all at first glance - stare at the code a while longer.
It will become clear with enough staring ;)
Heap Management With malloc() and free()
When programming in C, we normally use the standard memory allocator that comes
with the C library, to allocate memory from the heap (the data segment). This
memory allocator defines functions such as malloc() and
free(), for memory allocation handling.
Usage Of malloc()
When we want to allocate a chunk of memory from the heap, we use the
malloc() function. This function accepts a numeric parameter
denoting the size of the memory chunk we want, and returns a pointer to
a memory chunk with the following properties:
- The size of the returned chunk is usually enlarged to fit some 'rounded'
size for which the memory manager is optimized. For instance, if we
allocate an array of 10 chars (10 bytes), the manager will most likely
pad the chunk so it will contain a number of complete words, thus
returning a pointer to a 12 bytes chunk (three 4-byte words). We
should NOT rely on this behaviour, and not try to use more
bytes than we asked for.
- The starting address of the returned chunk is aligned based on the size of
the machine's "native" data, or rather to the size of the largest 'native'
C language object (usually a 'long long' or 'double'). This is required,
since the
malloc()function does not know what we wish to do with this memory chunk (it returns a char pointer), so it must returned memory that may be used to store any type of object.
malloc() is assigned to a pointer
variable, and cast to the type of object this pointer points to. since this
is just a general memory chunk, we may later cast it into other types
without a problem - as long as the chunk is long enough to hold an object
of that type. Here is an example:
char* p_c;
int* p_i;
p_c = malloc(8);
strcpy(p_C, "bla");
printf("%s\n", p_c);
p_i = (int*)p_c;
(*p_i) = 4;
printf("%d\n", *p_i);
printf("p_i is '%p', p_c is '%p'\n", p_i, p_c);
We allocated memory for use as a char pointer (a C "string"), and used it
like that. We then cast this memory into an int pointer, assigned a value
to it, and printed its contents as if it was an integer. Finally, with the
last printf() call, we printed out the address that
p_i and p_c contain, to show that both point to the
same memory chunk (note - the use of '%p' in the format of
printf() tells printf() to print the address
that the variable contains, rather than the value stored in that address).
Usage Of free()
When we want to free a memory chunk previously allocated by
malloc(), we use the free function. This function
accepts a char pointer to a previously allocated memory chunk, and frees it -
that is, adds it to the list of free memory chunks, that may be re-allocated.
Several notes about free():
- The size of the chunk was stored by
malloc()previously in its memory map, and that is howfree()knows how many bytes to free. - The freed memory is not being cleared or erased in any manner. This is
why accessing memory that was just freed often does not cause a crash -
any data in it is still the same as before calling
free(). - The
free()function cannot nullify pointers to the given memory chunk that might still exist in our program. After we callfree(), it is up to us (the programmers) not to try and dereference pointers that still point to that memory chunk. Such pointers are known as 'dangling pointers' - they point to memory that was already freed, and thus they should NOT be dereferenced again, unless they are assigned the address of a different (not-freed) memory chunk.
free() only marks the memory chunk
as free - there is no enforcement of this freeing operation. Accessing
memory chunks that were previously freed is the cause of many memory errors
for novices and experienced programmers. A good practice is that always nullify
a pointer that was just freed, as in:
char* p = malloc(10);
strcpy(p, "hello");
printf("%s\n", p);
free(p);
p = NULL;
20 Random Tutorials from the same category :
How to change a file's owner and group in Linux
How and when to use the dd command?
Redirecting standard input and output
Access Windows partitions from Linux
Execute a task 'at' the time you want..
Accessing User Information On A Unix System
Linux file permissions
Parallel Programming - Basic Theory For The Unwary
How to view text files in Linux
What to do if Linux refuses to boot after a power failure?
The humble Linux cheat sheet
Debugging "C" And "C++" Programs Using "gdb"
Asking questions in a discussion forum
What are the SUID, SGID and the Sticky Bits?
The great features of Linux - an alternative view
Scheduling tasks using Cron - Part II
Network programming under Unix systems
About your files on Linux
Basic Graphics Programming With The Xlib Library
Moving around in the Linux file system













