source : http://www.makelinux.net/books/lkd2/ch11lev1sec4
kmalloc()
The kmalloc() function's operation is very similar to that of user-space's familiar malloc() routine, with the exception of the addition of a flags parameter. The kmalloc() function is a simple interface for obtaining kernel memory in byte-sized chunks. If you need whole pages, the previously discussed interfaces might be a better choice. For most kernel allocations, however, kmalloc() is the preferred interface.
The function is declared in <linux/slab.h>:
void * kmalloc(size_t size, int flags)
The function returns a pointer to a region of memory that is at least size bytes in length[3]. The region of memory allocated is physically contiguous. On error, it returns NULL. Kernel allocations always succeed, unless there is an insufficient amount of memory available. Thus, you must check for NULL after all calls to kmalloc() and handle the error appropriately.
[3] It may allocate more than you asked, although you have no way of knowing how much more! Because at its heart the kernel allocator is page-based, some allocations may be rounded up to fit within the available memory. The kernel never returns less memory than requested. If the kernel is unable to find at least the requested amount, the allocation fails and the function returns NULL.
Let's look at an example. Assume you need to dynamically allocate enough room for a fictional dog structure:
struct dog *ptr;
ptr = kmalloc(sizeof(struct dog), GFP_KERNEL);
if (!ptr)
/* handle error ... */
If the kmalloc() call succeeds, ptr now points to a block of memory that is at least the requested size. The GFP_KERNEL flag specifies the behavior of the memory allocator while trying to obtain the memory to return to the caller of kmalloc().
gfp_mask Flags
You've seen various examples of allocator flags in both the low-level page allocation functions and kmalloc(). Now it's time to discuss these flags in depth.
The flags are broken up into three categories: action modifiers, zone modifiers, and types. Action modifiers specify how the kernel is supposed to allocate the requested memory. In certain situations, only certain methods can be employed to allocate memory. For example, interrupt handlers must instruct the kernel not to sleep (because interrupt handlers cannot reschedule) in the course of allocating memory. Zone modifiers specify from where to allocate memory. As you saw earlier in this chapter, the kernel divides physical memory into multiple zones, each of which serves a different purpose. Zone modifiers specify from which of these zones to allocate. Type flags specify a combination of action and zone modifiers as needed by a certain type of memory allocation. Type flags simplify specifying numerous modifiers; instead, you generally specify just one type flag. The GFP_KERNEL is a type flag, which is used for code in process context inside the kernel. Let's look at the flags.
Action Modifiers
All the flags, the action modifiers included, are declared in <linux/gfp.h>. The file <linux/slab.h> includes this header, however, so you often need not include it directly. In reality, you will usually use only the type modifiers, which are discussed later. Nonetheless, it is good to have an understanding of these individual flags. Table 11.3 is a list of the action modifiers.
Flag | Description |
---|---|
__GFP_WAIT | The allocator can sleep. |
__GFP_HIGH | The allocator can access emergency pools. |
__GFP_IO | The allocator can start disk I/O. |
__GFP_FS | The allocator can start filesystem I/O. |
__GFP_COLD | The allocator should use cache cold pages. |
__GFP_NOWARN | The allocator will not print failure warnings. |
__GFP_REPEAT | The allocator will repeat the allocation if it fails, but the allocation can potentially fail. |
__GFP_NOFAIL | The allocator will indefinitely repeat the allocation. The allocation cannot fail. |
__GFP_NORETRY | The allocator will never retry if the allocation fails. |
__GFP_NO_GROW | Used internally by the slab layer. |
__GFP_COMP | Add compound page metadata. Used internally by the hugetlb code. |
These allocations can be specified together. For example,
ptr = kmalloc(size, __GFP_WAIT | __GFP_IO | __GFP_FS);
instructs the page allocator (ultimately alloc_pages()) that the allocation can block, perform I/O, and perform filesystem operations, if needed. This allows the kernel great freedom in how it can find the free memory to satisfy the allocation.
Most allocations specify these modifiers, but do so indirectly by way of the type flags we will discuss shortly. Don't worryyou won't have to figure out which of these weird flags to use every time you allocate memory!
Zone Modifiers
Zone modifiers specify from which memory zone the allocation should originate. Normally, allocations can be fulfilled from any zone. The kernel prefers ZONE_NORMAL, however, to ensure that the other zones have free pages when they are needed.
There are only two zone modifiers because there are only two zones other than ZONE_NORMAL (which is where, by default, allocations originate). Table 11.4 is a listing of the zone modifiers.
Flag | Description |
---|---|
__GFP_DMA | Allocate only from ZONE_DMA |
__GFP_HIGHMEM | Allocate from ZONE_HIGHMEM or ZONE_NORMAL |
Specifying one of these two flags modifies the zone from which the kernel attempts to satisfy the allocation. The __GFP_DMA flag forces the kernel to satisfy the request from ZONE_DMA. This flag says, with an odd accent, I absolutely must have memory into which I can perform DMA. Conversely, the __GFP_HIGHMEM flag instructs the allocator to satisfy the request from either ZONE_NORMAL or (preferentially) ZONE_HIGHMEM. This flag says, I can use high memory, so I can be a doll and hand you back some of that, but normal memory works, too. If neither flag is specified, the kernel fulfills the allocation from either ZONE_DMA or ZONE_NORMAL, with a strong preference to satisfy the allocation from ZONE_NORMAL. No zone flag essentially says, I don't care so long as it behaves normally.
You cannot specify __GFP_HIGHMEM to either __get_free_pages() or kmalloc(). Because these both return a logical address, and not a page structure, it is possible that these functions would allocate memory that is not currently mapped in the kernel's virtual address space and, thus, does not have a logical address. Only alloc_pages() can allocate high memory. The majority of your allocations, however, will not specify a zone modifier because ZONE_NORMAL is sufficient.
Type Flags
The type flags specify the required action and zone modifiers to fulfill a particular type of transaction. Therefore, kernel code tends to use the correct type flag and not specify the myriad of other flags it might need. This is both simpler and less error prone. Table 11.5 is a list of the type flags and Table 11.6 shows which modifiers are associated with each type flag.
Flag | Description |
---|---|
GFP_ATOMIC | |
GFP_NOIO | |
GFP_NOFS | |
GFP_KERNEL | |
GFP_USER | |
GFP_HIGHUSER | |
GFP_DMA |
Flag | Modifier Flags |
---|---|
GFP_ATOMIC | __GFP_HIGH |
GFP_NOIO | __GFP_WAIT |
GFP_NOFS | (__GFP_WAIT | __GFP_IO) |
GFP_KERNEL | (__GFP_WAIT | __GFP_IO | __GFP_FS) |
GFP_USER | (__GFP_WAIT | __GFP_IO | __GFP_FS) |
GFP_HIGHUSER | (__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HIGHMEM) |
GFP_DMA | __GFP_DMA |
Let's look at the frequently used flags and when and why you might need them. The vast majority of allocations in the kernel use the GFP_KERNEL flag. The resulting allocation is a normal priority allocation that might sleep. Because the call can block, this flag can be used only from process context that can safely reschedule (that is, no locks are held and so on). Because this flag does not make any stipulations as to how the kernel may obtain the requested memory, the memory allocation has a high probability of succeeding.
On the far other end of the spectrum is the GFP_ATOMIC flag. Because this flag specifies a memory allocation that cannot sleep, the allocation is very restrictive in the memory it can obtain for the caller. If no sufficiently sized contiguous chunk of memory is available, the kernel is not very likely to free memory because it cannot put the caller to sleep. Conversely, the GFP_KERNEL allocation can put the caller to sleep to swap inactive pages to disk, flush dirty pages to disk, and so on. Because GFP_ATOMIC is unable to perform any of these actions, it has less of a chance of succeeding (at least when memory is low) compared to GFP_KERNEL allocations. Nonetheless, the GFP_ATOMIC flag is the only option when the current code is unable to sleep, such as with interrupt handlers, softirqs, and tasklets.
In between these two flags are GFP_NOIO and GFP_NOFS. Allocations initiated with these flags might block, but they refrain from performing certain other operations. A GFP_NOIO allocation does not initiate any disk I/O whatsoever to fulfill the request. On the other hand, GFP_NOFS might initiate disk I/O, but does not initiate filesystem I/O. Why might you need these flags? They are needed for certain low-level block I/O or filesystem code, respectively. Imagine if a common path in the filesystem code allocated memory without the GFP_NOFS flag. The allocation could result in more filesystem operations, which would then beget other allocations and, thus, more filesystem operations! This could continue indefinitely. Code such as this that invokes the allocator must ensure that the allocator also does not execute it, or else the allocation can create a deadlock. Not surprisingly, the kernel uses these two flags only in a handful of places.
The GFP_DMA flag is used to specify that the allocator must satisfy the request from ZONE_DMA. This flag is used by device drivers, which need DMA-able memory for their devices. Normally, you combine this flag with the GFP_ATOMIC or GFP_KERNEL flag.
In the vast majority of the code that you write you will use either GFP_KERNEL or GFP_ATOMIC. Table 11.7 is a list of the common situations and the flags to use. Regardless of the allocation type, you must check for and handle failures.
Situation | Solution |
---|---|
Process context, can sleep | Use GFP_KERNEL |
Process context, cannot sleep | Use GFP_ATOMIC, or perform your allocations with GFP_KERNEL at an earlier or later point when you can sleep |
Interrupt handler | Use GFP_ATOMIC |
Softirq | Use GFP_ATOMIC |
Tasklet | Use GFP_ATOMIC |
Need DMA-able memory, can sleep | Use (GFP_DMA | GFP_KERNEL) |
Need DMA-able memory, cannot sleep |
kfree()
void kfree(const void *ptr)
The kfree() method frees a block of memory previously allocated with kmalloc(). Calling this function on memory not previously allocated with kmalloc(), or on memory which has already been freed, results in very bad things, such as freeing memory belonging to another part of the kernel. Just as in user-space, be careful to balance your allocations with your deallocations to prevent memory leaks and other bugs. Note, calling kfree(NULL) is explicitly checked for and safe.
Look at an example of allocating memory in an interrupt handler. In this example, an interrupt handler wants to allocate a buffer to hold incoming data. The preprocessor macro BUF_SIZE is the size in bytes of this desired buffer, which is presumably larger than just a couple of bytes.
char *buf;
buf = kmalloc(BUF_SIZE, GFP_ATOMIC);
if (!buf)
/* error allocting memory ! */
kfree(buf);