How to get a struct page from any address in the Linux kernel
Asked Answered
S

5

29

I have existing code that takes a list of struct page * and builds a descriptor table to share memory with a device. The upper layer of that code currently expects a buffer allocated with vmalloc or from user space, and uses vmalloc_to_page to obtain the corresponding struct page *.

Now the upper layer needs to cope with all kinds of memory, not just memory obtained through vmalloc. This could be a buffer obtained with kmalloc, a pointer inside the stack of a kernel thread, or other cases that I'm not aware of. The only guarantee I have is that the caller of this upper layer must ensure that the memory buffer in question is mapped in kernel space at that point (i.e. it is valid to access buffer[i] for all 0<=i<size at this point). How do I obtain a struct page* corresponding to an arbitrary pointer?

Putting it in pseudo-code, I have this:

lower_layer(struct page*);
upper_layer(void *buffer, size_t size) {
    for (addr = buffer & PAGE_MASK; addr <= buffer + size; addr += PAGE_SIZE) {
        struct page *pg = vmalloc_to_page(addr);
        lower_layer(pg);
    }
}

and I now need to change upper_layer to cope with any valid buffer (without changing lower_layer).

I've found virt_to_page, which Linux Device Drivers indicates operates on “a logical address, [not] memory from vmalloc or high memory”. Furthermore, is_vmalloc_addr tests whether an address comes from vmalloc, and virt_addr_valid tests if an address is a valid virtual address (fodder for virt_to_page; this includes kmalloc(GFP_KERNEL) and kernel stacks). What about other cases: global buffers, high memory (it'll come one day, though I can ignore it for now), possibly other kinds that I'm not aware of? So I could reformulate my question as:

  1. What are all the kinds of memory zones in the kernel?
  2. How do I tell them apart?
  3. How do I obtain page mapping information for each of them?

If it matters, the code is running on ARM (with an MMU), and the kernel version is at least 2.6.26.

Squishy answered 12/5, 2011 at 17:42 Comment(2)
Will the target of buffer need to have any alignment or size requirements?Amide
@Karmastan: No, there's no alignment constraint. The lower layer will map whole pages anyway. I can start upper_layer with (the clean equivalent of) buffer &= ~(PAGE_SIZE-1).Degrade
C
15

I guess what you want is a page table walk, something like (warning, not actual code, locking missing etc):

struct mm_struct *mm = current->mm;
pgd = pgd_offset(mm, address);
pmd = pmd_offset(pgd, address);  
pte = *pte_offset_map(pmd, address);  
page = pte_page(pte);

But you you should be very very careful with this. the kmalloc address you got might very well be not page aligned for example. This sounds like a very dangerous API to me.

Cammiecammy answered 23/5, 2011 at 15:42 Comment(9)
Hmm, maybe (off to read up and test now). What are the constraints on address? (At the end of the day, the buffer needs to be shared with a device, meaning that all the pages it spans will be shared. The code is working now for the apparently harder case of vmalloc. This isn't an external API, the upper/lower layer in my question is just our internal design.)Degrade
Bounty awarded because this has helped me understand a little better what vmalloc_to_page is doing. However in my tests I've only managed to trigger an oops: “Unable to handle kernel paging request at virtual address 340009f4” when dereferencing ptep (returned by pte_offset_map), for an address on the stack of the calling function or returned by kmalloc.Degrade
@Gilles: perhaps because the kernel is using large pages (for instance 2MB pages) for itself? The way you do a page table walk would be different in this case (and perhaps not give you a struct page - if one even exists). vmalloc_to_page does not care probably because it knows it is using normal 4K pages. To be sure, take a look at the page table attributes on each page table level - if it is a large page, one of the levels must have an attribute saying so (else the hardware itself would have no way of knowing).Indicant
@CesarB: No, we have the usual 4kB/1MB levels. Hmm, two levels only, does this change anything? (vmalloc_to_page is arch-independent, so I guess not)Degrade
@Gilles: for less than 4 levels, the kernel "folds" them, so the unused levels appear to only have a single entry, and the compiler optimizes out the extra calls.Indicant
I guess it is because stack addresses and kmalloc addresses are not page aligned.Cammiecammy
With modern kernels (> 2.6.38) you must take into account that there might be transparent huge-pages lwn.net/Articles/423584, so you'll need to walk through them...Niles
@Gilles Were you able to solve this? I also get a crash when attempt to translate an (4K-aligned) address on the stack.Laniary
@IgorR. No, I never got that code to run with stack buffers, and now the whole driver's been rewritten from scratch with a different architecture.Degrade
D
7

Mapping Addresses to a struct page

There is a requirement for Linux to have a fast method of mapping virtual addresses to physical addresses and for mapping struct pages to their physical address. Linux achieves this by knowing where, in both virtual and physical memory, the global mem_map array is because the global array has pointers to all struct pages representing physical memory in the system. All architectures achieve this with very similar mechanisms, but, for illustration purposes, we will only examine the x86 carefully.

Mapping Physical to Virtual Kernel Addresses

any virtual address can be translated to the physical address by simply subtracting PAGE_OFFSET, which is essentially what the function virt_to_phys() with the macro __pa() does:

/* from <asm-i386/page.h> */
132 #define __pa(x)        ((unsigned long)(x)-PAGE_OFFSET)

/* from <asm-i386/io.h> */
 76 static inline unsigned long virt_to_phys(volatile void * address)
 77 {
 78         return __pa(address);
 79 }

Obviously, the reverse operation involves simply adding PAGE_OFFSET, which is carried out by the function phys_to_virt() with the macro __va(). Next we see how this helps the mapping of struct pages to physical addresses.

There is one exception where virt_to_phys() cannot be used to convert virtual addresses to physical ones. Specifically, on the PPC and ARM architectures, virt_to_phys() cannot be used to convert addresses that have been returned by the function consistent_alloc(). consistent_alloc() is used on PPC and ARM architectures to return memory from non-cached for use with DMA.

What are all the kinds of memory zones in the kernel? <---see here

Darryl answered 23/5, 2011 at 16:21 Comment(2)
I've read that. What I'm asking is what's missing from LDD3 ch.15 or from my incomplete understanding of it: what is a virtual address (not the address of a global buffer, experimentally)? How to I get to the page data (struct page, not the physical address) from a virtual (or any other) address?Degrade
@Gilles: maybe this article helps to understand, what a virtual address is virtuell memoryDarryl
W
3

For user-space allocated memory, you want to use get_user_pages, which will give you the list of pages associated with the malloc'd memory, and also increment their reference counter (you'll need to call page_cache_release on each page once done with them.)

For vmalloc'd pages, vmalloc_to_page is your friend, and I don't think you need to do anything.

Wirework answered 27/8, 2014 at 17:25 Comment(1)
page_cache_release is not available anymore, use put_page instead.Calendre
T
2

For 64 bit architectures, the answer of gby should be adapted to:

 pgd_t * pgd;
 pmd_t * pmd;
 pte_t * pte;
 struct page *page = NULL;
 pud_t * pud;
 void * kernel_address;

 pgd = pgd_offset(mm, address);
 pud = pud_offset(pgd, address);
 pmd = pmd_offset(pud, address);
 pte = pte_offset_map(pmd, address);
 page = pte_page(*pte);

 // mapping in kernel memory:
 kernel_address = kmap(page);

 // work with kernel_address....

 kunmap(page);
Tupiguarani answered 4/6, 2015 at 13:3 Comment(0)
I
1

You could try virt_to_page. I am not sure it is what you want, but at least it is somewhere to start looking.

Indicant answered 16/5, 2011 at 9:51 Comment(1)
According to LDD, virt_to_page only works for logical addresses, not for vmalloc buffers or high memory. So maybe it's part of the solution, but I don't know enough to be confident that I'm writing sufficiently robust code.Degrade

© 2022 - 2024 — McMap. All rights reserved.