How to convert virtual address to physical address from user space? Three different methods gave different results in Linux kernel 4x version

First of all, I’m sorry for so long question.

I do some simulation modeling task and I need to translate user space virtual address into kernel space physical addresses. I used three different method and got three different results. Could you please give me an advice what method is correct and why they are provide different results?

CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

OS: Linux Fedora release 22 (Twenty Two)

Kernel: 4.4.4-200.fc22.x86_64

I read many different sources but confused by many old information related to before 2.6 kernel version or not to 64bit Intel architecture.

Method 1.

As I understand, I have only one method to convert the address (VA->PA) from the user space. This could be done by using /proc/self/pagemap. This file provide PFN (Page Frame Number) by virtual page number as offset in the file.

virtualPageNumber = virtualAddress / systemPageSize;
lseek(pageMapFile, virtualPageNumber * sizeof(void *), SEEK_SET);
size_t bytesRead = read(pageMapFile, &physicalAddressPFN, sizeof(void *));

and physicalAddressPFN + (virtualAddress - virtualPageNumber) will be the physical address itself.

This approach provide me following example address conversion:

Virtual Address : PGD  : PUD  : PMD  : PTE  :offset:  pagemap value :physical address:  PGD address   :  PUD address   :  PMD address   : PTE address
         15920f0:     0:     0:     a:   192:    f0:8180000000138622:       1386220f0:       137090000:       137090000:       138490000:   138622000

PGD:PUD:PMD:PTE are extracted from virtual address by using bitfileds as described here https://lwn.net/Articles/117749/

“pagemap value” is the 8 bytes extracted from /proc/self/pagemap

Also, I’m interested in addresses of the page table layers (each of four-level page table). I looked to http://lxr.free-electrons.com/source/arch/x86/mm/dump_pagetables.c#L376 that, as I understand, used to produce kernel page table in /sys/kernel/debug/kernel_page_tables

cat /sys/kernel/debug/kernel_page_tables
Example output from my system:
---[ User Space ]---
0x0000000000000000-0xffff800000000000    16777088T                               pgd
---[ Kernel Space ]---
0xffff800000000000-0xffff880000000000           8T                               pgd
---[ Low Kernel Mapping ]---
0xffff880000000000-0xffff880000099000         612K     RW                 GLB NX pte
0xffff880000099000-0xffff88000009a000           4K     ro                 GLB NX pte
0xffff88000009a000-0xffff88000009b000           4K     ro                 GLB x  pte
0xffff88000009b000-0xffff880000200000        1428K     RW                 GLB NX pte
etc

This approach used in reverse direction (calculating PGD/PUD/PMD/PTE offsets from physical address). Results of this represented in fileds “PGD address:PUD address:PMD address:PTE address” above.

For example:

addrPTE = physicalAddress - vaOffset;
addrPMD = addrPTE - (vaIdxPTE * PTE_LEVEL_MULT);
addrPUD = addrPMD - (vaIdxPMD * PMD_LEVEL_MULT);
addrPGD = addrPUD - (vaIdxPUD * PUD_LEVEL_MULT);

where vaIdxPUD is index of the PUD from virtual address and PMD_LEVEL_MULT = PTRS_PER_PTE * PTE_LEVEL_MULT as described in http://lxr.free-electrons.com/source/arch/x86/mm/dump_pagetables.c#L101

All calculation above are done from the user space.

The next thing is how to check correctness of these addresses manipulation.

Method 2.

I implemented simple LKM (Linux Kernel Module) with addresses translation. This LKM interact with user space program by /proc files. I used /proc/my_input to write the virtual address that I would like to translate and read translated information from /proc/my_output.

I can’t use this method in my general task and I implemented it just to check correctness of the address translation.

In the LKM I used

unsigned long addrPA = virt_to_phys((void*)addrVA);

and this method gave me

[259084.890129]
                Given virtual address 0x15920f0
[259084.890131] Physical address provided by virt_to_phys() 0x7800015920f0

This is different from the address from method 1 that provided PA 0x1386220f0 as exposed above.

Method 3.

I tried to walk on page table and find all four-level page table address.

   struct page *page = NULL;
    struct mm_struct *mm = current->mm;

    pgd = pgd_offset(mm, addr);
    if (pgd_none(*pgd) || pgd_bad(*pgd)) {
        goto out;
    }
    printk(KERN_NOTICE "Valid pgd %p", pgd);
    *pgdVar = (unsigned long) pgd;

    pud = pud_offset(pgd, addr);
    if (pud_none(*pud) || pud_bad(*pud)) {
        goto out;
    }
    printk(KERN_NOTICE "Valid pud %p", pud);
    *pudVar = (unsigned long) pud;
etc...

This method gave me following

                Given virtual address 0x15920f0
[259084.890132] Valid pgd ffff880038ac0000
[259084.890134] Valid pud ffff880147039000
[259084.890134] Valid pmd ffff88018a2f6050
[259084.890135] page frame struct is @ ffffea0004e18880 pte ffff88003876ec90

The PTE address is far from PA addresses found by previous methods.

So, these three methods provide different results.

Method 1: VA: 0x15920f0 -> PA: 0x1386220f0

Method 2: VA: 0x15920f0 -> PA: 0x7800015920f0

Method 3: VA: 0x15920f0 -> PA: 0xffff88003876ec90

What is the right method to get VA->PA translation from user space and find page tables address?

How to check the correctness of the method (can be at kernel level)?

Thank you

Sergey

Update 1.

I collect event traces from large HPC MPI applications to simulate different addresses behavior at HW level. I need to know physical addresses to understand how many different addresses used at particular time by application at particular MPI rank. I can't use kernel space level for VA->PA translation because no HPC cluster administrator allow me root privileges on real machine (need Top500 class machines for investigations).

This is quite big task and I didn't want bother readers by this redundant explanation.

Recommended topics

Hot tags