First of all, I’m sorry for so long question.
I do some simulation modeling task and I need to translate user space virtual address into kernel space physical addresses. I used three different method and got three different results. Could you please give me an advice what method is correct and why they are provide different results?
CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
OS: Linux Fedora release 22 (Twenty Two)
Kernel: 4.4.4-200.fc22.x86_64
I read many different sources but confused by many old information related to before 2.6 kernel version or not to 64bit Intel architecture.
Method 1.
As I understand, I have only one method to convert the address (VA->PA) from the user space. This could be done by using /proc/self/pagemap. This file provide PFN (Page Frame Number) by virtual page number as offset in the file.
virtualPageNumber = virtualAddress / systemPageSize;
lseek(pageMapFile, virtualPageNumber * sizeof(void *), SEEK_SET);
size_t bytesRead = read(pageMapFile, &physicalAddressPFN, sizeof(void *));
and physicalAddressPFN + (virtualAddress - virtualPageNumber)
will be the physical address itself.
This approach provide me following example address conversion:
Virtual Address : PGD : PUD : PMD : PTE :offset: pagemap value :physical address: PGD address : PUD address : PMD address : PTE address
15920f0: 0: 0: a: 192: f0:8180000000138622: 1386220f0: 137090000: 137090000: 138490000: 138622000
PGD:PUD:PMD:PTE
are extracted from virtual address by using bitfileds as described here https://lwn.net/Articles/117749/
“pagemap value” is the 8 bytes extracted from /proc/self/pagemap
Also, I’m interested in addresses of the page table layers (each of four-level page table). I looked to http://lxr.free-electrons.com/source/arch/x86/mm/dump_pagetables.c#L376 that, as I understand, used to produce kernel page table in /sys/kernel/debug/kernel_page_tables
cat /sys/kernel/debug/kernel_page_tables
Example output from my system:
---[ User Space ]---
0x0000000000000000-0xffff800000000000 16777088T pgd
---[ Kernel Space ]---
0xffff800000000000-0xffff880000000000 8T pgd
---[ Low Kernel Mapping ]---
0xffff880000000000-0xffff880000099000 612K RW GLB NX pte
0xffff880000099000-0xffff88000009a000 4K ro GLB NX pte
0xffff88000009a000-0xffff88000009b000 4K ro GLB x pte
0xffff88000009b000-0xffff880000200000 1428K RW GLB NX pte
etc
This approach used in reverse direction (calculating PGD/PUD/PMD/PTE offsets from physical address). Results of this represented in fileds “PGD address:PUD address:PMD address:PTE address”
above.
For example:
addrPTE = physicalAddress - vaOffset;
addrPMD = addrPTE - (vaIdxPTE * PTE_LEVEL_MULT);
addrPUD = addrPMD - (vaIdxPMD * PMD_LEVEL_MULT);
addrPGD = addrPUD - (vaIdxPUD * PUD_LEVEL_MULT);
where vaIdxPUD is index of the PUD from virtual address and PMD_LEVEL_MULT = PTRS_PER_PTE * PTE_LEVEL_MULT
as described in http://lxr.free-electrons.com/source/arch/x86/mm/dump_pagetables.c#L101
All calculation above are done from the user space.
The next thing is how to check correctness of these addresses manipulation.
Method 2.
I implemented simple LKM (Linux Kernel Module) with addresses translation. This LKM interact with user space program by /proc files. I used /proc/my_input
to write the virtual address that I would like to translate and read translated information from /proc/my_output
.
I can’t use this method in my general task and I implemented it just to check correctness of the address translation.
In the LKM I used
unsigned long addrPA = virt_to_phys((void*)addrVA);
and this method gave me
[259084.890129]
Given virtual address 0x15920f0
[259084.890131] Physical address provided by virt_to_phys() 0x7800015920f0
This is different from the address from method 1 that provided PA 0x1386220f0
as exposed above.
Method 3.
I tried to walk on page table and find all four-level page table address.
struct page *page = NULL;
struct mm_struct *mm = current->mm;
pgd = pgd_offset(mm, addr);
if (pgd_none(*pgd) || pgd_bad(*pgd)) {
goto out;
}
printk(KERN_NOTICE "Valid pgd %p", pgd);
*pgdVar = (unsigned long) pgd;
pud = pud_offset(pgd, addr);
if (pud_none(*pud) || pud_bad(*pud)) {
goto out;
}
printk(KERN_NOTICE "Valid pud %p", pud);
*pudVar = (unsigned long) pud;
etc...
This method gave me following
Given virtual address 0x15920f0
[259084.890132] Valid pgd ffff880038ac0000
[259084.890134] Valid pud ffff880147039000
[259084.890134] Valid pmd ffff88018a2f6050
[259084.890135] page frame struct is @ ffffea0004e18880 pte ffff88003876ec90
The PTE address is far from PA addresses found by previous methods.
So, these three methods provide different results.
Method 1: VA: 0x15920f0 -> PA: 0x1386220f0
Method 2: VA: 0x15920f0 -> PA: 0x7800015920f0
Method 3: VA: 0x15920f0 -> PA: 0xffff88003876ec90
What is the right method to get VA->PA translation from user space and find page tables address?
How to check the correctness of the method (can be at kernel level)?
Thank you
Sergey
Update 1.
I collect event traces from large HPC MPI applications to simulate different addresses behavior at HW level. I need to know physical addresses to understand how many different addresses used at particular time by application at particular MPI rank. I can't use kernel space level for VA->PA translation because no HPC cluster administrator allow me root privileges on real machine (need Top500 class machines for investigations).
This is quite big task and I didn't want bother readers by this redundant explanation.
physicalAddressPFN + (virtualAddress - virtualPageNumber)
you need to scale by systemPageSize – Concernvirt_to_phys
within the kernel module, are you sure it's the user space virtual address that's being converted? – Concern