Enabling write-combining IO access in userspace
Asked Answered
M

1

12

I have a PCIe device with a userspace driver. I'm writing commands to the device through a BAR, the commands are latency sensitive and amount of data is small (~64-bytes) so I don't want to use DMA.

If I remap the physical address of the BAR in the kernel using ioremap_wc and then write 64-bytes to the BAR inside the kernel, I can see that the 64-bytes are written as a single TLP over PCIe. If I allow my userspace program to mmap the region with the MAP_SHARED flag and then write 64-bytes I see multiple TPLs on the PCIe bus, rather than a single transaction.

According to the kernel PAT documentation I should be able to export write-combined pages through to userspace:

Drivers wanting to export some pages to userspace do it by using mmap interface and a combination of

1) pgprot_noncached()

2) io_remap_pfn_range() or remap_pfn_range() or vm_insert_pfn()

With PAT support, a new API pgprot_writecombine is being added. So, drivers can continue to use the above sequence, with either pgprot_noncached() or pgprot_writecombine() in step 1, followed by step 2.

Based on this documentation, the relevant kernel code from my mmap handler looks like this:

 vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);

 return io_remap_pfn_range(vma,
                           vma->vm_start,
                           info->mem[vma->vm_pgoff].addr >> PAGE_SHIFT,
                           vma->vm_end - vma->vm_start,
                           vma->vm_page_prot);

My PCIe device shows up in lspci with the BARs marked as prefetchable as expected:

    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 11
    Region 0: Memory at d8000000 (64-bit, prefetchable) [size=32M]
    Region 2: Memory at d4000000 (64-bit, prefetchable) [size=64M]

When I call mmap from userspace I see a log message (having set debugpat kernel boot parameter):

reserve_memtype added [mem 0xd4000000-0xd7ffffff], track write-combining, req write-combining, ret write-combining

I can also see in /sys/kernel/debug/x86/pat_memtype_list that a PAT entry looks correct and there are no overlapping regions:

write-combining @ 0xd4000000-0xd8000000
uncached-minus  @ 0xd8000000-0xda000000

I have also checked that there are no MTRR entries that would conflict with the PAT configuration. As far as I can see, everything is set up correctly for write-combining to occur in userspace, however using a PCIe analyser to observe the transactions on the PCIe bus there the userspace access pattern is completely different to the same write performed from the kernel after an ioremap_wc call.

Why is write-combining not working as expected from userspace?

What can I do to debug further?

I'm currently running on a single socket 6-core i7-3930K.

Mcguinness answered 23/4, 2014 at 15:11 Comment(2)
What is the userspace visible address for the range? Maybe it is not properly aligned?Secunda
@Secunda Good suggestion but the userspace visible address looks like it's aligned.Mcguinness
A
1

I don't know if this will help, but this is how I got write-combining working on PCIe. Granted, it was in kernel space, but this complies with the Intel documentation. It's worth trying if you're stuck.

Globally defined:

unsigned int __attribute__ ((aligned(0x20))) srcArr[ARR_SIZE];

In your function:

int *pDestAddr

for (i = 0; i < ARR_SIZE; i++) {
    _mm_stream_si32(pDestAddr + i, pSrcAddr[i]);
}
Archives answered 23/4, 2014 at 22:52 Comment(1)
Thanks for the answer but this gives the same pattern of TLPs: Kernel space a single 64-byte TLP, userspace a stream of individual TLPs.Mcguinness

© 2022 - 2024 — McMap. All rights reserved.