How to generate a zero-length read on PCIE Bus using x86-64 and Linux?
Asked Answered
F

1

8

In the PCI Express Base specification, section 2.2.5 "First/Last DW Byte Enables Rules", it says a zero length read can be used as a flush request. However, in the linux kernel documentation, most examples just use either a 1B or 4B read request:

Bus-Independent Device Accesses

How To Write Linux PCI Drivers

I'm wondering if it's possible the x86-64 architecture is capable of generating an instruction that causes a zero length read on PCI, and if it can, if there is some linux kernel function that creates that instruction.

Fritz answered 20/10, 2020 at 22:34 Comment(6)
This looks like an XY problem. You should explain why you would want to send a 0-byte read TLP in the first place. I'm not that knowledgeable on PCIe, but as I understand TLPs are up to the PCI controller to generate, the kernel does not just send TLPs and has no control over them (related question on the kernel mailing list). To me, it looks like the controller might decide to send 0-byte read request TLPs if needed, and the OS should not worry about it.Saury
I would use a 0-byte read immediately after a write to a PCIe device to ensure the write occurred (since writes are posted asynchronously). Using a 0-byte read instead of a 1B or 4B read would reduce bandwidth on the bus. I agree, it does not seem I have direct control over the TLPs, but wondering if an x86-64 instruction to an MMIO space would somehow translate to a 0-byte read.Fritz
That's an interesting question, I do not think so, as I suppose those kind of requests are only intended to be performed by the chip controller. I think you should not worry about having to manually ensure data is written. I could be wrong though, as I don't know much about PCIe, let's see if anybody answers...Saury
I wonder if clflush / mfence could get the CPU to wait for write completion? Probably not. That works (I think) for waiting for data to commit to NVRAM connected to the memory controllers (on Intel CPUs that support persistent memory, e.g. Skylake-X), but the system agent is a separate thing from a memory controller on the ring bus or mesh. At best you'd probably only wait for the write to be initiated, not completed. Maybe not even that.Lonnalonnard
In device drivers a usual practice is to perform a dummy read back of the same size as write. I never heard about 0 length reads for that, so will wait if there is an expert appears.Superhuman
I/O address space is not cached.Hazy
F
1

The two examples you mentioned involve MMIO accesses or legacy I/O port accesses from the CPU to an I/O device, but the zero-length read implementation note from Section 2.2.5 of the PCIe specification is about accesses from an I/O device. The PCIe spec and the Intel/AMD64 x86 manuals are obviously different from each other and they use different terms, so I don't understand how you confused the two. No, there is no such thing as zero-length read in x86.

The code from the first link is the following:

WRT_REG_WORD(&reg->ictrl, 0);
/*
 * The following read will ensure that the above write
 * has been received by the device before we return from this
 * function.
 */
RD_REG_WORD(&reg->ictrl);

There is a 16-bit MMIO write followed by a 16-bit MMIO read to the same address. The memory type of the target location is most probably UC, which ensures that all UC accesses appears on the system bus in program order. This means that it reaches the PCIe root complex (which is integrated on modern processors) in order. An MMIO write is translated by the processor's I/O unit to a posted write PCIe transaction and the read is translated to a non-posted read PCIe transaction. Both of these transactions would have traffic class and with relaxed ordering disabled. According to the transaction order rules, such a non-posted read cannot be reordered with any earlier posted write. The overall effect is that when the UC read gets back the result, the preceding UC write must have already completed at the target I/O device.

The second link you provided also includes an example of MMIO ordering that works exactly the same way. Issuing a read after a posted write is a commonly used technique to determine when the write has completed. A UC read is not a fully serializing operation in x86. If you don't want any later instructions (not UC accesses) to execute until the read completes, you need to add a fully serializing instruction after the read. The Linux kernel itself defines numerous MMIO barriers used in different situations.

The second link also mention that a legacy I/O write doesn't require a following read because "I/O Port space guarantees write transactions reach the PCI device before the CPU can continue." I/O instructions provide more ordering guarantees than UC accesses, but still they are not fully serializing. Among these guarantees include waiting for previous instructions to commit before executing an I/O instruction and not allowing later instructions to execute until the I/O instruction completes. These guarantees combined with the fact that I/O instructions are translated by the I/O controller to PCIe I/O transactions, where an I/O write transaction is a non-posted transaction, ensures that when the next instruction executes, it's guaranteed that the I/O write has completed at the target I/O device.

Zero-length reads can be used by an I/O device to determine that earlier writes have completed at the destination. This is how, for example, an I/O device can ensure that a write has reached the persistence domain on a platform that supports Asynchronous DRAM Refresh (ADR) or that a write has become observable by the device driver.

Folse answered 13/1, 2022 at 4:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.