The two examples you mentioned involve MMIO accesses or legacy I/O port accesses from the CPU to an I/O device, but the zero-length read implementation note from Section 2.2.5 of the PCIe specification is about accesses from an I/O device. The PCIe spec and the Intel/AMD64 x86 manuals are obviously different from each other and they use different terms, so I don't understand how you confused the two. No, there is no such thing as zero-length read in x86.
The code from the first link is the following:
WRT_REG_WORD(®->ictrl, 0);
/*
* The following read will ensure that the above write
* has been received by the device before we return from this
* function.
*/
RD_REG_WORD(®->ictrl);
There is a 16-bit MMIO write followed by a 16-bit MMIO read to the same address. The memory type of the target location is most probably UC, which ensures that all UC accesses appears on the system bus in program order. This means that it reaches the PCIe root complex (which is integrated on modern processors) in order. An MMIO write is translated by the processor's I/O unit to a posted write PCIe transaction and the read is translated to a non-posted read PCIe transaction. Both of these transactions would have traffic class and with relaxed ordering disabled. According to the transaction order rules, such a non-posted read cannot be reordered with any earlier posted write. The overall effect is that when the UC read gets back the result, the preceding UC write must have already completed at the target I/O device.
The second link you provided also includes an example of MMIO ordering that works exactly the same way. Issuing a read after a posted write is a commonly used technique to determine when the write has completed. A UC read is not a fully serializing operation in x86. If you don't want any later instructions (not UC accesses) to execute until the read completes, you need to add a fully serializing instruction after the read. The Linux kernel itself defines numerous MMIO barriers used in different situations.
The second link also mention that a legacy I/O write doesn't require a following read because "I/O Port space guarantees write transactions reach the PCI device before the CPU can continue." I/O instructions provide more ordering guarantees than UC accesses, but still they are not fully serializing. Among these guarantees include waiting for previous instructions to commit before executing an I/O instruction and not allowing later instructions to execute until the I/O instruction completes. These guarantees combined with the fact that I/O instructions are translated by the I/O controller to PCIe I/O transactions, where an I/O write transaction is a non-posted transaction, ensures that when the next instruction executes, it's guaranteed that the I/O write has completed at the target I/O device.
Zero-length reads can be used by an I/O device to determine that earlier writes have completed at the destination. This is how, for example, an I/O device can ensure that a write has reached the persistence domain on a platform that supports Asynchronous
DRAM Refresh (ADR) or that a write has become observable by the device driver.
clflush
/mfence
could get the CPU to wait for write completion? Probably not. That works (I think) for waiting for data to commit to NVRAM connected to the memory controllers (on Intel CPUs that support persistent memory, e.g. Skylake-X), but the system agent is a separate thing from a memory controller on the ring bus or mesh. At best you'd probably only wait for the write to be initiated, not completed. Maybe not even that. – Lonnalonnard