My understanding of Intel CPUs in general is that demand loads to consecutive physical addresses trigger the L2 hardware stream prefetcher, which can prefetch quite far in advance up to the page boundary.
I want to trigger this mechanism without polluting L1. My idea is to use L2 software prefetches to consecutive physical addresses. Does anyone know if this will trigger the same mechanism?