I have a few questions on STREAM (http://www.cs.virginia.edu/stream/ref.html#runrules) benchmark.
- Below is the comment from stream.c. What is the rationale about the requirement that arrays should be 4 times the size of cache?
* (a) Each array must be at least 4 times the size of the
* available cache memory. I don't worry about the difference
* between 10^6 and 2^20, so in practice the minimum array size
* is about 3.8 times the cache size.
- I originally assume STREAM measures the peak memory bandwidth. But I later found that when I add extra arrays and array accesses, I can get larger bandwidth numbers. So it looks to me that STREAM doesn't guarantee to saturate memory bandwidth. Then my question is what does STREAM really measures and how do you use the numbers reported by STREAM?
For example, I added two extra arrays and make sure to access them together with the original a/b/c arrays. I modify the bytes accounting accordingly. With these two extra arrays, my bandwidth number is bumped up by ~11.5%.
> diff stream.c modified_stream.c
181c181,183
< c[STREAM_ARRAY_SIZE+OFFSET];
---
> c[STREAM_ARRAY_SIZE+OFFSET],
> e[STREAM_ARRAY_SIZE+OFFSET],
> d[STREAM_ARRAY_SIZE+OFFSET];
192,193c194,195
< 3 * sizeof(STREAM_TYPE) * STREAM_ARRAY_SIZE,
< 3 * sizeof(STREAM_TYPE) * STREAM_ARRAY_SIZE
---
> 5 * sizeof(STREAM_TYPE) * STREAM_ARRAY_SIZE,
> 5 * sizeof(STREAM_TYPE) * STREAM_ARRAY_SIZE
270a273,274
> d[j] = 3.0;
> e[j] = 3.0;
335c339
< c[j] = a[j]+b[j];
---
> c[j] = a[j]+b[j]+d[j]+e[j];
345c349
< a[j] = b[j]+scalar*c[j];
---
> a[j] = b[j]+scalar*c[j] + d[j]+e[j];
CFLAGS = -O2 -fopenmp -D_OPENMP -DSTREAM_ARRAY_SIZE=50000000
My last level cache is around 35MB.
Any commnet?
Thanks!
This is for a Skylake Linux server.