Main memory bandwidth measurement
Asked Answered
F

1

4

I want to measure the main memory bandwidth and while looking for the methodology, I found that,

  1. many used 'bcopy' function to copy bytes from a source to destination and then measure the time which they report as the bandwidth.
  2. Others ways of doing it is to allocate and array and walk through the array (with some stride) - this basically gives the time to read the entire array.

I tried doing (1) for data size of 1GB and the bandwidth I got is '700MB/sec' (I used rdtsc to count the number of cycles elapsed for the copy). But I suspect that this is not correct because my RAM config is as follows:

  1. Speed: 1333 MHz
  2. Bus width: 32bit

As per wikipedia, the theoretical bandwidth is calculated as follows:

clock speed * bus width * # bits per clock cycle per line (2 for ddr 3 ram) 1333 MHz * 32 * 2 ~= 8GB/sec.

So mine is completely different from the estimated bandwidth. Any idea of what am I doing wrong?

=========

Other question is, bcopy involves both read and write. So does it mean that I should divide the calculated bandwidth by two to get only the read or only the write bandwidth? I would like to confirm whether the bandwidth is just the inverse of latency? Please suggest any other ways of measuring the bandwidth.

Fluorspar answered 12/11, 2011 at 21:38 Comment(2)
You seems to forgot the importance of caching on current machines. And how do you define your memory bandwidth? From a programmer's point of view, it is essentially what memcpy is getting. Also, you probably have other processes running on your machine (so extra context switches, etc.). I don't understand what you are wanting to measure exactly!!Whiffle
Basile's comment goes to the heart of the matter...modern consumer PCs are ferociously complicated beasts and the performance that you see depends intimately on what you are doing. There are multiple levels of cache; branch-predicting, speculativly executing pipelines in the CPU; interrupts; other processes; DMA peripherals wanting to use the (multiple!) buses; etc... THis question would have made a lot more sense on my Apple ][+.Kreiner
C
1

I can't comment on the effectiveness of bcopy, but the most straightforward approach is the second method you stated (with a stride of 1). Additionally, you are confusing bits with bytes in your memory bandwidth equation. 32 bits = 4bytes. Modern computers use 64 bit wide memory buses. So your effective transfer rate (assuming DDR3 tech)

1333Mhz * 64bit/(8bits/byte) = 10666MB/s (also classified as PC3-10666)

The 1333Mhz already has the 2 transfer/clock factored in.

Check out the wiki page for more info: http://en.wikipedia.org/wiki/DDR3_SDRAM

Regarding your results, try again with the array access. Malloc 1GB and traverse the entire thing. You can sum each element of the array and print it out so your compiler doesn't think it's dead code.

Something like this:

double time;
int size = 1024*1024*1024;
int sum;
*char *array = (char*)malloc(size);
//start timer here
for(int i=0; i < size; i++)
  sum += array[i];
//end timer
printf("time taken: %f \tsum is %d\n", time, sum);
Cora answered 28/8, 2012 at 20:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.