CPU huffman compression faster after first execution?
Asked Answered
T

1

2

I've recently constructed a CPU implementation of Huffman encoding in C++. I've also constructed a GPU version in CUDA in order to compare times, but I've come across a problem when testing the CPU's times:

When stress testing by compressing large files, for instance a 97mb text file with almost every letter in the alphabet and various other ascii characters, my CPU implementation will take approximately 8.3 seconds the first time it executes. After that, the time drops significantly to 1.7 seconds. NOTE: I'm only timing the CPU's counting of the frequency, not the encoding of the string and writing to a file.

Any ideas how this could be? I'm closing all file pointers and shouldn't be caching anything as far as I know.

Let me know if any source code is needed, thanks.

Toplofty answered 30/4, 2012 at 4:48 Comment(0)
U
5

After the first run the file content is cached by the system (and is shared by all processes), so the next run you are actually reading the file from memory.

Umberto answered 30/4, 2012 at 4:59 Comment(5)
So should I use the timings when the file is first run, or when it's cached by the system and read from memory? If the first, how would I ensure that I'm doing the run and it's not being read from memory?Toplofty
@superXero: I have a program on my system rougly like 'try{while(true) new char[1000];} catch(...) {return 0;}' which will make the system flush all the buffers it possibly can. Normally people just use the second timing though. It's easier and more reliable.Lilt
I guess what you really want is the time used by your algorithm without the time used by IO operations. So if I were you I will read the whole file into memory, record the start time, run the algorithm with the file content in the memory, and then record the end time.Umberto
Thanks, will use the second timing, where I'm reading the file from memory.Toplofty
It may not only be memory cache. CPU caches might be at work here. It is better to look for a way to flush all caches ... Or restart computer each time. Or make a slave -server computer to run algorithm and restart automatically and wait noxt job.Fete

© 2022 - 2024 — McMap. All rights reserved.