Accuracy of clock() function in C
Asked Answered
D

1

11

I have some code that tries to determine the execution time of a code block.

#include <time.h>
#include <stdio.h>

int main()
{
   clock_t start_t, end_t, total_t;
   int i;

   start_t = clock(); //clock start
   printf("Starting of the program, start_t = %ld\n", start_t);

   printf("Going to scan a big loop, start_t = %ld\n", start_t);
   for(i=0; i< 10000000; i++)  //trying to determine execution time of this block
   {
   }
   end_t = clock(); //clock stopped
   printf("End of the big loop, end_t = %ld\n", end_t);

   total_t = (long int)(end_t - start_t);
   printf("Total time taken by CPU: %lu\n", total_t  );

   return(0);
}

The output of the code snippet on my machine is

Starting of the program, start_t = 8965
Going to scan a big loop, start_t = 8965
End of the big loop, end_t = 27259
Total time taken by CPU: 18294

So if my CPU was running at 21 MHz and assuming that this was the only thing getting executed, each machine cycle would be approximately equal to 47 nanoseconds so (18294 * 47) = 859818 nanoseconds.

Would this be the execution time for the for loop in my code? Am I making some incorrect assumptions here.

Delius answered 8/10, 2016 at 18:23 Comment(8)
To get the time in seconds you should divide the number, e.g. total_t in your case, with CLOCKS_PER_SEC. Note that you need to cast total_t into a floating point value for it to work.Indented
Also a small nitpicking on your naming scheme: Symbols ending with the suffix _t are usually used for type-aliases (as created with typdef). For example size_t or time_t and even clock_t.Indented
@JoachimPileborg I reviewed the documentation for the clock() function and CLOCK_PER_SEC would return the time accurate upto 1/100th of a second and I am looking for resolution upto 10 microseconds hence I used the mentioned approach. Also I want this to work on different platforms and architectures so I thought that just calculating the difference and then multiplying that with the clock speed would be a better option since CLOCKS_PER_SEC would change with architectures.Delius
The reason CLOCKS_PER_SEC are different on different platforms is because the "ticks" are platform dependent. It depends not only on hardware, but also on the operating system. The resolution and precision is something you have to find out in your operating system documentation. The only way to reliably and portably get the number of seconds is to divide the (floating point) difference with CLOCKS_PER_SEC. The numbers themselves are otherwise pretty meaningless.Indented
@JoachimPileborg That was very useful. Is there any way I can increase the resolution to the order of 10 microseconds. Maybe use an alternate set of functions that has a better accuracy?Delius
You can't really change the clock function or its internal workings. But depending on the operating system there might be higher-resolution timers available. And if you're on an embedded system with only a minimal operating system, then the hardware might have have timers you could use instead.Indented
Do you really mean to include the printf times?Eugenides
The question is good but the example is hopeless. An empty loop can be optimised to nothing, printf() could do a lot of work or it could essentially pass everything to a system thread and so timing it doesn't tell you much. And you even use _t for an identifier. If you want to time instructions, you have to code in assembly and count the instructions, and you need about a tenth of a second (rule of thumb) for an accurate measurement.Brasier
L
10

The unit of time used by the clock function is arbitrary. On most platforms, it is unrelated to the processor speed. It's more commonly related to the frequency of an external timer interrupt — which may be configured in software — or to a historical value that's been kept for compatibility through years of processor evolution. You need to use the macro CLOCKS_PER_SEC to convert to real time.

printf("Total time taken by CPU: %fs\n", (double)total_t / CLOCKS_PER_SEC);

The C standard library was designed to be implementable on a wide range of hardware, including processors that don't have an internal timer and rely on an external peripheral to tell the time. Many platforms have more precise ways to measure wall clock time than time and more precise ways to measure CPU consumption than clock. For example, on POSIX systems (e.g. Linux and other Unix-like systems), you can use getrusage, which has microsecond precision.

struct timeval start, end;
struct rusage usage;
getrusage(RUSAGE_SELF, &usage);
start = usage.ru_utime;
…
getrusage(RUSAGE_SELF, &usage);
end = usage.ru_utime;
printf("Total time taken by CPU: %fs\n", (double)(end.tv_sec - start.tv_sec) + (end.tv_usec - start.tv_usec) / 1e-6);

Where available, clock_gettime(CLOCK_THREAD_CPUTIME_ID) or clock_gettime(CLOCK_PROCESS_CPUTIME_ID) may give better precision. It has nanosecond precision.

Note the difference between precision and accuracy: precision is the unit that the values are reported. Accuracy is how close the reported values are to the real values. Unless you are working on a real-time system, there are no hard guarantees as to how long a piece of code takes, including the invocation of the measurement functions themselves.

Some processors have cycle clocks that count processor cycles rather than wall clock time, but this gets very system-specific.

Whenever making benchmarks, beware that what you are measuring is the execution of this particular executable on this particular CPU in these particular circumstances, and the results may or may not generalize to other situations. For example, the empty loop in your question will be optimized away by most compilers unless you turn optimizations off. Measuring the speed of unoptimized code is usually pointless. Even if you add real work in the loop, beware of toy benchmarks: they often don't have the same performance characteristics as real-world code. On modern high-end CPUs such as found in PC and smartphones, benchmarks of CPU-intensive code is often very sensitive to cache effects and the results can depend on what else is running on the system, on the exact CPU model (due to different cache sizes and layouts), on the address at which the code happens to be loaded, etc.

Loss answered 8/10, 2016 at 20:6 Comment(2)
@Giles This is exactly what I needed. It has resolution of upto 1 us as compared to the clock function which had a resolution of 100 ms. But do you know if this code is portable. I need this to run on a ARM M0 system. Is there a way I could make this code portable?Delius
@Delius If you need something beyond clock then it won't be as portable, you will create a dependency on either the operating system or the CPU or both. Check what your OS provides. If you're running on bare metal, if you want close to 1µs accuracy then you'll need a cycle-accurate counter, check what debugging functionality is present on your CPU (I think it's an optional feature). If you don't need that much accuracy then you can use the systick timer which is optional but widespread.Sacramentarian

© 2022 - 2024 — McMap. All rights reserved.