I am trying to profile a code for execution time on an x86-64 processor. I am referring to this Intel white paper and also gone through other SO threads discussing the topic of using RDTSCP vs CPUID+RDTSC here and here.
In the above mentioned whitepaper, the method using CPUID+RDTSC is termed unreliable and also proven using the statistics.
What might be the reason for the CPUID+RDTSC being unreliable?
Also, the graphs in Figure 1(Minimum value Behavior graph) and Figure 2 (Variance Behavior graph) in the same white paper have got a "Square wave" pattern. What explains such a pattern?