Massive time loss in simple for loop
Asked Answered
S

1

5

I have an extremely simple program to measure how much time a function is taking.

#include <iostream>
#include <vector>
#include <chrono>
struct Foo
{
    void addSample(uint64_t s)
    {
    }
};

void test(const std::vector<uint64_t>& samples)
{
    uint32_t onlyCallTime = 0;
    uint32_t loopOnlyTime = 0;
    Foo stats;
    std::chrono::high_resolution_clock::time_point callStart,callEnd;
    auto start = callStart = callEnd = std::chrono::high_resolution_clock::now();
    for(auto &s : samples)
    {
        callStart = std::chrono::high_resolution_clock::now();
        loopOnlyTime += std::chrono::duration_cast<std::chrono::microseconds>(callStart-callEnd).count();
        stats.addSample(s);
        callEnd = std::chrono::high_resolution_clock::now();
        onlyCallTime += std::chrono::duration_cast<std::chrono::microseconds>(callEnd-callStart).count();
    }
    auto end = std::chrono::high_resolution_clock::now();

    std::cout << "overall duration: " << std::chrono::duration_cast<std::chrono::microseconds>(end-start).count() << std::endl;
    std::cout << "only call duration: " << onlyCallTime << std::endl;
    std::cout << "only loop duration: " << loopOnlyTime << std::endl;
}

int main()
{
    std::vector<uint64_t> dataSetDecreasing;
    for(uint32_t i = 0; i < 1000000; ++i)
        dataSetDecreasing.push_back(1000000-i);
    test(dataSetDecreasing);
}

The output is really confusing. Here are some examples:

overall duration: 56047
only call duration: 195
only loop duration: 285
overall duration: 40984
only call duration: 177
only loop duration: 243
overall duration: 47328
only call duration: 187
only loop duration: 177

How I see it is that callEnd-callStart captures the call to addSample + the duration_cast.
callStart-callEnd captures everything else, so the loop initialization, iteration, condition and the second duration_cast. What am I missing? Where are the other ~40000 microseconds going?

Compiled with g++ -Wall -Wextra -std=c++17 -O3
g++ (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1)
My OS is Fedora release 32 (Thirty Two)

Speight answered 8/9, 2020 at 17:46 Comment(5)
How are you compiling this (compiler version and options), and what OS and hardware? Could lazy dynamic linking account for the extra overhead in the outer timed region? Page faults if you're touching new memory? No, you're touching memory before the timed region. Or maybe some kind of sample granularity issue, I didn't look at the code very carefully. What's the overall time for the whole process, like with perf stat to report it about as precisely as the OS measures it.Brahui
You forgot to measure the duration of the chrono calls.Reign
@PeterCordes Added the compiler, options and OS.Speight
Niote: Timing addSample is pointless. At O3 compiler should be optimizing it out. You will have to add some observable effect for the function to make it to the executable.Poetry
@UlrickEckhardt either way the individual times should add up to the total. The lines measured are (almost) identicalVend
I
12

1 000 000 iterations took roughly 50 000 microseconds. That is well under 1μs per iteration on average and std::chrono::duration_cast<std::chrono::microseconds> will round any time less than 1μs to 0. Meaning your loop only counts the iterations that took way longer than average for some reason (scheduling, page faults, caches probably).

Since each measurement has a basic error independent of the duration measured (plus other errors), taking many small measurements and adding them up will be so much less precise than measuring the whole duration once.

Immaterialize answered 8/9, 2020 at 17:59 Comment(2)
Good catch. Running it with nanoseconds instead adds up almost exactlySpeight
@JeJo: 1μs isn't code, it's a real interval of time. Code-formatting is not necessary or even appropriate. We're not talking about those symbols appearing as part of a program. Physical quantities in general like 1 cm aren't code and shouldn't be code-formatted. Using C++ number formatting for the iteration count is optional but not a real improvement. (I commented in such detail for the benefit of your future edits, and to explain my rollback, not to call you out.)Brahui

© 2022 - 2024 — McMap. All rights reserved.