std::chrono now() induces very significant drift with Ubuntu 24.04 under WSL2
Asked Answered
M

2

5

I've got an algorithm I've used for years to emulate a real-time clock in a non-real-time environment. It's always worked exceptionally well until used on Ubuntu 24.04 in WSL2 in Windows 11. The logic predicts in system clock time when each next time-step should occur and repeatedly calls chrono's now() in a loop until that point in time is reached, then continues with execution of the next frame. When set up in this environment, now()'s return value will occasionally jump a significant amount forward in time (e.g., ~25 seconds) between successive calls. Simply observing the application run reveals that its not a matter of now() taking ~25 seconds to return a value as the whole app will complete in ~5 seconds when it should have taken ~30. The implication is that calling now() has actually altered the clock's time. To verify this, I wrapped my sample code in a bash script where I could monitor and resync WSL's clock ... now() is indeed altering the clock's time.

The sample code has been built in this environment with both g++ and clang++, C++17 and C++20. Same effect observed in all four cases.

The sample code has been observed to work properly with no alteration to clock's time on a machine running Ubuntu-24.04 (older h/w) and under Windows11/WSL2/Ubuntu-22.04 (identical h/w)

I've found countless posts discussing a Linux clock drift problem when running under WSL. Most claim to have 'fixed' the problem by resyncing with the h/w clock. This does not 'fix' the drift problem, it merely resets it. There is clearly a bug somewhere in the system, could this be related? Is now() in some WSL configurations affecting the time and is my extreme case of hammering now() in a loop simply pronouncing the problem. How do I fix this?

reproducible example code:

#include <iostream>
#include <chrono>
#include <ctime>
#include <iomanip>
#include <sstream>

void print(std::chrono::duration<long double> elapsed_seconds, 
    std::chrono::_V2::system_clock::time_point, 
    std::chrono::duration<long double> error);

int main() 
{
    std::cout << "commanded_elapsed_time, corresponding_measured_time_points, error" << std::endl; 
    using namespace std::chrono_literals;
    auto start_tp = std::chrono::system_clock::now();
    auto dt = 1.0s;
    auto elapsed = 0.0s;
    auto next_tp = start_tp + elapsed;
    std::chrono::_V2::system_clock::time_point now;
    print(elapsed, start_tp, start_tp - next_tp);
    for (int t = 1; t < 30; t++)
    {
        elapsed = dt * t;
        next_tp = start_tp + elapsed;
        while((now = std::chrono::system_clock::now()) < next_tp) {}
        print(elapsed, now, now - next_tp);
    }
    return 0;
}

void print(
    std::chrono::duration<long double> elapsed_seconds, 
    std::chrono::_V2::system_clock::time_point tp, 
    std::chrono::duration<long double> error)
{
    std::time_t currentTime = std::chrono::system_clock::to_time_t(tp);
    std::tm* localTime = std::localtime(&currentTime);
    std::stringstream ss;
    ss << std::fixed << std::setprecision(3) << std::setfill('0') << std::setw(6) << elapsed_seconds.count() << ", ";
    ss << std::put_time(localTime, "%H:%M:%S");
    auto nowInMicroseconds = std::chrono::time_point_cast<std::chrono::microseconds>(tp);
    auto epoch = nowInMicroseconds.time_since_epoch();
    auto microseconds = std::chrono::duration_cast<std::chrono::microseconds>(epoch).count() % 1000000;
    ss << '.' << std::setfill('0') << std::setw(9) << microseconds * 1000 << ", ";
    ss << std::fixed << std::setprecision(9) << error.count();
    std::cout << ss.str() << std::endl;
}

test script:

#! /bin/bash
TRUE=0
FALSE=1
using_wsl=$FALSE
echo "kernel name:........" $(uname -s)
echo "kernel release:....." $(uname -r)
echo "kernel version:....." $(uname -v)
echo "machine:............" $(uname -m)
echo "processor:.........." $(uname -p)
echo "hardware platform..." $(uname -i)
echo "operating system...." $(uname -o)
if [ "$(systemd-detect-virt)" = "wsl" ]; then
    echo "running in wsl...... yes"
    using_wsl=$TRUE
    if ! command -v hwclock >/dev/null 2>&1; then
        echo ""
        echo "When running this script within WSL, the 'hwclock'"
        echo "command is used to sync the WSL clock with the"
        echo "host system's clock. 'hwclock' is not found on"
        echo "this system. Install and rerun test."
        exit 1
    fi
else
    echo "running in wsl...... no"
    using_wsl=$FALSE
fi
echo ""
if [ $using_wsl -eq $TRUE ]; then
    echo "WSL time before syncing with host...:" $(date +"%T.%6N")
    echo "syncing WSL clock with host ..."
    sudo hwclock -s
fi
start_time=$(date +"%s.%N")
echo "test system start time..............: $(date -d "@$start_time" +"%T.%6N")"
echo "test starting ..."
echo ""
./zero_drift_dt
end_test_time=$(date +"%s.%N")
echo ""
echo "... test complete"
echo "test system finish time............................: $(date -d "@$end_test_time" +"%T.%6N")"
if [ $using_wsl -eq $TRUE ]; then
    sudo hwclock -s
    echo "WSL time after syncing with host (again)...........:" $(date +"%T.%6N")
fi
final_time=$(date +"%s.%N")
expected_duration=$(awk "BEGIN {print $end_test_time - $start_time}")
actual_duration=$(awk "BEGIN {print $final_time - $start_time}")
error_duration=$(awk "BEGIN {print $end_test_time - $final_time}")
echo "Expected approximate test duration (s).............:" $expected_duration
echo "Measured test duration (s).........................:" $actual_duration
echo "During this test, now() moved clock forward (s)....:" $error_duration

Results on problem system:

~/Projects/proving_grounds/now_error$ sudo ./test.sh
[sudo] password for blumert:
kernel name:........ Linux
kernel release:..... 5.15.153.1-microsoft-standard-WSL2
kernel version:..... #1 SMP Fri Mar 29 23:14:13 UTC 2024
machine:............ x86_64
processor:.......... x86_64
hardware platform... x86_64
operating system.... GNU/Linux
running in wsl...... yes

WSL time before syncing with host...: 13:39:30.320309
syncing WSL clock with host ...
test system start time..............: 13:39:31.001844
test starting ...

commanded_elapsed_time, corresponding_measured_time_points, error
00.000, 13:39:31.003072000, 0.000000000
01.000, 13:39:32.003072000, 0.000000070
02.000, 13:39:52.464335000, 19.461263572
03.000, 13:39:52.464537000, 18.461465685
04.000, 13:39:52.464569000, 17.461497372
05.000, 13:39:52.464573000, 16.461501714
06.000, 13:39:52.464575000, 15.461503561
07.000, 13:39:52.464577000, 14.461505319
08.000, 13:39:52.464579000, 13.461506932
09.000, 13:39:52.464580000, 12.461508617
10.000, 13:39:52.464582000, 11.461510324
11.000, 13:39:52.464584000, 10.461512480
12.000, 13:39:52.464586000, 9.461514319
13.000, 13:39:52.464588000, 8.461515990
14.000, 13:39:52.464589000, 7.461517739
15.000, 13:39:52.464591000, 6.461519320
16.000, 13:39:52.464593000, 5.461520922
17.000, 13:39:52.464594000, 4.461522511
18.000, 13:39:52.464596000, 3.461524055
19.000, 13:39:52.464597000, 2.461525592
20.000, 13:39:52.464599000, 1.461527167
21.000, 13:39:52.464600000, 0.461528720
22.000, 13:39:57.473118000, 4.470046550
23.000, 13:39:57.473178000, 3.470106269
24.000, 13:39:57.473182000, 2.470110610
25.000, 13:39:57.473184000, 1.470112218
26.000, 13:39:57.473185000, 0.470113725
27.000, 13:40:02.476388000, 4.473316514
28.000, 13:40:02.476452000, 3.473380680
29.000, 13:40:02.476457000, 2.473385528

... test complete
test system finish time............................: 13:40:02.477504
WSL time after syncing with host (again)...........: 13:39:43.002630
Expected approximate test duration (s).............: 31.4757
Measured test duration (s).........................: 12.0018
During this test, now() moved clock forward (s)....: 19.4738
  • Note that when the clock at the end of the test, it reset backward in time. If the system is left alone for awhile, the first resync will exhibit this behavior as well.
  • By its defined purpose, the app should have taken slightly more than 30 seconds to execute. As measured with respect to resynced times, it only took 12.
  • In a perfect run (as in all other environments described above), the corresponding_measured_time_points progress in exact 1 second increments, and the error is rarely more than a few 10 nanoseconds (i.e., like the first two samples).
  • Every run is a bit different in that the first big jump can seemingly happen anytime in the first 10 seconds or so, but usually occur in the first 3.
Millihenry answered 21/9, 2024 at 19:13 Comment(1)
std::chrono::_V2::system_clock::time_point. why _V2? It's both implementation only, and version manipulation. Can it be a problem? also does steady_clock behave similar or it's just system_clock?Giannini
E
5

System clock is the wrong type to use, as it can (and does) vary. From cppreference (emphasis mine):

Class std::chrono::system_clock represents the system-wide real time wall clock.

It may not be monotonic: on most systems, the system time can be adjusted at any moment

You want to use steady_clock

Class std::chrono::steady_clock represents a monotonic clock. The time points of this clock cannot decrease as physical time moves forward and the time between ticks of this clock is constant. This clock is not related to wall clock time (for example, it can be time since last reboot), and is most suitable for measuring intervals

Eunuchize answered 21/9, 2024 at 19:31 Comment(5)
I totally agree, system_clock is indeed not monotonic (it can even go backward in time when the system synchronizes with an NTP server).Glosso
And it's not just WSL2, virtual machines can also "appear" to have very strange timing issues with system clock. (If the VM doesn't get any scheduling time, the wall clock just jumps seconds ahead in a few instructions executed)Glosso
@PepijnKramer wsl is pretty much a lightweight vm and uses some of hyperv in it's implementation (presumably including the clock emulsion)Domett
And, in general, when use "clocks" in a way that requires some sort of accuracy, ALWAYS design your implementation under the assumption that the clock can jump ahead, and that your ticks aren't precise.Syngamy
@ChrisMM, thanks, this fixed the issue for my application. However, I still find it curious as to why every time I run the app as presented, with system_clock, the system time advances ~25 seconds. Being able to predictably alter system time by calling system_clock::now() just doesn't seem right.Millihenry
M
1

While steady_clock is probably the more appropriate clock to use for this application, it is not the major part of the described problem. The erratic behavior is a combination of using system_clock with the system clock not synchronized (with host) and with local NTP service active. Ideally, the WSL/Ubuntu-24.04 would not be syncing with the NTP service but rather would rely on the Windows11 host to do that and simply sync with the host. The fix:

Check current status:

$ timedatectl status
               Local time: Mon 2024-09-23 10:13:11 CDT
           Universal time: Mon 2024-09-23 15:13:11 UTC
                 RTC time: Mon 2024-09-23 15:13:10
                Time zone: America/Chicago (CDT, -0500)
System clock synchronized: no
              NTP service: active
          RTC in local TZ: no

Enable system clock syncing. Add or modify the following lines in /etc/wsl.conf:

[interop]
enabled=true
appendWindowsPath=true

[automount]
enabled=true
options="metadata"

[boot]
systemd=true

Exit Linux, restart WSL instance:

$ exit
> wsl.exe --shutdown
> wsl ~

Set the NTP service to inactive:

sudo systemctl stop systemd-timesyncd
sudo systemctl disable systemd-timesyncd

Check status again:

$ timedatectl status
               Local time: Mon 2024-09-23 10:28:05 CDT
           Universal time: Mon 2024-09-23 15:28:05 UTC
                 RTC time: Mon 2024-09-23 15:28:06
                Time zone: America/Chicago (CDT, -0500)
System clock synchronized: yes
              NTP service: inactive
          RTC in local TZ: no

Results of posted test after this fix:

kernel name:........ Linux
kernel release:..... 5.15.153.1-microsoft-standard-WSL2
kernel version:..... #1 SMP Fri Mar 29 23:14:13 UTC 2024
machine:............ x86_64
processor:.......... x86_64
hardware platform... x86_64
operating system.... GNU/Linux
running in wsl...... yes

WSL time before syncing with host...: 10:37:01.290106
syncing WSL clock with host ...
test system start time..............: 10:37:02.004335
test starting ...

commanded_elapsed_time, corresponding_measured_time_points, error
00.000, 0:11:39.930802312, 0.000000000
01.000, 0:11:40.930802335, 0.000000023
02.000, 0:11:41.930802336, 0.000000024
03.000, 0:11:42.930802314, 0.000000002
04.000, 0:11:43.930802313, 0.000000001
05.000, 0:11:44.930802328, 0.000000016
06.000, 0:11:45.930802326, 0.000000014
07.000, 0:11:46.930802323, 0.000000011
08.000, 0:11:47.930802339, 0.000000027
09.000, 0:11:48.930802339, 0.000000027
10.000, 0:11:49.930802333, 0.000000021
11.000, 0:11:50.930802318, 0.000000006
12.000, 0:11:51.930802333, 0.000000021
13.000, 0:11:52.930802335, 0.000000023
14.000, 0:11:53.930802328, 0.000000016
15.000, 0:11:54.930802315, 0.000000003
16.000, 0:11:55.930802317, 0.000000005
17.000, 0:11:56.930802324, 0.000000012
18.000, 0:11:57.930802327, 0.000000015
19.000, 0:11:58.930802327, 0.000000015
20.000, 0:11:59.930802335, 0.000000023
21.000, 0:12:00.930802328, 0.000000016
22.000, 0:12:01.930802317, 0.000000005
23.000, 0:12:02.930802343, 0.000000031
24.000, 0:12:03.930802349, 0.000000037
25.000, 0:12:04.930802328, 0.000000016
26.000, 0:12:05.930804110, 0.000001798
27.000, 0:12:06.930802334, 0.000000022
28.000, 0:12:07.930802328, 0.000000016
29.000, 0:12:08.930802314, 0.000000002

... test complete
test system finish time............................: 10:37:31.009184
WSL time after syncing with host (again)...........: 10:37:32.001678
Expected approximate test duration (s).............: 29.0048
Measured test duration (s).........................: 29.998
During this test, now() moved clock forward (s)....: -0.993178

Note that the last three values (after test complete) do not account for the time required to start up and shutdown the app, which is likely where the .99s difference is coming from.

Also note that the 26 s sample shows a relatively larger error than the rest. This is the magnitude of error I would expect from using system_clock over steady_clock.

Millihenry answered 23/9, 2024 at 16:27 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.