Meaning of the benchmark variables in snakemake
Asked Answered
B

2

15

I included a benchmark directive to some of the rules in my snakemake workflow, and the resulting files have the following header:

s   h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load

The only documentation I've found mentions a "benchmark txt file (which will contain a tab-separated table of run times and memory usage in MiB)".

I can guess that columns 1 and 2 are two different ways of displaying the time taken to execute the rule (in seconds, and converted to hours, minutes and seconds).

io_in and io_out likely related to disk read and write activity, but in what units are they measured?

What are the others? Is this documented somewhere?

Edit: Looking at the source code

I've found the following piece of code in /snakemake/benchmark.py, that might well be where the benchmark data come from:

def _update_record(self):
    """Perform the actual measurement"""
    # Memory measurements
    rss, vms, uss, pss = 0, 0, 0, 0
    # I/O measurements
    io_in, io_out = 0, 0
    # CPU seconds
    cpu_seconds = 0
    # Iterate over process and all children
    try:
        main = psutil.Process(self.pid)
        this_time = time.time()
        for proc in chain((main,), main.children(recursive=True)):
            meminfo = proc.memory_full_info()
            rss += meminfo.rss
            vms += meminfo.vms
            uss += meminfo.uss
            pss += meminfo.pss
            ioinfo = proc.io_counters()
            io_in += ioinfo.read_bytes
            io_out += ioinfo.write_bytes
            if self.bench_record.prev_time:
                cpu_seconds += proc.cpu_percent() / 100 * (
                    this_time - self.bench_record.prev_time)
        self.bench_record.prev_time = this_time
        if not self.bench_record.first_time:
            self.bench_record.prev_time = this_time
        rss /= 1024 * 1024
        vms /= 1024 * 1024
        uss /= 1024 * 1024
        pss /= 1024 * 1024
        io_in /= 1024 * 1024
        io_out /= 1024 * 1024
    except psutil.Error as e:
        return
    # Update benchmark record's RSS and VMS
    self.bench_record.max_rss = max(self.bench_record.max_rss or 0, rss)
    self.bench_record.max_vms = max(self.bench_record.max_vms or 0, vms)
    self.bench_record.max_uss = max(self.bench_record.max_uss or 0, uss)
    self.bench_record.max_pss = max(self.bench_record.max_pss or 0, pss)
    self.bench_record.io_in = io_in
    self.bench_record.io_out = io_out
    self.bench_record.cpu_seconds += cpu_seconds

So this seems to come from functionalities provided by psutil.

Beasley answered 18/10, 2017 at 15:3 Comment(0)
A
7

Benchmarking in snakemake could certainly be better documented, but psutil is documanted here:

get_memory_info()
Return a tuple representing RSS (Resident Set Size) and VMS (Virtual Memory Size) in bytes.
On UNIX RSS and VMS are the same values shown by ps. 
On Windows RSS and VMS refer to "Mem Usage" and "VM Size" columns of taskmgr.exe.

psutil.disk_io_counters(perdisk=False)

Return system disk I/O statistics as a namedtuple including the following attributes:
    read_count: number of reads
    write_count: number of writes
    read_bytes: number of bytes read
    write_bytes: number of bytes written
    read_time: time spent reading from disk (in milliseconds)
    write_time: time spent writing to disk (in milliseconds)

The code you found confirms that all the memory usage and IO counts are reported in MB (= bytes * 1024 * 1024).

Alkalinity answered 9/11, 2017 at 11:56 Comment(0)
P
14

I will just leave this here for future reference.

Reading through

as previously suggested:

colname type (unit) description
s float (seconds) Running time in seconds
h:m:s string (-) Running time in hour, minutes, seconds format
max_rss float (MB) Maximum "Resident Set Size”, this is the non-swapped physical memory a process has used.
max_vms float (MB) Maximum “Virtual Memory Size”, this is the total amount of virtual memory used by the process
max_uss float (MB) “Unique Set Size”, this is the memory which is unique to a process and which would be freed if the process was terminated right now.
max_pss float (MB) “Proportional Set Size”, is the amount of memory shared with other processes, accounted in a way that the amount is divided evenly between the processes that share it (Linux only)
io_in float (MB) the number of MB read (cumulative).
io_out float (MB) the number of MB written (cumulative).
mean_load float (-) CPU usage over time, divided by the total running time (first row)
cpu_time float(-) CPU time summed for user and system
Pantomimist answered 30/3, 2021 at 14:9 Comment(3)
Huh, how come I'm having values such as 0.02 in io_out? Number of operations should be an int, shouldn't it?Musical
According to @heathobrien answer, io_in and io_out correspond to "number of MB read" and "number of MB written", so they are also in MB.Ranita
You are correct, I read some stuff too literally from the docs. I edited the original to reflect the correct measurements.Pantomimist
A
7

Benchmarking in snakemake could certainly be better documented, but psutil is documanted here:

get_memory_info()
Return a tuple representing RSS (Resident Set Size) and VMS (Virtual Memory Size) in bytes.
On UNIX RSS and VMS are the same values shown by ps. 
On Windows RSS and VMS refer to "Mem Usage" and "VM Size" columns of taskmgr.exe.

psutil.disk_io_counters(perdisk=False)

Return system disk I/O statistics as a namedtuple including the following attributes:
    read_count: number of reads
    write_count: number of writes
    read_bytes: number of bytes read
    write_bytes: number of bytes written
    read_time: time spent reading from disk (in milliseconds)
    write_time: time spent writing to disk (in milliseconds)

The code you found confirms that all the memory usage and IO counts are reported in MB (= bytes * 1024 * 1024).

Alkalinity answered 9/11, 2017 at 11:56 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.