SSD vs. tmpfs speed
Asked Answered
A

1

6

I made a tmpfs filesystem in my home directory on Ubuntu using this command:

$ mount -t tmpfs -o size=1G,nr_inodes=10k,mode=0777 tmpfs space
$ df -h space .
File system                  Size    Used Avail. Avail% Mounted at
tmpfs                        1,0G    100M  925M   10%   /home/user/space
/dev/mapper/ubuntu--vg-root  914G    373G  495G   43%   /

Then I wrote this Python program:

#!/usr/bin/env python3

import time
import pickle


def f(fn):
    start = time.time()
    with open(fn, "rb") as fh:
        data = pickle.load(fh)
    end = time.time()
    print(str(end - start) + "s")
    return data


obj = list(map(str, range(10 * 1024 * 1024)))  # approx. 100M


def l(fn):
    with open(fn, "wb") as fh:
        pickle.dump(obj, fh)


print("Dump obj.pkl")
l("obj.pkl")
print("Dump space/obj.pkl")
l("space/obj.pkl")

_ = f("obj.pkl")
_ = f("space/obj.pkl")

The result:

Dump obj.pkl
Dump space/obj.pkl
0.6715312004089355s
0.6940639019012451s

I am confused about this result. Isn't the tmpfs a file system based on RAM and isn't RAM supposed to be notably faster than any hard disk, including SSDs?

Furthermore, I noticed that this program is using over 15GB of RAM when I increase the target file size to approx. 1 GB.

How can this be explained?

The background of this experiment is that I am trying to find alternative caching locations to the hard disk and Redis that are faster and available to multiple worker processes.

Agglutinogen answered 25/9, 2020 at 14:42 Comment(10)
Wouldn't you use cpickle if in a hurry?Inurn
More of a discussion point than an answer; sorry about the formatting this inflicts. I created a tmpsfs using the same means as you (with the same name under my home, space). $ time dd if=/dev/zero of=space/test.img bs=1048576 count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.0231555 s, 4.5 GB/s real 0m0.030s user 0m0.000s sys 0m0.030sEpimorphosis
And to SSD: $ time dd if=/dev/zero of=test.img bs=1048576 count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.165582 s, 633 MB/s real 0m0.178s user 0m0.000s sys 0m0.060sEpimorphosis
Could be python responsible for the time, not the FS/medium of choice. 0m0.030s vs 0m0.178s ... seems like a clear winner for tmpfs ...Epimorphosis
@Epimorphosis Yes, I can replicate your observations, so probably a Python issue. I would speculate, that maybe it's the reconstruction of the Python data structure that takes most of the time into account, so that the short read times do not alter the total time notably.Rateable
@MarkSetchell Using _pickle instead of pickle does not make any difference to the final time measurements. A library called cpickle apparently does not exist in Python3.Rateable
Glad that's settled, then ;)Epimorphosis
Would still be curious of the actual reasons though.Rateable
@Green - new question? Timing of pickling in Python?Epimorphosis
ok, thanks for your contribution!Rateable
E
2

Answer flowing on from comments:

The time elapsed seems to be a python thing, rather than the media of choice.

In a similar set-up (SSD vs tmpfs) using OS commands on Linux the speed difference in writing a 100MB file is notable:

To tmpfs:

$ time dd if=/dev/zero of=space/test.img bs=1048576 count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0231555 s, 4.5 GB/s

real    0m0.030s
user    0m0.000s
sys 0m0.030s

To SSD:

$ time dd if=/dev/zero of=test.img bs=1048576 count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.165582 s, 633 MB/s

real    0m0.178s
user    0m0.000s
sys 0m0.060s
Epimorphosis answered 26/9, 2020 at 4:14 Comment(3)
You wrote 100 MiB, not 100 MB :) de.wikipedia.org/wiki/Byte#VergleichstabelleIndulgent
@Indulgent -und auch noch in Deutsch! Danke ;)Epimorphosis
Yepp, the german article has a nice table which the english version lacks!Indulgent

© 2022 - 2024 — McMap. All rights reserved.