def load_pickle(file: int) -> dict: with open(f"D:/data/batched/{file}.pickle", "rb") as handle: return pickle.load(handle) for i in range(0, 9): print(f"\nIteration {i}") start_time = time.time() file = None print(f"Unloaded file in {time.time() - start_time:.2f} seconds") start_time = time.time() file = load_pickle(0) print(f"Loaded file in {time.time() - start_time:.2f} seconds")

Iteration 0 Unloaded file in 0.00 seconds Loaded file in 18.80 seconds Iteration 1 Unloaded file in 14.78 seconds Loaded file in 30.51 seconds Iteration 2 Unloaded file in 28.67 seconds Loaded file in 30.21 seconds Iteration 3 Unloaded file in 35.38 seconds Loaded file in 40.25 seconds Iteration 4 Unloaded file in 39.91 seconds Loaded file in 41.24 seconds Iteration 5 Unloaded file in 43.25 seconds Loaded file in 45.57 seconds Iteration 6 Unloaded file in 46.94 seconds Loaded file in 48.19 seconds Iteration 7 Unloaded file in 51.67 seconds Loaded file in 51.32 seconds Iteration 8 Unloaded file in 55.25 seconds Loaded file in 56.11 seconds

I'd love to know the reason for this slow down as well as I have encountered it with a similar task. Note that I can only verify the issue on Windows 11 but not on Linux (identical hardware spec for each). I 'solved' it with using h5py instead of pickle.

My task is to read millions of numpy images and get a region dynamically. The application dictates that the images are stored in batches of about 3000 to 6000 in files.

pickle

imagesDict = {i: np.random.randint(0, 255, (300, 300), dtype=np.uint8) for i in range(4000)}
with open(filePath, 'wb') as file:
    pickle.dump(imagesDict, file, pickle.HIGHEST_PROTOCOL)


thumbs = []
num_image_sets = 0
durations_s_sum = 0.
for i in range(500):
    start_s = time.perf_counter()
    with open(filePath, 'rb') as file:
        imagesDict: dict[int, np.ndarray] = pickle.load(file)
        for key in imagesDict.keys():
            image = imagesDict[key]
            thumb = image[:50, :50].copy()
            thumbs.append(thumb)

    durations_s_sum += (time.perf_counter() - start_s)
    num_image_sets += 1
    if 50 <= num_image_sets:
        memory_info = psutil.Process(os.getpid()).memory_info()
        print(f"{durations_s_sum:4.1f}s for 50 image sets of 4000 images, rss={memory_info.rss/1024/1024:6,.0f}MB, vms={memory_info.vms/1024/1024:6,.0f}MB")
        durations_s_sum = 0.
        num_image_sets = 0

The speed of pickle.load() slows down with every iteration, quickly getting to an unacceptable level:

10.6s for 50 image sets of 4000 images, rss= 1,575MB, vms= 1,579MB
10.0s for 50 image sets of 4000 images, rss= 2,117MB, vms= 2,134MB
11.5s for 50 image sets of 4000 images, rss= 2,632MB, vms= 2,662MB
14.2s for 50 image sets of 4000 images, rss= 3,150MB, vms= 3,193MB
16.3s for 50 image sets of 4000 images, rss= 3,670MB, vms= 3,726MB
19.1s for 50 image sets of 4000 images, rss= 4,212MB, vms= 4,280MB
22.6s for 50 image sets of 4000 images, rss= 4,746MB, vms= 4,824MB
25.4s for 50 image sets of 4000 images, rss= 5,276MB, vms= 5,367MB
29.2s for 50 image sets of 4000 images, rss= 5,817MB, vms= 5,919MB
35.3s for 50 image sets of 4000 images, rss= 6,360MB, vms= 6,472MB

h5py

with h5py.File(filePath, 'w') as h5:
    for i in range(4000):
        image = np.random.randint(0, 255, (300, 300), dtype=np.uint8)
        h5.create_dataset(str(i), data=image)

thumbs = []
num_image_sets = 0
durations_s_sum = 0.
for i in range(500):
    start_s = time.perf_counter()
    with h5py.File(filePath, "r") as h5:
        for key in h5.keys():
            image = h5[key]
            thumb = image[:50, :50]
            thumbs.append(thumb)

    durations_s_sum += (time.perf_counter() - start_s)
    num_image_sets += 1
    if 50 <= num_image_sets:
        memory_info = psutil.Process(os.getpid()).memory_info()
        print(f"{durations_s_sum:4.1f}s for 50 image sets of 4000 images, rss={memory_info.rss/1024/1024:6,.0f}MB, vms={memory_info.vms/1024/1024:6,.0f}MB")
        durations_s_sum = 0.
        num_image_sets = 0

h5py is slower but the duration is almost constant at about 19s, so it wins over time:

20.3s for 50 image sets of 4000 images, rss=   646MB, vms=   637MB
20.3s for 50 image sets of 4000 images, rss= 1,166MB, vms= 1,167MB
19.7s for 50 image sets of 4000 images, rss= 1,685MB, vms= 1,697MB
19.4s for 50 image sets of 4000 images, rss= 2,208MB, vms= 2,229MB
19.7s for 50 image sets of 4000 images, rss= 2,731MB, vms= 2,764MB
19.8s for 50 image sets of 4000 images, rss= 3,255MB, vms= 3,298MB
19.4s for 50 image sets of 4000 images, rss= 3,778MB, vms= 3,832MB
19.9s for 50 image sets of 4000 images, rss= 4,303MB, vms= 4,366MB
19.6s for 50 image sets of 4000 images, rss= 4,826MB, vms= 4,899MB
19.9s for 50 image sets of 4000 images, rss= 5,349MB, vms= 5,434MB

Also, if memory fragmentation was the issue, why does h5py not show a similar behaviour?

pickle

h5py

Recommended topics

Hot tags