Saving a dataset with ImagingCore objects pickle
Asked Answered
D

1

0

I have created a dataset mapping images to their labels for a ml project. I want to save the dataset on a script and load it in another one to save time every time I rerun the code for the model. The output from the dataset gives me a list of tuples containing an ImagingCore object and 2 string type objects. The output of print(dataset) is

[(<ImagingCore object at 0x12091a825>, 'R', 'FIRST'),(<ImagingCore object at 0x12091a850>, 'L', 'THIRD'),...]

Where the second element of each tuple is either left or right, and the third goes from first to fourth.

I have tried to save the dataset with pickle, json, dill, hickle, using the structure

with open('data.pickle', 'wb') as f:
     pickle.dump(dataset, f)

but I always get the same error:

TypeError: cannot pickle 'ImagingCore' object

Can someone help? This has been literally driving me crazy for weeks

Descender answered 28/2, 2021 at 1:39 Comment(2)
Please share your code/data structure.Blocking
@TenaciousB I have added my code and the dataset outputDescender
V
0

That error tells me PIL's objects aren't pickleable. They're native objects (likley C instead of python code), so it makes sense that they don't automatically support pickling. You could either pickle a different set of data (a proxy object) or use a custom Pickler to support pickling their data.

See the Pickleable Image Object question for how to pull data out of PIL.Image that can be pickled.

Persistence of External Objects in the pickle docs shows how to write your own pickler. Looks like the code should be something like this (untested):

import pickle
from PIL import Image  # Assuming Pillow
from collections import namedtuple

# Proxy the PIL.Image by storing the bytes.
ImageProxy = namedtuple("ImageProxy", "pixels, size, mode")


class PilPickler(pickle.Pickler):
    def persistent_id(self, obj):
        if isinstance(obj, Image):
            # Create our proxy that (I think) will get pickled in lieu of a PIL object.
            return ImageProxy(
                pixels=obj.tobytes(),
                size=obj.size,
                mode=obj.mode,
            )
        else:
            # Fallback to default pickle.
            return None


class PilUnpickler(pickle.Unpickler):
    def persistent_load(self, pid):
        # pid is the object returned by PilPickler.
        if isinstance(pid, ImageProxy):
            return Image.frombytes(pid.mode, pid.size, pid.pixels)
        else:
            # Always raise an error if you cannot return the correct object.
            raise pickle.UnpicklingError("unsupported persistent object")


def main():
    import io
    import pprint

    images = []  # [... make images here ...]

    # Save the records using our custom PilPickler.
    file = io.BytesIO()
    PilPickler(file).dump(images)

    print("Pickled records:")
    pprint.pprint(images)

    # Load the records from the pickle data stream.
    file.seek(0)
    images = PilUnpickler(file).load()

    print("Unpickled records:")
    pprint.pprint(images)


if __name__ == "__main__":
    main()

I think your ImagingCore are the native part of PIL.Image, so this custom Pickler solution allows you to keep your existing setup and any PIL.Images inside your data structures will get pickled. If you are using other PIL objects, you may need to add support for them with more isinstance checks and proxy objects.

Vivanvivarium answered 12/2 at 19:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.