PyTorch - RuntimeError: [enforce fail at inline_container.cc:209] . file not found: archive/data.pkl
Asked Answered
P

6

16

Problem

I'm trying to load a file using PyTorch, but the error states archive/data.pkl does not exist.

Code

import torch
cachefile = 'cacheddata.pth'
torch.load(cachefile)

Output

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-8edf1f27a4bd> in <module>
      1 import torch
      2 cachefile = 'cacheddata.pth'
----> 3 torch.load(cachefile)

~/opt/anaconda3/envs/matching/lib/python3.8/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    582                     opened_file.seek(orig_position)
    583                     return torch.jit.load(opened_file)
--> 584                 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
    585         return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
    586 

~/opt/anaconda3/envs/matching/lib/python3.8/site-packages/torch/serialization.py in _load(zip_file, map_location, pickle_module, **pickle_load_args)
    837 
    838     # Load the data (which may in turn use `persistent_load` to load tensors)
--> 839     data_file = io.BytesIO(zip_file.get_record('data.pkl'))
    840     unpickler = pickle_module.Unpickler(data_file, **pickle_load_args)
    841     unpickler.persistent_load = persistent_load

RuntimeError: [enforce fail at inline_container.cc:209] . file not found: archive/data.pkl

Hypothesis

I'm guessing this has something to do with pickle, from the docs:

This save/load process uses the most intuitive syntax and involves the least amount of code. Saving a model in this way will save the entire module using Python’s pickle module. The disadvantage of this approach is that the serialized data is bound to the specific classes and the exact directory structure used when the model is saved. The reason for this is because pickle does not save the model class itself. Rather, it saves a path to the file containing the class, which is used during load time. Because of this, your code can break in various ways when used in other projects or after refactors.

Versions

  • PyTorch version: 1.6.0
  • Python version: 3.8.0
Pelion answered 5/10, 2020 at 9:37 Comment(0)
P
13

Turned out the file was somehow corrupted. After generating it again it loaded without issue.

Pelion answered 5/10, 2020 at 10:0 Comment(1)
I had this issue for exactly this reason: a common cause of this can be interrupting a model training process (Ctrl-C, killed by a job scheduler for running out of time, etc.) while you are in the middle of writing to the filesystem.Rancid
K
17

In my case, my disk drive was full. Clear some space and try again, deleting any partial files if necessary.

Kopp answered 28/4, 2021 at 6:49 Comment(2)
Actually, it was my issue!Aubree
This was super helpful, and it's not obvious from the error message that the root cause was a full hard drive.Zen
P
13

Turned out the file was somehow corrupted. After generating it again it loaded without issue.

Pelion answered 5/10, 2020 at 10:0 Comment(1)
I had this issue for exactly this reason: a common cause of this can be interrupting a model training process (Ctrl-C, killed by a job scheduler for running out of time, etc.) while you are in the middle of writing to the filesystem.Rancid
P
3

I was facing the same problem. I downloaded directly the model (.pt) trained with GPU from a notebook on GCP AI Platform. When I loaded it on local by torch.load('models/model.pt', map_location=device), I got this error:

RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory`.

I noticed that the size of the downloaded file is much smaller than expected. So same as @Ian, it turned out the file were corrupted when downloading from the notebook. Finally I had to transfer the file from the notebook into a bucket on Google Cloud Storage (GCS) at first instead of downloading it directly, then downloaded the file from GCS. It works now.

Paint answered 22/3, 2021 at 18:40 Comment(0)
D
2

I encountered this issue not for a single file, but consistently on any file I was dealing with. Looking at the file size, you could say they were corrupted, in the sense that they were too small and incomplete, but why were they always created that way?

I think the issue was that I had done a harmless modification to a simple class that I was saving. So like I made a class Foo, kept the data the same but added some method, then tried to save an older instance when I only had a newer class definition of Foo.

Here is an example of what I think happened, but it doesn't reproduce it exactly:

class Foo(object):
  def __init__(self):
    self.contents = [1,2,3]
    
torch.save(Foo(), "foo1.pth")

foo1 = torch.load("foo1.pth") # saved with class version 1 of Foo

# some days later the code looks like this
class Foo(object):
  def __init__(self):
    self.contents = [1,2,3]
  def __len__(self):
    return len(self.contents)

foo1 = torch.load("foo1.pth") # still works
torch.save(foo1, "foo2.pth") # try to save version 1 object where class is no longer known

The first time around I got an error like PicklingError: Can't pickle <class '__main__.Foo'>: it's not the same object as __main__.Foo, but when using Jupyter Notebook's autoreload feature it's hard to tell what exactly happened. Normally older classes can be loaded into newer class definitions without problems.

In any case of what really happened, my solution was to load the old version and manually copy over the data fields into a freshly instantiated version of Foo, like such:

old = torch.load("foo1.pth")
new = Foo()
# new = old # this was the code that caused issues
new.contents = old.contents
torch.save(new, "foo2.pth")
Delisadelisle answered 25/3, 2021 at 14:14 Comment(0)
A
1

In my case, the main reason for this error was the .pt file being corrupted. I started downloading the file when the file was still getting created.

So, in order to avoid the error, copy the .pt file in another directory and download the .pt file from that directory.

Antagonize answered 22/11, 2021 at 7:31 Comment(0)
S
0

why not trying cleaning your disk? that could be the most possiable reason.

Snowstorm answered 15/6, 2023 at 1:28 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Kneedeep

© 2022 - 2024 — McMap. All rights reserved.