NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported
Asked Answered
G

2

22

I try to load a dataset using the datasets python module in my local Python Notebook. I am running a Python 3.10.13 kernel as I do for my virtual environment.

I cannot load the datasets I am following from a tutorial. Here's the error:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
/Users/ari/Downloads/00-fine-tuning.ipynb Celda 2 line 3
      1 from datasets import load_dataset
----> 3 data = load_dataset(
      4     "jamescalam/agent-conversations-retrieval-tool",
      5     split="train"
      6 )
      7 data

File ~/Documents/fastapi_language_tutor/env/lib/python3.10/site-packages/datasets/load.py:2149, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, ignore_verifications, keep_in_memory, save_infos, revision, token, use_auth_token, task, streaming, num_proc, storage_options, **config_kwargs)
   2145 # Build dataset for splits
   2146 keep_in_memory = (
   2147     keep_in_memory if keep_in_memory is not None else is_small_dataset(builder_instance.info.dataset_size)
   2148 )
-> 2149 ds = builder_instance.as_dataset(split=split, verification_mode=verification_mode, in_memory=keep_in_memory)
   2150 # Rename and cast features to match task schema
   2151 if task is not None:
   2152     # To avoid issuing the same warning twice

File ~/Documents/fastapi_language_tutor/env/lib/python3.10/site-packages/datasets/builder.py:1173, in DatasetBuilder.as_dataset(self, split, run_post_process, verification_mode, ignore_verifications, in_memory)
   1171 is_local = not is_remote_filesystem(self._fs)
   1172 if not is_local:
-> 1173     raise NotImplementedError(f"Loading a dataset cached in a {type(self._fs).__name__} is not supported.")
   1174 if not os.path.exists(self._output_dir):
   1175     raise FileNotFoundError(
   1176         f"Dataset {self.dataset_name}: could not find data in {self._output_dir}. Please make sure to call "
   1177         "builder.download_and_prepare(), or use "
   1178         "datasets.load_dataset() before trying to access the Dataset object."
   1179     )

NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.

How do I resolve this? I don't understand how this error is applicable, given that the dataset is something I am fetching and thus cannot be cached in my LocalFileSystem in the first place.

Godunov answered 6/11, 2023 at 17:29 Comment(3)
run this pip install fsspec==2023.9.2 and then try againHeigho
@Heigho I tried but it didn't work. I am using Python 3.10.13. What's your Python kernel version ?Godunov
Reference Goku's answer. The issue has nothing to do with the Python version. its the fsspec version.Heigho
R
52

Try doing:

pip install -U datasets

This error stems from a breaking change in fsspec. It has been fixed in the latest datasets release (2.14.6). Updating the installation with pip install -U datasets should fix the issue.

git link : https://github.com/huggingface/datasets/issues/6352


If you are using fsspec, then do:

pip install fsspec==2023.9.2

There is a problem with fsspec==2023.10.0

git link : https://github.com/huggingface/datasets/issues/6330



Edit: Looks like it broken again in 2.17 and 2.18 downgrading to 2.16 should work.

Ronaronal answered 6/11, 2023 at 17:37 Comment(3)
this worked only after I restarted my kernel. I didn't explitly install fsspec either. Thanks!Godunov
pip install -U datasets worked for me when running dataset = load_dataset("OpenAssistant/oasst1")Morgun
Using pip repo now has no problems. But Conda repo has problems still.Dickman
B
0

I managed to get round it by deleting the files from the cached hugging face datasets folder. This is not the best way to go about solving this but it managed to work afterwards. Do bare in mind that I was only using datasets for one dataset though, so it didn't affect anything else.

Bedelia answered 5/3 at 23:44 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.