Importing COCO datasets to google colaboratory
Asked Answered
V

5

5

The COCO dataset is very large for me to upload it to google colab. Is there any way I can directly download the dataset to google colab?

Vietnam answered 7/4, 2019 at 8:18 Comment(2)
Hi there and welcome to SO, Can you provide more details for example what is "very large" and other steps to reproduce your issue. Also a screenshot with more details of what you see can helpRib
It's around 20GB including the train, test and validation datasets. I want to directly import the dataset from cocodasets.org to my Google colab notebook.Vietnam
A
5

You can download it directly with wget

!wget http://images.cocodataset.org/zips/train2017.zip

Also, you should use GPU instance which gives larger space at 350 GB.

Along answered 7/4, 2019 at 10:34 Comment(2)
I'm wondering why google is not offering these datasets right in colab. They are wasting a lot of resources, since this dataset is probably loaded several thousand times a day to colab.Raeannraeburn
@Raeannraeburn google keras api has pre trained model that the reason they dont botherArdelia
E
6

One more approach could be uploading just the annotations file to Google Colab. There's no need to download the image dataset. We will make use of the PyCoco API. Next, when preparing an image, instead of accessing the image file from Drive / local folder, you can read the image file with the URL!

# The normal method. Read from folder / Drive
I = io.imread('%s/images/%s/%s'%(dataDir,dataType,img['file_name']))

# Instead, use this! Url to load image
I = io.imread(img['coco_url'])

This method will save you plenty of space, download time and effort. However, you'll require a working internet connection during training to fetch the images (which of course you have, since you are using colab).

If you are interested in working with the COCO dataset, you can have a look at my post on medium.

Etana answered 6/5, 2020 at 7:28 Comment(0)
A
5

You can download it directly with wget

!wget http://images.cocodataset.org/zips/train2017.zip

Also, you should use GPU instance which gives larger space at 350 GB.

Along answered 7/4, 2019 at 10:34 Comment(2)
I'm wondering why google is not offering these datasets right in colab. They are wasting a lot of resources, since this dataset is probably loaded several thousand times a day to colab.Raeannraeburn
@Raeannraeburn google keras api has pre trained model that the reason they dont botherArdelia
G
1

You can download it to google drive and then mount the drive to Colab.

from google.colab import drive
drive.mount('/content/drive')

then you can cd to the folder containing the dataset, for eg.

import os
os.chdir("drive/My Drive/cocodataset")

enter image description here

Gennie answered 9/4, 2019 at 1:35 Comment(1)
how many space does coco occupy? should drive be paid subscription ?Mauchi
S
1

Using drive is better for further use. Also unzip the zip with using colab ( !unzip ) because using zip extractor on drive takes longer. I've tried :D

Sheriesherif answered 6/2, 2021 at 18:37 Comment(0)
L
0

These days, the easiest way to download COCO is to use the Python tool, fiftyone. It lets you download, visualize, and evaluate the dataset as well as any subset you are interested in.

It also works directly in Colab so you can perform your entire workflow there.

import fiftyone as fo
import fiftyone.zoo as foz

#
# Only the required images will be downloaded (if necessary).
# By default, only detections are loaded
#

dataset = foz.load_zoo_dataset(
    "coco-2017",
    splits=["validation","train"],
    classes=["person", "car"],
    # max_samples=50,
)

# Visualize the dataset in the FiftyOne App
session = fo.launch_app(dataset)

enter image description here

Litigable answered 19/10, 2021 at 15:9 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.