Is there any Command to Download data from particular folder from Kaggle Competition using kaggle API
Asked Answered
H

1

8

I'm trying to download data from Kaggle Competition state-farm-distracted-driver-detection

The dataset has following directory structure

|-driver_imgs_list.csv
|-sample-submission.csv
|imgs
|   |test
|   |train
|       |c0
|       |c1
|       |c2
|          |-img_100029.jpg
|          |-img_100108.jpg

I want only imgs/train/c2 folder to download. I know how to download full dataset and particular files, But I'm unable to figure out How to download a particular folder using the API

Initially I have tried using Kaggle CLI API, Using that I'm able to download particular image as follows

kaggle competitions download state-farm-distracted-driver-detection -f imgs/train/c2/img_100029.jpg

But when I tried the following command to download the c2 folder, I'm getting error like File not found

kaggle competitions download state-farm-distracted-driver-detection -f imgs/train/c2
404 - Not Found

Is there any Command To download a Particular folder from competition in kaggle api??

As another trial I used Kaggle API from python to download that folder

My idea is, There is a file named "driver_imgs_list.csv" which contains class names like (c0,c1,c2..) along with their corresponding image files. As I want to download c2 class folder, I stored the c2 class image files in an array using pandas. Then I tried to download the each file using a for loop as follows

from kaggle.api.kaggle_api_extended import KaggleApi
import pandas as pd
api = KaggleApi()
api.authenticate()

data = pd.read_csv("driver_imgs_list.csv")

images = data[data["classname"] == "c2"]["img"]   #It will give me all image file names under c2 folder

imgArray=[]
for i in images:
   imgArray.append(i)

for i in imgArray:
   file = "imgs/train/c2/{i}".format(i=i)
   api.competition_download_file('state-farm-distracted-driver-detection',file,quiet = False,force = True)

Even By using the above Code I'm getting the same error as file not found as follows

HTTP response body: b'{"code":404,"message":"NotFound"}'

How can I Download a Particular folder either using Kaggle CLI API or from python

Herbartian answered 9/3, 2020 at 19:10 Comment(1)
did you solve it?Pastrami
C
1

Could it be that the error message is true, and that the file is truly not in the dataset's folder?

Another idea is that it has to do with the order (?), because I was able to get your code running when using .sort_values() on the image names' Series:

data = pd.read_csv('driver_imgs_list.csv')
filenames = 'imgs/train/c2/' + data[data['classname'] == 'c2']['img'].sort_values()

for filename in filenames:
    api.competition_download_file('state-farm-distracted-driver-detection', filename)

However, I only let it run for like 10 files. So again it could be that there is a mismatch between the files in the CSV file and the files actually available in the dataset.

Cyanite answered 12/3, 2020 at 9:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.