documentation for Kaggle API *within* python?
Asked Answered
N

2

22

I want to write a python script that downloads a public dataset from Kaggle.com.

The Kaggle API is written in python, but almost all of the documentation and resources that I can find are on how to use the API in command line, and very little on how to use the kaggle library within python.

Some users seem to know how to do this, see for example several answers to this question, but the hints are not enough to resolve my specific issue.

Namely, I have a script that looks like this:

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi('content of my json metadata file')

file = api.datasets_download_file(
    owner_slug='the-owner-slug',
    dataset_slug='the-dataset-slug',
    file_name='the-file-name.csv',
)

I have come up with this by looking at the method's signature:
api.datasets_download_file(owner_slug, dataset_slug, file_name, **kwargs)

I get the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 12: invalid start byte

Beyond the solution to this specific problem, I would be really happy to know how to go about troubleshooting errors with the Kaggle library, other than going through the code itself. In fact, perhaps the issue has nothing to do with utf-encoding, but I don't know how to figure this out. What if it is just that the filename is wrong, or something as silly as this?

The csv file is nothing special: three columns, first is timestamp, the other two are integers.

Nuthouse answered 1/5, 2019 at 10:19 Comment(2)
Have you tried to use the Kaggle CLI? (github.com/floydwch/kaggle-cli) You can download the datasets with it already. This is the command you can use to download the files: kg download -u <username> -p <password> -c <competition>Blessington
This is not what I am asking about. I am asking about Kaggle as a python library to be used in a python script.Nuthouse
O
34

I published a blog post that explains most of the common use cases of competition, datasets and kernel interactions.

Here are the steps involved in using the Kaggle API from within Python.

Setting up API Key

Go to your Kaggle account Tab at https://www.kaggle.com/<username>/account and click ‘Create API Token’. A file named kaggle.json will be downloaded. Move this file in to ~/.kaggle/ folder in Mac and Linux or to C:\Users<username>.kaggle\ on windows.

Alternatively, you can populate KAGGLE_USERNAME and KAGGLE_KEY environment variables with values from kaggle.json to get the api to authenticate.

Authenticating With API Server

from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()

Downloading Datasets

# Download all files of a dataset
# Signature: dataset_download_files(dataset, path=None, force=False, quiet=True, unzip=False)
api.dataset_download_files('avenn98/world-of-warcraft-demographics')

# download single file
#Signature: dataset_download_file(dataset, file_name, path=None, force=False, quiet=True)
api.dataset_download_file('avenn98/world-of-warcraft-demographics','WoW Demographics.csv')

Downloading Competition Files

# Download all files for a competition
# Signature: competition_download_files(competition, path=None, force=False, quiet=True)
api.competition_download_files('titanic')

# Download single file for a competition
# Signature: competition_download_file(competition, file_name, path=None, force=False, quiet=False)
api.competition_download_file('titanic','gender_submission.csv')

Submitting to competitions

# Signature: competition_submit(file_name, message, competition,quiet=False)
api.competition_submit('gender_submission.csv','API Submission','titanic')

Retrieving Leader Board

# Signature: competition_view_leaderboard(id, **kwargs)
leaderboard = api.competition_view_leaderboard('titanic')
Octant answered 19/2, 2020 at 21:54 Comment(0)
I
0

You can check the implementation of the Kaggle API

But if you are lazy you can just install kaggle on your server pip install kaggle.

And to download a whole competition you may call this from python.

import os
os.system('kaggle competitions download -c "dogs-vs-cats-redux-kernels-edition"')

Further, check the docs

Industry answered 1/5, 2019 at 14:46 Comment(3)
Thanks, I am aware of all that. Right now I am using the "lazy" solution of invoking os.system but it is simply absurd, given that kaggle is itself written in python.Nuthouse
Your comment does not tell anything that I didn't know or hadn't already tried, so no vote this time, I guess.Nuthouse
So indeed, the best resource I have been able to find so far is the actual source for the api: github.com/Kaggle/kaggle-api/blob/master/kaggle/api/…Armillary

© 2022 - 2024 — McMap. All rights reserved.