I want to write a python
script that downloads a public dataset from Kaggle.com.
The Kaggle API is written in python, but almost all of the documentation and resources that I can find are on how to use the API in command line, and very little on how to use the kaggle
library within python
.
Some users seem to know how to do this, see for example several answers to this question, but the hints are not enough to resolve my specific issue.
Namely, I have a script that looks like this:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi('content of my json metadata file')
file = api.datasets_download_file(
owner_slug='the-owner-slug',
dataset_slug='the-dataset-slug',
file_name='the-file-name.csv',
)
I have come up with this by looking at the method's signature:
api.datasets_download_file(owner_slug, dataset_slug, file_name, **kwargs)
I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 12: invalid start byte
Beyond the solution to this specific problem, I would be really happy to know how to go about troubleshooting errors with the Kaggle library, other than going through the code itself. In fact, perhaps the issue has nothing to do with utf-encoding, but I don't know how to figure this out. What if it is just that the filename is wrong, or something as silly as this?
The csv
file is nothing special: three columns, first is timestamp, the other two are integers.
kg download -u <username> -p <password> -c <competition>
– BlessingtonKaggle
as apython
library to be used in apython
script. – Nuthouse