Seaborn load_dataset

S

7

67

I am trying to get a grouped boxplot working using Seaborn as per the example

I can get the above example working, however the line:

tips = sns.load_dataset("tips")

is not explained at all. I have located the tips.csv file, but I can't seem to find adequate documentation on what load_dataset specifically does. I tried to create my own csv and load this, but to no avail. I also renamed the tips file and it still worked...

My question is thus:

Where is load_dataset actually looking for files? Can I actually use this for my own boxplots?

EDIT: I managed to get my own boxplots working using my own DataFrame, but I am still wondering whether load_dataset is used for anything more than mysterious tutorial examples.

Shutter answered 19/5, 2015 at 21:16 Comment(1)

load_dataset is just a convenience function for the seaborn documentation. – Misunderstood 20/5, 2015 at 0:40

K

78

load_dataset looks for online csv files on https://github.com/mwaskom/seaborn-data. Here's the docstring:

Load a dataset from the online repository (requires internet).

Parameters

name : str Name of the dataset (name.csv on https://github.com/mwaskom/seaborn-data). You can obtain list of available datasets using :func:get_dataset_names

kws : dict, optional Passed to pandas.read_csv

If you want to modify that online dataset or bring in your own data, you likely have to use pandas. load_dataset actually returns a pandas DataFrame object, which you can confirm with type(tips).

If you already created your own data in a csv file called, say, tips2.csv, and saved it in the same location as your script, use this (after installing pandas) to load it in:

import pandas as pd

tips2 = pd.read_csv('tips2.csv')

Kuroshio answered 19/5, 2015 at 22:40 Comment(1)

Weird that the load_dataset documentation doesn't actually state what it returns. I know that it is obvious to those that have used it a couple of times, but how can one not document that basic fact? https://seaborn.pydata.org/generated/seaborn.load_dataset.html – Readily 29/10, 2019 at 8:3

C

11

Just to add to 'selwyth's' answer.

import pandas as pd
Data=pd.read_csv('Path\to\csv\')
Data.head(10)

Once you have completed these steps successfully. Now the plotting actually works like this.

Let's say you want to plot a bar plot.

sns.barplot(x=Data.Year,y=Data.Salary) //year and salary attributes were present in my dataset.

This actually works with every plotting in seaborn.

Moreover, we will not be eligible to add our own dataset on Seaborn Git.

Chrysoberyl answered 21/3, 2018 at 11:10 Comment(0)

F

0

load_dataset is used for seaborn datasets;if you want to use your own dataset, you should open(or read )it with Pandas and after it you can use seaborn methods to Draw diagrams and visualization tasks. for example in Jupyter Notebook I've put my own dataset in my local drive and a document in my machine and read it :

import pandas as pd
import seaborn as sns

AI_df=pd.read_csv('AI.csv')
ai_cor=AI_df.corr()
sns.heatmap(ai_cor,annot=True,cmap='coolwarm',linewidths=1)

Filia answered 25/3, 2023 at 10:25 Comment(0)

N

0

Try to use cache=False:

tips = sns.load_dataset("tips", cache=False)

Necrose answered 10/1 at 2:16 Comment(0)

W

0

If you're for instance using Colab, and want to see where load_dataset looks for the files to upload, type the command below

sns.get_dataset_names()

And you'll get the output of all the files it can upload from as below:

['anagrams',
 'anscombe',
 'attention',
 'brain_networks',
 'car_crashes',
 'diamonds',
 'dots',
 'dowjones',
 'exercise',
 'flights',
 'fmri',
 'geyser',
 'glue',
 'healthexp',
 'iris',
 'mpg',
 'penguins',
 'planets',
 'seaice',
 'taxis',
 'tips',
 'titanic']

NB: You can also upload any files from your computer, external sources to your workspace by clicking of the folder icon on the left side of the workspace to display all the files and folders in the workspace and right click in an empty space below the files to select 'upload' and upload a file. And if you do so, you're going to use the lines below to read that file (in my case the file is goodreads.csv):

import pandas as pd
df = pd.read_csv('goodreads.csv', on_bad_lines='skip')
print(df)

Make sure to use on_bad_lines='skip to avoid errors that might occur because of null, NaN, or empty cell in the data.

Wehrle answered 2/2 at 17:41 Comment(0)

S

-1

Download all csv files(zipped) to be used for your example from here.

Extract the zip file to a local directory and launch your jupyter notebook from the same directory. Run the following commands in jupyter notebook:

import pandas as pd
tips = pd.read_csv('seaborn-data-master/tips.csv')

you're good to work with your example now!

Sidon answered 11/5, 2018 at 3:18 Comment(0)

B

-1

You will need to have an internet connection since the csv files are not on your local computer so your computer needs to be online in order to download the dataset

Bobbi answered 5/3, 2022 at 9:3 Comment(0)

Recommended topics

Hot tags