How to read CSV file into Jupyter Notebook

Asked 1/10, 2019 at 13:10 Answered 24/4, 2024 at 6:50

I have been having issues reading a CSV file into Jupyter Notebook. this is the code:

import pandas as pd
mpg = pd.read_csv('C:/Users/Ajibola/Documents/mpg.csv')
mpg.head()

And this is the error I got:

File "<ipython-input-138-844bace16611>", line 1
    mpg = pd.read_csv('C:\Users\Ajibola\Documents\mpg.csv')
                     ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

And after prefixing the PATH with r, I got the error:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-140-a1289650ba91> in <module>
----> 1 mpg = pd.read_csv(r'C:\Users\Ajibola\Documents\mpg.csv')
      2 mpg.head()

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    700                     skip_blank_lines=skip_blank_lines)
    701 
--> 702         return _read(filepath_or_buffer, kwds)
    703 
    704     parser_f.__name__ = name

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    427 
    428     # Create the parser.
--> 429     parser = TextFileReader(filepath_or_buffer, **kwds)
    430 
    431     if chunksize or iterator:

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
    893             self.options['has_index_names'] = kwds['has_index_names']
    894 
--> 895         self._make_engine(self.engine)
    896 
    897     def close(self):

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
   1120     def _make_engine(self, engine='c'):
   1121         if engine == 'c':
-> 1122             self._engine = CParserWrapper(self.f, **self.options)
   1123         else:
   1124             if engine == 'python':

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in __init__(self, src, **kwds)
   1851         kwds['usecols'] = self.usecols
   1852 
-> 1853         self._reader = parsers.TextReader(src, **kwds)
   1854         self.unnamed_cols = self._reader.unnamed_cols
   1855 

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._get_header()

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte

I've run through the community for related problems and answers but making no headway. An answer would be really appreciated.

Androcles answered 1/10, 2019 at 13:10 Comment(5)

What OS are you running jupyter in? Are you sure it's Windows and not running inside a linux container or remote / virtual machine? I recommend you use pathlib rather than strings for referencing filepaths. Another thought, it could be a weird character in your csv file, you might need to specify the encoding. You could try adding an argument like encoding="latin1" to your read_csv call, but you'd have to figure out which encoding was used to create the CSV. – Lott 1/10, 2019 at 13:22

@Lott My bad - thanks for letting me know. – Agueweed 1/10, 2019 at 13:29

Thank you so much @dan. I've resolved it. I was using the wrong folder. Silly me. – Androcles 2/10, 2019 at 14:42

If it was just a typo you should probably delete the question :/ – Lott 2/10, 2019 at 14:46

Okay. Sorry I'm new here. – Androcles 2/10, 2019 at 14:47

Create your .csv file in the same folder with your code. This will work

import pandas as pd
data = pd.read_csv('data.csv')
print(data)

Varix answered 22/5, 2020 at 13:21 Comment(0)

 import pandas as pd  
 mpg = pd.read_csv('C://Users//Ajibola//Documents//mpg.csv')
 mpg.head()

it will work since its a unicode error.

Roscoeroscommon answered 15/9, 2020 at 12:33 Comment(0)

Headers refer to the column names of your dataset. For some datasets you might encounter, the headers may be completely missing, partially missing, or they might exist, but you may want to rename them. enter link description here

hope this article is benificial for you

Thayer answered 1/10, 2019 at 13:18 Comment(1)

Please quote from the link so readers can understand which part of that tutorial is relevant in this situation. – Lott 1/10, 2019 at 13:25

The error is complaining about 'utf-8' not being able of decoding the data in your file. This is probably due to special characters in your file. Try another encoding (such as 'utf-16' or 'latin-1') as a parameter in your call:

import pandas as pd
mpg = pd.read_csv('C:/Users/Ajibola/Documents/mpg.csv', encoding = 'utf-16')
mpg.head()

For more info refer to:

pandas read csv to see how to use the encoding parameter and python standard encodings.

Linsk answered 1/10, 2019 at 13:28 Comment(1)

Thank you so much. I've resolved it. I was using the wrong folder. – Androcles 2/10, 2019 at 14:43

Try this for Mac

import pandas as pd

data = pd.read_csv("/Users/qiyunchu/Downloads/nyc_temperature_2019.csv")

data.head()

Shrewish answered 24/4, 2024 at 6:50 Comment(0)

Recommended topics

Hot tags