Import pandas dataframe column as string not int

Asked 8/11, 2012 at 16:54 Answered 10/1 at 14:52

Solved python pandas casting type-conversion dtype

145

I would like to import the following csv as strings not as int64. Pandas read_csv automatically converts it to int64, but I need this column as string.

ID
00013007854817840016671868
00013007854817840016749251
00013007854817840016754630
00013007854817840016781876
00013007854817840017028824
00013007854817840017963235
00013007854817840018860166

df = read_csv('sample.csv')

df.ID
>>

0   -9223372036854775808
1   -9223372036854775808
2   -9223372036854775808
3   -9223372036854775808
4   -9223372036854775808
5   -9223372036854775808
6   -9223372036854775808
Name: ID

Unfortunately using converters gives the same result.

df = read_csv('sample.csv', converters={'ID': str})
df.ID
>>

0   -9223372036854775808
1   -9223372036854775808
2   -9223372036854775808
3   -9223372036854775808
4   -9223372036854775808
5   -9223372036854775808
6   -9223372036854775808
Name: ID

Slavonic answered 8/11, 2012 at 16:54 Comment(0)

230

Just want to reiterate this will work in pandas >= 0.9.1:

In [2]: read_csv('sample.csv', dtype={'ID': object})
Out[2]: 
                           ID
0  00013007854817840016671868
1  00013007854817840016749251
2  00013007854817840016754630
3  00013007854817840016781876
4  00013007854817840017028824
5  00013007854817840017963235
6  00013007854817840018860166

I'm creating an issue about detecting integer overflows also.

EDIT: See resolution here: https://github.com/pydata/pandas/issues/2247

Update as it helps others:

To have all columns as str, one can do this (from the comment):

pd.read_csv('sample.csv', dtype = str)

To have most or selective columns as str, one can do this:

# lst of column names which needs to be string
lst_str_cols = ['prefix', 'serial']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str'  for x in lst_str_cols}
# use dict on dtypes
pd.read_csv('sample.csv', dtype=dict_dtypes)

Bloody answered 14/11, 2012 at 17:58 Comment(5)

It also seems, if you want all columns to be interpreted as strings, one can do the following: dtype = str. – Gurias 6/7, 2017 at 18:9

It seems empty fields still come through as np.nan – Methylene 19/9, 2019 at 22:3

same question here. But i used keep_default_na = False resolved my issue. – Melissa 10/2, 2020 at 15:0

Thank you for the comments. I also had to use dypte=str AND keep_default_na = False so that null values weren't nan. – Meiny 23/7, 2020 at 18:22

Using the high-digit integers as a string saves a lot of headaches. ; hero or villain? YOU'RE A HERO!! – Bourbonism 24/5, 2022 at 20:21

This probably isn't the most elegant way to do it, but it gets the job done.

In[1]: import numpy as np

In[2]: import pandas as pd

In[3]: df = pd.DataFrame(np.genfromtxt('/Users/spencerlyon2/Desktop/test.csv', dtype=str)[1:], columns=['ID'])

In[4]: df
Out[4]: 
                       ID
0  00013007854817840016671868
1  00013007854817840016749251
2  00013007854817840016754630
3  00013007854817840016781876
4  00013007854817840017028824
5  00013007854817840017963235
6  00013007854817840018860166

Just replace '/Users/spencerlyon2/Desktop/test.csv' with the path to your file

Deauville answered 9/11, 2012 at 2:54 Comment(0)

Since pandas 1.0 it became much more straightforward. This will read column 'ID' as dtype 'string':

pd.read_csv('sample.csv',dtype={'ID':'string'})

As we can see in this Getting started guide, 'string' dtype has been introduced (before strings were treated as dtype 'object').

Dior answered 14/4, 2020 at 3:3 Comment(0)

The following approach seems to work to get every column as a string:

import pandas as pd
from collections import defaultdict

df = pd.read_csv(
    data_path,
    dtype=defaultdict(lambda: 'string'),
    keep_default_na=False,
)

W answered 10/1 at 14:52 Comment(0)

Recommended topics

Hot tags