Python - Loading Zip Codes into a DataFrame as Strings?
Asked Answered
F

5

7

I'm using Pandas to load an Excel spreadsheet which contains zip code (e.g. 32771). The zip codes are stored as 5 digit strings in spreadsheet. When they are pulled into a DataFrame using the command...

xls = pd.ExcelFile("5-Digit-Zip-Codes.xlsx")
dfz = xls.parse('Zip Codes')

they are converted into numbers. So '00501' becomes 501.

So my questions are, how do I:

a. Load the DataFrame and keep the string type of the zip codes stored in the Excel file?

b. Convert the numbers in the DataFrame into a five digit string e.g. "501" becomes "00501"?

From answered 15/10, 2015 at 0:4 Comment(0)
G
8

As a workaround, you could convert the ints to 0-padded strings of length 5 using Series.str.zfill:

df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)

Demo:

import pandas as pd
df = pd.DataFrame({'zipcode':['00501']})
df.to_excel('/tmp/out.xlsx')
xl = pd.ExcelFile('/tmp/out.xlsx')
df = xl.parse('Sheet1')
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
print(df)

yields

  zipcode
0   00501
Gilder answered 15/10, 2015 at 1:0 Comment(0)
C
2

You can avoid panda's type inference with a custom converter, e.g. if 'zipcode' was the header of the column with zipcodes:

dfz = xls.parse('Zip Codes', converters={'zipcode': lambda x:x})

This is arguably a bug since the column was originally string encoded, made an issue here

Caveman answered 15/10, 2015 at 0:18 Comment(1)
if you have 00501 in excel, numeric with 2 leading zeroes, that would give 501 in pandas....Chucklehead
L
1
str(my_zip).zfill(5)

or

print("{0:>05s}".format(str(my_zip)))

are 2 of many many ways to do this

Listing answered 15/10, 2015 at 0:9 Comment(0)
M
0

The previous answers have correctly suggested using zfill(5). However, if your zipcodes are already in float datatype for some reason (I recently encountered data like this), you first need to convert it to int. Then you can use zfill(5).

df = pd.DataFrame({'zipcode':[11.0, 11013.0]})
    zipcode
0   11.0
1   11013.0
df['zipcode'] = df['zipcode'].astype(int).astype(str).str.zfill(5)
    zipcode
0   00011
1   11013
Mechellemechlin answered 8/8, 2020 at 13:48 Comment(0)
V
0

Pandas.read_excel docs say that you can preserve the data exactly as in the Excel sheet by specifying dtype as object: https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html

dtypeType name or dict of column -> type, default None Data type for data or columns. E.g. {‘a’: np.float64, ‘b’: np.int32} Use object to preserve data as stored in Excel and not interpret dtype. If converters are specified, they will be applied INSTEAD of dtype conversion.

So, something like this should work:

xls = pd.read_excel("5-Digit-Zip-Codes.xlsx", dtype=dtype={'zip_code': object, 'other_col': str})

(note: not at my work pc right now, so wasn't able to test it yet)

Vitrification answered 14/9, 2023 at 20:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.