convert a column in a python pandas from STRING MONTH into INT
Asked Answered
A

3

12

In Python 2.7.11 & Pandas 0.18.1:

If we have the following csv file:

YEAR,MONTH,ID
2011,JAN,1
2011,FEB,1
2011,MAR,1

Is there any way to read it as a Pandas data frame and convert the MONTH column into strings like this?

YEAR,MONTH,ID
2011,1,1
2011,2,1
2011,3,1

Some pandas functions such as "dt.strftime('%b')" doesn't seem to work. Could someone enlighten?

Aboriginal answered 9/3, 2017 at 0:28 Comment(0)
S
28

I guess the easiest and one of the fastest method would be to create a mapping dict and map like as follows:

In [2]: df
Out[2]:
   YEAR MONTH  ID
0  2011   JAN   1
1  2011   FEB   1
2  2011   MAR   1

In [3]: d = {'JAN':1, 'FEB':2, 'MAR':3, 'APR':4, }

In [4]: df.MONTH = df.MONTH.map(d)

In [5]: df
Out[5]:
   YEAR  MONTH  ID
0  2011      1   1
1  2011      2   1
2  2011      3   1

you may want to use df.MONTH = df.MONTH.str.upper().map(d) if not all MONTH values are in upper case

another more slower but more robust method:

In [11]: pd.to_datetime(df.MONTH, format='%b').dt.month
Out[11]:
0    1
1    2
2    3
Name: MONTH, dtype: int64

UPDATE: we can create a mapping automatically (thanks to @Quetzalcoatl)

import calendar

d = dict((v,k) for k,v in enumerate(calendar.month_abbr))

or alternatively (using only Pandas):

d = dict(zip(range(1,13), pd.date_range('2000-01-01', freq='M', periods=12).strftime('%b')))
Suh answered 9/3, 2017 at 0:31 Comment(1)
to generate the dictionary: import calendar dict((v,k) for k,v in enumerate(calendar.month_abbr)) courtesy of: #3418550Boracite
F
2

Here's a one-liner using the pandas API and the calendar.month_abbr convenience:

from calendar import month_abbr

lower_ma = [m.lower() for m in month_abbr]

# one-liner with Pandas
df['MONTH'] = df['MONTH'].str.lower().map(lambda m: lower_ma.index(m)).astype('Int8')
  1. Convert the calendar.month_abbr which are title-cased, into lower-cased
  2. Feed the lowered-cased MONTH series to a map method >> .str.lower()
  3. Use a lambda function within the map method and get the index of the corresponding month abbreviation via the .index python list method >> .map(lambda m: lower_ma.index(m))
  4. Convert to integer >> .astype('Int8')
Faso answered 3/8, 2021 at 13:50 Comment(0)
H
-1

Following Max's last point; create the same thing but rely on your local dataframe's way of encoding months:

# create mapping
d = dict((v,k) for k,v in zip(range(1, 13), df.Month.unique()))
# create column
df['month_index'] = df['Month'].map(d)
Housebreaking answered 15/12, 2020 at 19:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.