I have a Pandas DataFrame that contains several string values. I want to replace them with integer values in order to calculate similarities. For example:
stores[['CNPJ_Store_Code','region','total_facings']].head()
Out[24]:
CNPJ_Store_Code region total_facings
1 93209765046613 Geo RS/SC 1.471690
16 93209765046290 Geo RS/SC 1.385636
19 93209765044084 Geo PR/SPI 0.217054
21 93209765044831 Geo RS/SC 0.804633
23 93209765045218 Geo PR/SPI 0.708165
and I want to replace region == 'Geo RS/SC' ==> 1, region == 'Geo PR/SPI'==> 2 etc.
Clarification: I want to do the replacement automatically, without creating a dictionary first, since I don't know in advance what my regions will be. Any ideas? I am trying to use DictVectorizer, with no success.
I'm sure there's a way to do it in intelligent way, but I just can't find it.
Anyone familiar with a solution?