I'm looking for a way to replicate the encode behaviour in Stata, which will convert a categorical string column into a number column.
x = pd.DataFrame({'cat':['A','A','B'], 'val':[10,20,30]})
x = x.set_index('cat')
Which results in:
val
cat
A 10
A 20
B 30
I'd like to convert the cat column from strings to integers, mapping each unique string to an (arbitrary) integer 1-to-1. It would result in:
val
cat
1 10
1 20
2 30
Or, just as good:
cat val
0 1 10
1 1 20
2 2 30
Any suggestions?
Many thanks as always, Rob
encode
does. It produces one-to-one mappings. – Swaggering'A'
becomes1
, each instance of'B'
becomes2
etc. – Corniceencode
does in Stata. – Swaggeringcat
column that's getting the mapping, not theval
column. Theval
column remains unchanged and is of no relevance to the example. The important thing is thatcat
goes from['A','A','B']
to[1,1,2]
as per my example. – Cornice