My data consists of a mix of continuous and categorical features. Below is a small snippet of how my data looks like in the csv format (Consider it as data collected by a super store chain that operates stores in different cities)
city,avg_income_in_city,population,square_feet_of_store_area, store_type ,avg_revenue
NY ,54504 , 3506908 ,3006 ,INDOOR , 8000091
CH ,44504 , 2505901 ,4098 ,INDOOR , 4000091
HS ,50134 , 3206911 ,1800 ,KIOSK , 7004567
NY ,54504 , 3506908 ,1000 ,KIOSK , 2000091
Her you can see that avg_income_in_city, square_feet_of_store_area and avg_revenue are continuous values where as city,store_type etc are categorical classes (and few more which I have not shown here to maintain the brevity of the data).
I wish to model the data in order to predict the revenue. The question is how to 'Discretizate' the continuous values using sklearn? Does sklearn provide any "readymade" class/method for Discretization of the continuous values? (like we have in Orange e.g Orange.Preprocessor_discretize(data, method=orange.EntropyDiscretization())
Thanks !