ValueError: Specifying the columns using strings is only supported for pandas DataFrames

I am using a titanic.csv dataset where i am trying to use Column Transfer and Pipeline and while using pipe.predict(x_test) i am getting an error. Here is my code.

titanic={'sex':['M','M','M','F','F','M','F','F','M','M'],
     'Pclass':[2,2,2,1,1,2,3,1,3,3],
     'age':[58,45,20,27,38,43,40,35,60,72],
     'embarked':['s','c','c','s','s','s','s','s','c','c'],
     'survived':[1,0,1,0,1,1,1,1,0,0]
    }
df=pd.DataFrame(data=titanic)
x=df.drop(['survived'],axis=1)
y=df.survived
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y)

col_tra_1=ColumnTransformer([
('trf1',SimpleImputer(),['Pclass','age'])
],remainder='passthrough')

col_tra_2=ColumnTransformer([
('ohe1',OneHotEncoder(sparse=False, handle_unknown='ignore'),['sex','embarked'])
],remainder='passthrough')

col_tra_3=ColumnTransformer([
    ('scale',MinMaxScaler(),['Pclass','age'])
],remainder='passthrough')


   model=DecisionTreeClassifier()
from sklearn.pipeline import Pipeline, make_pipeline
pipe = Pipeline([
    ('col_tra_1',col_tra_1),
    ('col_tra_2',col_tra_2),
    ('col_tra_3',col_tra_3),
    ('model',model)
])
   pipe.fit(x_train,y_train)

after that i am getting an error: ValueError: Specifying the columns using strings is only supported for pandas DataFrames.

if i use the indexes instead of column name i am getting a different error :ValueError: Cannot use mean strategy with non-numeric data: could not convert string to float: 'F'

col_tra_1=ColumnTransformer([ ('trf1',SimpleImputer(strategy='constant'),[0,1]) ],remainder='passthrough') col_tra_2=ColumnTransformer([ ('ohe1',OneHotEncoder(sparse=False, handle_unknown='ignore'),[0,3]) ],remainder='passthrough') col_tra_3=ColumnTransformer([ ('scale',MinMaxScaler(),[0,1]) ],remainder='passthrough')

Pipeline(steps=[('col_tra_1', ColumnTransformer(remainder='passthrough', transformers=[('trf1', SimpleImputer(strategy='constant'), [0, 1])])), ('col_tra_2', ColumnTransformer(remainder='passthrough', transformers=[('ohe1', OneHotEncoder(handle_unknown='ignore', sparse=False), [0, 3])])), ('col_tra_3', ColumnTransformer(remainder='passthrough', transformers=[('scale', MinMaxScaler(), [0, 1])])), ('model', DecisionTreeClassifier())])

Recommended topics

Hot tags