ValueError: Specifying the columns using strings is only supported for pandas DataFrames
Asked Answered
O

1

8

I am using a titanic.csv dataset where i am trying to use Column Transfer and Pipeline and while using pipe.predict(x_test) i am getting an error. Here is my code.

titanic={'sex':['M','M','M','F','F','M','F','F','M','M'],
     'Pclass':[2,2,2,1,1,2,3,1,3,3],
     'age':[58,45,20,27,38,43,40,35,60,72],
     'embarked':['s','c','c','s','s','s','s','s','c','c'],
     'survived':[1,0,1,0,1,1,1,1,0,0]
    }
df=pd.DataFrame(data=titanic)
x=df.drop(['survived'],axis=1)
y=df.survived
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y)

col_tra_1=ColumnTransformer([
('trf1',SimpleImputer(),['Pclass','age'])
],remainder='passthrough')

col_tra_2=ColumnTransformer([
('ohe1',OneHotEncoder(sparse=False, handle_unknown='ignore'),['sex','embarked'])
],remainder='passthrough')

col_tra_3=ColumnTransformer([
    ('scale',MinMaxScaler(),['Pclass','age'])
],remainder='passthrough')


   model=DecisionTreeClassifier()
from sklearn.pipeline import Pipeline, make_pipeline
pipe = Pipeline([
    ('col_tra_1',col_tra_1),
    ('col_tra_2',col_tra_2),
    ('col_tra_3',col_tra_3),
    ('model',model)
])
   pipe.fit(x_train,y_train)

after that i am getting an error: ValueError: Specifying the columns using strings is only supported for pandas DataFrames.

if i use the indexes instead of column name i am getting a different error :ValueError: Cannot use mean strategy with non-numeric data: could not convert string to float: 'F'

Obrian answered 2/4, 2022 at 7:49 Comment(0)
C
4

The problem is that you stack some transformers on each other. The transformer returns a numpy array which have not columns, so the error occurs. You can verify it by removing two of transformers. The easiest way for me is to use indices instead of column names. Code:

col_tra_1=ColumnTransformer([
('trf1',SimpleImputer(strategy='constant'),[0,1])
],remainder='passthrough')

col_tra_2=ColumnTransformer([
('ohe1',OneHotEncoder(sparse=False, handle_unknown='ignore'),[0,3])
],remainder='passthrough')

col_tra_3=ColumnTransformer([
    ('scale',MinMaxScaler(),[0,1])
],remainder='passthrough')

Output:

Pipeline(steps=[('col_tra_1',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('trf1',
                                                  SimpleImputer(strategy='constant'),
                                                  [0, 1])])),
                ('col_tra_2',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('ohe1',
                                                  OneHotEncoder(handle_unknown='ignore',
                                                                sparse=False),
                                                  [0, 3])])),
                ('col_tra_3',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('scale', MinMaxScaler(),
                                                  [0, 1])])),
                ('model', DecisionTreeClassifier())])
Crumpler answered 2/4, 2022 at 8:8 Comment(2)
After using index i am getting an different error : ValueError: Cannot use mean strategy with non-numeric data: could not convert string to float: 'F'Obrian
Thank you so much buddy now it is running. You saved my timeObrian

© 2022 - 2024 — McMap. All rights reserved.