I'm a newbie to Machine Learning and trying to work through an error I'm getting using OneHotEncoder class. The error is: "Expected 2D array, got 1D array instead". So when I think of 1D arrays it's something like: [1,4,5,6]
and a 2D array would be [[2,3], [3,4], [5,6]]
, but I still cannot figure out why this is failing. It's failing on this line:
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()
Here is my whole code:
# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
df_X = pd.DataFrame(X)
df_y = pd.DataFrame(y)
# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 3:5 ])
X[:, 3:5] = imputer.transform(X[:, 3:5])
# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])
# Transform into a Matrix
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()
# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])
I'm sure you can tell by this code that I have 2 columns that were labels. I used the Label Encoder to turn those columns into numbers. I'd like to use OneHotEncoder to take it one step further and turn these into a matrix so each row would have something like this:
0 1 0
1 0 1
The only thing that came to mind was how I encoded the labels. I did them one by one instead of doing them all at once. Not sure this is the problem.
I was hoping to do something like this:
# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])
# Transform into a Matrix
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()
# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])
# Transform into a Matrix
onehotencoder2 = OneHotEncoder(categorical_features = [1])
X[:, 1] = onehotencoder1.fit_transform(X[:, 1]).toarray()
Below you will find my whole error:
File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array
"if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[ 2. 1. 3. 2. 3. 5. 5. 0. 4. 0.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Any help in the right direction would be great.
X[:, 0] = onehotencoder1.fit_transform(X[:, 0].reshape(-1,1)).toarray()
– Georgeanngeorgeanna