Error "Expected 2D array, got 1D array instead" Using OneHotEncoder

D

6

11

I'm a newbie to Machine Learning and trying to work through an error I'm getting using OneHotEncoder class. The error is: "Expected 2D array, got 1D array instead". So when I think of 1D arrays it's something like: [1,4,5,6] and a 2D array would be [[2,3], [3,4], [5,6]], but I still cannot figure out why this is failing. It's failing on this line:

X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()

Here is my whole code:

# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Import Dataset
dataset = pd.read_csv('Data2.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 5].values
df_X = pd.DataFrame(X)
df_y = pd.DataFrame(y)

# Replace Missing Values
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values = 'NaN', strategy = 'mean', axis = 0)
imputer = imputer.fit(X[:, 3:5 ])
X[:, 3:5] = imputer.transform(X[:, 3:5])


# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])

# Transform into a Matrix
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()

# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])

I'm sure you can tell by this code that I have 2 columns that were labels. I used the Label Encoder to turn those columns into numbers. I'd like to use OneHotEncoder to take it one step further and turn these into a matrix so each row would have something like this:

0  1  0
1  0  1

The only thing that came to mind was how I encoded the labels. I did them one by one instead of doing them all at once. Not sure this is the problem.

I was hoping to do something like this:

# Encoding Categorical Data "Name"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x = LabelEncoder()
X[:, 0] = labelencoder_x.fit_transform(X[:, 0])

# Transform into a Matrix
onehotencoder1 = OneHotEncoder(categorical_features = [0])
X[:, 0] = onehotencoder1.fit_transform(X[:, 0]).toarray()

# Encoding Categorical Data "University"
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x1 = LabelEncoder()
X[:, 1] = labelencoder_x1.fit_transform(X[:, 1])

# Transform into a Matrix
onehotencoder2 = OneHotEncoder(categorical_features = [1])
X[:, 1] = onehotencoder1.fit_transform(X[:, 1]).toarray()

Below you will find my whole error:

File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array
    "if it contains a single sample.".format(array))

ValueError: Expected 2D array, got 1D array instead:
array=[ 2.  1.  3.  2.  3.  5.  5.  0.  4.  0.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Any help in the right direction would be great.

Dib answered 24/12, 2017 at 0:27 Comment(1)

Use X[:, 0] = onehotencoder1.fit_transform(X[:, 0].reshape(-1,1)).toarray() – Georgeanngeorgeanna 26/12, 2017 at 6:27

F

2

This is an issue in sklearn OneHotEncoder raised in https://github.com/scikit-learn/scikit-learn/issues/3662. Most scikit learn estimators need a 2D array rather than a 1D array.

The standard practice is to include a multidimensional array. Since you have specified which column to consider as categorical for onehotencoding in categorical_features = [0], you can rewrite the next line as the following to take whole dataset or a part of it. It will consider only the first column for categorical to dummy transformation while still have a multidimensional array to work with.

onehotencoder1 = OneHotEncoder(categorical_features = [0])
X = onehotencoder1.fit_transform(X).toarray()

(I hope your dataset doesn't have anymore categorical values. I'll advise you to labelencode everything first, then onehotencode.

Floreated answered 7/1, 2018 at 15:35 Comment(0)

D

9

I got the same error and after the error message there's a suggestion as followed:

"Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."

Since my data was an array, i used X.values.reshape(-1,1) and it works. (There was another suggestion to use X.values.reshape instead of X.reshape).

Disembark answered 21/5, 2018 at 2:56 Comment(0)

F

2

This is an issue in sklearn OneHotEncoder raised in https://github.com/scikit-learn/scikit-learn/issues/3662. Most scikit learn estimators need a 2D array rather than a 1D array.

The standard practice is to include a multidimensional array. Since you have specified which column to consider as categorical for onehotencoding in categorical_features = [0], you can rewrite the next line as the following to take whole dataset or a part of it. It will consider only the first column for categorical to dummy transformation while still have a multidimensional array to work with.

onehotencoder1 = OneHotEncoder(categorical_features = [0])
X = onehotencoder1.fit_transform(X).toarray()

(I hope your dataset doesn't have anymore categorical values. I'll advise you to labelencode everything first, then onehotencode.

Floreated answered 7/1, 2018 at 15:35 Comment(0)

F

2

I came across a fix by adding

X=X.reshape(-1,1)

the error appears to be gone now, but not sure if this is the right way to fix this

Foredate answered 21/1, 2018 at 11:52 Comment(0)

V

1

At the moment that will change the categorical features, you need to add another pair of brackets:

X[:, 0] = pd.DataFrame(onehotencoder1.fit_transform(X[[:, 0]]).toarray())

Vanward answered 28/10, 2020 at 2:57 Comment(0)

H

0

You need to reshape your data as the method expects a multidimensional array as mentioned previously. X = x.reshape(-1,1) worked for me as well.

Hedger answered 11/5, 2019 at 0:25 Comment(0)

R

0

For use with Pandas DataFrame input data, I found this video very helpful: One Hot Encoder with Python Machine Learning (Scikit-Learn)

Basically, use

OneHotEncoder(sparse_output=False).set_output(transform='pandas')

to use the encoder object on a Pandas DataFrame with a selected feature column.

Regulation answered 24/4 at 14:35 Comment(0)

Recommended topics

Hot tags