Why do we flatten the data before we feed it into tensorflow?

Asked 15/6, 2017 at 16:4 Answered 16/6, 2017 at 9:19

I'm following udacity MNIST tutorial and MNIST data is originally 28*28 matrix. However right before feeding that data, they flatten the data into 1d array with 784 columns (784 = 28 * 28).

For example, original training set shape was (200000, 28, 28).
200000 rows (data). Each data is 28*28 matrix

They converted this into the training set whose shape is (200000, 784)

Can someone explain why they flatten the data out before feeding to tensorflow?

Otolith answered 15/6, 2017 at 16:4 Comment(3)

Your link starts with localhost, you should fix this – Haight 15/6, 2017 at 16:10

Thanks. fixed now. – Otolith 15/6, 2017 at 16:31

You don't have to flatten it before you send it to tensorflow. You could flatten it in Tensorflow. – Strengthen 19/6, 2017 at 14:39

Because when you're adding a fully connected layer, you always want your data to be a (1 or) 2 dimensional matrix, where each row is the vector representing your data. That way, the fully connected layer is just a matrix multiplication between your input (of size (batch_size, n_features)) and the weights (of shape (n_features, n_outputs)) (plus the bias and the activation function), and you get an output of shape (batch_size, n_outputs). Plus, you really don't need the original shape information in a fully connected layer, so it's OK to lose it.

It would be more complicated and less efficient to get the same result without reshaping first, that's why we always do it before a fully connected layer. For a convolutional layer, on the opposite, you'll want to keep the data in original format (width, height).

Haight answered 15/6, 2017 at 16:15 Comment(0)

That is a convention with fully connected layers. Fully connected layers connect every node in the previous layer with every node in the successive layer so locality is not an issue for this type of layer.

Additionally by defining the layer like this we can efficiently calculate the next step by calculating the formula: f(Wx + b) = y. This would not be as easily possible with multidimensional input and reshaping the input is low cost and easy to accomplish.

Astrology answered 16/6, 2017 at 9:19 Comment(0)

Recommended topics

Hot tags