What is the difference between conv1d with kernel_size=1 and dense layer?

Asked 16/8, 2019 at 15:2 Answered 31/8, 2020 at 21:53

Solved tensorflow keras neural-network conv-neural-network tf.keras

I am building a CNN with Conv1D layers, and it trains pretty well. I'm now looking into how to reduce the number of features before feeding it into a Dense layer at the end of the model, so I've been reducing the size of the Dense layer, but then I came across this article. The article talks about the effect of using a Conv2D filters with a kernel_size=(1,1) to reduce the number of features.

I was wondering what the difference is between using a Conv2D layer with kernel_size=(1,1) tf.keras.layers.Conv2D(filters=n,kernel_size=(1,1)) and using a Dense layer of the same size tf.keras.layers.Dense(units=n)? From my perspective (I'm relatively new to neural nets), a filter with kernel_size=(1,1) is a single number, which is essentially equivalent to weight in a Dense layer, and both layers have biases, so are they equivalent, or am I misunderstanding something? And if my understanding is correct, in my case where I am using Conv1D layers, not Conv2D layers, does that change anything? As in is tf.keras.layers.Conv1D(filters=n, kernel_size=1) equivalent to tf.keras.layers.Dense(units=n)?

Please let me know if you need anything from me to clarify the question. I'm mostly curious about if Conv1D layers with kernel_size=1 and Conv2D layers with kernel_size=(1,1) behave differently than Dense layers.

Staunch answered 16/8, 2019 at 15:2 Comment(0)

Yes, since Dense layer is applied on the last dimension of its input (see this answer), Dense(units=N) and Conv1D(filters=N, kernel_size=1) (or Dense(units=N) and Conv2D(filters=N, kernel_size=1)) are basically equivalent to each other both in terms of connections and number of trainable parameters.

Ashcroft answered 16/8, 2019 at 15:42 Comment(3)

The elitist and technically reasonable explanation I learned to this question is that a purely CNN had no MLPs, instead using kernel size 1 to achieve similar function. – Lamdin 18/8, 2019 at 8:54

@Lamdin to clarify, when you say MLPs, I think you are referencing Dense layers, is that correct? – Staunch 3/2, 2020 at 18:27

Precisely. Of course, in the vast majority of practical circumstances, there's no need to be pedantic about it; might as well use dense layers if you can. I can imagine some hardware neural networks perhaps optimized by 'pure' CNNs, but otherwise... :) – Lamdin 7/2, 2020 at 4:19

In 1D CNN, the kernel moves in 1 direction. The input and output data of 1D CNN is 2 dimensional. Mostly used on Time-Series Data, Natural Language Processing tasks etc. Definitely gonna see people using it in Kaggle NLP competitions and notebooks.

In 2D CNN, the kernel moves in 2 directions. The input and output data of 2D CNN is 3 dimensional. Mostly used on Image data. Definitely gonna see people using it in Kaggle CNN Image Processing competitions and notebooks

In 3D CNN, the kernel moves in 3 directions. The input and output data of 3D CNN is 4 dimensional. Mostly used on 3D Image data (MRI, CT Scans). Haven't personally seen applied version in competitions

Deckard answered 31/8, 2020 at 21:53 Comment(0)

Recommended topics

Hot tags