The pytorch
tutorial (https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py) trains a convolutional neural network (CNN) on a CIFAR dataset.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
The network looks good except that the very last layer fc3
, which predicts the probability of belonging to 10 classes without a softmax. Shouldn't we apply a softmax first to make sure the output of the fc layer is between 0 and 1 and sum before calculating cross-entropy loss?
I tested this by applying the softmax and rerunning, butvthe accuracy dropped to around 35%. This seems counterintuitive. What is the explanation?