I am attempting to understand more about computer vision models, and I'm trying to do some exploring of how they work. In an attempt to understand how to interpret feature vectors more I'm trying to use Pytorch to extract a feature vector. Below is my code that I've pieced together from various places.
import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torch.autograd import Variable
from PIL import Image
# Load the pretrained model
model = models.resnet18(pretrained=True)
# Use the model object to select the desired layer
layer = model._modules.get('avgpool')
# Set model to evaluation mode
transforms = torchvision.transforms.Compose([
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
def get_vector(image_name):
# Load the image with Pillow library
img = Image.open("Documents/Documents/Driven Data Competitions/Hateful Memes Identification/data/01235.png")
# Create a PyTorch Variable with the transformed image
t_img = transforms(img)
# Create a vector of zeros that will hold our feature vector
# The 'avgpool' layer has an output size of 512
my_embedding = torch.zeros(512)
# Define a function that will copy the output of a layer
def copy_data(m, i, o):
# Attach that function to our selected layer
h = layer.register_forward_hook(copy_data)
# Run the model on our transformed image
# Detach our copy function from the layer
# Return the feature vector
return my_embedding
pic_vector = get_vector(img)
When I do this I get the following error:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 224, 224] instead
I'm sure this is an elementary error, but I can't seem to figure out how to fix this. It was my impression that the "totensor" transformation would make my data 4-d, but it seems it's either not working correctly or I'm misunderstanding it. Appreciate any help or resources I can use to learn more about this!