Background
I have been playing around with Deep Dream
and Inceptionism
, using the Caffe
framework to visualize layers of GoogLeNet
, an architecture built for the Imagenet
project, a large visual database designed for use in visual object recognition.
You can find Imagenet
here: Imagenet 1000 Classes.
To probe into the architecture and generate 'dreams', I am using three notebooks:
https://github.com/kylemcdonald/deepdream/blob/master/dream.ipynb
https://github.com/auduno/deepdraw/blob/master/deepdraw.ipynb
The basic idea here is to extract some features from each channel in a specified layer from the model or a 'guide' image.
Then we input an image we wish to modify into the model and extract the features in the same layer specified (for each octave), enhancing the best matching features, i.e., the largest dot product of the two feature vectors.
So far I've managed to modify input images and control dreams using the following approaches:
- (a) applying layers as
'end'
objectives for the input image optimization. (see Feature Visualization)- (b) using a second image to guide de optimization objective on the input image.
- (c) visualize
Googlenet
model classes generated from noise.
However, the effect I want to achieve sits in-between these techniques, of which I haven't found any documentation, paper, or code.
Desired result (not part of the question to be answered)
To have one single class or unit belonging to a given
'end'
layer (a) guide the optimization objective (b) and have this class visualized (c) on the input image:
An example where class = 'face'
and input_image = 'clouds.jpg'
:
please note: the image above was generated using a model for face recognition, which was not trained on the Imagenet
dataset. For demonstration purposes only.
Working code
Approach (a)
from cStringIO import StringIO
import numpy as np
import scipy.ndimage as nd
import PIL.Image
from IPython.display import clear_output, Image, display
from google.protobuf import text_format
import matplotlib as plt
import caffe
model_name = 'GoogLeNet'
model_path = 'models/dream/bvlc_googlenet/' # substitute your path here
net_fn = model_path + 'deploy.prototxt'
param_fn = model_path + 'bvlc_googlenet.caffemodel'
model = caffe.io.caffe_pb2.NetParameter()
text_format.Merge(open(net_fn).read(), model)
model.force_backward = True
open('models/dream/bvlc_googlenet/tmp.prototxt', 'w').write(str(model))
net = caffe.Classifier('models/dream/bvlc_googlenet/tmp.prototxt', param_fn,
mean = np.float32([104.0, 116.0, 122.0]), # ImageNet mean, training set dependent
channel_swap = (2,1,0)) # the reference model has channels in BGR order instead of RGB
def showarray(a, fmt='jpeg'):
a = np.uint8(np.clip(a, 0, 255))
f = StringIO()
PIL.Image.fromarray(a).save(f, fmt)
display(Image(data=f.getvalue()))
# a couple of utility functions for converting to and from Caffe's input image layout
def preprocess(net, img):
return np.float32(np.rollaxis(img, 2)[::-1]) - net.transformer.mean['data']
def deprocess(net, img):
return np.dstack((img + net.transformer.mean['data'])[::-1])
def objective_L2(dst):
dst.diff[:] = dst.data
def make_step(net, step_size=1.5, end='inception_4c/output',
jitter=32, clip=True, objective=objective_L2):
'''Basic gradient ascent step.'''
src = net.blobs['data'] # input image is stored in Net's 'data' blob
dst = net.blobs[end]
ox, oy = np.random.randint(-jitter, jitter+1, 2)
src.data[0] = np.roll(np.roll(src.data[0], ox, -1), oy, -2) # apply jitter shift
net.forward(end=end)
objective(dst) # specify the optimization objective
net.backward(start=end)
g = src.diff[0]
# apply normalized ascent step to the input image
src.data[:] += step_size/np.abs(g).mean() * g
src.data[0] = np.roll(np.roll(src.data[0], -ox, -1), -oy, -2) # unshift image
if clip:
bias = net.transformer.mean['data']
src.data[:] = np.clip(src.data, -bias, 255-bias)
def deepdream(net, base_img, iter_n=20, octave_n=4, octave_scale=1.4,
end='inception_4c/output', clip=True, **step_params):
# prepare base images for all octaves
octaves = [preprocess(net, base_img)]
for i in xrange(octave_n-1):
octaves.append(nd.zoom(octaves[-1], (1, 1.0/octave_scale,1.0/octave_scale), order=1))
src = net.blobs['data']
detail = np.zeros_like(octaves[-1]) # allocate image for network-produced details
for octave, octave_base in enumerate(octaves[::-1]):
h, w = octave_base.shape[-2:]
if octave > 0:
# upscale details from the previous octave
h1, w1 = detail.shape[-2:]
detail = nd.zoom(detail, (1, 1.0*h/h1,1.0*w/w1), order=1)
src.reshape(1,3,h,w) # resize the network's input image size
src.data[0] = octave_base+detail
for i in xrange(iter_n):
make_step(net, end=end, clip=clip, **step_params)
# visualization
vis = deprocess(net, src.data[0])
if not clip: # adjust image contrast if clipping is disabled
vis = vis*(255.0/np.percentile(vis, 99.98))
showarray(vis)
print octave, i, end, vis.shape
clear_output(wait=True)
# extract details produced on the current octave
detail = src.data[0]-octave_base
# returning the resulting image
return deprocess(net, src.data[0])
I run the code above with:
end = 'inception_4c/output'
img = np.float32(PIL.Image.open('clouds.jpg'))
_=deepdream(net, img)
Approach (b)
"""
Use one single image to guide
the optimization process.
This affects the style of generated images
without using a different training set.
"""
def dream_control_by_image(optimization_objective, end):
# this image will shape input img
guide = np.float32(PIL.Image.open(optimization_objective))
showarray(guide)
h, w = guide.shape[:2]
src, dst = net.blobs['data'], net.blobs[end]
src.reshape(1,3,h,w)
src.data[0] = preprocess(net, guide)
net.forward(end=end)
guide_features = dst.data[0].copy()
def objective_guide(dst):
x = dst.data[0].copy()
y = guide_features
ch = x.shape[0]
x = x.reshape(ch,-1)
y = y.reshape(ch,-1)
A = x.T.dot(y) # compute the matrix of dot-products with guide features
dst.diff[0].reshape(ch,-1)[:] = y[:,A.argmax(1)] # select ones that match best
_=deepdream(net, img, end=end, objective=objective_guide)
and I run the code above with:
end = 'inception_4c/output'
# image to be modified
img = np.float32(PIL.Image.open('img/clouds.jpg'))
guide_image = 'img/guide.jpg'
dream_control_by_image(guide_image, end)
Question
Now the failed approach how I tried to access individual classes, hot encoding the matrix of classes and focusing on one (so far to no avail):
def objective_class(dst, class=50):
# according to imagenet classes
#50: 'American alligator, Alligator mississipiensis',
one_hot = np.zeros_like(dst.data)
one_hot.flat[class] = 1.
dst.diff[:] = one_hot.flat[class]
To make this clear: the question is not about the dream code, which is the interesting background and which is already working code, but it is about this last paragraph's question only: Could someone please guide me on how to get images of a chosen class (take class #50: 'American alligator, Alligator mississipiensis'
) from ImageNet (so that I can use them as input - together with the cloud image - to create a dream image)?
cat.jpg
and you get out its predicted class, and you do not need to have all of the input images and labels that GoogLeNet uses to train and test itself. That is the advantage of the pretrained models. Instead, you want to filter the ImageNet input images classified as #50 in gist.github.com/yrevar/942d3a0ac09ec9e5eb3a. Thus, is this a pure ImageNet question? – Woodleyout = model.predict(img) # note: the model has three outputs
, but rather how to get the class #50 from ImageNet. Which should not be so difficult? I expect a simple filter on the input images using the assigned label. Is this correct? If you really want to use the output of GoogLeNet, it will be just a predicted class of images of your chosen input. Insofar, GoogLeNet does not play a role here unless you want to predict the classes of your own images and use them for Deep Dream. – Woodley