So i think you are on the right track w/r/t your step 1 (apply some algorithm to the image, which converts it into a set of features).
This project is more challenging that most ML problems because here you will actually have to create your training data set from the raw data (the individual frames comprising the cartoons). For instance, grab a frame, identify two characters in that frame, Dilbert and the character with horns (Dilbert's boss i believe, don't know his name), extract those two characters from that frame and append to each the appropriate class label (e.g., "1" for Dlibert).
Step 1
To extract the individual characters from each of the frames comprising the a Dilbert cartoon, i would suggest a spectral decomposition of each frame. If you are not familiar with this technique, at its core, it's just an eigenvector decomp.
If you like python (or R, given that you can use python-to-R bindings like RPy) then i would strongly encourage you to look at sklearn. In particular, this excellent library (which was originally developed under the SciPy scikits project umbrella, and therefore uses NumPy + SciPy for matrix computation) has several algorithms for image segmentation, one of which is based on spectral clustering. For this step in your Project, you would most likely want to look at these two scikits.learn modules
Included with these two modules are two good example scripts, one segmenting a digital photograph and the other segmenting an image comprised of three partially super-imposed circles with minimal contrast w/r/t each other and w/r/t the background--both, i suspect are more difficult problems that the decompositions you will need to perform. In other words, sklearn has two complete, well-documented example scripts included in the source distribution, both of which process data similar to yours. Either or both would be an excellent template for this step.
Step 2
So that's the first step; here's the second: sort all of the components of the decomposed images into groups, one group for each Dilbert character. Next, assign a class label to each Group, e.g., if there are four characters from your decomposition step, then a decent choice for class labels is "0", "1", "2", and "3." Append those class labels to the component matrices (the decomposition products from step 1) so each character matrix is mapped to its corresponding class (Dilbert character).
Step 3
Select a suitable ML technique. You have many choices for this step; the only criteria are that the technique is in the supervised category (because you have assigned class labels to your data) and that it function as a classifier (i.e., it returns a class label, versus a regressor which outputs a numerical value). Given this is a personal project, i would chose the one that seems most interesting to you. A few that satisfy the criteria i just mentioned are: multi-layer perceptron (neural network), support vector machine (SVM), and k-nearest neighbors (kNN).
Step 4
train, validate, and test your classifier
Alternative Technique: Template Matching
Once Step 1 is completed (each image is decomposed into a set of objects, some of which will no doubt represent the characters) you can manually sift through these decomposition products and collect exemplars for each character in the cartoon. The are the templates.
Next you compare objects segmented from an image with this set of unique templates. In scikit-image, another scipy scikit, you can use the method match_template, to which you pass in a template image and a candidate image, and this method returns a 2D array showing the pixel-by-pixel correlation (between -1 and 1).