I just gave a talk at SecondConf where I demonstrated the use of the iPhone's camera to track a colored object using OpenGL ES 2.0 shaders. The post accompanying that talk, including my slides and sample code for all demos can be found here.
The sample application I wrote, whose code can be downloaded from here, is based on an example produced by Apple for demonstrating Core Image at WWDC 2007. That example is described in Chapter 27 of the GPU Gems 3 book.
The basic idea is that you can use custom GLSL shaders to process images from the iPhone camera in realtime, determining which pixels match a target color within a given threshold. Those pixels then have their normalized X,Y coordinates embedded in their red and green color components, while all other pixels are marked as black. The color of the whole frame is then averaged to obtain the centroid of the colored object, which you can track as it moves across the view of the camera.
While this doesn't address the case of tracking a more complex object like a foot, shaders like this should be able to be written that could pick out such a moving object.
As an update to the above, in the two years since I wrote this I've now developed an open source framework that encapsulates OpenGL ES 2.0 shader processing of images and video. One of the recent additions to that is a GPUImageMotionDetector class that processes a scene and detects any kind of motion within it. It will give you back the centroid and intensity of the overall motion it detects as part of a simple callback block. Using this framework to do this should be a lot easier than rolling your own solution.