What methods/algorithms are used for gesture recognition in a multi-touch environment?

Asked 19/3, 2009 at 19:13 Answered 22/3, 2009 at 9:2

In a multi-touch environment, how does gesture recognition work? What mathematical methods or algorithms are utilized to recognize or reject data for possible gestures?

I've created some retro-reflective gloves and an IR LED array, coupled with a Wii remote. The Wii remote does internal blob detection and tracks 4 points of IR light and transmits this information to my computer via a bluetooth dongle.

This is based off Johnny Chung Lee's Wii Research. My precise setup is exactly like the graduate students from the Netherlands displayed here. I can easily track 4 point's positions in 2d space and I've written my basic software to receive and visualize these points.

alt text

The Netherlands students have gotten a lot of functionality out of their basic pinch-click recognition. I'd like to take it a step further if I could, and implement some other gestures.

How is gesture recognition usually implemented? Beyond anything trivial, how could I write software to recognize and identify a variety of gestures: various swipes, circular movements, letter tracing, etc.

Horripilation answered 19/3, 2009 at 19:13 Comment(0)

Gesture recognition, as I've seen it anyway, is usually implemented using machine learning techniques similar to image recognition software. Here's a cool project on codeproject about doing mouse gesture recognition in c#. I'm sure the concepts are quite similar since you can likely reduce the problem down to 2D space. If you get something working with this, I'd love to see it. Great project idea!

Therewithal answered 19/3, 2009 at 19:20 Comment(0)

One way to look at it is as a compression / recognition problem. Basically, you want to take a whole bunch of data, throw out most of it, and categorize the remainder. If I were doing this (from scratch) I'd probably proceed as follows:

work with a rolling history window
take the center of gravity of the four points in the start frame, save it, and subtract it out of all the positions in all frames.
factor each frame into two components: the shape of the constellation and the movement of it's CofG relative to the last frame's.
save the absolute CofG for the last frame too
the series of CofG changes gives you swipes, waves, etc.
the series of constellation morphing gives you pinches, etc.

After seeing your photo (two points on each hand, not four points on one, doh!) I'd modify the above as follows:

Do the CofG calculation on pairs, with the caveats that:
- If there are four points visible, pairs are chosen to minimize the product of the intrapair distances
- If there are three points visible, the closest two are one pair, the other one is the other
- Use prior / following frames to override when needed
Instead of a constellation, you've got a nested structure of distance / orientation pairs (i.e., one D/O between the hands, and one more for each hand).
Pass the full reduced data to recognizers for each gesture, and let them sort out what they care about.

If you want to get cute, do a little DSL to recognize the patterns, and write things like:

fire when
    in frame.final: rectangle(points) 
  and
    over frames.final(5): points.all (p => p.jerk)

fire when
    over frames.final(3): hands.all (h => h.click)

Oblation answered 19/3, 2009 at 19:45 Comment(2)

@MarkusQ, thanks for the comments. Just for reference, the Netherlands students clicking algorithm works as follows: if the wiimote loses track of a point and the other point in its pair was within a closeness threshold, then it is a "click". – Horripilation 19/3, 2009 at 20:0

As the fingers come together, the wiimote will see both the fingers as one blob and not two any longer. So it loses one of its points. This can also happen when your hands are no longer visible so the closeness threshold is used to prevent false positives. – Horripilation 19/3, 2009 at 20:1

A video of what has been done with this sort of technology, if anyone is interested?

Pattie Maes demos the Sixth Sense - TED 2009

Dincolo answered 21/3, 2009 at 5:13 Comment(0)

Err.. I've been working on gesture recognition for the past year or so now, but I don't want to say too much because I'm trying to patent my technology :) But... we've had some luck with adaptive boosting, although what you're doing looks fundamentally different. You only have 4 points of data to process, so I don't think you really need to "reduce" anything.

What I would investigate is how programs like Flash turn a freehand drawn circle into an actual circle. It seems like you could track the points for duration of about a second, and then "smooth" the path in some fashion, and then you could probably get away with hardcoding your gestures (if you make them simple enough). Otherwise, yes, you're going to want to use a learning algorithm. Neural nets might work... I don't know. Just tossing out ideas :) Maybe look at how OCR is done too... or even Hough transforms. It looks to me like this is a problem of recognizing shapes more than it is of recognizing gestures.

Tampa answered 21/3, 2009 at 5:3 Comment(2)

Well, fundamentally, drawing a circle, an x, or swiping all 4 points across in different directions ~are~ gestures. In my 2d world my gestures are shapes. I'll have to look further into the learning algorithms though. – Horripilation 21/3, 2009 at 5:7

Well, yes, they are gestures, I just mean that if you can figure out what shape it makes, you can also figure out what gesture it was. i.e., I think the gesture recognition is reducible to shape recognition (which may be an easier problem to solve -- less probabilistic). – Tampa 23/3, 2009 at 19:34

Most simple gesture-recognition tools I've looked at use a vector-based template to recognize them. For example, you can define right-swipe as "0", a checkmark as "-45, 45, 45", a clockwise circle as "0, -45, -90, -135, 180, 135, 90, 45, 0", and so on.

Posit answered 22/3, 2009 at 9:2 Comment(0)

-1

I'm not very well versed in this type of mathematics, but I have read somewhere that people sometimes use Markov Chains or Hidden Markov Models to do Gesture Recognition.

Perhaps someone with a little more background in this side of Computer Science can illuminate it further and provide some more details.

Corroborate answered 21/3, 2009 at 4:11 Comment(0)

Recommended topics

Hot tags