In this generation of consoles motion controls have come to prominence. The most ambitious implementation is Microsoft’s Kinect. Marketed with the phrase “You are the controller,” its aim is to free us from the tyranny of the button by providing a more natural way to interact with our entertainment.
Using a camera as an input device creates some interesting new opportunities for developers. Acting out context sensitive movements, like climbing a rope, can add an additional layer of immersion to a game. It also introduces some new challenges for developers. The lack of tactile feedback can make the fact that you’re miming out movements even more apparent.
For my first post at #AltDev I will walk through creating a simple interaction using a camera with only the image data. By this I mean I’m only working in 2D and not accounting for depth nor attempting to make a skeleton. Computer vision is a rather hefty subject so I’ll be glossing over the finer points, treating this as a primer to the material. The main goal is to talk about how the interaction is created rather than delving into vision algorithms, which I can go into in a future post, so let me know in the comments if that’s something you’re interested in.
So with that said lets begin.
Determining the Foreground
So you’re sitting at your desk reading this article. You peer over your monitor and see a coworker standing off in the distance. How did you know he wasn’t there the entire time? Well last time you looked he wasn’t there so he must have shown up sometime between now and the last time you looked.
Background subtraction is a camera applying this same logic. The gist of the operation is entirely in the name. If you take an image of a space with no people in it you have your background. You then compare this image to the current image and any difference between the two indicates that something has appeared in the foreground. How that difference is determined is dependent upon the algorithm used, as implementations vary on their speed and accuracy.
Collision Detection
Imagine you have a cup of coffee in front of you, feel free to substitute for your beverage of choice. The catch is every time you go to grab for it your hand passes right through it. How would that make you feel? Frustrated to say the least.
This isn’t something you’d expect to happen in the real world and those expectations carry forward in the virtual. If you can’t interact with the world you don’t feel like you’re a part of it. You’re essentially a ghost at that point.
Adding collision detection and response grounds the person in the world. The actual means of the collision system is beyond the scope of this article, but it shouldn’t allow an object to enter the silhouette of the person.
At this point an interaction like this should be possible with the person using their silhouette to manipulate the object.