Peripheral-Free ASCII Input Using Intel® RealSense™ Technology

(This white paper is also published in Intel Developer Zone here)

If the goal of virtual input devices like those that can be created with Intel® RealSense™ technology merged with an appropriate Natural User Interface (NUI), is to be competitive with, or a replacement for, established physical inputs like mouse and keyboard, they must address the matter of text input. Where current-gen NUI technology has done a reasonable job of competing with the mouse, a visual and spatially contextualized input method, it has fallen notably short of competing with the keyboard.

The primary problem facing a virtual keyboard replacement is speed of input, and speed is necessarily related to accuracy. However, as sensor accuracy continues to improve, we can see other challenges arise that may prove more difficult to address.

In this paper I start with the assumption that sensor accuracy does, or soon will, allow the detection of small hand movements of a degree similar to that of keystrokes. Then I examine the opportunities, challenges, and some possible solutions for virtualized textual input without the need for a physical peripheral beyond the camera sensor.

A Few Ground Rules For This Discussion

For the sake of this paper, I’ll be discussing the potential replacement of a western style keyboard. The specific configuration, QWERTY or otherwise, is irrelevant to the main point. With that in mind, keyboards can be considered in an hierarchy of complexity, from a numeric 10-key, through extended computer keyboards that include letters, numbers, punctuation, and various macro and function keys.

As noted above, factors most likely to make or break any proposed replacement to the keyboard are speed, followed by accuracy. Sources disagree on the average typing speed of a modern tech employee, and Words Per Minute (WPM) may be an improper measure of keyboarding skills for code writing, but it will serve as a useful comparative metric. I will assume that 40WPM is a reasonable speed, and methods that cannot realistically reach that goal given an experienced user should be discarded.

I will also be focused on using the Latin alphabet; however, other alphabetic languages are similar in application. What is not considered here, though well worth exploring, is the virtualization of logographic input. It’s conceivable that gestural encoding of conceptual content would be faster and superior to gestures that encode phonemes and may even represent a linguistic evolutionary advance from logographic systems like Kanji. That said, a very likely use case for this kind of technology is writing computer code, in which case letter-by-letter input is an overarching consideration.