I've recently watched John Underkloffer's presentation on 3D UIs, and how he helped create the presentation for the film Minority Report. You know the scene, the one where Tom Cruise is working his way through the UI with a series of hand gestures (although the one in Iron Man 2 is an upgrade). As I was watching the clip, I watched Underkloffer work through a prototype, wearing gloves (as Cruise did, but Downey did not), and he had to make all of these esoteric gestures to make it behave.

As he did so, I flashed on the way my wife has to use her iPhone, with these weird gestures she has to use to get it to behave. There's an entire library of gestures, and even worse, those gestures can mean completely different things in different contexts-- in different programs, or even in different modes of the same program.

Underkloffer talks about how the WIMP (Window, Icon, Menu, Pointer) interface was a miracle when it debut popularly in the Macintosh, but we haven't really progressed far from there. He wants us to stretch beyond that interactive format.

The thing about the WIMP interface, and one of the reasons we haven't progressed far from that original design, is that it's absolutely minimal in what you need to know from the start to make the system behave. Point, click, read. You don't need to memorize a whole slew of esoteric commands, as you did with DOS (or as we Linux people pride ourselves on doing). Well-written UIs have discoverability and affordance, with the written word and the icon as the primary cues as to what to do next.

Underkloffer's demonstration shows a world where affordance and discoverability don't exist; you have to know the gestures, or be shown them, before you can do anything. Maybe we'll have the bandwidth per application to teach that, maybe not. But the 3D UI (and all gestural UIs, like those in tablets and phones) is a step back to the era when we had to know some esoteric and unfamiliar activity-- a code word, a gesture-- to get anything done.

Most people don't love Emacs. I understand that.

On the other hand, I did love one bit about Underkloffer's essay. Back in the summer of 1992, I had the good luck to accompany a student group to a presentation and dinner by Dr. Timothy Leary. At that dinner, Dr. Leary and I got into a rather heated discussion about virtual reality.

Leary's contention was that virtual reality was never going be the stuff of home installations. It was too expensive, too complicated. We'd have to go to places, like we go to theaters, to get the full virtual reality experience. He was adamant; by 2010, there'd be these places in malls you'd go to have what sounded a lot like Huxley's "feelies."

I argued that we were already there. We had mucks at the time, which were the beginnings of a communal virtual experience. He was highly dismissive: after all, that was still text on a screen. The whole goggles-and-gloves things would never happen in the home. I argued that the problem was one of bandwidth, which had grown in leaps and bounds in the ten years since the earliest BBSes.

Underkloffer's vision is that five years from now every object we buy will have spatial sensors in the bezel, and interaction with the real world is just a matter of time and effort, the development of software to meaningfully interpret our gestures and convert them into commands.

I look forward to that.