Exploratory building of rich non-traditional (not necessarily handless) user interfaces is becoming increasingly accessible. For instance, here's a web demo of Google's MediaPipe's face and iris tracking[1]. And hand tracking[2] with a downward-facing camera, permits enriching keyboard and touchpad/tablets: annotating events with which finger was used, and where on a keycap it was pressed; hand and finger gestures in 2D and 3D; and positional touchless events. And speech to text... sigh.
But doing sensor fusion is hard. And strongly impacts system architecture. "Launch the missiles"... 1000 ms later... oh, nope, that was "Lunch is mussels in butter". "Spacebar keypress event"... 50 ms later... "btw, that was a thumb, at the 20% position". "Ok, so 2000 ms ago, just before the foo, there was a 3D gesture bar". So app state needs to easily roll backward and forward, because you won't fully know what happened now until seconds from now. Upside is traditional "have to wait a bit until we know whether mumble" latencies can be optimistically speculated away.
But doing sensor fusion is hard. And strongly impacts system architecture. "Launch the missiles"... 1000 ms later... oh, nope, that was "Lunch is mussels in butter". "Spacebar keypress event"... 50 ms later... "btw, that was a thumb, at the 20% position". "Ok, so 2000 ms ago, just before the foo, there was a 3D gesture bar". So app state needs to easily roll backward and forward, because you won't fully know what happened now until seconds from now. Upside is traditional "have to wait a bit until we know whether mumble" latencies can be optimistically speculated away.
[1] https://viz.mediapipe.dev/demo/iris_tracking ("run" button is top right) [2] https://viz.mediapipe.dev/demo/hand_tracking