I worked on Dictation and Siri interaction designs for the world’s first true “spatial computer”: Apple Vision Pro, announced in June 2023. The eye-tracking capability of the device unlocks incredibly magical experiences, such as searching the web merely by looking at Safari’s address bar and speaking.
My contributions to Dictation included multimedia mockups illustrating how a machine-learned model trained on gaze coordinate streams could distinguish intended dictation events from incidental speech.
The visionOS behavior follows the precedent set by Safari Voice Search for iOS/iPadOS, but it’s even more natural and fluid in that you don’t have to target a graphical element (i.e., a microphone button) before you can start speaking.
Siri also benefits from the superpower of gaze awareness, taking into account where you’re looking when you say commands like “add this to Up Next” or “share this with Tasha.”
My Siri contributions included multimedia mockups of realtime feedback in apps during active conversations with the assistant, as well as mockups, policy bullets and state diagrams covering invocation, listening, responding and dismissal behavior.
We’re only scratching the surface of what’s possible at the intersection of gaze, gesture and voice on Vision Pro. We’re certainly not the first to explore such territory — check out this pioneering work from 1980 (!) — but I’m proud of what we accomplished.