Siri Activation and Feedback

Category
projects

People want Siri to be easy to talk to, but unobtrusive. To put it another way, you want it when you want it, and you don’t when you don’t. This poses challenges given that Siri’s hardware button is overloaded: short press to lock the screen, double-press to launch Wallet, and long press to launch Siri.

Using just your voice to summon Siri is natural and convenient, but that convenience comes at the cost of occasional false-activation blunders.

Finding the right balance between these opposing poles  — accessibility and unobtrusiveness — was at the heart of my work as the designer responsible for the mechanics of the conversation with Siri, including multimodal feedback about Siri’s state. For example: listening, processing, or ready for additional input from the user. (That’s just a high-level summary; it gets more complicated than that.)

Finding the right balance between these opposing poles  — accessibility and unobtrusiveness

The goal is, of course, an interface that feels fluid and responsive even though the microphone isn’t always (fully) on. There’s a surprising amount of technical sophistication behind the removal of the required “Hey” before “Siri,” but the gain in fluidity is tremendous.

But doing more with less speech signal is only the tip of the invocation iceberg. My direct report and I collaborated with acoustics, computer vision and location/ranging researchers to build sensor fusion techniques that can model natural engagement and disengagement behaviors across a wide range of devices and scenarios.

Our contributions included vision videos, diagrams and audiovisual mockups illustrating improvements at the turn and micro-turn levels. We also contributed to Setup and Settings UI and copy.

We also oversaw the design of the related “Back-to-Back Requests” feature (both “Siri” and Back-to-Back debuted in iOS 17). Once you have Siri’s attention, you can continue talking to it without saying its name again. For example, in the interaction at left I said “Siri, how tall is the Empire State Building,” and then “how about the Eiffel Tower?”

On iPhone 11 and later, you can even interrupt Siri while it’s talking, or speak over music, thanks to hardware and firmware support for echo cancellation. 

These interruption and back-to-back request capabilities are on by default, rather than requiring the user to turn them on via settings, as with Alexa and Google Assistant.

To achieve this simplicity, we built machine-learned models that classify Siri-directed versus human-directed speech. Siri ignores the latter, and dismisses itself automatically after a few seconds.