Ask HN: How does Alexa avoid interrupting itself when saying its own name?

richarme

This is achieved using Acoustic Echo Canellation (AEC). This essentially subtracts the output of the speaker as well as reverberations from the room from the microphone input. Here's a youtube video explaining the basic principle: https://www.youtube.com/watch?v=bJKGrheOoY4

Source: worked on 3rd party Alexa speakers

Someone

The simple solution: switch off the code that listens for the “Alexa” prompt when saying “Alexa” yourself.

Slightly harder: keep it running, but discard hits that are timed close to the time you say “Alexa” yourself.

Even harder: have a second detector that is trained on the device saying “Alexa”, and discard hits that coincide with that detector firing. That second detector can be simplified by superimposing a waveform that humans will (barely) notice but that is easily detected by a computer on top of the audio whenever the device says “Alexa”.

Still harder: obtain the transfer function and latency between the speaker(s) and its microphone(s) and, using that, compute what signal you expect to hear at the microphone from the speaker’s output, and subtract that from the actual signal detected to get a signal that doesn’t include one’s one utterances.

That function could be obtained from one device in the factory or trained on-device.

I suspect the first already is close to good enough for basic devices. If you want a device that can listen whilexalso playing music at volume, the last option can be helpful.

ITB

More interesting is when we used to run Alexa commercials and it would cause a denial of service attack on ourselves with all the devices being triggered, particularly during the superbowl. In that case, we added some imperceptible noise to the audio stream so that Alexa wouldn’t trigger.

felixgallo

Not speaking for Alexa or Amazon, where I have worked in the past, but the wakeword detection is done separately in a much lower power, localized model using hardware features. On the downside this means that you can only select from a few wakewords and the filtering etc. are limited (e.g. the local model is not aware of commercials being played concurrently in the world, so has to wake up the bigger model, which can check to see if that's happening). On the positive side, it's much lower power and mitigates concerns that Alexa is listening to normal activity/conversations.

solardev

What happens if you record her saying her own name and then play it back separately? Does she respond to that if she's not actively talking at the moment?

-----

Not directly the same case but similar, Amazon trains Alexa to avoid certain mentions of her in commercials using acoustic fingerprinting techniques: https://www.amazon.science/blog/why-alexa-wont-wake-up-when-...

dtagames

It can be done in a single line of code, like this JS example: "wakeWordHeard && !sayingAlexa ? doWakeWordCommand() : null"

ma2rten

This is the same problem as echo cancellation on calls. This is something that built into a lot of software and hardware.

icecube123

Ive always thought they might have a method to ignore the wake work if theres a specific frequency sent at the same time. I’ve noticed that there are sometimes TV commercials that have the “alexa” or “hey google” wake words, but they do not activate the smart speakers. But if the smart speakers hear something close on just a random tv show they will activate.

But as others have said, they might be able to just sleep the wake algorithm temporarily when they know it’s playing back its own wake word.

nickburns

spitballing: 'her' own hardcoded waveform is a hardcorded wake word exception. i have doubts that it'd be much more complex than that.

what i do find interesting, however, is that, at times, she'll wake to an utterance from some other media i have playing and seems to 'know' immediately that she was inadvertently awoken. the 'listening' tone and 'end listening' tones sound in quick succession. i do not have voice recognition enabled (to the extent that that setting is respected).

caprock

There's usually a very small, purpose built model for hearing the initial starting phrase (Alexa, hey Google, etc). It's called a wake word model or wake word detection. Because it's a separate component, it's then fairly straightforward to disable it during an active conversation.

makerdiety

Easy. Because it's not artificial intelligence (modern advances are only a small subset of AI). It's just an expert system with recursion programed in.

Real AI doesn't need recursion that is explicitly instructed into its behavior. Because real artificial general intelligence has better things to do than to listen to human advisors and programmers who don't know about effective objective function optimization. Therefore, Alexa gets a rudimentary infinite recursion loop break statement explicitly installed into her by her human shepherds.

Edit: Recursion should be seen as a general, mathematical form of engineering constructs like acoustic echo cancellation and adaptive filtering. Recursion should be what those engineering tools get reduced to being.

smitelli

Here's something I just thought to try (although I can't do it myself; don't/won't own a smart speaker) -- If a person were to play something like "The Downeaster Alexa"[1] in the room, would that wake it up or does the fact that it's sung-not-spoken with music behind it prevent activation?

[1] https://www.youtube.com/watch?v=LESFuoW-T7I

numpad0

Microphone noise canceling? It's known what waveform is going out, so it should be trivial to subtract playback audio from recorded audio.

chuckadams

"Alexa, what is love?" https://www.youtube.com/watch?v=ESt7CTZiXqY&t=123s

Anyone actually pull this off?

goutham2688

I assume they're using some form of https://en.m.wikipedia.org/wiki/Speaker_diarisation

next_xibalba

Couldn’t it just shut off input from the mic while speaking?

ww520

Pre-generate a waveform in exact opposite of the utterance. Add it to the incoming waveform.

julianacox02

[dead]