We’re better at listening to a person when there are more people taking in a crowd, but Google’s AI is now as good as us! You might have noticed that the AI from smart speakers or the phone assistant is not able to pick out your voice when another person is talking at the same time. And you end up repeating the command…
Google’s Blog Post on AI
But Google has trained their AI to separate voices that talk at the same time, and they showed the public how it’s done by using a video recording. In the video posted by Google to explain the new AI ability, they showed with both sound and visual signals that the AI is actually lip reading the people in the footage. The post reads:
“The visual signal not only improves the speech separation quality significantly, in cases of mixed speech (compared to speech separation using audio alone, as we demonstrate in our paper). Importantly, it also associates the separated, clean speech tracks with the visible speakers in the video.”
Looking at one of the videos Google has used to show the AI ability to pick sounds, it’s very impressing to see that it picks out a single voice when two comedians talk loudly at each other, muting the other person’s speech.
Google stated that the AI does this task by selecting the face of that person it wants to hear or it selects the face algorithmically, according to context.
Worries About Privacy
This technology is great, but the public is a little bit suspicious about privacy because it implies using cameras for speech recognition. However, Google immediately explains their intentions.
They stated that the current generation of smart speakers doesn’t use cameras to interact with its users, but the future devices could do that, in order to offer video calling from the comfort of your home, while sitting on the couch. Other uses point to a better AI performance in software that uses voice control like phones, PCs, tablets or TVs.