We Tried New AI Lip Reading Tool—Here’s How it Fared

Audio tech startup Symphonic Labs has released an online tool showcasing how their AI performs at lip reading. We put it to the test.

The San Francisco and Canada-based startup creates what it calls tools for “multimodal speech understanding” with applications including voice calls in highly noisy environments or whispering to your voice assistant in public.

“Wanna know what people like Blake Lively, Taylor Swift, LeBron James, and more are saying when the microphones aren’t around? We just launched readtheirlips.com, allowing you to upload a video of any speaker and identify inaudible speech using our AI Lip Reading model,” the startup posted on LinkedIn.

Anyone can upload a short video clip to the site and it will return text of what it calculates is being said. The video must clearly show the face and lips of the speaker.

We tested Symphonic Lab’s lip reading AI on a 26 second Getty Images clip of U.S. VP Kamala Harris speaking at an event on Gun Violence Awareness Day at Kentland Community Center on June 7, 2024 in Landover, Maryland.

For the most part, the software was pretty accurate but it got some minor parts of the speech wrong e.g. “to try to comfort them” instead of “to try and comfort them,” and some moderate errors too: “will recall every day in gun violence” instead of “or what we call everyday gun violence.” Overall, as long as the face was clear, it seemed quite accurate.

Split image of Kamala Harris, Taylor Swift. — VP Kamala Harris during the presidential debate and Taylor Swift, who has endorsed the Democratic presidential candidate. Symphonic Labs has tested its lip reading AI on Taylor Swift while Newsweek tested it on footage of…

SAUL LOEB and ANDRE DIAS NOBRE/AFP via Getty Images

We also tested it on some silent film era clips to see how it fared with grainy old black and white footage. While we can’t confirm what was actually being said, it was interesting to see what movie stars like Gloria Swanson might have been saying.

In a 23-second news reel clip from 1925, Swanson can been seen on a boat in New York Harbor with the Statue of Liberty in the background. The clip is silent and voiced over by a newsreader. Symphonic Lab’s software guesses the actor is turning to her husband and saying something approximating “I’ve been doing this for a long time, I’ve been doing it for so long,” as she waves to the camera.

Readtheirlips.com is a showcase of what Symphonic Labs is fully working on. Its Mac OS software application called MAMO integrates this technology with personal computers, allowing the user to issue voice commands “without making a sound,” Chris Samra, engineer with the startup, posted on X (formerly Twitter).

Speaking to Newsweek, Samra said that the reason he and his co-founder created the startup was “to build an interface that felt telepathic, without the need for an implant or bulky hardware.”

“In terms of novelty, our AI model serves two purposes. On the one hand it could let anyone communicate 3x faster than typing without making a sound, and on the other hand it has the ability to analyze speech at long distances or in loud environments,” added Samra.

He explained that readtheirlips.com is more of a tech demo and “not our main goal in the long-term,” although “it’s amazing to see people try to decode inaudible videos from the past that otherwise couldn’t have been decoded without our model.”

“I really think the big opportunities are in enabling the mass consumer to use conversational interfaces with much less friction, and accessibility for people with Dysphonia, RSI, and those who are hard of hearing,” said Samra.

A new update to this software now allows for the addition of personal context and vocabulary, meaning the user can train it better to work with their voice and other interactions.

“You can dictate in public and noisy environments and it will transcribe for you by reading your lips. No vocalization, additional hardware, or wearable mic required,” added Samra.

This might prove useful for many. A PwC survey on how U.S. consumers interact with voice assistants found that most people feel uncomfortable using it in public.

“Despite being accessible everywhere, three out of every four consumers (74 percent) are using their mobile voice assistants at home. The majority of focus group participants were quick to say that they prefer privacy when speaking to their voice assistant and that using it in public ‘just looks weird’,” said the report.

Source link

Login

About the Author

admin

Some Related Posts