Voice AI Systems Are Vulnerable to Hidden Audio Attacks

62 points • by SVI • today at 11:51 AM • 17 comments • view on HN

Comments

Isn't it the "adversarial image" attack, well-known in (earlier) visual recognition models [1]? That would be a quite obvious vector.

[1]: https://www.science.org/content/article/turtle-or-rifle-hack...

➕ show 1 reply

JoblessWonder • today at 4:29 PM

This isn't fully "Hidden" but I've always wondered if Ai scraping is the reason why short form videos on Youtube/TikTok/Instagram featuring film/tv clips will sometimes have 2 audio tracks... one with the actual audio from the clip a little louder and one audio track with a computer generated narrator providing running commentary of what is happening and why. As a human I'm able to tune it out but it is very weird/jarring.

In case anyone hasn't had the displeasure of viewing these I'll link some in a comment below once I scroll through my feed and find one.

➕ show 2 replies

moffkalast • today at 4:26 PM

Phreaking is back on the menu, boys.

➕ show 1 reply

naveenraj-17 • today at 3:24 PM

I believe that will be purely based on how the AI Models stored the voices in their neural networks. If we can debug that, then we would be able to send a secret sounnd a AI model might be able to understand due to it's internat connections, but that doesn't make sense to us. Until then, there's no harm, is what my view is

leonulicnik • today at 3:43 PM

Does this transfer to Whisper / CLAP-type audio models or is it ASR-decoder specific? Whisper would be intresting given how widely it's used in prod.

➕ show 2 replies

wutwutwat • today at 4:15 PM

Related: Benn Jordon shows how to poison pill AI harvesting music for training

The Art Of Poison-Pilling Music Files

https://www.youtube.com/watch?v=xMYm2d9bmEA

alt Hacker News

Voice AI Systems Are Vulnerable to Hidden Audio Attacks

Comments