logoalt Hacker News

Voice AI Systems Are Vulnerable to Hidden Audio Attacks

62 pointsby SVItoday at 11:51 AM17 commentsview on HN

Comments

nine_ktoday at 3:51 PM

Isn't it the "adversarial image" attack, well-known in (earlier) visual recognition models [1]? That would be a quite obvious vector.

[1]: https://www.science.org/content/article/turtle-or-rifle-hack...

show 1 reply
JoblessWondertoday at 4:29 PM

This isn't fully "Hidden" but I've always wondered if Ai scraping is the reason why short form videos on Youtube/TikTok/Instagram featuring film/tv clips will sometimes have 2 audio tracks... one with the actual audio from the clip a little louder and one audio track with a computer generated narrator providing running commentary of what is happening and why. As a human I'm able to tune it out but it is very weird/jarring.

In case anyone hasn't had the displeasure of viewing these I'll link some in a comment below once I scroll through my feed and find one.

show 2 replies
moffkalasttoday at 4:26 PM

Phreaking is back on the menu, boys.

show 1 reply
naveenraj-17today at 3:24 PM

I believe that will be purely based on how the AI Models stored the voices in their neural networks. If we can debug that, then we would be able to send a secret sounnd a AI model might be able to understand due to it's internat connections, but that doesn't make sense to us. Until then, there's no harm, is what my view is

leonulicniktoday at 3:43 PM

Does this transfer to Whisper / CLAP-type audio models or is it ASR-decoder specific? Whisper would be intresting given how widely it's used in prod.

show 2 replies
wutwutwattoday at 4:15 PM

Related: Benn Jordon shows how to poison pill AI harvesting music for training

The Art Of Poison-Pilling Music Files

https://www.youtube.com/watch?v=xMYm2d9bmEA