logoalt Hacker News

dijksterhuistoday at 3:57 PM0 repliesview on HN

In general, if you zoom all the way out, yes the high level optimization problem is very similar. find some `delta` where `target_y = model_inference(delta + x)` where `target_y != real_y` and `size_of(delta) < threshold`

But (1) older audio models typically used different architectures like RNNs (Recurrent networks) which came with additional challenges compared to the CNNs (Convolutional networks) that image models used. e.g. the exploding gradients problem. during training of RNNs vanishing gradients are a potential problem. during advex optimization the problem gets inverted and you have to do different things to solve it.

Also (2) the human stuff related to imperceptibility is very different with audio. Ears vs eyes.

So, they're the same, but different.

source -- this is what my (unfinished) phd was on. i should really write up the attack that i crafted, but never got published :(