logoalt Hacker News

maxlohtoday at 12:22 PM4 repliesview on HN

It could be an attack surface. Maybe one day, when we find a chatbot online, we could let it guess a random number repeatedly, then accurately infer the underlying model based on the resulting distribution.


Replies

dijksterhuistoday at 1:19 PM

i did something in my phd developing an attack against mozilla deepspeech.

deepspeech used the CTC algorithm [0], which adds a “blank” character token to indicate repeats of a predicted normal alphabet character token over a sequence of audio/speech feature inputs.

so "h==e=l===l===o====" maps to "hello"

the model becomes super biased towards predicting that blank token. one speech feature is like 0.1 second of audio or less (can’t remember off hand). so there are a lot of alphabet character token repeats. off hand i seem to remember the predicted token distribution over like 1000 audio files was 50% blank token and then 50% distributed across the rest of the alphabet.

as a result, you can get significantly smaller perturbations when generating adversarial examples. by like a factor of 2-4 or something. all you need to do is prioritise blank tokens in your target output.

i spent 2 years trying to find a super clever attack. turns out all i needed to do was make one simple graph counting characters. xD

[0]: https://en.wikipedia.org/wiki/Connectionist_temporal_classif...

alistairSHtoday at 12:59 PM

Proto-Voight-Kampff Test?

vidarhtoday at 12:31 PM

At least some Claude models have a thing for numbers that contains "47"...

smokeltoday at 12:39 PM

In order to find out how real humans reply:

Please guess a number between 1 and 100.

show 13 replies