logoalt Hacker News

Neywinylast Friday at 4:31 PM2 repliesview on HN

Not sure if that counts as lying but I've heard that an ML model (way before all this GPT LLM stuff) learned to classify images based on the text that was written. For an obfuscated example, it learned to read "stop", "arrêt", "alto", etc. on a stop sign instead of recognizing the red octagon with white letters. Which naturally does not work when the actual dataset has different text.


Replies

Jon_Lowteklast Friday at 4:50 PM

typographic attacks against vision-language models are still a thing with more recent models like GPT4-V: https://arxiv.org/abs/2402.00626

catigulalast Friday at 4:45 PM

That does feel a little more like over-fitting, but you might be able to argue that there's some philosophical proximity to lying.

I think, largely, the

  Pre-training -> Post-training -> Safety/Alignment training
pipeline would obviously produce 'lying'. The trainings are in a sort of mutual dissonance.