logoalt Hacker News

notahacker12/08/20241 replyview on HN

The researchers' suggestion that certain architectural features might have been encoded in the sound [which is at least superficially plausible] is rather undermined the data leakage in the model also leading to it generate the right colour signage in the right part of multiple images. The fidelity of the sound clearly isn't enough for the model to register key aspects of the sign's geometry like it only being a few feet from the observer, but it has somehow managed to pick up that it's green and x pixels from the left of the image...


Replies

mewpmewp212/08/2024

I don't know if data leakage is the right word, but maybe overfitting if they took a 1 hour clip from same place and used 90 percent for training and 10 percent for eval/test?

It is still decent way to start I think, but it needs to get more varied data after that and use different geographical locations for eval and test.