Data leakage is an eval problem, not an accuracy problem. That is, the problem is not that the AI ...

wavemode • last Wednesday at 8:51 PM • 0 replies • view on HN

Data leakage is an eval problem, not an accuracy problem.

That is, the problem is not that the AI is wrong X% of the time. The problem is that, in the presence of a data leak, there is no way of knowing what the value of X even is.

This problem is recursive - in the presence of a data leak, you also cannot know for sure the quantity of data that has leaked.

alt Hacker News