Maybe I don’t understand this data labeling issue - are you talking about imbalanced classification ...

p1esk • yesterday at 10:14 PM • 1 reply • view on HN

Maybe I don’t understand this data labeling issue - are you talking about imbalanced classification dataset? Are hard classes under-represented or missing labels completely?

Replies

srean • yesterday at 10:37 PM

None of those (but they could be added to the mix to complicate matters).

Consider the case that the labelers creates the labelled training set by cherry picking those examples that are easy to label. He labels many, but selects the items to label according to his preference.

First question, is this even a problem. Yes, most likely. But why ? How to fix it ? When are such fixes even possible.

alt Hacker News

Replies