Maybe I don’t understand this data labeling issue - are you talking about imbalanced classification dataset? Are hard classes under-represented or missing labels completely?
None of those (but they could be added to the mix to complicate matters).
Consider the case that the labelers creates the labelled training set by cherry picking those examples that are easy to label. He labels many, but selects the items to label according to his preference.
First question, is this even a problem. Yes, most likely. But why ? How to fix it ? When are such fixes even possible.
None of those (but they could be added to the mix to complicate matters).
Consider the case that the labelers creates the labelled training set by cherry picking those examples that are easy to label. He labels many, but selects the items to label according to his preference.
First question, is this even a problem. Yes, most likely. But why ? How to fix it ? When are such fixes even possible.