Cool post! I'm somewhat curious whether the data quality scoring has actually translated into b...

devanshp • last Monday at 6:46 PM • 1 reply • view on HN

Cool post! I'm somewhat curious whether the data quality scoring has actually translated into better data; do you have numbers on how much more of your data is useful for training vs in May?

Replies

rio-popper • last Monday at 7:03 PM

so the neural quality real-time checking was the most important thing here. Before we rewrote the backend, between 58-64% of participant hours were actually usable data. Now, it's between 90-95%

If you mean the text quality scoring system, then when we added that, it improved the amount of text we got per hour of neural data by between 30-35%. (That includes the fact that we filter which participants we have return based on their text quality scores)

alt Hacker News

Replies