Cool post! I'm somewhat curious whether the data quality scoring has actually translated into better data; do you have numbers on how much more of your data is useful for training vs in May?
so the neural quality real-time checking was the most important thing here. Before we rewrote the backend, between 58-64% of participant hours were actually usable data. Now, it's between 90-95%
If you mean the text quality scoring system, then when we added that, it improved the amount of text we got per hour of neural data by between 30-35%. (That includes the fact that we filter which participants we have return based on their text quality scores)
so the neural quality real-time checking was the most important thing here. Before we rewrote the backend, between 58-64% of participant hours were actually usable data. Now, it's between 90-95%
If you mean the text quality scoring system, then when we added that, it improved the amount of text we got per hour of neural data by between 30-35%. (That includes the fact that we filter which participants we have return based on their text quality scores)