I don't buy the notion that tests do not test relevant skills.
In my long career I've noticed a strong correlation between SAT scores and academic performance as well as job performance.
> I don't think the comparison to flight school is relevant enough in this context because it's a too different of a world to traditional academia.
My dad kept his flight school tests for flying all sorts of airplanes. They bear a lot of similarities with the SATs. There's a lot of math in there for things like fuel consumption, wind, maximum landing weight, glide distance, and so on.
For example, one day he was cruising along in his F-86 when the engine failed. he radioed the tower, and they told him to bail out. But he calculated his speed, altitude, distance, wind, sink rate, air templeratur, etc., and figured he could make the field after configuring the airplane for maximum glide. He made a perfect landing, but still got reprimanded for risking his life bringing the airplane back. But he had worked the math and disagreed that it was more risky to bring it in than bail out.
> I don't buy the notion that tests do not test relevant skills. In my long career I've noticed a strong correlation between SAT scores and academic performance as well as job performance.
SAT tests intelligence (aptitude), not skills. Which is why it correlates with job performance, where intelligence can (over some time) matter as much or more than a starting point of relevant skills.
Do you also think LLM leaderboards accurately reflect the capabilities of the models being tested? If you do, then I can easily point you to numerous academic papers pointing out the various flaws in many leaderboards (from poorly designed benchmarks like bABI and the original SQuAD, to data contamination, and more).
In that same way, any test, including the SAT and GRE have flaws. They can be gamed in ways similar to LLM leadeboards: test prep makes you better at them. That's one of the main reasons universities moved away from SAT; they were afraid that it disenfranchised lower socioeconomic status students (and it does to some degree). The issue is that the test is positively correlated with success in an undergraduate program, so they threw out the baby with the bathwster. The real issue is that the SAT is not able to distinguish the capabilities among students to the degree it purports to.
And if you want an anecdote to match all yours, the first time I took a GRE practice test, I got a 3 on the writing. Not because I'm poor at writing, but because I didn't really know what they were looking for. After reading a test prep book, I got a 4.5 on my next practice test and a 5 on my final practice test. When I finally took the actual GRE, I got 6 on the analytical writing. Trust me, nothing changed in my writing ability over that time. In fact, I didn't even practice the skill except through those three practice tests. Clearly the test was not capable of determining my real ability to make an argument; it merely tested my ability to adapt my writing to what was supposedly being tested.
Interestingly, the vast majority of universities that got rid of the GRE requirements for PhD programs are not going back on that. Turns out that the students with the highest GRE scores are the ones most likely to drop out of their STEM PhD. [1]
[1]: https://journals.plos.org/plosone/article?id=10.1371/journal...
> I don't buy the notion that tests do not test relevant skills.
> In my long career I've noticed a strong correlation between SAT scores and academic performance as well as job performance.
A test doesn't need to test the relevant skills for that, it just needs to test _something_ that correlates with academic performance and job success.