How many of the 170k English words do you know?

473 points • by abnry • last Friday at 1:51 PM • 543 comments • view on HN

Comments

78,500.

The very first one was "Unique". I wondered if "the only one of its kind" was still the correct answer, having seen "very unique" used all too often recently. They accept "only one of its kind".

Missed "hegemony" (wasn't sure a hegemony had a leader), "quotidian" (should have known that, seen it before), "ultracrepedarian" (new word to me), "absquatulate" (19th century slang), and "fartlek" (Swedish interval training).

rcfox • yesterday at 1:31 AM

Interesting that this showed up here now. I did it a week ago after hearing about it on The Rest Is Science. https://www.youtube.com/watch?v=9t-5lQ2mzuw

➕ show 1 reply

yorwba • last Friday at 2:45 PM

There is a typo in "Hippopotomonstrosesquippedaliophobia," it should be "Hippopotomonstrosesquipedaliophobia" instead. (Also, it breaks the layout.)

➕ show 6 replies

iandanforth • yesterday at 2:43 PM

Even though it said ""Unbelievable. Are you actually Stephen Fry in disguise?" it still estimates I know less than half the English vocabulary. Humbling.

thimabi • last Friday at 9:12 PM

I got 68,900 words, with the vast majority of the errors being on the grandmaster level.

As a non-native English speaker, I found that result pretty good! Though being a native Portuguese speaker certainly helped me as many difficult words in English borrow from Latin, and in Portuguese the Latin influence is more pronounced.

poisonfountain • yesterday at 2:43 AM

Once you get to the Advanced/Expert words onwards it's too easy to guess the correct answer: it's usually the longest option. And once you notice this pattern it's impossible to try to guess fairly.

jrrv • last Friday at 2:56 PM

Presumably it's a random batch of words since you can run the test again. I wonder how much the word selection affects the outcome. I got 66,750 with 20/20/15/17/14.

I'm curious how the difficult is chosen because "obfuscate" was included in the hardest difficulty but I would not consider that to me a difficult word.

Also I found that some of the definitions were not completely correct.

➕ show 1 reply

dsenkus • yesterday at 4:10 AM

I'm sure everyones scores would be a lot lower if we had to describe each word instead of selecting between silly/smart sounding definitions. As was mentioned before, it needs "I don't know" button, otherwise it's too easy to guess.

This approach could also work for getting more accurate results:

1. Show word without any definitions

2. User clicks "I know" or "I don't know"

3. If user clicked "I know", show actual definition of word

4. User selects "I was correct" or "I was not correct"

ssaakaash • yesterday at 4:25 PM

A quirk of LLM generated MCQs is that in the majority of cases, the longest option is the right one.

jan_Sate • yesterday at 10:58 AM

I got 35000, 18/13/9/9/6. Not my first language.

Interesting how literally everyone here's performing better than I do. Perhaps that's because I just clicked on the first option whenever I don't know about a word.

jcd000 • yesterday at 2:03 PM

90/100 and 13/20 expert & 17/20 gm. Not too bad for a non-native speaker (but I've read books in english daily for years)

sceptic123 • last Friday at 4:11 PM

Yarborough is _also_ an English town so I should have got one more

➕ show 1 reply

waterpowder • last Friday at 3:28 PM

69,250 (91/100) - I think being French helped a lot for the most complex words, as they're basically the same!

yousif_123123 • last Friday at 3:19 PM

This was fun! And it told me I know 55k words which made me a little happy.

I'm not sure exactly how you did this, but I think you asked an LLM to come up with the wrong options. Two things to consider:

1. While the LLM can go r good options, they won't be always hard to guess. I wonder if instead you can have the LLM generate very close words (or skip using an LLM entirely) and put those as the options. 2. If you will generate options with an LLM, make sure you are mindful of its inability to shuffle things around. The correct answer was overwhelmingly the first or second option in the list. You should ask the model to give the options in a uniform order (say from true meaning then decreasing amount of replayability), then manually shuffle them so that the probability of which option (A, B, C or D) is always 25%.

naishoya • last Friday at 3:21 PM

"77,250words "Unbelievable. Are you actually Stephen Fry in disguise?"

I do concur that a refined collection of incorrect proposed responses which includes selections among terms with semantic proximity, conflated synonyms and plausible morphology could refine the accuracy of evaluations; and if the test was intended to bestow authentic assessments of lexicographical capability this would in all probability become an efficacious approach, but as a simply presentable quiz for folks with sesquipedalian proclivities I was not unduly discomfited by anything moreso than the extraneous clicks leading to and following the display of dichotomous determinations.

➕ show 2 replies

vhayda • last Friday at 9:23 PM

The longest answer choice is correct 80%+ of the time, when it should be closer to 25%. I was able to breeze through unfamiliar words just by picking the longest option every time…

fp64 • last Friday at 3:24 PM

When there are two options that describe exactly the opposite of each other, it will be one of them. Reduced a bit the fun - but then again, for some words I understood what they are dealing with, but not whether positively or negatively.

asdfasgasdgasdg • last Friday at 3:13 PM

Not a very good test. Too easy to guess many of the words, and the words seem to follow a theme. For example my list had five or six that had to do with speaking too much or too little (verbose, lugubrious, and a few others in that vein). And many easy words were placed late in the test (e.g. zeitgeist, facetious being in the expert and grand master categories?).

And it didn't even tell me at the end how many words I know!

There is a similar variant of such a test where you just go down a list of words of increasing obscurity, ticking the ones you are familiar with. If you do this once or twice, you can get a fairly good estimate of the actual number of words you know.

HyperL0gi • last Friday at 3:42 PM

UX suggestion to make going thought this much faster:

1. Frame each option with one key (1,2,3,4). User press 2, select the second option

2. Let the user change options if they want until they press Enter. Enter submits the answer.

3. Once submitted, another Enter brings the next one

jurgenaut23 • yesterday at 5:59 AM

I did it and achieved 69’400. English is a second language to me and I think this is quite overestimated, though. Mostly due to French being my first language and most of the advanced words in the tests were derived from French. Or some more academic use.

WalterBright • yesterday at 5:00 AM

What I read long ago in a book on English:

TV vocabulary is targeted at 6th grade reading level.

Conversational English is about 2,000 words.

High school vocabulary is about 10,000 words.

College degree vocabulary is about 30,000 words

English has over a million words.

Which heartens me, because it means I can be "fluent" in another language by learning just 2,000 words.

ChoGGi • last Friday at 10:49 PM

I flubbed a couple advanced/master and half of grandmaster, eh good enough.

Be fun to start at Master and up, but is kerfuffle really grandmaster?

Gaikwar and Kowtow are English words?

➕ show 1 reply

benob • yesterday at 8:47 AM

Longest definition and semi-columns are strong biases for right answer. Also, my run contained a lot of adjectives for which it is pretty obvious that noun definitions do not match.

jstanley • last Friday at 2:48 PM

Cool idea, am working through.

It's annoying that you need to click 3 times per question, and the buttons are in 2 different places.

Maybe would be better to just let me click the answer I want and then instantly show me the next question?

Also who is Sandi?

➕ show 2 replies

alentred • last Friday at 3:36 PM

Good fun! At first I was scared of having to answer 100 questions, but when the words got more sophisticated it turned to be more engaging. Also, the result is good for self-esteem! :) Many thanks to the author!

I wonder if the test is calibrated to the fact that some answers are just well guessed? I am not a native English speaker, but I speak 3 languages overall and have basic notions in Latin, and I have to admit it helped a lot in "deciphering" a few words that I didn't know at all. And in at least 2 cases I just guessed correctly.

dtagames • last Friday at 2:17 PM

This was fun! The progression seems logical.

I scored 71,000.

➕ show 1 reply

piekvorst • yesterday at 7:49 AM

English being my language of choice, but not my first language, I got 75/100. Performance breakdown: 18/20, 18/20, 11/20, 18/20, 10/20.

(My first language is Russian.)

➕ show 1 reply

stephbook • yesterday at 1:20 PM

Should use an ELO rating to find your level faster. Slogging through 100 basics is pointless.

fcatalan • last Friday at 3:10 PM

71050, not bad for a non native speaker I guess. I missed 9/100.

But to be honest many that might catch out a native speaker are just the Spanish/French/Latin word, so it was too easy in a way.

micw • yesterday at 11:48 AM

It misses a "I don't know" button. So it has a 20% false positive by guessing bias built in, right?

golol • yesterday at 12:00 PM

Cute, but for strange words clicking the longest explanation turned out to be akmost always rhe correct one :)

srean • last Friday at 3:32 PM

In addition to how much fun it was, it has potential pedagogic value for teaching sampling based estimation.

It would have paired well with an exposition of vanilla Monte Carlo and the benefits of stratified sampling.

Although stratified sampling is good, one can do better in this case by using adaptive sampling, where one uses a runtime (Bayesian) estimate of vocabulary to maximize information gain per question -- preferrentially sample from those strata where the current strata specific estimate has higher variance.

air7 • yesterday at 7:35 AM

With the risk of giving a spoiler, it seems the correct answer is almost always the longer, more elaborate one.

I would guess this causes an up shift in results even if not consciously noticed.

cs02rm0 • yesterday at 10:53 AM

Having the name of a former Indian state doesn't seem to be cricket.

At least I can step away from the laptop now I've got RSI.

glove2477 • yesterday at 11:03 AM

It's made with AI and I don't know to what extent. That's enough to have no trust in the results. As a non-native speaker I find those words weird. Some "core words" I have no idea about, but many of the expert ones are easy. So yeah, at least I hope the author had fun vibe-coding it.

Johnny_Bonk • last Friday at 3:03 PM

I like this but it should be all operable with keyboard to be faster ie up down and 1234 for options and if its righht you just move on, maybe show synonyms in the success ui.

canpan • yesterday at 1:17 PM

Picking max(len(answer)) is the right choice almost every time at the higher level..

alkyon • last Friday at 3:57 PM

I only got 4 wrong as a non-native speaker. Okay, I'm widely read in English, but among LLM-generated definitions it's just too easy to spot the right one.

miqkt • yesterday at 6:36 AM

Only scored 93... One of those, "yclept" I've never ever encountered before (as a native Australian English speaker) and only lucked out by way of elimination.

firefoxd • yesterday at 6:19 AM

Good thing I read this post this morning: https://news.ycombinator.com/item?id=48603664

alkonaut • last Friday at 11:09 PM

I did 81/100 (not my first language) but I probably only knew 60 from before. But I speak other languages and so I can usually decode an origin of a word or I have seen other words in English or another language.

So it’s not a test of how many words you know but how good you are at guessing what words mean.

Liftyee • yesterday at 6:52 AM

Far too slow to complete and too many clicks. I'm surprised it's not using a binary search method easy-hard-easy ... Then it could show an in progress metric.

bw86 • last Friday at 9:05 PM

84 total, with this breakdown: Core Basics 19/20 Intermediate 20/20 Advanced 13/20 Expert 15/20 Grandmaster 17/20

Scientific Estimate: 69 100 word

It began very simple, so that I took it not very serious for a moment, but I never heard many of the later words. But thanks to knowing some latin and other languages, I could understand many of them.

A fun idea!

➕ show 1 reply

herczegzsolt • yesterday at 2:56 PM

Desperately needs a skip button for words I don't know.

himata4113 • yesterday at 8:05 AM

Fascinating how many of the words I didn't know, but got correct from how they sound in my head which makes be believe this test is flawed.

bialpio • last Friday at 10:26 PM

Pretty bad that there is no option of "I don't know". A couple of times I tried to guess the wrong word on purpose when I knew I had no clue what the word meant and accidentally got the right answer. I'd expect that admitting ignorance would be an option in such an app...

amarant • last Friday at 3:17 PM

Fun game! I did worse than many others here, only 69.9k estimated words. But then English is my second language, so I'm pretty pleased with the result!

HaloZero • last Friday at 3:09 PM

I wish it had keyboard shortcuts, it's a bit of a sludge to click through twice.

Got 64,650: 20/19/17/18/12 (the intermediate one was a dumb mistake)

uberex • yesterday at 3:32 AM

87/100 64,250

A lot of words used in Software Engineering as metaphors helped.

Also one weird tip. If I didn't know the answer went for the negative description of human behaviour answer and I guess 50% chance rather than 1 in 4.

sfupysbsu • yesterday at 12:39 PM

Major flaw in the quiz: you can do great by just picking the longest definition.

alt Hacker News

How many of the 170k English words do you know?

Comments

🔗 View 50 more comments