logoalt Hacker News

Jordan-117last Tuesday at 4:08 PM2 repliesview on HN

How is a largely text-based app 3.47 GB? Is the dictionary/semantic DB just that large or is there other stuff going on?


Replies

michaeld123last Tuesday at 4:13 PM

I wish it were not so! Here's the breakdown: 1.5M headwords × ~2KB average per entry >= 3GB Each entry contains: 40 associations in the core graph Multiple senses (up to 8) × 17 associations each = up to 136 more Stems and morphological variants In-game clue definitions Longer definition entries with several types of related word lists.

The only good news is it works offline.

show 1 reply
hx8last Tuesday at 4:20 PM

The first really large text data I ever encountered was Google Ngram[0], the total size of which is about 3TB. I would have guessed it was closer to 3GB before I started downloading it.

[0] https://storage.googleapis.com/books/ngrams/books/datasetsv3...

show 1 reply