I wish it were not so! Here's the breakdown: 1.5M headwords × ~2KB average per entry >= 3GB Each entry contains: 40 associations in the core graph Multiple senses (up to 8) × 17 associations each = up to 136 more Stems and morphological variants In-game clue definitions Longer definition entries with several types of related word lists.
The only good news is it works offline.
I do think that as a practical matter it might be better to store the word graph in the cloud and query it from the client.
You could either store the word graph as a partitioned set of S3 buckets, or have a back-end that serves individual words and does rate-limiting. I guess that the back-end might be better to avoid surprise egress charges from anyone trying to download the entire dataset.
I want to try out the game but I'm discouraged by the download size.