logoalt Hacker News

lscharentoday at 11:59 AM3 repliesview on HN

I was halfway through the article and began thinking that his described data structure sounded very familiar to something I used about 20 years ago.

Sure enough, the first paragraph on the Wikipedia entry for DAFSA is:

DAFSA is the rediscovery of a data structure called Directed Acyclic Word Graph (DAWG)


Replies

hiAndrewQuinntoday at 12:43 PM

Apparently the structure itself has a bit of a history. The word 'rediscovery' tipped me off to go to Wikipedia myself and read up more about this.

First Blumer et al., 1983 came up with a "DAWG", but reading the abstract [1] I was left a little confused as to how exactly we get from 'here is how we store all substrings of a string in O(|string|) space, with "is this a substring [yn]" recognition in O(|substring|) time' to the modern DAFSA, as cool and useful as that is. Come to think of it I bet I could use that in some LeetCode problems.

But the structure we actually think of as a DAWG or DAFSA (or FST, I guess, thanks to this Rust crate) is in the paper "The World’s Fastest Scrabble Program". That worked but you had to construct a whole trie first, then compact it down, so build time was a memory hog. Then Dr. Daciuk of 3city sharpened the blade in 2000 by proving that this was as good as it gets in the unsorted case, but on a sorted set you could build the DAFSA incrementally, because an increasingly large part of the graph you were building was already optimized.

And then from there BurntSushi got involved with the implementation and the rest is history.

[1]: https://www.sciencedirect.com/science/article/pii/0304397585...

(That's what I can glean from ~30 minutes of not particularly focused reading. Forgive me if I have made any mistakes.)

show 1 reply
carefulfungitoday at 1:10 PM

I recently learned about a related data structure designed for scrabble style word searches: https://en.wikipedia.org/wiki/GADDAG.

zozbot234today at 4:39 PM

Isn't this just a regular expression?