logoalt Hacker News

genewitchtoday at 6:18 AM1 replyview on HN

it is a lot of fun and rewarding to do this! I've done it several times for medium-sized datasets, like wikipedia dumps, the entire geospatial dataset to mapreduce it (pgsql). The wikipedia one was great, i had it set up to query things like "show me all ammunition manufactured after 1950 that is between .30 and .40" and it could just return it nearly instantly. The wikimedia dumps keep the infoboxes and relations intact, so you can do queries like this easily.


Replies

3eb7988a1663today at 6:54 AM

Do you have a write-up of this somewhere? When I last looked at the Wikipedia dumps, they looked like a mess to parse. How were you getting structured information?