logoalt Hacker News

scarsamyesterday at 12:24 PM1 replyview on HN

US government auctions are scattered across at least 28 platforms. GSA sells decommissioned federal fleet. DLA Disposition moves military gear. The US Marshals front seized property through bid4assets. PublicSurplus runs school district and state-agency lots. GovDeals fronts thousands of county and municipal agencies. Fannie Mae and HUD auction foreclosed homes. None of these sites index together, and most have search UX that lost a fight with 2008.

So I scraped them all and put one search box in front. 180,276 active listings as of today, normalized into a shared schema in Postgres with full-text search. About 53,000 new listings come in every week.

A few real things you can buy this week, all live in the data:

- A 2000 Bell 430 helicopter (executive model), $250k starting, 0 bids: https://www.govdeals.com/asset/8103/23762

- A 1985 Cessna 182R aircraft in Missouri, $33k starting, 0 bids: https://www.govdeals.com/asset/36476/430

- An M75 APC armored personnel carrier on Ritchie Bros, no bids yet: https://www.rbauction.com/pdp/armored-tank-m75-apc-personnel...

- A Rolls-Royce ship thruster, never used, $500k starting: https://www.govdeals.com/asset/247/16144

- A 2.3 kg iridium-platinum ingot (police seizure on PropertyRoom), 52 bids, currently $175k: https://www.propertyroom.com/l/iridium-platinum-ingot-ir90-p...

- A 1927 Seagrave fire truck, "runs, drives, and titled," $24k, 0 bids: https://www.govdeals.com/asset/285/16223

- A truck-mounted forklift from a manufacturer literally named "Donkey & Burro": https://www.govplanet.com/for-sale/Forklifts/14842632

The work that took longest wasn't the scraping (each source has its own quirky JSON or HTML), it was the dedup. The same Fannie Mae foreclosure shows up under three different addresses across three platforms. A "2008 Ford F-150" from GSA Fleet looks structurally identical to one from PublicSurplus, but they're different vehicles with different VINs, and the only way to know is to fingerprint enough metadata to make a confident match.

There's a deal score per listing (price vs category median, bid velocity, time remaining, starting-bid ratio) and SEO landing pages per state-by-category combo, mostly because long-tail government-auction queries on Google are nearly all unanswered.

Stack: Next.js, Postgres, TypeScript scrapers per source, daily refresh.

Happy to answer questions about scraping the federal sites (some of them really do not want to be scraped) or how the deal scoring works.


Replies

djmipsyesterday at 7:35 PM

Don't forget the Aircraft Jet Blast Deflector you've always dreamed of!

https://www.govdeals.com/en/asset/2402/13188

show 1 reply