Google Books (or similar) all book scans – $200k bounty (2025)

396 points • by Cider9986 • yesterday at 4:51 PM • 213 comments • view on HN

Comments

ahmedfromtunis • yesterday at 6:09 PM

I live in a country where the selection of available books, especially in English, is very limited. Buying online from foreign markets comes with a long list of administrative hurdles and limits.

If it were not for Anna's Archive and Z-Library, I would've never been able to read the books that shaped who I am today, or keep my passion for learning alive.

Thanks, AA and ZLib! (Also, thank you to the authors whose books and knowledge I consumed without being able to pay them back.)

➕ show 4 replies

dr_dshiv • yesterday at 6:56 PM

https://SourceLibrary.org has about 16,000 rare books translated — most for the first time. 50,000 books archived (will be translated when we have $$ for it). More tokens than English Wikipedia and about .75 petabytes.

Not sure if we will qualify for a bounty, but happy to share! Btw, we are looking for funding from small or large donors who want to help us translate the Renaissance…

➕ show 5 replies

tangenter • yesterday at 9:20 PM

Anna’s came clutch for me yesterday. I spent a few days trying to find a zip file of a CD that came with an old book from early 2000s on programming. One of those Thomson Publishing slap jobs that I actually enjoyed. I checked used copies all of them said does not come with CD. I tried googling around, nothing. LLMs couldn’t find it. ChatGPT kept saying it is on the archive (no it isn’t you useless piece of shit). Anyway, on a whim I went to AA, lo and behold, zip files for both first and second edition. Godsend.

➕ show 1 reply

hedora • yesterday at 6:05 PM

I wonder how long it will be before they offer bounties for internet scrapes.

Cloudflare captchas have made the internet unusable for me, and I'm sure it will only get worse over time. I'd much rather just browse (or even torrent) a copy of archive.is or similar. The latter would be much better for privacy, and hey, I run ad blockers anyway.

➕ show 2 replies

trilogic • yesterday at 6:01 PM

Who is behind Annas archive, there is a lot of english speakers involved in the team and forums! Anyway as long as buying isn´t owning no issues here.

➕ show 4 replies

jagged-chisel • today at 1:42 AM

> Plead read [this] carefully before working on a bounty.

[this] appears as a link to a .li address, and that goes bad places.

Should be https://annas-archive.gl/volunteering#bounties

DeepYogurt • yesterday at 6:30 PM

Anyone afraid of being laid off at google right now? Perhaps this is a backup :)

➕ show 3 replies

bix6 • yesterday at 5:39 PM

Piracy / copyright predictions?

The current situation feels untenable with renting. So many regular people I know have learned about VPN, NAS, etc.

➕ show 2 replies

wxw • yesterday at 5:40 PM

Some more interesting bounties they offer: https://software.annas-archive.gl/AnnaArchivist/annas-archiv...

> Purchase all Library of Congress MARC datasets — $3,000 bounty

> English Wikipedia pages about relevant institutions — up to $100 per new page

> Internet Archive Digital Lending — $5000 per 1 million pdf files

> Text version of our full library — $20,000

...

➕ show 1 reply

hereme888 • yesterday at 7:16 PM

The link sort of reads like people who have very easy access to the requested material. Almost like they're Google employees.

anyaya1 • yesterday at 7:42 PM

Does Anna's Archive use a completely different "source repository" from LibGen?

➕ show 2 replies

alkyon • yesterday at 9:38 PM

Gemini should be trained on those books already, so in theory it could regurgitate some verbatim fragments (as NYT lawsuit against OpenAI showed some time ago).

➕ show 1 reply

FerritMans • yesterday at 5:39 PM

So AA is a front for openai?

➕ show 3 replies

stephenlf • yesterday at 7:58 PM

Anna’s archive rocks

thenthenthen • yesterday at 8:30 PM

There was a time where you would get a random page preview, some artists found a way to extract full books that way (F.A.T lab?).

stephenlf • yesterday at 8:09 PM

The only legal hurdle keeping Anna’s Archive away from its noble goal (piracy laws) has been shown to mean zilch in the age of AI.

neilv • yesterday at 6:07 PM

The US should just find a way to quietly share literature access with the Russians, rather than letting piracy be promoted and facilitated for US consumers as freedom-fighter "archiving".

Between all the piracy, and all the AI training and the purchase/visitor-circumventing AI services, the practice of writing and publishing genuinely good work is being wiped out.

We're killing the goose that lays the eggs, for selfish gain.

➕ show 5 replies

leoc • yesterday at 8:36 PM

Just do it and be legends, Larry. ;)

➕ show 1 reply

mmooss • yesterday at 8:35 PM

How is Anna's Archive funded? I see they have memberships, but it's hard to believe that can fund all these bounties - some going into six figures. Ask any FOSS project about funding by that method.

It seems like there are some deep pockets funding them.

➕ show 2 replies

vagab0nd • today at 12:39 AM

Another source I'd love to see scraped or opened up is the New York Times archive, along with other newspaper archives.

cubefox • yesterday at 10:07 PM

HN logic:

Training on copyrighted material

--> bad

Actually distributing copyrighted material

--> good

Needless to say, this is backwards. Any copyright holder will be much more worried about the latter.

➕ show 4 replies

TZubiri • yesterday at 8:46 PM

I think this would cross the line from civil copyright claims into criminal activity

https://chatgpt.com/share/6a4970e8-7fe8-83e9-8f81-3aefd76b6b...

On another note, if Google's cybersecurity were always one rogue employee away from a massive leak, then it wouldn't be Google. What was the last Google leak you remember, defense in depth people.

OrangeDelonge • yesterday at 6:07 PM

Curious as to how you would approach this. I have no experience in this area, anyone on this forum willing to share their expertise?

➕ show 1 reply

ThrowawayTestr • yesterday at 5:37 PM

One of my hopes is that when the AI bubble bursts, some brave person will sneak out a copy of the last frontier model.

➕ show 4 replies

rvba • yesterday at 10:18 PM

Comment by Borja is a great example of eternal September.

b112 • yesterday at 5:40 PM

[dead]

tolerance • yesterday at 9:37 PM

    If you shouldn't be able to copyright GRAPES...you shouldn't be able to copyright BOOKS.

➕ show 3 replies

alt Hacker News

Google Books (or similar) all book scans – $200k bounty (2025)

Comments