logoalt Hacker News

Google Books (or similar) all book scans – $200k bounty (2025)

396 pointsby Cider9986yesterday at 4:51 PM213 commentsview on HN

Comments

ahmedfromtunisyesterday at 6:09 PM

I live in a country where the selection of available books, especially in English, is very limited. Buying online from foreign markets comes with a long list of administrative hurdles and limits.

If it were not for Anna's Archive and Z-Library, I would've never been able to read the books that shaped who I am today, or keep my passion for learning alive.

Thanks, AA and ZLib! (Also, thank you to the authors whose books and knowledge I consumed without being able to pay them back.)

show 4 replies
dr_dshivyesterday at 6:56 PM

https://SourceLibrary.org has about 16,000 rare books translated — most for the first time. 50,000 books archived (will be translated when we have $$ for it). More tokens than English Wikipedia and about .75 petabytes.

Not sure if we will qualify for a bounty, but happy to share! Btw, we are looking for funding from small or large donors who want to help us translate the Renaissance…

show 5 replies
tangenteryesterday at 9:20 PM

Anna’s came clutch for me yesterday. I spent a few days trying to find a zip file of a CD that came with an old book from early 2000s on programming. One of those Thomson Publishing slap jobs that I actually enjoyed. I checked used copies all of them said does not come with CD. I tried googling around, nothing. LLMs couldn’t find it. ChatGPT kept saying it is on the archive (no it isn’t you useless piece of shit). Anyway, on a whim I went to AA, lo and behold, zip files for both first and second edition. Godsend.

show 1 reply
hedorayesterday at 6:05 PM

I wonder how long it will be before they offer bounties for internet scrapes.

Cloudflare captchas have made the internet unusable for me, and I'm sure it will only get worse over time. I'd much rather just browse (or even torrent) a copy of archive.is or similar. The latter would be much better for privacy, and hey, I run ad blockers anyway.

show 2 replies
trilogicyesterday at 6:01 PM

Who is behind Annas archive, there is a lot of english speakers involved in the team and forums! Anyway as long as buying isn´t owning no issues here.

show 4 replies
jagged-chiseltoday at 1:42 AM

> Plead read [this] carefully before working on a bounty.

[this] appears as a link to a .li address, and that goes bad places.

Should be https://annas-archive.gl/volunteering#bounties

DeepYogurtyesterday at 6:30 PM

Anyone afraid of being laid off at google right now? Perhaps this is a backup :)

show 3 replies
bix6yesterday at 5:39 PM

Piracy / copyright predictions?

The current situation feels untenable with renting. So many regular people I know have learned about VPN, NAS, etc.

show 2 replies
wxwyesterday at 5:40 PM

Some more interesting bounties they offer: https://software.annas-archive.gl/AnnaArchivist/annas-archiv...

> Purchase all Library of Congress MARC datasets — $3,000 bounty

> English Wikipedia pages about relevant institutions — up to $100 per new page

> Internet Archive Digital Lending — $5000 per 1 million pdf files

> Text version of our full library — $20,000

...

show 1 reply
hereme888yesterday at 7:16 PM

The link sort of reads like people who have very easy access to the requested material. Almost like they're Google employees.

anyaya1yesterday at 7:42 PM

Does Anna's Archive use a completely different "source repository" from LibGen?

show 2 replies
alkyonyesterday at 9:38 PM

Gemini should be trained on those books already, so in theory it could regurgitate some verbatim fragments (as NYT lawsuit against OpenAI showed some time ago).

show 1 reply
FerritMansyesterday at 5:39 PM

So AA is a front for openai?

show 3 replies
stephenlfyesterday at 7:58 PM

Anna’s archive rocks

thenthenthenyesterday at 8:30 PM

There was a time where you would get a random page preview, some artists found a way to extract full books that way (F.A.T lab?).

stephenlfyesterday at 8:09 PM

The only legal hurdle keeping Anna’s Archive away from its noble goal (piracy laws) has been shown to mean zilch in the age of AI.

neilvyesterday at 6:07 PM

The US should just find a way to quietly share literature access with the Russians, rather than letting piracy be promoted and facilitated for US consumers as freedom-fighter "archiving".

Between all the piracy, and all the AI training and the purchase/visitor-circumventing AI services, the practice of writing and publishing genuinely good work is being wiped out.

We're killing the goose that lays the eggs, for selfish gain.

show 5 replies
leocyesterday at 8:36 PM

Just do it and be legends, Larry. ;)

show 1 reply
mmoossyesterday at 8:35 PM

How is Anna's Archive funded? I see they have memberships, but it's hard to believe that can fund all these bounties - some going into six figures. Ask any FOSS project about funding by that method.

It seems like there are some deep pockets funding them.

show 2 replies
vagab0ndtoday at 12:39 AM

Another source I'd love to see scraped or opened up is the New York Times archive, along with other newspaper archives.

cubefoxyesterday at 10:07 PM

HN logic:

Training on copyrighted material

--> bad

Actually distributing copyrighted material

--> good

Needless to say, this is backwards. Any copyright holder will be much more worried about the latter.

show 4 replies
TZubiriyesterday at 8:46 PM

I think this would cross the line from civil copyright claims into criminal activity

https://chatgpt.com/share/6a4970e8-7fe8-83e9-8f81-3aefd76b6b...

On another note, if Google's cybersecurity were always one rogue employee away from a massive leak, then it wouldn't be Google. What was the last Google leak you remember, defense in depth people.

OrangeDelongeyesterday at 6:07 PM

Curious as to how you would approach this. I have no experience in this area, anyone on this forum willing to share their expertise?

show 1 reply
ThrowawayTestryesterday at 5:37 PM

One of my hopes is that when the AI bubble bursts, some brave person will sneak out a copy of the last frontier model.

show 4 replies
rvbayesterday at 10:18 PM

Comment by Borja is a great example of eternal September.

b112yesterday at 5:40 PM

[dead]

toleranceyesterday at 9:37 PM

    If you shouldn't be able to copyright GRAPES...you shouldn't be able to copyright BOOKS.
show 3 replies