https://SourceLibrary.org has about 16,000 rare books translated — most for the first time. 50,000 books archived (will be translated when we have $$ for it). More tokens than English Wikipedia and about .75 petabytes.
Not sure if we will qualify for a bounty, but happy to share! Btw, we are looking for funding from small or large donors who want to help us translate the Renaissance…
Anna’s came clutch for me yesterday. I spent a few days trying to find a zip file of a CD that came with an old book from early 2000s on programming. One of those Thomson Publishing slap jobs that I actually enjoyed. I checked used copies all of them said does not come with CD. I tried googling around, nothing. LLMs couldn’t find it. ChatGPT kept saying it is on the archive (no it isn’t you useless piece of shit). Anyway, on a whim I went to AA, lo and behold, zip files for both first and second edition. Godsend.
I wonder how long it will be before they offer bounties for internet scrapes.
Cloudflare captchas have made the internet unusable for me, and I'm sure it will only get worse over time. I'd much rather just browse (or even torrent) a copy of archive.is or similar. The latter would be much better for privacy, and hey, I run ad blockers anyway.
Who is behind Annas archive, there is a lot of english speakers involved in the team and forums! Anyway as long as buying isn´t owning no issues here.
> Plead read [this] carefully before working on a bounty.
[this] appears as a link to a .li address, and that goes bad places.
Anyone afraid of being laid off at google right now? Perhaps this is a backup :)
Piracy / copyright predictions?
The current situation feels untenable with renting. So many regular people I know have learned about VPN, NAS, etc.
Some more interesting bounties they offer: https://software.annas-archive.gl/AnnaArchivist/annas-archiv...
> Purchase all Library of Congress MARC datasets — $3,000 bounty
> English Wikipedia pages about relevant institutions — up to $100 per new page
> Internet Archive Digital Lending — $5000 per 1 million pdf files
> Text version of our full library — $20,000
...
The link sort of reads like people who have very easy access to the requested material. Almost like they're Google employees.
Does Anna's Archive use a completely different "source repository" from LibGen?
Gemini should be trained on those books already, so in theory it could regurgitate some verbatim fragments (as NYT lawsuit against OpenAI showed some time ago).
Anna’s archive rocks
There was a time where you would get a random page preview, some artists found a way to extract full books that way (F.A.T lab?).
The only legal hurdle keeping Anna’s Archive away from its noble goal (piracy laws) has been shown to mean zilch in the age of AI.
The US should just find a way to quietly share literature access with the Russians, rather than letting piracy be promoted and facilitated for US consumers as freedom-fighter "archiving".
Between all the piracy, and all the AI training and the purchase/visitor-circumventing AI services, the practice of writing and publishing genuinely good work is being wiped out.
We're killing the goose that lays the eggs, for selfish gain.
How is Anna's Archive funded? I see they have memberships, but it's hard to believe that can fund all these bounties - some going into six figures. Ask any FOSS project about funding by that method.
It seems like there are some deep pockets funding them.
Another source I'd love to see scraped or opened up is the New York Times archive, along with other newspaper archives.
HN logic:
Training on copyrighted material
--> bad
Actually distributing copyrighted material
--> good
Needless to say, this is backwards. Any copyright holder will be much more worried about the latter.
I think this would cross the line from civil copyright claims into criminal activity
https://chatgpt.com/share/6a4970e8-7fe8-83e9-8f81-3aefd76b6b...
On another note, if Google's cybersecurity were always one rogue employee away from a massive leak, then it wouldn't be Google. What was the last Google leak you remember, defense in depth people.
Curious as to how you would approach this. I have no experience in this area, anyone on this forum willing to share their expertise?
One of my hopes is that when the AI bubble bursts, some brave person will sneak out a copy of the last frontier model.
Comment by Borja is a great example of eternal September.
[dead]
If you shouldn't be able to copyright GRAPES...you shouldn't be able to copyright BOOKS.
I live in a country where the selection of available books, especially in English, is very limited. Buying online from foreign markets comes with a long list of administrative hurdles and limits.
If it were not for Anna's Archive and Z-Library, I would've never been able to read the books that shaped who I am today, or keep my passion for learning alive.
Thanks, AA and ZLib! (Also, thank you to the authors whose books and knowledge I consumed without being able to pay them back.)