logoalt Hacker News

duskwuffyesterday at 11:58 PM3 repliesview on HN

Tape is also an extraordinarily poor option for a service like Internet Archive which intends to provide interactive, on-demand access to its holdings.


Replies

BryantDtoday at 5:21 AM

Back in the day, if you loaded a page from the web archive that wasn’t in cache, it’d tell you to come back in a couple of minutes. If it was in cache, it was reasonably speedy.

Cache in this case was the hard drives. If I recall correctly, we were using SAM-FS, which worked fairly well for the purpose even though it was slow as dirt —- we could effectively mount the tape drive on Solaris servers, and access the file system transparently.

Things have gotten better. I’m not sure if there were better affordable options in the late 1990s, though. I went from Alexa/IA to AltaVista, which solved the problem of storing web crawl data by being owned by DEC and installing dozens of refrigerator sized Alpha servers. Not an option open to Alexa/IA.

EvanAndersontoday at 12:41 AM

I presume backing-up the archive is a desirable thing. That's a place where I would see tape fitting well for them.

show 1 reply
stonogotoday at 12:28 AM

This is a common use for tape, which can via tools like HPSS have a couple petabytes of disk in front of it, and present the whole archive in a single POSIX filesystem namespace, handling data migration transparently and making sure hot data is kept on low-latency storage.

show 1 reply