logoalt Hacker News

ninkendotoday at 2:18 AM0 repliesview on HN

Why don’t you take a moment to explain to the class why you think web crawling means you can’t cache anything?

It seems to me that the very first thing I’d try to solve if I were writing a tool for an LLM to search the web, would be caching.

An LLM should have to go through a proxy to fetch any URL. That proxy should be caching results. The cache should be stored on the LLM’s company’s servers. It should not be independently hitting the same endpoint repeatedly any time it wants to fetch the same URL for its users.

Is it expensive to cache everything the LLM fetches? You betcha. Can they afford to spend of the billions they have for capex to buy some fucking hard drives? Absolutely. If archive.org can do it via funding from donations, a trillion dollar AI company should have no problem.