logoalt Hacker News

direwolf20today at 5:56 PM3 repliesview on HN

I hope they cache search results to further reduce the number of calls to Google.

And Marginalia Search was not mentioned? Marginalia Search says they are licensing their index to Kagi. Perhaps it's counted under "Our own small-web index" which is highly misleading if true.


Replies

z64today at 7:56 PM

There is a practical limit that we can't cache results for too long; Search engine users are particularly sensitive to stale data, especially around current events. Without a holistic and realiable way to know when the cache ought to be invalidated, our caching is mostly focused on mitigating "abuse", e.g., someone / bunch of people spamming the same search in a short timespan; no sense in repeating all those upstream calls.

Most "cost saving engineering" is involved in finding cases/hueristics where we only need to use a subset of sources and omitting calls in the first place, without compromising quality. For example, we probably don't need to fire all of our sources to service a query like "youtube" or "facebook".

Marginalia data is physically consolidated into the same infra that we use for small web results in our SERP, but also among other small scale sources besides those two. That line is simply referring directly to https://kagi.com/smallweb (https://github.com/kagisearch/smallweb).

xnxtoday at 7:26 PM

> "Our own small-web index"

Has Kagi ever said what this is? I wouldn't be at all surprised if it is just kagi.com pages or a download of Wikipedia.

show 1 reply
packetlosttoday at 5:59 PM

The index is not necessarily the code, but the dataset. IMO it would be better to be more open about the technical stack, but I don't think this feels dishonest to me.