logoalt Hacker News

woodruffwyesterday at 4:55 AM2 repliesview on HN

The searchable context for a distribution on PyPI is unbounded in the general case, assuming the goal is to allow search over READMEs, distribution metadata, etc.

(Which isn’t to say I disagree with you about scale not being the main issue, just to offer some nuance. Another piece of nuance is the fact that distributions are the source of metadata but users think in terms of projects/releases.)


Replies

bastawhizyesterday at 3:34 PM

> assuming the goal is to allow search over READMEs, distribution metadata, etc.

Why would you build a dedicated tool for this instead of just using a search engine? If I'm looking for a specific keyword in some project's very long README I'm searching kagi, not npm.

I'd expect that the most you should be indexing is the data in the project metadata (setup.py). That could be unbounded but I can't think of a compelling reason not to truncate it beyond a reasonable length.

show 1 reply
coldteayesterday at 5:42 PM

>The searchable context for a distribution on PyPI is unbounded in the general case, assuming the goal is to allow search over READMEs, distribution metadata, etc.

Even including those, it's what? Sub-20-30GB.