> assuming the goal is to allow search over READMEs, distribution metadata, etc.
Why would you build a dedicated tool for this instead of just using a search engine? If I'm looking for a specific keyword in some project's very long README I'm searching kagi, not npm.
I'd expect that the most you should be indexing is the data in the project metadata (setup.py). That could be unbounded but I can't think of a compelling reason not to truncate it beyond a reasonable length.
You would definitely use a search engine. I was just responding to a specific design constraint.
(Note PyPI can’t index metadata from a `setup.py` however, since that would involve running arbitrary code. PyPI needs to be given structured metadata, and not all distributions provide that.)