logoalt Hacker News

starkparker10/10/20241 replyview on HN

Specifically, managing edge and object caches (and caching for anonymous viewers vs. logged-in editors with separate frontend and backend caches) while mitigating the effects of cache misses, minimizing the impacts of the job queue when many pages are changed at once, optimizing image storage, thumbnailing, and caching, figuring out when to use a wikitext template vs. a Scribunto/Lua module vs. a MediaWiki extension in PHP (and if Scribunto, which Lua runtime to use), figuring out which structured data backend to use and how to tune it, figuring out whether to rely on API bots (expensive on the backend) vs. cache scrapers (expensive on the frontend) vs. database dump bots (no cost to the live site but already outdated before they're finished dumping) for automated content maintenance jobs, tuning rate limiting, and loadbalancing it all.

At especially large scales, spinning the API and job queues off altogether into microservices and insulating the live site from the performance impact of logging this whole rat's nest.


Replies

bawolff10/10/2024

Everything is hard at scale. You have to be pretty big scale before some of that stuff starts to matter (some of course matters at smaller scales)

show 1 reply