I disagree with the AI part. Because hugepages is one of the things that can be guessed to improve performance when doing something with substantial amount of data.
So anyone familiar with the space could have suggested something like that without knowing the details of the problem. Hence it is not useful advice IMO.
That aside, the blog post was really cool to read and a instant favorite, wish there were more english posts on the blog.
Especially like the hardware limit based expectations, detailed measurements and the writing style.
Thank you for liking this blog. I agree with your point. Actually, I’ve just recently transitioned from building data infrastructure on the cloud to taking on a high-performance computing role that truly handles massive amounts of data. So, although I’d heard about the benefits of hugepages before, I had never actually reproduced these issues in my own environment. This time, even though I initially suspected the problems were related to hugepages and the TLB, I didn’t write this blog from a seasoned perspective. Instead, I wanted to methodically investigate and eliminate all other possible issues I could think of. (Interestingly, my agent attributed the effectiveness of hugepages to the root causes of these bugs, which piqued my curiosity and drove my deeper exploration.)
Finally, thank you very much for your appreciation, which means a lot to me. Previously, I was working on open-source projects, but now that I’ve changed jobs, I may not have the same amount of energy to contribute to open-source code as before. However, I think blogging might be a new way for me to contribute. I hope I can keep it up.
(My English writing skills are poor, so I wrote in Chinese and used AI to translate it; I hope you don’t mind.)