OP here. I wrote this implementation to deeply understand the mechanics behind HNSW (layers, entry points, neighbor selection) without relying on external libraries. While PHP isn't the typical choice for vector search engines, I found it surprisingly capable for this use case, especially with JIT enabled on PHP 8.x. It serves as a drop-in solution for PHP monoliths that need semantic search features without adding the complexity of a separate service like Qdrant or Pinecone. If you want to jump straight to the code, the open-source repo is here: https://github.com/centamiv/vektor Happy to answer any questions about the implementation details!
Great writeup. Thanks for talking the time to organise and share.
It's tempting to use this in projects that use PHP.
Is it useable with a corpus of like 1.000 3kb markdown files? And 10.000 files?
Can I also index PHP files so that searches include function and class names? Perhaps comments?
How much ram and disk memory we would be talking about?
And the speed?
My first goal would to index a PHP project and its documentation so that an LLM agent could perform semantic search using my MCP tool.
Great article! I also read your other post and love it! This is exactly my thinking: Locality of Behavior (LoB)
Never heard this term before, but I like it.
https://centamori.com/index.php?slug=basics-of-web-developme...
The only small thing you forgot to mention - it requires use of AI. Open Ai to be specific. I've got baited.
Thanks a lot, I liked the fantasy based examples to explain the concept.
Programming is chanting magic incarnations and spells after all. (And fighting against evil spirits and demons)