Hey HN!
Over the past few months, I've been working on building Omni - a workplace search and chat platform that connects to apps like Google Drive/Gmail, Slack, Confluence, etc. Essentially an open-source alternative to Glean, fully self-hosted.
I noticed that some orgs find Glean to be expensive and not very extensible. I wanted to build something that small to mid-size teams could run themselves, so I decided to build it all on Postgres (ParadeDB to be precise) and pgvector. No Elasticsearch, or dedicated vector databases. I figured Postgres is more than capable of handling the level of scale required.
To bring up Omni on your own infra, all it takes is a single `docker compose up`, and some basic configuration to connect your apps and LLMs.
What it does:
- Syncs data from all connected apps and builds a BM25 index (ParadeDB) and HNSW vector index (pgvector) - Hybrid search combines results from both - Chat UI where the LLM has tools to search the index - not just basic RAG - Traditional search UI - Users bring their own LLM provider (OpenAI/Anthropic/Gemini) - Connectors for Google Workspace, Slack, Confluence, Jira, HubSpot, and more - Connector SDK to build your own custom connectors
Omni is in beta right now, and I'd love your feedback, especially on the following:
- Has anyone tried self-hosting workplace search and/or AI tools, and what was your experience like? - Any concerns with the Postgres-only approach at larger scales?
Happy to answer any questions!
The code: https://github.com/getomnico/omni (Apache 2.0 licensed)
Interesting!
I also started to build something similar for us, as an PoC/alternative to Glean. I'm curious how you handle data isolation, where each user has access to just the messages in their own Slack channels, or Jira tickets from only workspaces they have access to? Managing user mapping was also super painful in AWS Q for Business.
I've done some RAG using postgres and the vector db extension, look into it if you're doing that type of search; it's certainly simpler than bolting another solution for it.
Multiple pages link to a `API Reference` that returns a 404
[dead]
[dead]
How well does the Postgres-only approach hold up as data grows — did you benchmark it against Elasticsearch or a dedicated vector DB?