logoalt Hacker News

aaronsteerstoday at 12:57 AM0 repliesview on HN

Hi, @jessewmc. Thanks for your reply. Regarding your points:

> If I'm reading correctly, the indexing (Context Store) is neutral/unopinionated? How does it select fields for indexing?

While we haven't yet published details on the backend implementation, I can say that our implementation performs very well without needing to prioritize specific fields for indexing. We aim for large text fields to perform decently and retrieval based on small/compressible fields like ints to be fast. (More to come on this in the coming months.)

> Have you done any testing on guided indexing, or metadata layers on top of the data?

We've been testing with different data scales and shapes. Nothing detailed to share yet, but performance has (so far) never itself become the bottleneck in our agent testing. (The LLM thinking itself is often the bottleneck.)

> My experience so far on similar work is that getting data in front of an agent isn't enough context to get useful/reliable answers enough of the time.

Airbyte has rich metadata on our upstream connector's data models, which I think helps us a lot to deliver helpful context to the agent. Another option, when optimizing for specific use cases, is to build your own agent tools on top of our Agent SDK. This allows you to make the calls organic and build the tools in a way that makes natural sense to the agent, regardless of source shape or which system(s) that data is coming from.

> This does look like a good foundation for that kind of tooling though!

We agree! Thanks again for sharing your thoughts here.