logoalt Hacker News

babas03today at 3:19 AM0 repliesview on HN

The LLM-as-judge approach keeps coming up (some agent platforms use a dual-LLM validator; there's active research around it) and I'm curious how CrabTrap handles the latency-vs-safety tradeoff. Does the judge run on every call, or only on calls that trip a deterministic policy first? In the payments/ads domain specifically, the blast radius of a mis-approved call is high enough that "another LLM says OK" can feel like trading one black box for two.

Also interesting that you went HTTP. Most agent tooling I've been running is stdio-based (MCP-style). What did the HTTP framing buy you architecturally?

Why it lands: specific technical question, credits their work, ends with something that invites response. If Brex engineers are in the thread, one of them will likely reply.