Tokens saved should not be your north star metric. You should be able to show that tool call perform...

stephantul • today at 7:15 AM • 2 replies • view on HN

Tokens saved should not be your north star metric. You should be able to show that tool call performance is maintained while consuming fewer tokens. I have no idea whether that is the case here.

As an aside: this is a cool idea but the prose in the readme and the above post seem to be fully generated, so who knows whether it is actually true.

Replies

hrmtst93837 • today at 9:00 AM

Token counts alone tell you nothing about correctness, latency, or developer ergonomics. Run a deterministic test suite that exercises representative MCP calls against both native MCP and mcp2cli while recording token usage, wall time, error rate, and output fidelity.

Measure fidelity with exact diffs and embedding similarity, and include streaming behavior, schema-change resilience, and rate-limit fallbacks in the cases you care about. Check the repo for a runnable benchmark, archived fixtures captured with vcrpy or WireMock, and a clear test harness that reproduces the claimed 96 to 99 percent savings.

rakag • today at 9:44 AM

The AI prose is getting so tiring to read

"We measured this. Not estimates — actual token counts using the cl100k_base tokenizer against real schemas, verified by an automated test suite."

alt Hacker News

Replies