The modern state of the art is inherently not verifiable. Which way you give it input is really seco...

Etheryte • yesterday at 11:57 AM • 2 replies • view on HN

The modern state of the art is inherently not verifiable. Which way you give it input is really secondary to that fact. When you don't see weights or know anything else about the system, any idea of verifiability is an illusion.

Replies

mikaelaast • yesterday at 12:12 PM

Sure. Verifiability is far-fetched. But say I want to produce a statistically significant evaluation result from this – essentially testing a piece of prose. How do I go about this, short of relying on a vague LLM-as-a-judge metric? What are the parameters?

➕ show 2 replies

hu3 • yesterday at 12:16 PM

At least MCPs can be unit tested.

With Skills however, you just selectively append more text to prompt and pray.

alt Hacker News

Replies