I'm having an hard time getting my mind to see this. > Users should re-tune their prompts ...

dataviz1000 • last Thursday at 11:03 PM • 3 replies • view on HN

I'm having an hard time getting my mind to see this.

> Users should re-tune their prompts and harnesses accordingly.

I read this in the press release and my mind thought it meant test harness. Then there was a blog post about long running harnesses with a section about testing which lead me to a little more confusion.

Yes, the word 'harness' is consistently used in the context as a wrapper around the LLM model not as 'test harness'.

Replies

dboreham • last Friday at 7:11 AM

This field is chock full of people using terms incorrectly, defining new words for things that already had well known names, overloading terms already in use. E.g. shard vs partition. TUI which already meant "telephony user interface ". "Client" to mean "server" in blockchain.

suttontom • yesterday at 3:23 PM

Some people also call evaluations "tests". There are unexpected things that come along with new models, like the model in a workflow you'd set up suddenly starts calling a tool and never stops or decides to no longer call a particular tool, so running your existing evaluations to catch regressions like this and potentially updating the prompts is considered "testing" your prompts and harnesses.

kreig • last Friday at 6:57 PM

I understood this concept with this simple equation: Agent = LLM + harness

alt Hacker News

Replies