logoalt Hacker News

danudeytoday at 12:58 AM2 repliesview on HN

One thing my team lead is working on is using Claude to 'generate' integration tests/add new tests to e2e runs.

Straight up asking Claude to run the tests, or to generate a test, could result in potential inconsistencies between runs or between tests, between models, and so on, so instead he created a tool which defines a test, inputs and outputs and some details. Now we have a system where we have a directory full of markdown files describing a test suite, parameters, test cases, error cases, etc., and Claude generates the usage of the tool instead.

This means that whatever variation Claude, or any other LLM, might have run-to-run or drift over time, it all still has to be funneled through a strictly defined filter to ensure we're doing the same things the same way over time.


Replies

latentseatoday at 1:55 AM

I'm looking at implementing https://github.com/coleam00/Archon as a means to solve this. You can build arbitrary workflows custom to your codebase. Looks to bring a bit of much-needed determinism.

zx8080today at 1:01 AM

What kind of system/area (or product) are you working on?