logoalt Hacker News

mgtoday at 7:37 AM3 repliesview on HN

Considerations about what goes on in agents internally will probably not be part of software development for long.

Personally, I already see LLMs and agents as blackboxes. I give each feature request to multiple LLMs and then compare the results. I don't manually use "sessions" at all. I just look at the outcome. When I dislike it, I "git reset --hard", change my prompts and restart the feature request.

To have an ongoing sense of which agents perform best, I keep a log and calculate an ELO score of which agents meet my demands best. This score is imporant to me, not so much how the agent achieves it.


Replies

hypfertoday at 7:45 AM

This is an absolutely crazy wasteful thing to do considering the actual cost of all that inference and nothing to be proud of.

show 4 replies
justinclifttoday at 11:09 AM

What kind of projects/code do you have them work on?

Asking because I could guess that approach would be ok for the types of front end work that doesn't require much security or other validation.

But it sounds like it wouldn't be suitable for work in regulated industries or anything that needs to have extreme care taken.

?

perching_aixtoday at 8:54 AM

Which model is leading the pack for you?

show 1 reply