logoalt Hacker News

binalpateltoday at 3:35 AM4 repliesview on HN

I went down (continue to do down) this rabbit hole and agree with the author.

I tried a few different ideas and the most stable/useful so far has been giving the agent a single run_bash tool, explicitly prompting it to create and improve composable CLIs, and injecting knowledge about these CLIs back into it's system prompt (similar to have agent skills work).

This leads to really cool pattens like: 1. User asks for something

2. Agent can't do it, so it creates a CLI

3. Next time it's aware of the CLI and uses it. If the user asks for something it can't do it either improves the CLI it made, or creates a new CLI.

4. Each interaction results in updated/improved toolkits for the things you ask it for.

You as the user can use all these CLIs as well which ends up an interesting side-channel way of interacting with the agent (you add a todo using the same CLI as what it uses for example).

It's also incredibly flexible, yesterday I made a "coding agent" by having it create tools to inspect/analyze/edit a codebase and it could go off and do most things a coding agent can.

https://github.com/caesarnine/binsmith


Replies

rcarmotoday at 7:52 AM

The point where that breaks down is “next time it’s aware of the CLI and uses it”. That only really works well inside the same session, and often the next session it will create a different tool and use that one.

show 2 replies
fudged71today at 7:15 AM

I've been on a similar path. Will have 1000 skills by the end of this week arranged in an evolving DAG. I'm loving the bottoms-up emergence of composable use cases. It's really getting me to rethink computing in general.

show 1 reply
meander_watertoday at 7:16 AM

Have you done a comparison on token usage + cost? I'd imagine there would be some level of re-inventing the wheel (i.e. rewriting code for very similar tasks) for common tasks, or do you re-use previously generated code?

show 1 reply
skybriantoday at 6:30 AM

That’s pretty cool. Is it practical? What have you used it for?

show 1 reply