See also: agent harness in 50 lines (based on mini-swe-agent).
I followed this tutorial earlier today and I'm having a lot of fun with it.
https://gist.github.com/a-n-d-a-i/cb5e929b4c87b8d185760d0264...
I added a 2nd while loop so that it takes user input. And vendored my tiny llm lib (so it's 150 lines now, and dependency free :)
---
As for context-sculpting, the economics are different when not touching the context gives you the >98% discount everyone's doing now. (Although it might be worth fiddling with the suffix... not sure yet!)
e.g. this issue: "ToolSearch saves ~15K tokens per request in prompt size, but at the cost of breaking prefix-based caching for models like DeepSeek that rely on stable prefixes. For heavy users of DeepSeek through OpenRouter, the savings from smaller prompts are dwarfed by the increased cost from cache misses."