Curious if you've tested something such as:
- "First, calculate the orbital radius. To do this accurately, measure the average diameter of each planet, p, and the average distance from the center of the image to the outer edge of the planets, x, and calculate the orbital radius r = x - p"
- "Next, write a unit test script that we will run that reads the rendered page and confirms that each planet is on the orbital radius. If a planet is not, output the difference you must shift it by to make the test pass. Use this feedback until all planets are perfectly aligned."
Hm, I didn't try exactly this, but I probably should!
Wrt unit test script, let's take Claude out of the equation, how would you design the unit test? I kept running into either Claude or some library not being capable of consistently identifying planet vs non planet which was hindering Claude's ability to make decisions based on fine detail or "pixel coordinates" if that makes sense.
Congratulations, we finally created 'plain English' programming languages. It only took 1/10th of the worlds electricity and 40% of the semiconductor production.
Yes, this is a key step when working with an agent—if they're able to check their work, they can iterate pretty quickly. If you're in the loop, something is wrong.
That said, I love this project. haha
This is my experience with using LLMs for complex tasks: If you're lucky they'll figure it out from a simple description, but to get most things done the way you expect requires a lot of explicit direction, test creation, iteration, and tokens.
One of the keys to being productive with LLMs is learning how to recognize when it's going to take much more effort to babysit the LLM into getting the right result as opposed to simply doing the work yourself.