>> You no longer need to review the code. Or instruct the model at the level of files or functions. You can test behaviors instead.
I think this is where things will ultimately head. You generate random code, purely random in raw machine readable binary, and simply evaluate a behavior. Most random generated code will not work. some, however, will work. and within that working code, some will be far faster and this is the code that is used.
No different than what a geneticist might do evaluating generated mutants for favorable traits. Knowledge of the exact genes or pathways involved is not even required, one can still select among desired traits and therefore select for that best fit mechanism without even knowing it exists.
The problem is that programming logic/state is discrete and not continous so you can't assume similar behaviour given "similar state", and that possible states grow exponentially. Selecting the desired state will mean writing an extremely detailed spec that is akin to a programming language, which is what Dijkstra hinted at in the past.
Maybe, but you won't be able to test all behaviors and you won't have enough time to try a million alternatives. Just because of the number of possibilities, it'll be faster to just read the code.
For this to work, you'd have to fully specify the behavior of your program in the tests. Put another way, at that point your tests are the program. So the question is, which is a more convenient way to specify the behavior of a program: a traditional programming language, or tests written in that language. I think the answer should be fairly obvious.
I can see a lot of negatives in relation to removing the human readable aspect of software development. Thorough testing would be virtually impossible because we’d be relying on fuzzing to iron out potential edge cases or bugs.
In this situation, AI companies are incentivised to host the services their tooling generates. If we don't get source code, it is much easier for them to justify not sharing it. Plus, who is to say the machine code even works on consumer hardware anyway? It leads to a future where users specify inputs while companies generate programs and handle execution. Everything becomes a black box. No thank you.
You're describing genetic algorithms: https://en.wikipedia.org/wiki/Genetic_algorithm
all fun and games until you need to debug the rats nest that you've been continually building. I am actually shocked people who have coded before have been one-shotted into believing this
We would of course need to specify the behaviors to test for. The more precisely we specify these behaviors, the more complexly our end product would be able to behave. We might invent a formal language for writing down these behaviors, and some people might be better at thinking about what kind of tests would need to be written to coax a certain type of end result out of the machine.
But that's future music, forgive a young man for letting his imagination run wild! ;)
> You generate random code,
Code derived from a training set is not at all "random."
This is a recurring fantasy in LLM threads but makes little sense. Writing machine code is very difficult (even writing byte code for simple VMs is annoying and error-prone). Abstractions are beneficial and increase productivity (per human, per token). It makes essentially no sense to throw away seven decades of productivity increasing technologies to have neural nets punch cards again, and it's not going to happen unless tokens become unimaginably cheap.
How do you handle the larger amounts of tests? I did this but my PRs are larger because more tests are needed
> ou generate random code, purely random in raw machine readable binary, and simply evaluate a behavior. Most random generated code will not work. some, however, will work. and within that working code, some will be far faster and this is the code that is used.
Humans are expensive but this approach seems incredibly inefficient and expensive. Even a junior can make steady progress against implementing a function, with your approach, just monkey coding like that could take you ages to write a single function. Estimates in software are already bad, they will get worse with your approach
And how exactly do you foresee probabilistic systems working out in real life? Nobody wants software that seldom does what they expect, and which tends to trend toward desirable behavior over time (where "desirable" behavior is determined by the sum of global feedback and revenue/profit of the company producing it).
Today you send some money to your spouse but it's received by another person with the same name. Tomorrow you order food but your order gets mixed up with someone else's.
Tough luck, the system is probabilistic and you can only hope that the evolutionary pressures influence the behavior to change in desirable ways. This fantasy is a delusion.
I can tell you why this won't go this way:
Customers.
When you sell them a technological solution to their problem, they expect it to work. When it doesn't, someone needs to be responsible for it.
Now, maybe I'm wrong, but I don't see any of the current AI leaders being like, "Yeah, you're right, this solution didn't meet your customer's needs, and we'll eat the resulting costs." They didn't get to be "thought leaders" in the current iteration of Silicon Valley by taking responsibility for things that got broken, not at all.
So that means you will need to take responsibility for it, and how can you make that work as a business model? Well, you pay someone - a human - who knows what they're looking at to review at least some of the code that the AI generates.
Will some of that be AI-aided? Of course. Can you make a lot of the guesswork go away by saying "use commonly-accepted design patterns" in your CLAUDE.md? Sure. But you'll still need someone to enforce it and take responsibility at the end of the day if it screws up.
Why should we throw away decades of development in determistic algorithms? Why tech people mentions "geneticists"? I would never select an algorithm with a "good" flying trait for making an airplane works, that's nuts