There’s a ton of crossover between your method and RL. I guess instead of directly training on episodes and updating model weights, you just store episodes in RAM and sample from the most promising ones. It could be a neat way of getting out of infamous RL cold start by getting some examples of rewards. Thanks for sharing.
AI is much more powerful than human in the closed fields, like game and defense. AlphaGo proved that at first.
We built an autonomous testing example that plays Super Mario Bros. to explore how behavior models combine with autonomous testing. Instead of manually writing test cases, it systematically explores the game's massive state space while a behavior model validates correctness in real-time- write your validation once, use it with any testing driver. A fun way to learn how it all works and find bugs along the way. All code is open source: https://github.com/testflows/Examples/tree/v2.0/SuperMario.
[dead]
The start of the article is good, but it starts to sound like LLM staring at the "Why this maps to Genetic Algorithms?" section. Is that the case?