But starcraft training is not through mimicking human strategies - it was pure RL with a reward func...

Mentlo • today at 9:24 AM • 0 replies • view on HN

But starcraft training is not through mimicking human strategies - it was pure RL with a reward function shaped around winning, which allows it to emerge non-human and eventually super-human strategies (such as the worker oversaturation).

The current training loop for coding is RL as well - so a departure from human coding patterns is not unexpected (even if departure from human coding structure is unexpected, as that would require development of a new coding language).

alt Hacker News