This isn't quite RL, right...? It's an evolutionary approach on specifically labeled sections of code optimizing towards a set of metrics defined by evaluation functions written by a human.
I suppose you could consider that last part (optimizing some metric) "RL".
However, it's missing a key concept of RL which is the exploration/exploitation tradeoff.