Figure 12 shows probabilities I think, it actually does seem to be 100% at temperature 0.1 for certain pretraining runs.
And this Figure 12 is not about Dyck/balanced-brackets grammar. This figure is about something not properly described in the paper.