The repo contains only the PDF, not actual runnable code for the RL training pipeline.
Publishing a high-level description of the training algorithm is good, but it doesn't count as "open-sourcing", as commonly understood.