What is the goal of doing that vs using L2 loss?
The goal of using CRPS is to produce an ensemble that is a good probabilistic forecast without needing calibration/post processing.
[edit: "without", not "with"]
To encourage diversity between the different members in an ensemble. I think people are doing very similar things for MOE networks but im not that deep into that topic.
To add to the existing answers - L2 losses induce a "blurring" effect when you autoregressively roll out these models. That means you not only lose import spatial features, you also truncate the extrema of the predictions - in other terms, you can't forecast high-impact extreme weather with these models at moderate lead times.