I find it interesting that they quantify the improvement on speed and number of forecast-ed scenarios but lack details on how it results in improved accuracy of the forecast per:
``` WeatherNext 2 can generate forecasts 8x faster and with resolution up to 1-hour. This breakthrough is enabled by a new model that can provide hundreds of possible scenarios. ```
As an end user, all I care is that there's one accurate forecasted scenario.
For lay-users they could have explained that better. I think they may not have completely uninformed users in mind for this page though.
Developing an ensemble of possible scenarios has been the central insight of weather forecasting since the 1960s when Edward Lorenz discovered that tiny differences in initial conditions can grow exponentially (the "butterfly effect"). Since they could really do it in the 90s, all competitive forecasts are based on these ensemble models.
When you hear "a 70% chance of rain," it more or less means "there was rain in 70 of the 100 scenarios we ran."[0] There is no "single accurate forecast scenario."
[0] Acknowledging this dramatically oversimplifies the models and the location where the rain could occur.
As a end user I also want to see the variance to get a feeling of the uncertainty.
Quite a lot of weather sites offer this data in an easily eatable visual format.
Indeed. The most important benchmark is accuracy and how well it stacks up against existing physics-based models like GFS or ECMWF.
Sure, those big physics-based models are very computationally intensive (national weather bureaus run them on sizeable HPC clusters), but you only need to run them every few hours in a central location and then distribute the outputs online. It's not like every forecaster in a country needs to run a model, they just need online access to the outputs. Even if they could run the models themselves, they would still need the mountains of raw observation data that feeds the models (weather stations, satellite imagery, radars, wind profilers...). And these are usually distributed by... the national weather bureau of that country. So the weather bureau might as well do the number crunching as well and distribute that.
As others have explained, ensembles are useful.
As a layperson, what _is_ useful is to look at the difference between models. My long range favourite is to compare ECMWF and GFS27 and if the deviation is high (windy app has this) then you can bet that at least one of them is likely wrong
They integrated "MetNet-3" into Google products and my personal perception was accuracy decreased.
This is really important: You're not the end user of this product. These types of models are not built for laypeople to access them. You're an end user of a product that may use and process this data, but the CRPS scorecard, for example, should mean nothing to you. This is specifically addressing an under-dispersion problem in traditional ensemble models, due to a limited number (~50) and limited set of perturbed initial conditions (and the fact that those perturbations do very poorly at capturing true uncertainty).
Again, you, as an end user, don't need to know any of that. The CRPS scorecard is a very specific measure of error. I don't expect them to reveal the technical details of the model, but an industry expert instantly knows what WeatherBench[1] is, the code it runs, the data it uses, and how that CRPS scorecard was generated.
By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts (aka the ones you get on your phone). These are a piece of the puzzle, though, and not one that you will ever actually encounter as a layperson.
[1]: https://sites.research.google/gr/weatherbench/