Long tails and freak weather are the hottest topics of research in the area of data-driven weather forecasting. ECMWF, highlighted in this article, is attempting to extend its ML forecast system to ensemble predictions (https://www.ecmwf.int/en/about/media-centre/aifs-blog/2024/e...).
If these methods work, they'll likely improve our ability to model long tails. Traditional NWP is extremely expensive, so cutting-edge models can have either high resolution xor large ensembles. You need high resolution for the detail, but you need large ensembles to see into the tails of the distribution; it's a persistent problem.
In inference, ML-based models run a bit over two orders of magnitude faster than traditional NWP, with the gains split between running on GPUs (possibly replicable) and fantastic levels of numerical intensity thanks to everything being matrix-matrix products (much harder to replicate with conventional algorithms). That opens a lot of freedom to expand ensemble sizes and the like.
The limitations of the non-physics-based models is under rapid exploration and the bounds of their utility will be a lot more clear in a year or two. For now, they seem to outperform physics-based models in larger scales. The future may be a hybrid: https://arxiv.org/abs/2407.06100v2
Well, as I said, I would expect them to outperform at large scales, specifically because they're learning and memoising large, stable patterns (ed. in the sense of teleconnections) at low wavenumbers.
I hope they have a switch to turn it off if we ever mess up and go back to a single-cell Hadley configuration :)
Hybrid models are almost certainly the future, i.e., developing structural equations and filling in data or modeling gaps with relatively small NNs for approximation.
Hybrid models have the conceptual edge, but it's not yet obvious that they'll become the dominant AI forecasting paradigm.
The hybridization acts as a strong regularizer. This is a good thing, but it's not yet obvious that it's a necessary thing for short to medium-term forecasts. There seems to be enough extant data that pure learning models figure out the dynamics relatively easily.
Hybrid models are more obviously appropriate if we think about extending forecasts to poorly-constrained environments, like non-modern climates or exoweather. You can run an atmospheric model like WRF but with parameters set for Mars (no, not for colonization, but for understanding dust storms and the like), and we definitely don't have enough data to train a "Mars weather predictor."
The difficulty in training NeuralGCM is that one has to backpropagate over many dynamics steps (essentially all the time between data snapshots) to train the NN parameterizations. That's very memory-intensive, and for now NeuralGCM-like models run at coarser resolutions than fully-learned peers.
a senior weather modeler that is US-based but works on some of the Euro models spoke recently about the use of AI from his perspective. The comments were more than half communication skills with the audience, so it was not a math lecture. Overall I believe that this UC Berkeley speaker said almost exactly what the parent-post says.
How do ML models do with chaotic systems in general? I am a beginner in terms of ML, but based on my limited knowledge it seems like ML models would generally do fairly poorly with chaotic systems. Then again, maybe with enough data it doesn't really matter.
I think there is potential in this approach, but I don't think we have the training data yet. We have 100+ years of storm tracks, but we don't have 100+ years of surface observations at the mesonet granularity (like, every city has a reliable weather station). We only measure the upper atmosphere from a few points twice a day. I think that the "chaos" and "butterfly effect" type influences can be controlled, but probably not without really granular surface data, including over the ocean.
So while GPUs are ready to crunch the numbers, we don't actually have the numbers yet.
Seems to be a "pattern" in AI for weather forecasting --
1. Due to AI hype and funding (even predecessor hype cycles like "big data", ML, even GOF time-series statistics), generate 100s of AI models.
2. Of these, a few perform better than current models and are seen as the "future".
3. One year later models in 2. fail catastrophically.
4. Go back to step 1.
At first blush, AI seems to be fantastic for weather prediction:
1. Hugely chaotic model, which means deterministic prediction is extremely expensive, and accurate prediction more so.
2. Certain trends seem to dominate the observed time period, hence the field of meteorology.>
3. AI can clue you in as to which trends seem to be most likely to dominate, or if uncertainty is too high to predict, without the investment of major labor costs.
While I'm not sure what the market offers, I can say that if the demand-side seems unsatisfied this is the perfect place for a state-operated solution to enter: a zero-margin service where quality in the long-term is demanded over short-term profits.
Well with "AI" all we're talking about is theory-free frequentist modelling.
We can already do that for the weather: just take the mode of the last 10 years of the weather on this day, and some sort of weighted mean for temp/humidity/etc.
All "AI" is doing here is overfitting this process. On the overfit range, yes, you'll get better peformance. But the baseline performance here, from taking averages, is already pretty good.
What we need for the "future" of weather is long-tail events which cannot be modelled by curve-fitting to historical data.
This is simply not true and an uninformed opinion of what modeling physical events with AI looks like.
You seem to assume it’s a purely data-driven approach, it is not. You could use a physics informed neural network (PINN) that utilizes both approaches. That is, it uses both historical data (and likely synthetic data from physics models), as well as physical equations in the loss function (in this case atmospheric as fluid equations) as part of the training. It can truly be the best of both worlds if approached correctly.
That being said, 99% of AI out there is just masters thesis level data in -> prediction out, but that is far from what the useful AI models that are currently being developed to predict and forecast dynamical physical systems.
Additionally, you can generate synthetic data with the physical models of “edge” and “tail” events to train the model on. This by itself allows the ML model to be able to model almost all events that we can physically model, so at its base it’s at least as useful as the big O order models we use while being orders of magnitude faster. This doesn’t even account for using the physical equations to assist in training the model directly (through architecture tricks or in the loss function).
Source: I work on AI models that merge data and physics for dynamical physical systems
> That being said, 99% of AI out there is just masters thesis level data in -> prediction out, but that is far from what the useful AI models that are currently being developed to predict and forecast dynamical physical systems.
I think one surprising outcome of the recent wave of ML-based weather forecasting models is just how accurate the "dumb" approaches are. Pangu-Weather and a couple of its peers are essentially vision transformers without much explicit physics-informed regularization.
If you have explanatory models constraining the space of possible function fits, etc. etc. then I concede the point -- though, I rather regard it as my point.
The comment I replied to used "AI" in its generic sense which I take to name the theory-free frequentist stats currently in vogue. I don't regard theories as AI -- so adding physics to a NN is, in large part, computational physics. You can call it "AI", but then so-goes any use of a computer model of any kind.
Well, the difference is the data-driven aspect of parts of the model. While its constrained by physics during the learning process it isn't just running a forward physics model to get the solution. The upfront computational load and extremely fast inference times through parameterization IS what makes it AI, and what makes it useful versus a normal numerical computer model.
Physics has used "empirical/phenomenological models" where curve-fitting to data has served to preclude the need for simulation, or if it's computationally intractable, etc. I'd agree that it had been underused, since I'd say such modelling is held somewhat in contempt as giving up on doing physics.
Do you have a paper that discusses any of this work in these terms? I'm presently writing a larger survey on XAI towards a theory-informed approach, and it seems these mixed models might have some novel explanatory upside/needs. At the moment i'm inclined to partition the world into theory-based and theory-free.
Although it's obviously difficult to crack open an ML model, they do perform enough computation to have potentially learned something like the dynamical equations for the atmosphere.
At the same time, some ML models are surprisingly parsimonious. Graphcast has about 37 million trainable parameters, but its output is a forecast (increment) of six variables on a 37-level, quarter-degree lat/lon grid. That's about 235 million outputs for a single forecast date, so it's safe to conclude that Graphcast cannot memorize its training set.
Researchers are also explicitly interested in probing the out-of-sample behaviour of ML models for plausibility. A paper last year by Hakim and Masanam (https://arxiv.org/abs/2309.10867) put Pangu-Weather through some simplified but out-of-sample test cases and saw physically plausible outputs, so the ML models have at least not fallen at the first hurdle.
Meanwhile, it's also not quite correct to give traditional models an automatic pass for out-of-sample behaviour. The large-scale dyanmical equations of the atmosphere are well-understood, but so much of the chaos comes from poorly-resolved, poorly-modeled, or poorly-constrained processes near and below the grid scale. The microstructure of clouds, for example, is completely invisible to models that must necessarily run at kilometer or tens-of-kilometer scales. Operational weather models rely on parameterizations to close the system and statistical correlations to assimilate observational data.
As far as I'm aware, all of the operational weather models missed the rapid intensification of hurricane Otis last year, an out-of-sample event with deadly consequences.
There really isn't anything to crack open. The models are curves fit to data, the units of the weights are whatever the units of the data are... so, eg., if fit to temp data, then temp.
If you draw a line through measurement data of one kind, you arent getting a function of another: a function is just a map within this space.
Why it should be that drawing a line around shadows is a good prediction for future shadows isn't very mysterious -- no more and no less regardless of the complexity of the object. There isn't anything in the model here which explains why this process works: it works because the light casts shadows in the future the same way it does in the past. If the objects changed, or the light, the whole thing falls over.
Likewise, "generalization" as used in the ML literature is pretty meaningless. It has never hitherto been important that a model 'generalizes' to the same distribution. In science it would be regarded as ridiculous that it could even fail to.
The science sense of generalisation was concerned with whether the model generalizes across scenarios where the relevant essential properties of the target system generated novel distributions in the measure domain. Ie., the purpose of generalization was explanation -- not some weird BS about models remembering data. It's a given that we can always replay-with-variation some measurement data. The point is to learn the DGP>
No explanatory model can "remember" data, since if it could, it would be unfalsifable. Ie., any model build from fitting to historical cases can never fail to model the data, and hence can never express a theory about its generation.
> weather models missed the rapid intensification of hurricane Otis last year
Which happened because there was very little data to feed into the models. AI isn't going to help with this. The Atlantic Ocean and Gulf of Mexico have tons of data-collecting bouys and the Hurricane Hunter aircraft fly from the eastern US. Hurricane Hunters that go to the Pacific fly out of Mississippi, which adds quite a lot of latency to the data collection probes.
We should be adding more bouys to the Pacific, and need to add a Hurricane Hunter crew in San Diego (or perhaps the government of Mexico would like to host and pay for them).
Then we can start seeing what the models and AI will do.
I'm not up to date on the latest literature re: the Otis miss. Is the conventional thought that the ocean was in fact warmer than the models supposed, either at the surface or with a warmer upper-mixed layer?
If the problem was lack of constraint from data, this is still fixable in a probabilistic sense: we'd "just" (noting it's not that simple) need to assume more variability in ocean conditions in data-poor regions.
> What we need for the "future" of weather is long-tail events which cannot be modelled by curve-fitting to historical data.
Yes, it seems like weather forecasting is a simulation problem not a low-shot prediction problem. I assume it's one of the computationally irreducible problems Stephen Wolfram talks about
This has not really been my experience in my information bubble. But I admit I haven't kept up too well with the latest models and their failure modes.
Can you have provide any examples of the scenario you described?
IBM tried with Watson based modeling on weather.com and it was a spectacular failure. Hopefully things have improved enough to beat traditional weather models, but I honestly don't know enough about this area to speculate. Classical modeling, I.E. non neural network models, have been iteratively improved on for decades by the brightest minds in weather science and are barely able to eek out something close to a 50% accuracy in forecast. About 5 years ago that was 10x better than the AI based modeling available, It would be cool to see some advancement here!
Should we not get comparisons and absolute values based on true|false positives|negatives?
The most relevant parts seem to be:
> In seconds... [the ML based] GraphCast can produce a 10-day forecast that would take a supercomputer [crunching through traditional deterministic methods] more than an hour
> GraphCast outperformed the best forecasting model of the European Center for Medium-Range Weather Forecasts more than 90 percent of the time
> Dr. Lam said the study found that GraphCast locked in on landfall in Nova Scotia three days before the supercomputers reached the same conclusion
> the European center was considered the world’s top weather agency because comparative tests have regularly shown its forecasts to exceed all others in accuracy
> Weather experts say the A.I. systems are likely to complement the supercomputer approach because each method has its own particular strengths.
So it's not really that exciting, it's just a new tools for weather forecasting adding to the existing tool chain. I feel like this is only a news story because it's AI. New mathematical models are developed, but don't make it into the mainstream news, because math is boring.
Yeah statistical games with historical datasets on the verge of a new climate epoch threatening us with unprecedented weather events is a great way to be caught by surprise when we can least afford to.
Umbrella marketing terms like this are misleading, and hype adjacent tech like GenAI.
"AI" is a family of statistical techniques, and weather forecasting has been using many of them for decades. Are GNNs displayed here more "AI" than the usual stochastic monte-carlo simulations ? I guess the word "Math" is not sexy anymore
I currently develop a ml model to forecast temperature (spatial resolution 10 m x 10m) by leveraging these models (in this case Climax by Microsoft https://github.com/microsoft/climax ). Feel free to ask!
Do you use WRF as an input? I think graphcast uses some kind of NWP input (qhich in my mind i've equated to WRF). Thank you for offering to answer questions!
We use a commercial meteorology service (aka Openweathermap with resolution 500m x 500m ) as one input so they might use WRF data (and indirectly my model as well).
OWM was chosen, because they provide an easy api for providing general weather data.
Last I checked the AI models still used the same initial conditions used in models like the GFS and ECMWF which is a combination of a previous forecast adjusted with observations.
They've been using NN for this field for a long time as I understand it. Electricity demand, wind...etc. Plenty of vendors already out there since before AI became trendy again.
> Honestly I'd be surprised if we weren't already using ML in weather models.
Statistical modeling has long been used for "post-processing" the raw, gridded outputs of numerical models. This performs fairly well, and it is intended to correct for local factors that are not observable at the scale of the underlying model. For example, local topology might channel winds into a few primary directions, but a O(100m)-scale valley is too fine to show up in the large-scale grid of a global weather model.
You can also consider data assimilation to be a form of machine learning. ECMWF's main, traditional weather model essentially uses a form of backpropagation to find the Jacobian of observation values with respect to the input weather state, and from there it's straightforward (but far from simple) to optimize the weather system. Other centres take related approaches, such as ensemble Kalman filtering (when the underlying model is not so easily differentiated with respect to its inputs).
The remarkable product of the last year or two has been the discovery that full, relatively unconstrained ML models can do a good job at the core NWP prediction task. It was not at all obvious that the historic record (the ERA5 reanalysis, mostly: https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-...) was large enough for pure data-driven models to learn appropriate responses.
The images with the forecast don’t look like any forecasts I was watching. They predicted pretty well even on July 1st. The majority of models predicted it would go up like it did. The trend between EPS, GEFS, UKMET, and CMC all had it going up. Not saying an AI LLM won’t help, but this seems a little disingenuous.
We have efficiently computable state-update equations available for weather. If neural nets are actually able to outperform those in general (and not just because of some form of sampling bias), why would that be?
1. They're exploiting information not available in the raw inputs to those weather models. Perhaps the number of cars driving vs parked in garages on a given day, contributing some local heat and perturbing PDE solutions away from the optimum. If this is the case, there's probably a fruitful line of research in using neural nets to estimate unmeasured parameters and using those directly in the applicable physics.
2. They just look better, e.g. by comparing raw neural outputs to sanitized weather reports or by using a metric biased toward low prediction variance at the expense of accuracy or something (this could be as simple as grading the neural net's probability distribution against the weather model's raw predicted output too, not considering the epistemological tweaked predictions the physics model also produced, thus implicitly giving the neural net and edge because the physics model can't get partial credit, even when worthy of it's and the AI can). Those sorts of mistakes are shockingly easy to make, even with much thought and careful review. Time will tell, I hope.
3. The gains are in some form of auxiliary data, like cheaply extrapolating a coarse forecast into a fine-grained forecast (superresolution/hallucination), or cheaply approximating a single weather estimate without fleshing out the details of the entire grid. This is potentially incredibly valuable, at the obvious cost of occasionally being very wrong. Use such predictions with care.
I don't think that list is exhaustive, but I'm not holding my breath too much either. Gaining an extra day of hurricane touchdown accuracy, for example, takes immense amounts of data. We've slowly made those gains, but for a neural net to do noticeably better there would have to be major problems in our original problem formulation.
Slightly off-topic:
I'd love an easy way to get conditional probabilities in my forecasts. Some sort of raw, fine-grained data allowing those computations would be a god-send. When a forecast says there's a 30% chance of rain each hour, do they mean we're definitely getting rain and don't know which hour (like the very tall, narrow thunderstorms often traversing west->east in tornado alley), do they mean we're getting a spotty amount of rain that entire period (drizzling off and on like in southeast Alaskan summers), do they mean there's a 30% chance the storm hits us and rains continuously and a 70% chance it skirts by (it's commonly easy to predict there will be a storm but not necessarily exactly where or when with respect to communities/times on the boundaries)? Those have drastic impacts on my plans for the day, and the raw data has that nuance, but it's not obtainable from any weather report I've seen.
My niece recently asked me why it's important to make weather forecasts. I had never really thought about it myself. But it is a very important job! Knowing the weather can affect decisions about travel, outdoor events and work schedules; forecasts help people prepare for severe weather conditions, such as storms, hurricanes or extreme temperatures, potentially saving lives and reducing injuries!
Pilots usually need to fly regardless. They reroute based on present data, not future. Bad weather still usually cancels all flights at an airport since they can't afford to close down the x% of the time when the local forecast is wrong. I suppose the airport might get the de-icers out of the garage early, but prediction, rather than current state, seems less important for pilots.
> emergency services
Do you have an example? I believe staffing is almost constant for these, since most are union, until extra are needed. I would suspect the only difference would be "it's going to rain heavy next week. you guys on call should expect to be called".
There are military, commercial, and private pilots. Each group has different look-ahead needs and approaches to planning and routing flights. If you want to give your brain a little exercise and learn new stuff, read up on various levels of pilots' licenses, endorsements, type ratings, etc. Add to that limits imposed by the aircraft (max operating altitude, speed, de-icing equipment, weight, etc.), airspace classes (how high can you/have to fly, will you be cleared to fly above weather), and jurisdictions (does my aircraft rating allow me to land at my destination given the weather forecast in that area). When you do that, you will understand how important weather forecasting is for pilots.
As for emergency services, it's good to know if/when things are going to get biblical, because you may need to transport additional machinery and personnel to another area, or you may need to prep temporary shelters.
I've never known someone so mindful of the weather forecast as my uncle the farmer: is it too soon to plant? Should I irrigate or wait just a few days longer? If I don't harvest now will my fields get too muddy for the tractor?
Agriculture is a big one. We have an independent forecaster locally that posts on Facebook. People tend to check in with him around hurricane and snow storm times. It seems like his paying clients are mainly farmers.
Nothing suits a generative AI better, in terms of quotidian prognostication, than telling you whether it will be vaguely sunny-ish in the next 24 to 48 hours.
Whatever next? Perhaps that most egregious example of just about vaguely getting it right enough of the time to seem plausible: AI horoscopes?