To call this impressive is an understatement. Using a single GPU, outperforms mo...

wenc · on Nov 14, 2023

It builds on top of supercomputer model output and does better at the specific task of medium term forecasts.

It is a kind of iterative refinement on the data that supercomputers produce — it doesn’t supplant supercomputers. In fact the paper calls out that it has a hard dependency on the output produced by supercomputers.

carbocation · on Nov 14, 2023

I don't understand why this is downvoted. This is a classic thing to do with deep learning: take something that has a solution that is expensive to compute, and then train a deep learning model from that. And along the way, your model might yield improvements, too, and you can layer in additional features, interpolate at finer-grained resolution, etc. If nothing else, the forward pass in a deep learning model is almost certainly way faster than simulating the next step in a numerical simulation, but there is room for improvement as they show here. Doesn't invalidate the input data!

danielmarkbruce · on Nov 14, 2023

Because "iterative refinement" is sort of wrong. It's not a refinement and it's not iterative. It's an entirely different model to physical simulation which works entirely differently and the speed up is order of magnitude.

Building a statistical model to approximate a physical process isn't a new idea for sure.. there are literally dozens of them for weather.. the idea itself isn't really even iterative, it's the same idea... but it's all in the execution. If you built a model to predict stock prices tomorrow and it generated 1000% pa, it wouldn't be reasonable for me to call it iterative.

kridsdale3 · on Nov 14, 2023

It is iterative when you look at the scope of "humans trying to solve things over time".

danielmarkbruce · on Nov 14, 2023

lol, touche.

andbberger · on Nov 14, 2023

"amortized inference" is a better name for it

borg16 · on Nov 14, 2023

> the forward pass in a deep learning model is almost certainly way faster than simulating the next step in a numerical simulation

Is this the case in most of such refinements (architecture wise)?

danielmarkbruce · on Nov 14, 2023

Practically speaking yes. You'd not likely build a statistical model when you could build a good simulation of the underlying process if the simulation was already really fast and accurate.

westurner · on Nov 14, 2023

"BLD,ENH: Dask-scheduler (SLURM,)," https://github.com/NOAA-EMC/global-workflow/issues/796

Dask-jobqueue https://jobqueue.dask.org/ :

> provides cluster managers for PBS, SLURM, LSF, SGE and other [HPC supercomputer] resource managers

Helpful tools for this work: Dask-labextension, DaskML, CuPY, SymPy's lambdify(), Parquet, Arrow

GFS: Global Forecast System: https://en.wikipedia.org/wiki/Global_Forecast_System

TIL about Raspberry-NOAA and pywws in researching and summarizing for a comment on "Nrsc5: Receive NRSC-5 digital radio stations using an RTL-SDR dongle" (2023) https://news.ycombinator.com/item?id=38158091

whatever1 · on Nov 14, 2023

So best case scenario we can avoid some computation for inference, assuming that historical system dynamics are still valid. This model needs to be constantly monitored by full scale simulations and rectified over time.

silveraxe93 · on Nov 14, 2023

Could you point me to the part where it says it depends on supercomputer output?

I didn't read the paper but the linked post seems to say otherwise? It mentions it used the supercomputer output to impute data during training. But for prediction it just needs:

> For inputs, GraphCast requires just two sets of data: the state of the weather 6 hours ago, and the current state of the weather. The model then predicts the weather 6 hours in the future. This process can then be rolled forward in 6-hour increments to provide state-of-the-art forecasts up to 10 days in advance.

serjester · on Nov 14, 2023

You can read about it more in their paper. Specifically page 36. Their dataset, ERA5, is created using a process called reanalysis. It combines historical weather observations with modern weather models to create a consistent record of past weather conditions.

https://storage.googleapis.com/deepmind-media/DeepMind.com/B...

dekhn · on Nov 14, 2023

I can't find the details, but if the supercomputer job only had to run once, or a few times, while this model can make accurate predictions repeatedly on unique situations, then it doesn't matter as much that a supercomputer was required. The goal is to use the supercomputer once, to create a high value simulated dataset, then repeatedly make predictions from the lower-cost models.

silveraxe93 · on Nov 14, 2023

Ah nice. Thanks!

pkulak · on Nov 14, 2023

Why can't they just train on historical data?

_visgean · on Nov 15, 2023

ERA5 is based on historical data. See it for yourself https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysi..., https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-...

I don't using raw historical data would work for any data intensive model - afaik the data is patchy - there are spots where we don't have that many datapoints - e.g. middle of ocean... Also there are new satelites that are only available for the last x years and you want to be able to use these for the new models. So you need a re-analysis of what it would look like if you had that data 40 years ago...

Also its very convinient dataset because many other models trained on it: https://github.com/google-research/weatherbench2 so easy to do benchmarking..

xapata · on Nov 14, 2023

We don't have enough data. There's only one universe, and it's helpful to train on counter-factual events.

thatguysaguy · on Nov 14, 2023

They said single TPU machine to be fair, which means like 8 TPUs (still impressive)