Hacker News new | past | comments | ask | show | jobs | submit login

To call this impressive is an understatement. Using a single GPU, outperforms models that run on the world's largest super computers. Completely open sourced - not just model weights. And fairly simple training / input data.

> ... with the current version being the largest we can practically fit under current engineering constraints, but which have potential to scale much further in the future with greater compute resources and higher resolution data.

I can't wait to see how far other people take this.




It builds on top of supercomputer model output and does better at the specific task of medium term forecasts.

It is a kind of iterative refinement on the data that supercomputers produce — it doesn’t supplant supercomputers. In fact the paper calls out that it has a hard dependency on the output produced by supercomputers.


I don't understand why this is downvoted. This is a classic thing to do with deep learning: take something that has a solution that is expensive to compute, and then train a deep learning model from that. And along the way, your model might yield improvements, too, and you can layer in additional features, interpolate at finer-grained resolution, etc. If nothing else, the forward pass in a deep learning model is almost certainly way faster than simulating the next step in a numerical simulation, but there is room for improvement as they show here. Doesn't invalidate the input data!


Because "iterative refinement" is sort of wrong. It's not a refinement and it's not iterative. It's an entirely different model to physical simulation which works entirely differently and the speed up is order of magnitude.

Building a statistical model to approximate a physical process isn't a new idea for sure.. there are literally dozens of them for weather.. the idea itself isn't really even iterative, it's the same idea... but it's all in the execution. If you built a model to predict stock prices tomorrow and it generated 1000% pa, it wouldn't be reasonable for me to call it iterative.


It is iterative when you look at the scope of "humans trying to solve things over time".


lol, touche.


"amortized inference" is a better name for it


> the forward pass in a deep learning model is almost certainly way faster than simulating the next step in a numerical simulation

Is this the case in most of such refinements (architecture wise)?


Practically speaking yes. You'd not likely build a statistical model when you could build a good simulation of the underlying process if the simulation was already really fast and accurate.


"BLD,ENH: Dask-scheduler (SLURM,)," https://github.com/NOAA-EMC/global-workflow/issues/796

Dask-jobqueue https://jobqueue.dask.org/ :

> provides cluster managers for PBS, SLURM, LSF, SGE and other [HPC supercomputer] resource managers

Helpful tools for this work: Dask-labextension, DaskML, CuPY, SymPy's lambdify(), Parquet, Arrow

GFS: Global Forecast System: https://en.wikipedia.org/wiki/Global_Forecast_System

TIL about Raspberry-NOAA and pywws in researching and summarizing for a comment on "Nrsc5: Receive NRSC-5 digital radio stations using an RTL-SDR dongle" (2023) https://news.ycombinator.com/item?id=38158091


So best case scenario we can avoid some computation for inference, assuming that historical system dynamics are still valid. This model needs to be constantly monitored by full scale simulations and rectified over time.


Could you point me to the part where it says it depends on supercomputer output?

I didn't read the paper but the linked post seems to say otherwise? It mentions it used the supercomputer output to impute data during training. But for prediction it just needs:

> For inputs, GraphCast requires just two sets of data: the state of the weather 6 hours ago, and the current state of the weather. The model then predicts the weather 6 hours in the future. This process can then be rolled forward in 6-hour increments to provide state-of-the-art forecasts up to 10 days in advance.


You can read about it more in their paper. Specifically page 36. Their dataset, ERA5, is created using a process called reanalysis. It combines historical weather observations with modern weather models to create a consistent record of past weather conditions.

https://storage.googleapis.com/deepmind-media/DeepMind.com/B...


I can't find the details, but if the supercomputer job only had to run once, or a few times, while this model can make accurate predictions repeatedly on unique situations, then it doesn't matter as much that a supercomputer was required. The goal is to use the supercomputer once, to create a high value simulated dataset, then repeatedly make predictions from the lower-cost models.


Ah nice. Thanks!


Why can't they just train on historical data?


ERA5 is based on historical data. See it for yourself https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysi..., https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-...

I don't using raw historical data would work for any data intensive model - afaik the data is patchy - there are spots where we don't have that many datapoints - e.g. middle of ocean... Also there are new satelites that are only available for the last x years and you want to be able to use these for the new models. So you need a re-analysis of what it would look like if you had that data 40 years ago...

Also its very convinient dataset because many other models trained on it: https://github.com/google-research/weatherbench2 so easy to do benchmarking..


We don't have enough data. There's only one universe, and it's helpful to train on counter-factual events.


They said single TPU machine to be fair, which means like 8 TPUs (still impressive)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: