This is great. I am very curious about the architectural decisions you've taken here. Is there a blog post / article about them? 80 yrs of historical data -- are you storing that somewhere in PG and the APIs are just fetching it? If so, what indices have you set up to make APIs fetch faster etc. I just fetched 1960 to 2022 in about 12 secs.
Traditional database systems struggle to handle gridded data efficiently. Using PG with time-based indices is memory and storage extensive. It works well for a limited number of locations, but global weather models at 9-12 km resolution have 4 to 6 million grid-cells.
I am exploiting on the homogeneity of gridded data. In a 2D field, calculating the data position for a graphical coordinate is straightforward. Once you add time as a third dimension, you can pick any timestamp at any point on earth. To optimize read speed, all time steps are stored sequentially on disk in a rotated/transposed OLAP cube.
Although the data now consists of millions of floating-point values without accompanying attributes like timestamps or geographical coordinates, the storage requirements are still high. Open-Meteo chunks data into small portions, each covering 10 locations and 2 weeks of data. Each block is individually compressed using an optimized compression scheme.
While this process isn't groundbreaking and is supported by file systems like NetCDF, Zarr, or HDF5, the challenge lies in efficiently working with multiple weather models and updating data with each new weather model run every few hours.