IBM and NASA open-source largest geospatial AI foundation model on Hugging Face

RosanaAnaDana · on Aug 5, 2023

This is such a weird headline and dataset. It's not a very large model, esp for geospatial. And the data set is microscopic, not even 1k image tiles.

A typical geospatial UNET would be trained on any where from 10x to 100x this much data.

This is more like a toy dataset I would give an intern to play on. But to be clear, one would need much much much more data do do something interesting on. Likewise, there are a lot of data filtering and data processing considerations that come into play with satellites like clouds, ascension or descenion, averaging to try and get fewer clouds. Satellite and all remote sensing ML is tricky stuff.

joshvm · on Aug 6, 2023

I'm not sure this is a correct assessment. The baseline ViT model (https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M) is presumably trained on a much larger dataset from HLS (https://hls.gsfc.nasa.gov/) in an unsupservised setup.

Vanilla UNET is around 7-8M parameters, this is 100M(?) so the model itself is an order of magnitude larger. There are larger models though as pointed out in the other Hacker News thread.

The fine-tuning datasets are much smaller, but that's the point - they don't need to be large, because of the foundation model underneath.

pm_me_your_quan · on Aug 5, 2023

Yeah, I'm surprised they released the demo about multi-temporal crop prediction. Their accuracy is, frankly, pretty terrible. It's basically what I managed the first time I tried to run a classifier against the CDL dataset across years.

fnordpiglet · on Aug 6, 2023

I feel like this is the interesting bit:

It will be the largest geospatial foundation model on Hugging Face and the first-ever open-source AI foundation model built in collaboration with NASA.

Perhaps it’s a sad statement that this is the largest GIS model on HF etc, but at least it’s out there. I would love to see more and better and larger and less entangled with IBM or other megacorps out there.

version_five · on Aug 5, 2023

Discussed two days ago, 296 points and 78 comments: https://news.ycombinator.com/item?id=36985197

pkdpic · on Aug 5, 2023

In case anyone's wondering what the model does (like I was).

> With additional fine tuning, the base model can be redeployed for tasks like tracking deforestation, predicting crop yields, or detecting and monitoring greenhouse gasses. IBM and NASA researchers are also working with Clark University to adapt the model for applications such as time-series segmentation and similarity research.

hardware2win · on Aug 5, 2023

Whats the deal about hf?

geraltofrivia · on Aug 6, 2023

HuggingFace or HF was a incorporated entity writing some well engineered and easy to use Deep Learning libraries. Notably, around the time when the Transformer paper was released, and when the NLP community started using them en masse, HF’s implementation became very popular. They started hosting pretrained models that you can download and start using in a single python line. Then they started doing the same for common datasets. Now it’s one of the larger AI startups with a brilliant set of programmers, engineers, and have multiple revenue streams (I’m guessing).

RosanaAnaDana · on Aug 5, 2023

Think of it like a GitHub for sharing datasets and weights.

I'd like 'huggingface' to the face hugger alien. You can just rip off the front and back of models and train them on new data. Although I've been told explicitly by the HF this is not the case, it's my preferred head cannon.

Culonavirus · on Aug 6, 2023

> Think of it like a GitHub for sharing datasets and weights

I give it 3 years before the inevitable Microsoft acquisition.

RosanaAnaDana · on Aug 8, 2023

Ugh puke in my mouth, but I agree. HF is too useful to not be nerfed the toxicity that is modern big tech.

codersfocus · on Aug 6, 2023

It was a consumer facing AI chat app similar to Replika. They pivoted to being a repository for AI stuff.

phero_cnstrcts · on Aug 5, 2023

Dunno, but I’m not sure I like the name.

kbutler · on Aug 5, 2023

AI model-sharing platform, named after the emoji: https://blog.emojipedia.org/emojiology-hugging-face/

(I don't know that I've ever attempted to paste an emoji into HN before, and I'm rather glad to learn it strips them out, though they're fine in other contexts.)

flangola7 · on Aug 6, 2023

Seems kind of biased to prevent some Unicode characters but not others. Those characters exist for a purpose: to more accurately convey human communication.

version_five · on Aug 5, 2023

I don't like it, on one hand it's really sappy in a non-endearing way, on the other, https://avp.fandom.com/wiki/Facehugger (which incidentally is probably some foreshadowing of the time when the VCs start trying to get their returns)

It's mildly embarrassing to have to refer to it in a professional context.

Roark66 · on Aug 6, 2023

Wait, it's a 100M model. It's certainly very good they open sourced it, but its definitely not big considering we have 330B models. Perhaps it's big for this type of a model?

I for one would love to see a lot more highly capable small models that can be run on mobile devices and if on desktops, they don't need fiber to download.

DrZootron · on Aug 6, 2023

> has demonstrated to date a 15 percent improvement over state-of-the-art techniques

A-mazing

Aliza657 · on Aug 6, 2023

This is an exciting and commendable collaboration between IBM and Hugging Face, together with NASA's involvement, to democratize access to AI technology and further climate and Earth science research. The announcement of the open-source availability of IBM's watsonx.ai geospatial foundation model, built from NASA's satellite data, on Hugging Face is a significant step forward in advancing AI applications for climate science.

The fact that this model is trained on Harmonized Landsat Sentinel-2 satellite data and fine-tuned on labeled data for flood and burn scar mapping, resulting in a 15 percent improvement over state-of-the-art techniques using half as much labeled data, is impressive. This model has immense potential to aid in various environmental tasks, such as tracking deforestation, predicting crop yields, and monitoring greenhouse gases, making it a valuable tool for addressing pressing environmental challenges.

Moreover, the commitment to open-source principles and information sharing demonstrated by IBM, Hugging Face, and NASA is laudable. By open-sourcing the model and datasets, they are enabling researchers and scientists worldwide to access and utilize this valuable resource, fostering collaboration and accelerating progress in the field of AI.

Additionally, it's great to see IBM's dedication to creating flexible and reusable AI systems, as well as their focus on developing models that can be adapted for different tasks and scenarios. The commercial availability of the geospatial model through the IBM Environmental Intelligence Suite later this year further underscores IBM's commitment to advancing AI technologies for practical applications.

In summary, this collaboration represents a significant step forward in utilizing AI for the betterment of our planet and addressing climate and Earth science challenges. It's heartening to see leading organizations coming together to harness technology for positive global impact and promoting the open sharing of knowledge to foster innovation and progress.

anigbrowl · on Aug 6, 2023

Don't do chat GPT summaries on HN. Everyone here knows how to do it themselves and it doesn't contribute anything useful.

pavs · on Aug 6, 2023

Whats the point of gpt summarization as a comment, and not mention it?

dr_dshiv · on Aug 6, 2023

This set off my GPT detector

dopidopHN · on Aug 6, 2023

How does the detector works? Each paragraph starting by « furthermore » or « moreover » ?