That's a fair surface-level view, but worth thinking through a bit. Palantir is ...

llamaimperative · 2024-01-08T04:09:29 1704686969

I mean the topic of this thread is data management. That’s their bread and butter.

It just doesn’t make sense to be having this conversation through the lens of AI.

lmeyerov · 2024-01-08T08:06:53 1704701213

AI leadership seems existential to being a top data management company and providing top data management capabilities:

* Databricks data management innovations, now that basics are in, are half on the AI side, like adding vector & LLM indexing for any data stored in it, moving their data catalog to be LLM-driven, adding genAI interfaces to accessing data stored in it, ...

* Data pipelines spanning ingestion, correction, wrangling, indexing, integration, and feature & model management, and especially of the tricky unstructured text, photo, and video nature, and wide nature of event/log/transaction recordings important to a lot of the government, are all moving or have already moved to AI. Whether it is monitoring video, investigating satellite photos, mining social media & news, entity resolution & linking on documents & logs, linking datasets, or OCR+translation of foreign documents, these are all about the intelligence tier. Tools like ontology management and knowledge graphs are especially being reset due to the ability of modern LLMs to drastically improve their quality and improve their scalability & usability through automation.

* Data protection has long been layering on AI methods for alerting (UEBA, ...), classification, policy synthesis, configuration management, ...

Databricks is a pretty good example of a company here. They don't preconfigure government datasets on the governments behalf and sell that back to them, but we do see architects using it as a way to build their own data platforms, and especially for AI-era workloads. Likewise, they have grown an ecosystem of data management providers on top vs single-sourcing, eg, it's been cool to see Altana bring supply chain data as basically a Databricks implementation. For core parts, Databricks keeps adding more of the data management stack to their system, such as examining how a high-grade entity resolution pipeline would break down between their stack and ecosystem providers.