I'd love to know (in as much detail as you are allowed) what you feel the streng...

c1b · 2024-01-07T20:29:18.000000Z

I cannot comment specifically on CHEETAS, but what I can say is that USG developing in house software solutions almost always produces a disastrous product that goes over budget and has extreme maintenance overhead.

To see why, you can simply ask yourself: do you think that the unelected officials overseeing government agencies that embark on enterprise software development projects have sufficient expertise and enterprise software project management experience to be able to do this well?

Furthermore, do you think that the quality of engineers that the NHS or DoD can attract with less than half of the compensation of an actual software company stands a chance at developing something good in house?

It’s unfortunately almost impossible for these projects to go right.

ghc · 2024-01-08T14:45:51.000000Z

CHEETAS isn't really developed in house though; it's mainly developed by Dell. Certainly the leadership is USG-associated, but I think the leadership is actually really good. Unfortunately I seem to be unable to get _real_ access to CHEETAS and finding anyone who has worked with it is a challenge.

I suspect underneath it's mostly Hadoop but it's impossible to separate the roadmap from the implementation without getting my hands on it.

lmeyerov · 2024-01-07T21:17:13.000000Z

Interesting, thank you for sharing!

That experience speaks more to the perils of in-housing, not to why Palantir is the best COTS for specific needs here. Are there specific leading COTS here you view it so far ahead of for such a contract?

Closer to our own practice.. Modern LLMs have basically reset the field for SOTA in this space, with Palantir, by definition, being behind OpenAI in the most basic tasks, and thus being in the same race as everyone else to retool. Speaking from our own USG experience, we are deep tech leads in some other intelligence areas (graph, ...), and before OpenAI, often chose to adopt prev-gen leading LLM models (BERT, ...) for tasks closer to the NLP side as we recognized that wasn't where our deep tech had an inhouse advantage. We basically had to start over on some of those projects there as soon as GPT4 came out because it just changed so much that the incumbent advantages of already being delivering on a contract were a dead end for core functionality, and almost a year later, it's now obvious that it was the right choice when we get compared to companies that haven't been. Palantir has been publicly resetting as well for using GenAI era tech, which suggests the same situation.

llamaimperative · 2024-01-08T00:09:42.000000Z

It seems like you don’t know what Palantir is. Nothing OpenAI does is competitive with what Palantir does. Palantir, like every other software company out there, is exploring what “my product + AI” means.

lmeyerov · 2024-01-08T00:22:08.000000Z

That's a fair surface-level view, but worth thinking through a bit.

Palantir is multiple main things, and a whole ton of custom software projects on top, and a good chunk of them rely on the quality of their NLP & vision systems for being competitive with others. My question relates to the notion that they are inherently the best when, by all public AI benchmarks, they don't make the best components and, in the context of air-gapped / self-hosted government work, don't even have access to them. Separately, I'm curious how they relate to their COTS competitors (vs gov inhouse) given the claims here. For example, their ability to essentially privatize and resell the government's data to itself and make that into a network effects near-monopoly is incredible, but doesn't mean the technology is the best.

I've seen cool things with them, and on the flip side, highly frustrated users who have thrown them out (or are being forbidden to.) It's been a fascinating company to track over the years. I'm asking for any concrete details or comparisons as, so far, there is zero in the claims above, which is more consistent with their successful gov marketing & lobbying efforts than technical competitiveness.

llamaimperative · 2024-01-08T04:09:29.000000Z

I mean the topic of this thread is data management. That’s their bread and butter.

It just doesn’t make sense to be having this conversation through the lens of AI.

lmeyerov · 2024-01-08T08:06:53.000000Z

AI leadership seems existential to being a top data management company and providing top data management capabilities:

* Databricks data management innovations, now that basics are in, are half on the AI side, like adding vector & LLM indexing for any data stored in it, moving their data catalog to be LLM-driven, adding genAI interfaces to accessing data stored in it, ...

* Data pipelines spanning ingestion, correction, wrangling, indexing, integration, and feature & model management, and especially of the tricky unstructured text, photo, and video nature, and wide nature of event/log/transaction recordings important to a lot of the government, are all moving or have already moved to AI. Whether it is monitoring video, investigating satellite photos, mining social media & news, entity resolution & linking on documents & logs, linking datasets, or OCR+translation of foreign documents, these are all about the intelligence tier. Tools like ontology management and knowledge graphs are especially being reset due to the ability of modern LLMs to drastically improve their quality and improve their scalability & usability through automation.

* Data protection has long been layering on AI methods for alerting (UEBA, ...), classification, policy synthesis, configuration management, ...

Databricks is a pretty good example of a company here. They don't preconfigure government datasets on the governments behalf and sell that back to them, but we do see architects using it as a way to build their own data platforms, and especially for AI-era workloads. Likewise, they have grown an ecosystem of data management providers on top vs single-sourcing, eg, it's been cool to see Altana bring supply chain data as basically a Databricks implementation. For core parts, Databricks keeps adding more of the data management stack to their system, such as examining how a high-grade entity resolution pipeline would break down between their stack and ecosystem providers.