Hacker News new | past | comments | ask | show | jobs | submit login

> 80% of enterprise data is unstructured

I've seen quotes like this many times. It's silly. I worked at a big bank for over a decade. 95% of the data we cared about was already in a SQL database. Maybe ~80% of our data was "unstructured", but it wasn't stuff we cared about for risk management or other critical functions.

> people are not willing to pay more money for less risk

I'd disagree here. Banks are willing to pay money to reduce risk, it's just unlikely to come from scraping data out of PDFs with an LLM because they've already done this if it's worth it.




Who, in your example, put the customers financial data in the SQL database? Because in my part of finance that’s either the customer, or an employee.

Our customers are asking for integration with a lot of their systems (say HR / patrolling), but never ever offer to hook up their accounting system. If we want financial data, we either get a PDF with their audited financial statement or in exceptional cases a custom audited statement (you know, the one where a print of a part of the ledger gets a signature from the CPA for a not insignificant bill).

So I am enthusiastic from a data science point of view. Financial data processing of customer data is / was scarce since limited to what was feasible to manually process. That is nearly in the past.


I created automated descision suppport systems in asset based finance. For daily needs you get customer financials and other risk data from both official national sources and the likes of Dunn and Bradstreet, Graydon etc. The choice of providers depends on both the customer and deal risk/size. While the "api"'s to these providers might be clunky (putting structured request file on an ftp server and polling for a response), the data is structured (enough) to process.

Deals that are exceptional enough get assigned to a risk officer that deals with it as a case (sidenote: they use a lott of selfmade Excel, VB and low-code tools as they never get IT priority for these cases). There is not enough uniformity as well as a decreased tolerance for inaccuracy in those to warrant extensive automation.


Thanks. I see the context now. Our asset managers are indeed lucky to have Bloomberg and such, which are easily integratable (and indeed, have been "SQL" for more than a decade now). I'm aware of the third party providers of customer financial information. Lucky to operate in a niche that is not served by them. Graydon (the only one I've been in contact with) is facing a massive disruption though. Their higher tiers are perhaps not enterprise expensive, but Trellis and the likes are probably more integratable and more affordable.

But still, the one building the LLM-integration is 4x as expensive as the one manually entering the data. It's all about TOC, scale and risk perception. I also love that "risk people" (in the banking context, I'd say: model people) think their data quality should be exceptional and then use end user computing MacGyver style models. Spit and popsicle sticks.


"think their data quality should be exceptional and then use end user computing MacGyver style models"

The choice they have is submit a formal request to IT, be rejected 95% of the time with the remaing 5% being put in the planning with an eta 2-5 years in the future, or, DIY it with tools at hand. In an ideal world this would not be needed, in reality it is DIY or nothing.


Yep, and nowadays, banks are already deploying this stuff internally via their own IT teams. They have 1-2 decades of having built up ETL/orchestration talent + infra, and have been growing deals with openai/azure/google/aws/databricks for the LLM bits. Internally, big banks are rolling out hundreds of LLM apps each, and generally have freezes on new external AI vendors due to 'AI compliance risk'. NLP commoditized so it's a different world.

It makes sense on paper from a VC perspective as a big bet.. but good luck to smaller VC-funded founders competing with massive BD teams fronting top AI dev teams. We compete in adjacent spaces where we can differentiate, and intentionally decided against going in head-on. For those who can, again, good luck!


Am I in another world? (See my response above.) Most of the ‘hundreds of LLM apps’ I see are, well, not very fancy and struggling to keep up on accuracy in comparison to the meatspace solutions they promised to massively outperform.

I agree with your assessment that the IT risk barrier is very high in big corp so that entry might be hard for Trellis. Plus a continuous push afterwards to go back to traditional cloud once their offerings catch up.


I totally agree, and it's useful to play out the shrinking quality gap over time:

- Today: Financial companies are willing to pay cloud providers for DB, LLM, & AI services, and want to paper over the rest with internal teams + OSS, and maybe some already-trusted contractors for stopgaps. Institutional immune system largely rejects innovators not in the above categories.

- Next 6-18mo: Projects continue, and they hit today's typical quality issues. It's easiest to continue to solve these with the current team, maybe pull on a consultant or neighboring team due to good-money-after-bad, and likely, the cloud/AI provider solves more and more for them (GPT5, ..., new Google Vertex APIs, ..)

- Next year or year after: Either the above solved it, or they make a new decision around contractors + new software vendors. But how much is still needed here?

It's a scary question for non-vertical startups to still make sense with the assumption that horizontal data incumbents and core AI infra providers don't continue to eat into the territory here. Data extraction, vector indexing, RAG as a service, data quality, talk to your data, etc. Throw in the pressure of VC funding and even more fun. I think there's opportunity here, but when I think about who is positioned wrt distribution & engineering resources to get at that... I do not envy founders without those advantages.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: