Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Credal.ai (YC W23) – Data Safety for Enterprise AI
114 points by r_thambapillai on June 14, 2023 | hide | past | favorite | 24 comments
Hi Hacker News! We’re Ravin and Jack, the founders of Credal.ai (https://www.credal.ai/). We provide a Chat UI and APIs that enforce PII redaction, audit logging, and data access controls for companies that want to use LLMs with their corporate data from Google Docs, Slack, or Confluence. There’s a demo video here: https://www.loom.com/share/2b5409fd64464dc9b5b6277f2be4e90f?....

One big thing enterprises and businesses are worried about with LLMs is “what’s happening to my data”? The way we see it, there are three big security and privacy barriers companies need to solve:

1. Controlling what data goes to whom: the basic stuff is just putting controls in place around customer and employee PII, but it can get trickier when you also want to be putting controls in place around business secrets, so companies can ensure the Coca Cola recipe doesn’t accidentally leave the company.

2. Visibility: Enterprise IT wants to know exactly what data was shared by whom, when, at what time, and what the model responded with (not to mention how much the request cost!). Each provider gives you a piece of the puzzle in their dashboard, but getting all this visibility per request from either of the main providers currently requires writing code yourself.

3. Access Controls: Enterprises have lots of documents that for whatever reason cannot be shared internally to everyone. So how do I make sure employees can use AI with this stuff, without compromising the sensitivity of the data?

Typically this pain is something that is felt most acutely by Enterprise IT, but also of course by the developers and business people who get told not to build the great stuff they can envision. We think it’s critical to solve these issues since the more visibility and control we can give Enterprise IT about how data is used, the more we can actually build on top of these APIs and start applying some of the awesome capabilities of the foundation models across every business problem.

You can easily grab data from sources like Google Docs via their APIs, but for production use cases, you have to respect the permissions on each Google Doc, Confluence Page, Slack channel etc. This gets tricky when these systems combine some permissions defined totally inside their product, with permissions that are inherited from the company’s SSO provider (often Okta or Azure AD). Respecting all these permissions becomes both hard and vital as the number of employees and tools accessing the data grows.

The current state of the art is to use a vector database like Pinecone, Milvus, or Chroma, integrate your internal data with those systems, and then when a user asks a question, dynamically figure out which bits are relevant to the user’s question and send those to the AI as part of the prompt. We handle all this automatically for you (using Milvus for now, which we host ourselves), including the point and click connectors for your data (Google Docs/Sheets, Slack, Confluence with many more coming soon). You can use that data through our UI already and we’re in the process of adding this search functionality to the API as well.

There’s other schlep work that devs would rather not worry about: building out request level audit logs, staying on top of the rapidly changing API formats from these providers, implementing failover for when these heavily overburdened APIs go down etc, We think individual devs should not have to do these themselves, but the foundation model providers are unlikely to provide consistent, customer centric approaches for them. The PII detection piece in some ways is the easiest - there are a lot of good open source models for doing this, and companies using Azure OpenAI and AWS Bedrock seem less concerned with it anyway. We expect that the emphasis companies place on the redactions we provide may actually go down over time, while the emphasis on unified, consistent audit logging and data access controls will increase.

Right now we have three plans: a free tier (which is admittedly very limited but intended to give you a feel for the product), the business plan which starts at $500pm which gets you access to the data integration as well as the most powerful models like GPT 4 32k, Anthropic 100k etc, and an enterprise plan which starts at $5000pm, which is a scaled up version of the business tier and lets you go on-prem (more details on each plan are on the website). You can try the free tier self-serve, but we haven’t yet built out fully self service onboarding for the paid plans so for now it is a “book a meeting” button, apologies! (But it only takes 5 minutes and if you want it, we can fully onboard you in the meeting itself).

When Jack and I started Credal, we actually set out to solve a different problem: an ‘AI Chief of Staff’ that could read your documents and task trackers, and guide your strategic decision making. We knew that data security was going to be a critical problem for enterprises. Jack and I were both deep in the Enterprise Data Security + AI space before Credal, so we naturally took a security first approach to building out our AI Chief of Staff. But in reality, when we started showing the product to customers, we learned pretty fast that the ‘Chief of Staff’ features were at best nice to have, and the security features were what they were actually excited by. So we stripped the product back to basics, and built out the thing our customers actually needed. Since then we’ve signed a bunch of customers and thousands of users, which has been really exciting.

Now that our product is concretely helping a bunch of people at work, is SOC 2 T1 Compliant, and is ready for anyone to just walk up and use, we’re super excited to share it with the Hacker News community, which Jack and I have been avid readers of for a decade now. It’s still a very early product (the private beta opened in March), but we can’t wait to get your feedback and see how we can make it even better!




Congratulations on the launch! (I'm a beta user.) Audit trails, access controls and security certifications are big headaches when developing in regulated industries. Having these already set up has made it easier for us to experiment with and build on LLM APIs.


Many thanks! There is a lot of opportunity for LLMs lurking in regulated industries right now - glad to have given you a boost!


Go Ravin and Jack! We're not at sufficient scale to really get use of this product, but would love to try it down the road. Are you using Foundry for data integration and ACLs?


Thank you! We haven't gone down the Foundry route yet. We do have some smaller scale apps and companies using Credal either as their AI API or chat platform respectively - would be interested to hear a bit about your use case and see if it's a match?


Word- we're in the thick of it but I'll reach out once we're ready to start thinking through bringing in chat.


Stay tuned!! Right now no, but Foundry is definitely under consideration


This seems like a very promising product, a ChatGPT interface, enterprise version, is a large gap in the market.

However, this part in your advertising, sounds very dubious:"Credal can be deployed fully on-premise for Large Enterprises, including the large language models themselves"

What do you mean the LLMs themselves? Open source I can understand, how are you going to move GPT-4 to on prem? OpenAI is not giving you the weights.


Thanks for your encouragement and that is a totally fair criticism - when we say that, we mean two things:

1. We support using Credal with your own, open source LLM, which can of course be fully on prem in every sense

2. We also support using Credal with your own Azure OpenAI instance. As you say, OpenAI aren't giving us the weights, but many of our customers have procured Azure OpenAI from Microsoft and then we point GPT 4 usage at their Azure instance, meaning that the data never goes to Open AI at all.

One of the things that's going to be really interesting to see moving forward is whether the open source models are going to be able to compete with the blistering pace and funding that the closed source ones - Bard, Claude, and GPT-X are going to be able to attract (and maybe Mistral?). For the sake of the industry, I really hope that the OS models catch up but given the amount of funding (and now in OpenAI's case, revenue) the closed source models are generating, its hard to see how that happens


Congrats, I think this will be really successfull and you got a very early food in the door.

Do you consider self hosted LLMs a competitor of sorts? I suppose your premise is if a company uses Google Docs they will also likely never host internal LLMs, right?


Thanks so much! So about half of our enterprise customers use Credal in conjunction with either a self hosted LLM or Azure OpenAI (which you can debate, but most companies we've spoken to seem to treat their Azure OpenAI instance as equivalent to self hosted). In practice, you still need to:

1. Manage the permissions of making sure the self hosted LLM is only reading from the documents, slack channels, e.t.c. that the end user should actually have access to

2. Generate an audit log of exactly who did what when

So we actually see self hosted LLMs being a big part of how Credal is used! In the long term, we think Credal will actually become a tool for AI app developers to safely request access to data & embeddings from the enterprise on the fly, and make sure the data they get is appropriately controlled and the audit logs exist in a single place for the enterprise to see what data went to whom/when/why etc


Going thru the SOC2 process myself[0].

As I expected, we're hearing from customers they won't use a product that passes the contents of their database tables into an AI model (although some AI products are doing this). So the problem Credal is solving makes sense. Have you considered building an open source Python package for solving just this bit of the problem?

Any tips on the SOC2? Did you use something like Drata / Vanta?

0 - https://www.definite.app/


Thanks!! There are some fairly good OS models for the core stuff (PII, SSNs etc) out there already (Presidio, Spacey), so folks that need an OS option have one to start with. Detecting the more complex stuff can sometimes need a little iteration, but I could definitely imagine a world where we publish that in the future

On SOC 2, we used Drata, and spoke to Vanta, Laika and a few others. The price Vanta initially quoted us was waaaay higher than the other two, and between Laika and Drata we went with Drata mostly because there seemed to be more automation in Drata. In the end, the Drata live support was incredible and hard to imagine how we would have gotten the certification so fast without. We started our infra on DO, and so the most painful part of SOC 2 for us was the migration we did to AWS to take advantage of AWS' many security features. My main advice would be take full use of the Drata live support (I'd guess Vanta have something similar), but maybe on a deeper level - when you're doing SOC 2, don't focus on the certification: focus on the policies and technology that actually makes your company secure. In the end, that's what enterprises really care about, especially for the ones that have given us 300 question long questionnaires!


Nice! How long did it take end-to-end to get the SOC2 Type 1?


Our AWS migration wound up taking about 4 weeks, getting all the policies in place took about 8 weeks (which overlapped with about 2 weeks of the migration), and then the audit itself was a couple weeks as well


Apologies for the shameless plugin. I generally don't do this, but I just thought our product might be relevant for the use case you mentioned. We do not compete with Credal, but at Adaptive [1], we have been building a platform that helps with infrastructure access management and allows users to automatically generate and collect evidence, especially for CC5 and CC6 (logical access). Vendor security questionnaires become easy to answer when we, as an organisation, use our product.

We have seen reproducibility and access auditability in organisations that adopt products that access schema and metadata from databases, compute infrastructures, etc. comforts customers. Your customers care about security incidents like unauthorised access, privilege abuse, accidental operations, insider threats, etc. on the vendor's side which in my opinion are real threats.

[1] http://adaptive.live


Looks awesome and will make many enterprises feel more comfortable using AI.

I suspect your intuition about moving emphasis from redaction to unified access control and audit logging over time is right.

The "AI Chief of Staff" sounds interesting though -- can you share a bit more about what you showed to companies and received lukewarm response to?


The AI Chief of Staff had a few layers. The first was data integration of both productivity data (slack, notion etc) and "big data" lakes/warehouses. The former tells you what is getting done at a human level and the latter has the potential to tell you whether and how it is working. The second layer was modeling of your business strategy and including dependencies between concepts like projects and teams, which allows us to back out things like stakeholders and early warning recipients for any given progress or problems. The third was a presentation layer allowing humans to get a birds' eye view of what's happening including generating artifacts like meetings decks.

Ultimately this 1) wasn't successfully solving an urgent enough problem for most businesses and 2) was too difficult to adopt.

LLMs do break open opportunities in this space so I expect to see some more versions of this, perhaps on top of the Credal API!


This is so sorely needed. I used the app after the PH launch and loved how easy the self-serve was!

Do you have plans to let users define "types" of data that can be redacted (like monetary terms in a contract, code embedded in documents etc)? Also, any plans on making this an API that other developers could build on top of?


Great questions and thanks for trying the product!!

Yup so a few thoughts here - we're exploring using embeddings to allow a description of what you want to hide, that will actually immediately show you what of the data you have synced already would be caught by that (or which previous requests).

On the API side: yes ABSOLUTELY! the API is already live and used intensely by some of our startup customers like Sourceful. The API docs for using the OpenAI models here: https://credalai.notion.site/OpenAI-Drop-in-API-0ef7cfd18a7c...

and the Anthropic models here: https://credalai.notion.site/Anthropic-Drop-In-API-ad298f6f7...


That look what i need for data privacy for my chat with pdf tool Documind. https://documind.chat


Nice! Which AI model are you using for it? If you're using ChatGPT, you can actually use our ChatGPT API and get the PII redaction for free, with hopefully hardly any code changes


Very cool. The demo was really impressive, this feels like it could be a very successful product. All the best!


This looks awesome. Congrats on the launch!


Thanks! :) It feels so surreal to be launching on Hacker News! When I was first discovering tech the people launching YC funded startups on HN seemed like wisened old Gods to me. Now I laugh about it because obviously I'm still learning so much, even the basics, every day. I hope we get to inspire someone else the way the early YC cos inspired me




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: