I disagree with their risk ranking matrix. The controversial cell is "Prompt Injections" / "3rd Party LLMs". It says: "Medium risk. While the risk exists, the responsibility of fixing this is on the LLM provider."
No. The responsibility of using a vulnerable 3rd party component is always on you, unless there is a clause in the contract that says otherwise (and even then it might not apply or can be found illegal and void). Case in point: the payment info leak from ChatGPT in Italy was entirely due to a bug in a third-party component, redis-py, used by them.
Also, the concept of owning the LLM is used a lot, but not explained in sufficient detail. I don't see a sufficient level of distinction between LLMs both trained and used in-house and LLMs trained by 3rd parties but with the inference going on in house.
I don't follow. If using a third party LLM, there is a risk of prompt injection and unless there are advances I haven't heard of, it's not something they can fix?
1. I agree with your point that Prompt Injection can still affect the consumer of a third party LLM
2. I prefer to categorize it as a supply chain security issue, since the vulnerability is with a software provider that you are consuming.
(Author of the newsletter here)
It's early days, but the simplest use case has been to improve employee productivity (Github Copilot, ChatGPT etc.). The Stripe CEO just tweeted that over half of their employees are using an internal LLM tool they built (folks who build internal tooling know how hard it is to drive adoption to a non-mandatory tool): https://twitter.com/patrickc/status/1681699442817368064?s=20
There are other companies which are doing some crazy experimental things which may have a large impact. For instance, Truveta is cleaning up on millions of medical records, training a model on that data and using that to drive research about patient care. Too early to tell if LLMs will actually transform companies beyond slight bumps in productivity, but to me, it feels like the cloud computing moment from 12-15yrs ago.
> It's early days, but the simplest use case has been to improve employee productivity
Does anyone know if the impact has been properly measured? It’s one thing to say that “developers are more productive” and another to really have faster feature delivery (or any other metric).
How does this address the privacy concerns? OpenAI could still accidentally provide your data to other users, they could retain, sell or misuse it despite agreeing not to, or a rogue employee or an intruder could do so, or they could be forced to do so by a court order, etc.
There are privacy concerns with using ChatGPT, as data is collected by openai - opt out. Using the API has less privacy concerns as it is not used for training by openai.
agree, imo this is a big somewhat underrated advantage - gpts summarising capabilities seem quite a bit more impressive to me than its generative ability
Can you elaborate on the problem? I've successfully used LLMs to format unstructured data to json based on a predefined schema in a number of different scenarios yet somehow missed whatever issue you're describing in your comment.
Can you share your techniques or resources you used to get this working? We've got something that works maybe 90% of the time and occasionally get malformed json back from OpenAI.
Just last week I went over 8k lines of data, doing a forst applicability analysis, meaning which lines to be considered for further analysis. The information I needed to do so was hidden in manually created comments, because of course it is, I have never ever seen pre defined classifications used consistently by people. And those pre defined classes never cover whatever need one has years later anyway.
Thing is, when I started I didn't even know what to look for. I knew once I was done, so almost impossible to explain that to LLM before. Added benefit, I found a lot of other stuff in tue dataset that will be very useful in future. Had I used a LLM for that, I wouldn't know hald of what I know about that data I do now.
That's the risk I see with LLMs, already now my pet peeve are data scientist with no domain knowledge or understanding of the data they analyze, but at least they now the maths. If part ofbthat is outsourced to a blackbox AI that halluzonates half the time, I am afraid most of those analysises will be utterly useless, or worse, misleading in a very confident way...
TLDR: In my opinion LLMs take away the curious discovery when go over data or text or whatever. Which is lazy and prevents us from casually learning new things. And we cannot even be sure we can trust results. Oh, and we are moving to think more about the tool, LLMs and prompts, than we of doing the job. Again, lazy and superficial, and a dead sure way to get mediocre, at best, results.
We're already saving thousands of human hours with a dozen people from playing with ideas in the last three weeks.
Thanks to data processing, humans going about it manually, we have saved $40MM a month. I am quite certain we can save a few hundred million by the end of the year.
We have not yet even started ingesting our own data yet.
At the company I'm working at, we're looking at LLM's to introduce an guided, interrogatory interface for our university students to manage tricky enrolment scenarios. Basically to translate requests from English into formal course combinations, and to translate back to the student issues with clashes and dependencies. An over-simplification but I hope you get the idea.
At a colleague's company it receives better customer support user feedback than 80% of their human live chat staff, and 30% less churn. It also costs 98% less.
No. The responsibility of using a vulnerable 3rd party component is always on you, unless there is a clause in the contract that says otherwise (and even then it might not apply or can be found illegal and void). Case in point: the payment info leak from ChatGPT in Italy was entirely due to a bug in a third-party component, redis-py, used by them.
Also, the concept of owning the LLM is used a lot, but not explained in sufficient detail. I don't see a sufficient level of distinction between LLMs both trained and used in-house and LLMs trained by 3rd parties but with the inference going on in house.