Hacker News new | past | comments | ask | show | jobs | submit login

Not sure why your getting down voted. Anything sent to an cloud hosted LLM is subject to be publicly released or used in training.

Setting up a local LLM isn't that hard, although I'd probably air gap anything truly sensitive. I like ollama, but it wouldn't surprise me if it's phoning home.




This is just incorrect. The OpenAI models hosted though Azure are HIPAA-compliant, and Antropic will also sign a BAA.


I'm open to being wrong. However for many industries your still running the risk of leaking data via a 3rd party service.

You can run Llama3 on prem, which eliminates that risk. I try to reduce reliance on 3rd party services when possible. I still have PTSD from Saucelabs constantly going down and my manager berating me over it.


You are not technically wrong because a statement "there is a risk of leaking data" is not falsifiable. But your comment is performative cynicism to display your own high standards. For the very vast majority of people and companies, privacy standards-compliant services (like HIPAA-compliant) are private enough.


I know my company outright warns us to not share any sensitive information with LLMs, including ones that claim to not use customer data for training.

I can flip your statement around. For the vast majority of use cases, LLAMA 3 can be hosted on prem and will have similar performance.


This is not true. Both OpenAI and Google's LLM APIs have a policy of not using the data sent over them. Its no different than trusting Microsoft's or Google's cloud to store private data.


Can you link to documentation for Google's LLMs? I searched long and hard when Gemma 2 came out, and all of the LLM offerings seemed specifically exempted. I'd love to know if that has changed.



Thanks very much! I think before I looked at docs for Google AI Studio, but also for Google Workspace, and both made no guarantees.

From the linked document, so save someone else a click:

     > The terms in this "Paid Services" section apply solely to your use of paid Services ("Paid Services"), as opposed to any Services that are offered free of charge like direct interactions with Google AI Studio or unpaid quota in Gemini API ("Unpaid Services").


There's some possible confusion because of the Copilot problem where everything in the product stack is called Gemini.

The Gemini API (or Generative Language API) as documented on https://ai.google.dev uses https://ai.google.dev/gemini-api/terms for its terms. Paid usage, or usage from a UK/CH/EEA geolocated IP address will not be used for training.

Then there's Google Cloud's Vertex AI Generative AI offering, which has https://cloud.google.com/vertex-ai/generative-ai/docs/data-g.... Data is not used for training, and you can opt out of the 24 hour prompt cache to effectively be zero retention.

And then there's all the different consumer facing Gemini things. The chatbot at https://gemini.google.com/ (and the Gemini app) uses data for training by default: https://support.google.com/gemini/answer/13594961l, unless you pay for Gemini Enterprise as part of Gemini for Workspace.

Gemini in Chrome DevTools uses data for training (https://developer.chrome.com/docs/devtools/console/understan...).

Enterprise features like Gemini for Workspace (generative AI features in the office suite), Gemini for Google Cloud (generative AI features in GCP), Gemini Code Assist, Gemini in BigQuery/SecOps/etc do not use data for training.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: