Since there's already an active thread with the correct link at https://news.ycombinator.com/item?id=36881948, I've merged the relevant comments thither. That way we can leave the complaints about the link here and just kill this thread.
No complaint. It's more of a warning about how the main players (OpenAI, LangChain) share notebooks and cookbooks that illustrate how to make the LLMs "query" the databases. At the very least one would expect some language telling people to not do that in production. And it's not unique to SQL, this is just an extreme example.
> At the very least one would expect some language telling people to not do that in production. And it's not unique to SQL, this is just an extreme example.
In professional communication, is it necessary to repeat the obvious all the time? Does an article in a medical journal or a law journal need to explicitly remind its readers of 101 level stuff? If an unqualified person reads the article, misinterprets it because they don’t understand the basics of the discipline, and causes some harm as a result-how is that the responsibility of the authors of the article? Why should software engineering be any different?
> In professional communication, is it necessary to repeat the obvious all the time?
Based on the “repeat” dev articles I’ve seen on HN over the many years and the “repeat” mistakes replicated in the actual workplace, I think it is necessary.
> Why should software engineering be any different?
I don’t think it is. But also see my point below.
I understand the example you were trying to use but it wasn’t very effective. Dev blogs are not equivalent to medical or law journals in many ways that I don’t need to list. Academic computer science white papers are a bit closer.
Thinking about this more, in my experience and across multiple fields, I always see a phenomenon where either colleagues/classmates/whoever reference a _popular_ but _problematic_ resource which leads to a shitshow.
> Dev blogs are not equivalent to medical or law journals in many ways that I don’t need to list. Academic computer science white papers are a bit closer.
Okay, there are law blogs and medicine blogs too, which are directly comparable to dev blogs. And by that I mean blogs targeted at legal and medical professionals, not blogs on those topics targeted at consumers. For example, BMJ's Frontline Gastroenterology blog [0], whose target audience is practicing and trainee gastroenterologists, and its authors write for their target audience – it is public and anyone can read it, but I don't think the authors spend too much time worrying "what if an unqualified person reads this and misinterprets it due to a lack of basic medical knowledge?"
Or similarly, consider Opinio Juris, the most popular international law blog on the Internet. When a blog post contains the sentence "As most readers will know, lex specialis was created by the International Court of Justice in the Nuclear Weapons Case, to try to explain the relationship between international humanitarian law (IHL) and international human rights law (IHRL)", [1] you know you are not reading something aimed at a general audience.
> but I don't think the authors spend too much time worrying "what if an unqualified person reads this and misinterprets it due to a lack of basic medical knowledge?"
1) You don’t sound too sure about this. Your previous comment sounded like speculation also. Do you actually read these blogs and/or journals?
2) Again, you’re making comparisons that aren’t equivalent. Your argument fails when you replace “unqualified person” with “unqualified target person”. My pizza delivery driver is not reading dev blogs. The junior and senior engineers on my team over the years who passed 5 rounds of interviews yet still make simple but devastating mistakes are reading these blogs.
> lex specialis
1) In your previous comment, you said that medical and law journals _don’t_ explain every basic little thing. And now you provided a quote where the law blog is explicitly explaining a very basic thing even to their _qualified target audience_. If “most readers” already know something, then what’s the point of re-explaining it? You’re proving my point instead.
2) Another comparison that isn’t equivalent. Even if an “unqualified” person were to read a _professional_ law or medical blog/journal, what’s the worst that could happen? Nothing.
The answer to that question above will definitely change if we’re talking about _nonprofessional_ content (e.g. TikTok law and medical advice). Frankly, more dev blogs veer towards the “unprofessional” side than “professional”.
I also am very interested in law. I actually applied to law school once, but didn't get in, and gave up on the idea after that. If they'd accepted me, I might have been a lawyer right now rather than a software engineer. Public international law was always an area of particular fascination for me. I remember being at university, and I was supposed to be at my CS lecture, but instead I was in the library reading books like Restatement (Third) of the Foreign Relations Law of the United States and some textbook (I forget its name now) I found on the EU treaties and ECJ case law. So yes, I do read law blogs sometimes. I went through a period when I was reading SCOTUSblog a lot. Not a blog, but I actually enjoy reading stuff like this: https://legal.un.org/ilc/texts/instruments/english/draft_art...
> And now you provided a quote where the law blog is explicitly explaining a very basic thing even to their _qualified target audience_. If “most readers” already know something, then what’s the point of re-explaining it? You’re proving my point instead.
Even that quoted sentence is assuming the reader already knows what "international humanitarian law" and "international human rights law" are, and what is the difference between them. There are also many cases in that post in which (unlike lex specialis) the author uses technical legal terminology without ever explaining it: for example, his repeated invocation of jus ad bellum, or his mention of the "Inter-American System". Another example is where he cites the Vienna Convention on the Law of Treaties, which assumes the reader understands its significance.
> Even if an “unqualified” person were to read a _professional_ law or medical blog/journal, what’s the worst that could happen? Nothing.
For a medical journal – a person reads an article about using drug X to treat condition Y. They then proceed to misdiagnose themselves with condition Y, and then somehow acquire drug X without having been prescribed it, and start taking it. A person could cause quite serious medical harm to themselves in this way. Reading medical journals can also contribute to the development of illness anxiety disorder – you don't need to be a medical student to develop medical student's disease.
For a law journal - a criminal defendant reads something in a law journal and thinks it helps their case. Their lawyer tries to explain to them that they are misunderstanding it and it isn't actually relevant to their situation, but they refuse to listen. They fire their lawyer and then try to argue in court based on that misunderstanding. It is easy to see how they could end up with a significantly worse outcome as a result, maybe even many extra years in prison.
Conversely, our 10 year old sometimes write Python programs. They aren't anything special, but better than I could do at his age. I bet you his Python programs are full of security holes and nasty bugs and bad practices. Who cares, what possible harm could result? And he isn't at the stage yet of reading development blogs, but I've seen him before copying code off random websites he found, so maybe he has stumbled on to one of them. My brother is a (trainee) oncologist, but he did an introductory programming course as an undergrad, and he wrote some Python programs in that too, although he hasn't done any programming in years–what harm could have his programs done? If he started trying to modify the software in one of the radiation therapy machines, I'd be worried (but he's too responsible for that); if he decided to try writing a game in Python for fun, why should anyone worry, no matter what the quality of his code is?
Maybe "complaint" was the wrong word but I disagree with the conclusion that LLMs are "not for trustworthy production systems" for the reasons I stated.
Full disclosure, I wrote a blog post called "Text to SQL in Production." Maybe I should add a follow-up covering our guardrails. I agree that they are necessary.
tl;dr: nothing we didn't know. Since the beginning of times, startups with lots of funding have failed for a number of reasons. AI is no different in that regard.
If you try Bard or Claude or character.ai they are not far behind GPT4. They might even be on par in terms of raw LLM capabilities. ChatGPT has better marketing and in some cases better UX. A lot of this is self-fulfilling. We think it's far ahead, so it appears to be far ahead.
> If you try Bard or Claude or character.ai they are not far behind GPT4
Bard is way behind ChatGPT with GPT-3.5, much less GPT-4. Haven’t tried the others, though.
OTOH, that’s way behind qualitatively, not in terms of time-of-progress. So I don’t think it is at all an insurmountable lead, as much as it is a big utility gap.
Claude and Character AI are great at holding a conversation but they lack the ability to do anything specialized that really makes these LLM’s useful in my day to day life. I ask GPT-4 and ChatGPT questions I would ask in stackoverflow, I can’t do that with Claude or Character AI. Bard actually seems behind even conversationally to the rest
Thank you! Comparing this and the link the other commenter posted, what handles the actual search querying? Does instructor-xl include an LLM in addition to the embeddings? The other commenter's repo uses Pinecone for the embeddings and OpenAI for the LLM.
My apologies if I am completely mangling the vocabulary here - I have an, at best, rudimentary understanding of this stuff that I am trying to hack my education on.
Edit: If you're at the SF meetup tomorrow, I'd happily buy you a beverage in return for this explanation :)
You first create embeddings. What is this? It's an n-dimensional vector space with your tweets 'embedded' in that space. Each word is an n-dimensional vector in this space. The vectorization is supposed to maintain 'semantic distance'. Basically, if two words are very close in meaning or related (by say frequently appearing next to each other in corpus) they should be 'close' in some of those n-dimensions as well. The result at the end is the '.bin' file, the 'semantic model' of your corpus.
For semantic search, you run the same embedding algorithm against the query and take the resultant vectors and do similarity search via matrix ops, resulting in a set of results, with probabilities. These point back to the original source, here the tweets, and you just print the tweet(s) that you select from that result set (here the top 10).
Experts can chime in here but there are knobs such as 'batch size' and the functions you use to index. (cosine was used here.)
So the various performance dimensions of the process should also be clear. There is a fixed cost of making the embeddings of your data. There is a per-op embedding of your query, and then running the similarity algorithm to find the result set.