Hacker News new | past | comments | ask | show | jobs | submit login
AI companies are pivoting from creating gods to building products (aisnakeoil.com)
133 points by randomwalker 4 months ago | hide | past | favorite | 195 comments



Companies keep going at it the wrong way. Instead of saying "We have AI, let's find products we can make out of AI!" they should be saying, "What products do people want, let's use whatever tools we have (including maybe AI) to make them."

The idea that a company is an AI company should be as ridiculous as a company being a Python company. "We are Python-first, have Python experts, and all of our products are made with Python. Our customers want their apps to have Python in them. We just have to 'productize Python' and find the right killer app for Python and we'll be successful!" Going at it from the wrong direction. Replace Python in that quote with AI, and you probably have something a real company has said in 2024.


Its the same as all the "we are a blockchain company" startups that popped up looking for a problem to solve with their tech rather than the right way round.

However, a lot of those got a bunch of investment or made some decent money in the short term. Very few are still around. We will see the same pattern here.


I had already forgotten about the blockchain


git still works pretty well, I just wouldn't try to use it as a bank account.


Well, Git's also not a blockchain at least in the way commonly meant by the term. But yeah, it'd be a pretty bad bank account (and an even worse way of doing money transfers.)


Git is by the literal definition a blockchain.


Seems a bit of a stretch. Git is not linear, and there’s no consensus mechanisms


git is linear, if multiple users have a different main branch history you have a problem.

Pull requests in github is actually very similar conceptually to a consensus mechanism used in crypto currencies. Everyone has an identical copy of the main branch with an identical history of every commit in order, a PR is saying "I think this commit goes next" and, if you use code reviews, the PR approval is consensus.


This is the most unhinged thing I have ever read about git ever. Please share whatever you are hitting.

Have you ever seen a git graph? Does this look linear? https://tortoisegit.org/docs/tortoisegit/images/RevisionGrap...


Have you considered that the single source of truth, the chain that has consensus in the blockchain terminology, is the main/primary branch?

All secondary branches are works in progress that may be proposed as new commits to main.

Sticking with the blockchain comparison, every side branch in got is akin to potential blocks that miners are working on.


Well, that’s it. I am done with this site.


Hah, sounds good. I still don't get your argument here, I would be curious to hear more.

My point is that git is a data store involving a genesis block (initial commit), blocks of changes/diff's, tracked in sequential order, and with a form of consensus (code reviews and merges to primary).

What is missing that makes it not a blockchain?

And my caveat here, I can't stand arguments for cryptocurrencies and have never purchased any. Blockchain as a concept is fine, and git is a blockchain as best I can tell.


Let it go


No idea why this got down voted.

git is very much a blockchain

- sequential list of changes to a data source (commits)

- single, shared history of changes (main branch)

- users creating potential next change(s) to be added to the history (side branches and forks)

- consensus mechanism for new change blocks (merge requests and code reviews or approvals)

What's missing?


Receiving a coin / token everytime someone gets their branch merged? This is a joke btw, please don't make a gitcoin YC....


git blockchains are famous for being easy to fork (-;


You can fork a blockchain by building a ecosystem with a different consensus.


Any blockchain is easy to fork, just have a fully copy of the chain and make a new block.


It was likely the same back when the steam engine was invented. Everyone who could start a steam engine company, started a steam engine company. Because learning how to be a steam engine company was difficult, new, and unique. It would be a while before finding all the products that could be sold to people incorporating that new tech.


Tech was very different then. The first commercial steam Engine appeared in 1712 - big static thing thing that could pump water out of mines. It took about about 100 years until 1804 to get to a steam engine small but powerful enough to pull a train. Mines and factories were pretty much the only users for decades and there were very few people who had a steam engine for the sake of it


How was tech different then? We've been working on AI for 70 years now, seems comparable.


We’ve been working on digital computers for 70 years and AI is just one part of that. Computers are everywhere and suddenly AI is being crammed into every product, it seems. In the first 90 years of steam, steam engines did not suddenly appear everywhere. Outside of factories and mines, they essentially did not exist. No one had a steam engine in their home. Innovation happened very slowly, no equivalent of Moore’s law, no continuous year on year improvements to the basic tech


The products are different, sure. And there are more people participating now than then. But I'm not convinced that the two are that fundamentally different. Some technologies just take the better part of a century to mature.


I don't entirely disagree with you, but "what products do people want" is overly conservative. Pre-ChatGPT, very few people wanted a (more or less) general purpose chatbot.


> Pre-ChatGPT, very few people wanted a (more or less) general purpose chatbot.

And post-ChatGPT, very few people want to have to deal with "a (more or less) general purpose chatbot."


One of my local car dealerships is using an chat system of some kind (probably an LLM?).

It's awful and a complete waste of time. I'm not sure if LLMs are getting good use yet / general chatbots are good or ready for business use.


But this is like companies forcing self-checkout at retail. Companies get to cheap out on their customer support by paying a monthly fee instead of paying a person to do it. It doesn't matter if it's worse for you, as every business is getting sold on using this technology.

You misery and wasted time is their improved stock price and bonus package.


This works if you have a dominant market position, like Meta or Amazon, but a business like a local car dealership might be very ill served with this model since their customers can just go elsewhere.


Some of these are just text matchers with hardcoded options, no ML involved. Essentially phone trees, except worse because they don't tell you your options.

There are some more advanced ones using ChatGPT now. I'm guessing they simply pre-prompt it. Can lead to funny results like a customer making the Chevy bot implement an algo in Python.


Who cares. I literally use ChatGPT 30 times a day. It answers incredibly complex queries along with citations I can verify. Isn’t “this not good enough yet” line getting old? There nothing else that can estimate the number of cinder blocks I need to use for a project and account for the volume of concrete required for it (while taking into consideration the actual available volume available in a cinder block and settling) with a few quick sentences I speak to it. I can think of literally thousands of things I have asked that would have taken hours of googling that I can get an answer for in minutes.

I think the problem is you haven’t shifted your mindset to using AI correctly yet.

Edit: More everyday examples from just the last 3 days

- Use carbide bits to drill into rocks. Googling “best bits for drilling rocks” doesn’t bring up anything obvious about carbide but it was the main thing chatGPT suggested.

- gave it dimensions for a barn I’m building and asked it how many gallons of paint I would need of a particular type. I could probably work that out myself but it’s a bunch of lookups (what’s the total sq footage, how many sq ft per gallon, what type of paint stands up to a lot of scuffing etc.)

- coarse threaded inserts for softwood when I asked it for threaded insert recommendations. I would have probably ended up not caring and fine threaded slips right out of pine.

- lookup ingredients in a face cream and list out any harms (with citations) for any of them.

- speeds and feeds for acrylic cutting for my particular CNC. Don’t use a downcut bit because it might cause a fire, something I didn’t consider.

- an explanation of relevant NEMA outlets. Something that’s very hard to figure out if you’re just dropped into it via googling.


>Who cares.

clearly anyone trying to buy a car, which is already an ordeal with a human as is.

>I literally use ChatGPT 30 times a day

good for you? I use Google. mos of my queries aren't complex.

>Isn’t “this not good enough yet” line getting old?

as long as companies pretend 2024 AI can replace skilled labor, no. It's getting old how many more snake oil salesmen keep pretending that I can just use ChatGPT to refactor this very hot loop of performance sensitive code. And no ChatGPT, I do not have the time budget (real time) to hook into some distributed load for that function.

I'm sure in a decade it will wow me. But I prefer to for it to stay in its lane and I stay in mine for that decade.

>There nothing else that can estimate the number of cinder blocks I need to use for a project

is Calculus really this insurmontable feat to be defending big tech over? I'm not a great mathmatican, but give them excel/sheets and they can do the same in minutes.

>I can think of literally thousands of things I have asked that would have taken hours of googling that I can get an answer for in minutes.

I'm glad it works out for you. I'm more scrutinous in my searches and I see that about half the time its sources are a bit off at best, and dangerously wrong at worst. 50/50 isn't worth any potential time saved for what I research.

>I think the problem is you haven’t shifted your mindset to using AI correctly yet.

perhaps. But for my line of work that's probably for the best.


"I think the problem is you haven’t shifted your mindset to using AI correctly yet"

There is an indictment of AI "products" if I ever heard one


That’s a glib, low effort dismissal but it makes sense if you consider it.

It’s like people that kept going to the library even with Google around. You’re not playing to the strengths of AI and relying on whatever suboptimal previous method you used to find the answers. It does really, really well with very specific queries with a lot of looks ups and dependencies that nothing else can really answer without a lot of work on your end.


How come this spoon won't hold my soup? Don't tell me I'm holding it wrong!


I mean if my dentist adds a Helpful Super GenAI Infused Chatbot that can't book appointments or answer any questions about their services no amount of "you're holding it wrong" insistence about LLMs in general will actually make it useful to me.

The point is ChatGPT's wild success doesn't automatically mean consumers want and possibly will never want a chatbot as their primary interface for your specific app or service.


I feel like I'm being too obvious but maybe try using it for something it's good at.


> with citations I can verify.

And do you? Every time someone tried to show me examples of “how amazing ChatGPT is at reasoning”, the answers had glaring mistakes. It would be funny if it weren’t so sad how it shows people turning off their critical thinking when using LLMs, to the point they won’t even verify answers when trying to make a point.

Here’s a small recent example of failure: I asked the “state of the art” ChatGPT model which Monty Python members have been knighted (it wasn’t a trick question, I really wanted to know). It answered Michael Palin and Terry Gilliam, and that they had been knighted for X, Y, and Z (I don’t recall the exact reasons). Then I verified the answer on the BBC, Wikipedia, and a few others, and determined only Michael Palin has been knighted, and those weren’t even the reasons.

Just for kicks, I then said I didn’t think Michael Palin had been knighted. It promptly apologised, told me I was right, and that only Terry Gilliam had been knighted. Worse than useless.


I do. It’s not complex to click on the citation, skim the abstract and results and check the reputation of the publication. It’s built into how I have always searched for information.

I also usually follow most prompts with “look it up I want accurate information”


> I also usually follow most prompts with “look it up I want accurate information”

That didn’t work so hot for two lawyers in the news a while back.


A while back when the lawyers used it, chatGPT didn’t do lookups and citation links.


Please don't take this as a glib, low effort answer, but... I am glad you're not an engineer. Taking advice from an LLM on things like outlets, ingredient safety, and construction measurements seems like a mistake.


Did you do the cinder block project? Was its estimate close? From everything I’ve seen LLMs are not that great at math.


Yes, I finished the footings for the barn. I had to get two extra bags on an estimate of 68 bags. Not bad in my opinion considering the wastage from mixing, spilling etc. Also I would have had to do a bunch of tedious math that I didn’t have to.

I had about 5-10 cinder blocks left over, not bad for an order of ~150


It works for me, therefore it should work for everyone?


It’s pretty much the vein of the comment of the GP I’m responding to.


>I'm not sure if LLMs are getting good use yet / general chatbots are good or ready for business use.

They left room for the idea that the technology could evolve to be useful. You're simply dismissing anyone who cannot use he technology as is as "using it wrong".

As someone who did a tad of UX, that's pretty much the worst thing you can say to a tester. it doesn't help them understand your POV, it builds animosity towards you and the tester, and you're ruining the idea of the test because you are not going to be there to say "you're doing it wrong" when the UX releases. There's 0 upsides to making such a response.


This isn't what the other comment you're replying to was talking about.


Not sure if this is a sarcasm or "you are holding it wrong" moment.


There’s skill involved in using these. Of course there is.

That doesn’t make them useless.


depends on the tool and purpose. There was skill in navigating a file system, and now the next generation (obbscured from folder systems by mobile) seem to be losing that ability.

You can look at it in two ways, neither are particularly wrong unless your job is in fact to navigate file systems.


Of course, LLMs would be more useful to many more people if they could be used without skill, and were as "useful" as a human mentor.

That's true, and they lack that capability. Many people seem to react as though this means they're missing all value, however. I find them incredibly useful; it just isn't possible to get much value out without investing effort myself.


>LLMs would be more useful to many more people if they could be used without skill, and were as "useful" as a human mentor.

That is partially marketing's fault, so I say that confusion is self inflicted. Because marketing isn't focusing on "make yourself more productive!"


If you’re unaware of how something can help you, learning how can only improve your outcome.


I don't think so, people have been wanting a general chatbot for a long time. It's useful for plenty of things, just not useful when embedded in random places.


I kind of remember the Turing Test was a big deal for some 70+ years.

We should have known that once we pass the Turing Test it would almost instantly become as passe as Deep Blue beating Kasparov on the road to general intelligence.

I am taking a break from my LLM subscriptions right now for the first time to gain some perspective and all I miss it for is as a code assistant. I would also miss it for learning another human language. It seems unsurprising that large language models use cases are with automated language. What is really surprising is how very limited the use cases for automated language seems to be.


Stanford researchers said that ChatGPT passes the Turing Test. Honestly I don't understand how, since it's pretty easy to tell that you're talking to it, but yeah I don't think it really matters.

Far more useful than simulating a person, OpenAI managed to index so much information and train their models to present it in a compact way, making ChatGPT better than Google search for some purposes. Also, code generation.


To be fair, the overwhelming feedback appears to be that people dont want a general purpose chatbot in every product and website, especially when it's labelled 'AI'.

So... certainly there's a space for new products.

...but perhaps for existing products, it's not as simple as 'slap some random AI on it and hope you ride the wave of AI'.


Chatbots were HUGE in the late 2010s!


And they hugely sucked, and basically were a sign you were dealing with either a fly by night company or a corporation so faceless and shitty you'd never willingly do business with them.

It was literally replacing a hierarchical link tree and that almost always was easier to use.


Or you were told this one chatbot was “haunted” by a creepypasta character and was amazed that it would actually reference that story (not knowing it was just a silly program trained on its interactions, leading to a feedback loop).


>very few people wanted a (more or less) general purpose chatbot.

I mean, I still don't. But from a cynical business point of view, cutting customer servce costs (something virtually every company of scale has) of 99% of customer calls is a very obvious application of a genera purpose chatbot.

expand that to "better search engine" and "better autocomplete" and you already have very efficient, practical, and valuable tools to sell. but of course companies took the angle of "this can replace all labor" instead of offering these as assistive productivity tools.


A ton of the industrial revolution was actually motivated by that input-driven thinking. You don't decide you want an Eiffel Tower from first principles, you consider "what is the coolest thing I can make out of wrought iron".


But the Eiffel Tower is an art project, not something of actual utility...


It was actually scheduled to be demolished, but then they added an antenna: https://www.toureiffel.paris/en/news/130-years/why-was-eiffe...

Also: https://www.reuters.com/world/europe/eiffel-tower-grows-six-...

Utility may have been an afterthought, but it's still there.


I blogged about our glorious journey Becoming an AI Company (TM)

https://candid.dev/blog/becoming-an-ai-company/


I was unmoved until I hit "Natural Language™".


I 95% agree, but "what people want" is probably not a strong indicator on the thresholds of paradigm shifts, since people don't know what's possible.


I'd look at the collorary. "what people DON'T want" is a stronger (but still imperfect) indicator of how far you can push the overton curtain.

If you can't convince people that this is benefiting them, and instead focus on talking to investors about how much you can kill off the working class (aka, your "customers" and nowadays "product audience"), you will make it harder to properly sell your product nor audience. Companies have forgotten who the real customers are, no wonder their products aren't resonating.


I only partially agree with this. Having spent a lot of time in the “find a problem then the solution” way of working, I’ve found the solutions are often too tame and lack innovation.

When you’re truly bring novel new value to things, sometime you need to say “we can do this cool thing, but don’t know what that means”. Simply knowing that capability opens you up to better sets of solutions.


What's wrong with tame and lack of innovation? Sometimes people just need to get things done. There are lots of businesses with basic needs that aren't being met.


Especially amongst programmers the feeling of constantly pushing boundaries and grinding to a new unexplored territory is far more interesting than working on practical “tame” solutions. See for instance the amount of people who spend tons of time hacking at and opening up platforms for GNU/Linux that are poor fits for the OS. It would make far more practical sense for say Marcan to spend his time working on the Linux kernel more directly than playing around with Apple devices that are made to burn out in a year and become ewaste. However, he feels more satisfaction working on that “new frontier”, regardless of its practicality.


AI is trending right now. The most important thing for new companies is finding investors, and those people have been throwing cash at any company with AI.

Customers are also more interested in AI products. The tech industry has stagnated for years with incremental improvements on existing products. ChatGPT and generative AI are new capabilities that draw interest, and companies have been doing anything they can to stand out today.


The market is sorting itself out right now, and eventually the wheat will get separated from the chaff.

Every cycle, theres all types of people hop on board whatever the hype train is... it's the same mindset as pioneering for gold in the wild west.

I just hope we can move along more in the "wheat" direction with AI products. There's so much low-effort crap already out there.


There's still a lot of real work to be done knowing what can be built and operated profitably, because the underlying tech is so new.

So just zooming out, we need people trying to figure out what can be built with this Lego set. We also need people like you're saying to work the other side so everyone can meet in the middle.


This has been the case for decades. Look at the internet and .com’s. Mobile. Etc.


It really is one of the more effective ways to identify a bubble when companies shift to selling themselves on the tools and technology they use rather than the problem they are solving.


You are forgetting marketing is temporal. Fifteen years ago you could sell your software as the Cloud version of a legacy app. Right now, there's a window that being the AI version will get you a call.


That requires they you understand the capabilities and limitations of the tech way better than anyone does. So instead "let's see what we can do with this" is the underlying approach.


Might be a good philosophy for a hobbyist, not (usually so) for a business.


For VC funded startups, this is exactly the business model. Each company is a different lottery ticket for them.


A python company is too specialized but software companies are a thing, Maybe AI will be another tool for software companies.


To be fair, astral is the python company and thank god they are. I love ruff and uv


People want a faster horse


Also one that flies, fueled by atomic power.


if you're some Python contractor company, the angle makes sense. but of course, very few AI companies are out there trying to help others solve problems.


this is how things evolve everything was .com company when internet started going mainstream then real product and service providers were left standing


Ehhh it’s a spectrum. First you innovate, then you commercialise. Even Google took a few years to successfully monetise and they weren’t the first mover in web search. LLMs have been around for, what, coming up on three years? Probably two to four more years to see results.


I’m seeing a lot of meh products that take like 4 units of effort to integrate. I think multiple LLMs, deeply integrated into a cohesive product with 100+ effort units, that can be great. An AI that’s familiar with the use of every settings menu on windows would be awesome


I'm not so sure. When a technological wave is big enough, it seems reasonable to start by asking: "what business can be built on this exponential wave?" This is contrary to standard YC advice (make something people want right away, don't create a solution in search of a problem) but empirically a lot of big companies started this way:

- Bezos saw the growth rate of the internet, spent a few months mulling over the question: "what business would make sense to start in the context of massive internet adoption" and came up with an online bookstore.

- OpenAI's ChatGPT effort really began when they saw Google's paper on transformers and decided to see how far they could push this technology (it's hard to imagine they forecasted all the chatbot usecases; in reality I'm sure they were just stoked to push the technology forward).

- Intel was founded on the discovery of the integrated circuit, and again I think the dominant motivation was to see how far they could push transistor density with a very hazy vision at best of how the CPUs would eventually be used.

I think the reason this strategy works is that the newness of a truly important technology counteracts much of the adverse selection of starting a new business. If you make a new To-Do iPhone app, it's unlikely that people have overlooked a great idea in that space over the last 10 years. But if lithium ion batteries only just barely started becoming energy dense enough to make a car, there's a much more plausible argument why you could be successful now.

Said another way: "why hasn't this been done before?" (both by resource-rich incumbents as well as new entrants) is a good filter (and often a limiting one) for starting a business. New technological capabilities are one good answer to this question. Therefore if you're trying to come up with an idea for a business, it seems reasonable to look at new technologies that you think are actually important and then reason backward to what new businesses they enable.

Two additional positive factors I can think of:

1. A common dynamic is that a new technology is progressing rapidly but is of course far behind traditional solutions at the outset. Thus it is difficult to find immediate applications, even if large applications are almost guaranteed in 10-20 years. Getting in early - during the borderline phase where most applications are very contrived - is often a big advantage. See Tesla Roadster (who wants a $100k electric sports car with 200mi range and minimal charging network?), early computers (what is the advantage of a slow machine with no GUI over doing work by hand?), and perhaps current LLMs (how valuable is a chatbot that frequently hallucinates and has trouble thinking critically in original ways)? It's the classic Innovator's Dilemma - we overweight the initial warts and don't properly forecast how quickly things are improving.

2. There is probably a helpful motivational force for many people if they get to feel that they are on the cutting edge of technology that interests them and building products that simply weren't possible two years ago.


You're suggesting boring business way to do things. The tech ecosystem is full of startups doing that ridiculous thing you said chasing the hot new thing and raising huge amounts of money off the hype. This AI hype cycle is really bad and before that we had cryptocurrency.


> But when developers put AI in consumer products, people expect it to behave like software, which means that it needs to work deterministically. If your AI travel agent books vacations to the correct destination only 90% of the time, it won’t be successful.

This is the fundamental problem that prevents generative AI from becoming a "foundational building block" for most products. Even with rigorous safety measures in place, there are few guarantees about its output. AI is about as solid as sand when it comes to determinism, which is great if you're trying to sell sand, but not so great if you're trying to build a huge structure on top of it.


I've made this statement a bunch in other mediums: The reason AI software is always "AI software" and not just a useful product is because AI is fallible.

The reason we can build such deep and complex software system is because each layer can assume the one below it will "just work". If it only worked 99% of the time, we'd all still be interfacing with assembly, because we'd have to be aware of the mistakes that were made and deal with them, otherwise the errors would compound until software was useless.

Until AI achieves the level of determinism we have with other software, it'll have to stay at the surface.


Recent work from Meta uses AI to automatically increase test coverage with zero human checking of AI outputs. They do this with a strong oracle for AI outputs: whether the AI-generated test compiles, runs, and hits yet-unhit lines of code in the tested codebase.

We probably need a lot more work along this dimension of finding use cases where strong automatic verification of AI outputs is possible.


> with zero human checking of AI outputs

It can be hard enough for humans to just look at some (already consistently passing) tests and think, "is X actually the expected behavior or should it have been Y instead?"

I think you should have a look at the abstract, especially this quote:

> 75% of TestGen-LLM's test cases built correctly, 57% passed reliably, and 25% increased coverage. During Meta's Instagram and Facebook test-a-thons, it improved 11.5% of all classes to which it was applied, with 73% of its recommendations being accepted for production deployment by Meta software engineers

This tool sounds awesome in that it generated real tests that engineers liked! "zero human checking of AI outputs" is very different though, and "this test passes" is very different from "this is a good test"


Good points regarding test quality. One takeaway for me from this paper is that you can increase code coverage with LLMs without any human checking of LLM outputs, because it’s easy to make a fully automated checker. Pure coverage may not be super-interesting but it’s still fairly interesting and nontrivial. LLM-based applications that run fully autonomously without bubbling hallucinations up to users seem elusive but this is an example.


You hit the nail. It's been almost tragically funny how people frantically tried to juggle 5 bars of wet soap in recent 2 years solving problems that (from what I've seen so far) have been already solved in a (boring) deterministic way consuming much less resources.

Going further, our predecessors put so much work into getting non-deterministic electronics together providing us with a stable and _correct_ platform, it looks ridiculous how people were trying to squeeze another layer of non-determinism in between to solve the same classes of problems.


The irony here is that there are many domains using statistical methods, that bound the complexity and failure modes of statistical methods successfully. A lot of people struggle with statistics but in domains where the glove fits I think AI will slot in all across the stack really nicely.


But software works only 99% of the time. For some definition of work: 99% of days it's run, 99% of clicks, 99% of CPU time in given component, 99% of versions released and linked into some business' production binary, 99% of github tags, 99% of commits, 99% of software that that one guy says is battle-tested


If twenty components work 99% of the time, then they only have an 0.99^20 = 82% chance of working as a collective.

If your 5.1 GHz (billion instructions per second) CPU had a 0.00000001% chance of failing at a given instruction, you'd have a 40% chance of a crash every second.

If a flight had a 1% chance of killing everyone aboard 10 million people/day * 1% = 100,000 people would die every day from a plane.


Gamblers fallacy


Software works so much more than 99% of the time that it's a rather deliberate strawman to claim otherwise.

Newly-"AI"-branded things that I have touched work substantially less than 90% of the time. There are like 3 orders of magnitude difference, even people who aren't paying any attention at all are noticing it.


Do you have to write your code presuming that sometimes 'a + b' will be wrong? I don't.

Software pretty much always "works" when you consider the definition of work to be "does what the programmer told it too". AI? Not so much.


It’s all about limits and edge cases. a+b may “fail” at INT_MAX and at 0.1+0.2. You don’t `==` your doubles, you don’t (a+b)/2 your mid, and you don’t ask ai to just book you vacation. You ask it to “collect average sentiment from `these_5k_reviews()` ignoring apparently fake ones, which are defined as <…>”. You don’t care about determinism because it’s a statistical instrument.


> and you don’t ask ai to just book you vacation. You ask it to “collect average sentiment from `these_5k_reviews()` ignoring apparently fake ones, which are defined as <…>”.

That's exactly my point. You have to interact directly with the A.I. and be aware of what its doing.


That's not true. If software works correctly today then users can expect it to work correctly tomorrow. If it doesn't work any more that's a bug.


structured outputs help, paired with regular old systems design I think you can get pretty far. it really depends what you're building though.

>If your AI travel agent books vacations to the correct destination only 90% of the time

that would be using the wrong tool for the job. an AI travel agent would be very useful for making suggestions, either for destinations or giving a list of suggested flights, hotels etc, and then hand off to your standard systems to complete the transaction.

there are also a lot of systems that tolerate "faults" just fine such as image/video/audio gen


> that would be using the wrong tool for the job. an AI travel agent would be very useful for making suggestions

But that’s a recommendation engine and we have that already all over the place.


We have lists with shallowly gamed results all over the place, which work in owners/bots favor, not yours. You can’t expect something not running on your device (or on a gpu rented from a third party) to work in your interest.


And hopefully a real recommendation engine won't be weirdly biased towards different answers depending on the exact phrasing, tone, and idiom of the request.


Yeah how am I supposed to raise millions of dollars without a working product by selling yesterday’s tech??


i 100% percent agree. people get so caught up on trying to do everything 90% right with AI, but they forget there's a reason most websites offer at least 2 9's of uptime.


I’m not really sure what stance is here because you say you agree with the GP but then throw some figures that clearly disagree with the authors point (99% uptime is vastly greater than 90% accuracy).


> If your AI travel agent books vacations to the correct destination only 90% of the time, it won’t be successful.

Well, I don't agree. I think there are ways to make this successful, but you have to be honest about the limitations you're working it with and play to your strengths.

How about an AI travel agent that gets your itineraries at a discount with the caveat that you be ready for anything. Like old, cheap standby tickets where you just went wherever there was an empty seat that day.

Or how about an AI Spotify for way less money than current Spotify. It's not competing on quality, it can't. Occasionally you'll hear weird artifacts, but hey it's way cheaper.

That could work, imo


We've had good, free (non ai) media recommendation tools in the past and they got killed by licensing agreements.

AI is creating a post-scarcity content economy where quality is going to be the only driver of value.

If you are the rights holder of any premium human created media content you are not going to let a 'cheap' AI tool get access to recommend it out to people.


The AI travel agent is trivial to solve though. It's the same as the human travel agent. Put the plan and pricing together, then give it to the user to sign and accept. Do it in an app, do it in an email, do it on a piece of paper, whatever floats your boat, but give them something they can review and accept instead of trying to do everything verbally or in a basic chat interface.

I'm not disagreeing with the "needs to work deterministically" -- there is a need for that, but this is a poor example. "Hey robot, plan a trip to Mexico" might still save me time overall if done right, and that has value.


It just needs to beat all the other non-deterministic processes at accuracy.

Call centre workers are often dreadfully inaccurate as well. Same with support engineers.

Heck even for banking, there are enormous teams fixing every screw up made by some other employee.


I have a question for folks working heavily with AI blackboxes related to this - what are methods that companies use to test the quality of outputs? Testing the integration itself can be treated pretty much the same as testing around any third-party service, but what I've seen are some teams using models to test the output quality of models... which doesn't seem great instinctively


Take this with a grain of salt because I haven't done it myself, but I would treat this the same as testing anything that uses some element of random.

If you're writing a random number generator, that generates numbers between 0 and 100. How would you test it? Throw your hands up in the air and say nope, can't test it, it's not deterministic! Or maybe you can just run it 1000 times and make sure all the numbers are indeed between 0 and 100. Maybe count up the number frequencies and verify its uniform. There's lots of things you can check for.

So do the same with your LLMs. Test it on your specific use-cases. Do some basic smoke tests. Are you asking it yes or no questions? Is it responding with yes or no? Try some of your prompts on it, get a feel for what it outputs, write some regexes to verify the outputs stay sane when there's a model upgrade.

For "quality" I don't think there's a substitute than humans. Just try it. If the outputs feel good, add your unit tests. If you want to get scientific, do blind tests with different models and have humans rate them.


But a knowledgeable human can take the iternarary and run with it. I know I’ve done that with code enough from AI generated stuff, it’s basically boiler plate. You still run it through the same tests, reviews, and verification as you would have had to do anyway.


And yet, generative AI also seems to be poor at randomness. When I ask Google Gemini for a list of 50 random words, it gave me a list of 18 unique words, with 16 of them repeated exactly 3 times.

Abyss: 1 Ambiguous: 3 Cacophony: 3 Crescendo: 3 Ephemeral: 3 Ethereal: 3 Euphoria: 3 Labyrinth: 3 Maverick: 3 Melancholy: 3 Mellifluous: 3 Nostalgia: 3 Oblivion: 3 Paradox: 3 Quixotic: 1 Serendipity: 3 Sublime: 3 Zenith: 3


Randomness is difficult. I wouldn't expect any LLM to be able to reliably produce random anything, except in the cases where they have access to tools (ChatGPT Code Interpreter could use Python's random.random() for example).


Okay, but repeating words?


Makes sense to me: It's not understanding and complying with your request, it's doing statistics to guess what kind of follow-up would be most common.


Are you using regular or pro? As pro has no issues with this task.


Regular user. For useful tasks, how much of a difference does the Pro plan make?


Nowhere near as good as ChatGPT 4o or Claude (in not one case have I had it outperform those other two), but at least it can do math and data science correctly most of the time compared to the regular model.

I use it as a secondary when the other two are chewing on other tasks already.

I only own it as I am an outrageously heavy consumer of LLMs for all sorts of little projects at once and they all seem to pause one window if you use another.


Instead of pivoting, can this behaviour be explained by trying lots of different things and then iterating on the ones that show promise?

It's all well and good to say "Make something people want" but for anything that people want usually one of three things is true

1. Someone else is already making it.

2. Nobody knows how to make it.

3. Nobody knows that people want it.

People experimenting with 2 and 3 will have a lot of failures, but the great successes will come from those groups as well.

Sure, every trend in business has a lot of companies going "we should do this because everyone else is" It was a dumb idea for previous trends and it is a dumb idea now. Consider how many companies did that for the internet. There were a lot of poorly thought out forays into having an internet presence. Of those companies still around, they pretty much will have an internet presence now that serves their purposes. They transition from "because everyone else is" as their motivation to "We want specific ability x,y,&z"

Perhaps the best way to get from "everyone else is doing it" to knowing what to build is to play in the pool.


That's exactly what these companies are doing. They're trying a lot of different ideas, and seeing what sticks. The problem is that they're annoying users and causing distrust.


Using herbal natural remedy was what got me tested negative to HSV 2 after being diagnosed for years. I have spent so much funds on medications like acyclovir (Zovirax), Famciclovir (Famvir), and Valacyclovir (Valtrex). But it was all a waste of time and my symptoms got worse. To me It is very bad what Big pharma are doing, why keep making humans suffer greatly just to get profits annually for medications that don't work. I'm glad that herbal remedies are gaining so much awareness and many people are getting off medications and activating their entire body system with natural herbal remedies and they have become holistically healed totally, It’s also crucial to learn as much as you can about your diagnosis. Seek options visit: worldrehabilitateclinic. com


I’m building an integration platform. There’s a thousand ways to deeply embed AI throughout it, both to build integration workflows faster, and to help us build smarter API wrappers faster.

But AI has always been a secondary augmentation to the product itself. It’s a tool, it shouldn’t be the other way around.


ChatGPT is very useful for me to the point that I pay subscription fee. To me it IS the product.


I haven't found a use for it. What do you use it for?


I find it to be orders of magnitude more helpful at getting me started on the research journey when I don't know how to formulate the question of what I'm researching.

I find it useful for:

  * throwing ideas at a wall and rubber-ducking my emotional state and feelings.
  * creating silly, meme images in strange circumstances, sometimes.
  * answering simple "what's the name of that movie / song / whatever" questions
Is it always right? Absolutely not. Is it a good starting point? Yes.

Think of it like the school and the early days of Wikipedia. "Can I use Wikipedia as a source? No. But you can use it to find primary sources and use those in your research paper!"


It is my new Google, ever since Web Search and the Web itself stopped to work as a source of information.

When I look for answers to specific questions, I either search Wikipedia, or ask ChatGPT. "Searching the Internet" doesn't work anymore with all the ADs, pop-ups and "optimized" content that I have to consume before I get to find the answers.


Hmm, ChatGPT doesn't seem to provide accurate information often enough to be trustworthy. Have you tried an ad blocker and another search engine like DDG/Bing? Starting with Wikipedia is a good approach too.


It's outrageous at translation. I can set it to record next to my wife and her grandma speaking in Korean, through phone speakers, and it creates a perfect transcription and translation. Insane.


How are you checking that the translation is correct?


Wife is next to me and checks after. Never had a single wrong word.


what is the delay? i imagine translation specific models would be better?


Like a minute, it's not real time. You click record, let it transcribe then ask it to translate


Everything!

Its like talking to a intelligent person about a topic you want to learn, but they know it good enough to teach you if you keep asking questions.


I've found it to be like talking to someone who pretends to know things, but lacks understanding when you probe further.


Probe for summarised information, don't outsource your thinking to AI.

Like even when you are writing code. Describe the solution and ask AI to write it, don't specify the requirements to AI and hope it will write it.


For many things in general for IT related tasks in particular.

a) Write me shell script which does this and that. b) what Linux command with what arguments do I call to do such and such thing. c) Write me a function / class in language A/B/C that does this and that d) write me a SQL query that does this and that e) use it as a reference book for any programming language or whatever other subject.

etc. etc.

The answers sometimes come out wrong and / or does contain non trivial bugs. Being experienced programmer I usually have no problems spotting those just by looking at generated code or running test case created by ChatGPT. Alternatively there are no bugs but the approach is inefficient. In this case point explaining why it is inefficient and what to use instead ChatGPT will fix the approach. Basically it saves me a shit ton of time.


Do you have any strategy for prompting? I've found I need to spend a lot of effort coaxing it to understand the problem. Hallucinations were pretty common, and it lead me down a couple believable but non-viable paths. Like coding an extension for Chrome, but doing things the permission system will not allow.


I use it to auto generate responses to ridiculous rhetorical questions on hacker news


It gives amazing medical advice and answers, way better than webmd or frankly even most primary care physicians ive seen.


Are you a doctor? On what basis are you evaluating medical advice?


Whether or not it works!


Terrifying. You must like living dangerously.


I am lucky enough to have several family members who are MDs and am close to a few more.

They google crap all the time. They're as unaware of things as we are.

The tests they have access to are much better than anything we can get our hands on though.


They are NOT as unaware of things as we are. That’s like someone seeing a software developer googling stuff and saying “see, they don’t know much more than me”.

An expert refreshing their knowledge on Google is not the same as a layman learning it for the first time. At all.


How do you know that? Are you a doctor?


Ask chatgpt -> get advice-> double check on Google- > try advice -> my medical issue is solved

This has happened to me 3-4 times, hasn't sent me wrong yet. Meanwhile I've had doctors misdiagnose me or my wife a bunch of times in my life. Doctors may have more knowledge but they barely listen, and often don't keep up with the latest stuff.


This is one thing people who shit on chatbots don't get. It doesn't need to be god to be useful, it just needs to beat out a human who is bored, underpaid, and probably under qualified.


Hope it will never hallucinate on you. Doctors will start to need to warn against DoctorGPT MD now, not just Doctor Google MD.

Maybe I'm a low-risk guy but I would never follow a medical solution spit out by an LLC. First, I might be explaining myself badly, hiding important factors that a human person would spot immediately. Then, yeah, the hallucinations issue, and if I have to double check everything anyway, well, just trust a (good) professional.


Yeah, ChatGPT itself is amazing. What I don't understand is, why are other companies paying so much for training hardware now? Trying to make more specialized LLMs now that ChatGPT has proven the technology?


Just a heads-up. I'm interested in your online workshop link, but it's private.

https://sites.google.com/princeton.edu/agents-workshop


Google has been productizing AI for a while now. 2021 Pixels have the Tensor SoC which was explicitly marketed as an AI chip. Chatbots weren't part of the equation back then, but offline image translation, magic eraser, etc certainly were.


When I see “AI” in the product description of something I’m almost immediately turned off. It’s plastered everywhere for most tech companies now and doesn’t mean anything practically, despite trying to sound like a differentiator.


While I don't like the blog title, many things said in there rang true for my company (MoveAI.com). We are building an AI-powered moving concierge that can orchestrate your relocation experience end-to-end.

We initially were developing a system that we had hoped could handle everything and eject any workflow issues to a human so the operations team could kick the machine. We were hoping to avoid an interface all together on the customer side.

After a few versions and attempts at building this system, we moved towards a traditional app where we focused on building a product people wanted and automate parts of it over time. But even the parts we automated needed an interface for customers to spot check our work. So we found a great designer.

...Before we knew it, we were building a traditional company, with some AI. The company is doing well and people love what we're building, but it's different than we imagined.

We still believe in the long term vision and promise of the technology, but the article is right, this isn't going to be an overnight process unless some new architecture emerges.

In the mean time, we're focused on helping people get from A to B easily using whatever means necessary, because moving f**ing sucks. If you're moving soon or know anybody who is, we'd be happy to help them. -P


Because you need to make money to get there


people who claim AI is just snake oil are farthest from the reality.


I'll believe it when I see it.


I guess they've worked out making money is an important part of any business?


Their moat has evaporated on the B2C side--no friction, plenty of alternatives, overly generous free tier--and B2B is freaked out about non-local usage.


[flagged]


> Imagine not understanding that their main way of doing money is through their API for other companies, and not through a product.

Or, more to the point: Their primary product is B2B, not B2C.


There's no shortage of valuable B2B products to be built through an AI API.


In principle, yes.

In practice, it's a little more tricky. AI APIs give mostly human-level reliable results in many cases. But they don't work well 100% of the time, which affects trust in an automated B2B product meant to replace human labor.

There are still applications that aren't meant to replace human labor, just to generate natural language very quickly. I've even seen this done in academia. And also, the API AIs are expected to become more reliable over time.


Reliability is much more possible than it was 1-2 years ago.

There was an article early on I'm trying to find that said the direction of attention was counter-intuitive this time around.

Some of this is in the weeds, but those who have kept up actively trying to use the AI for 12-18 months are often saying and seeing different things because they know what capabilities have arrived while some of us hang on to the understanding from the last time we might have looked at something.

Where I think I see things differently both in what I've been able to show and deliver, is the application of this technology does need to come before, or right along with development of it. This has largely been overlooked so far

Learning how to use something new and pretty different requires trying to use it, see what it can do reliably, and what it's improving at, and helping organizations with $ to fund something they need.

The big one still, is that something that's not possible today, has to be looked at more through a lens of not if, but when. The when has been happening much quicker.

They can get much closer to working well 100% of the time.

Often by not using them at each step, or not wanting a magic wand to figure it all out.

Human labor replacement isn't anything new. Anything simple or repetitive technology in general (including LLMs) will make a dent.

The applications for natural language, or not are quite valid. I had someone in academia approach me for designing a guide to help grade in the personalized style of a professor. The irony was there's ways to do it without LLM that might equal what is posisble today, but what they were asking for was valid.

One of the big issues that's improving is not just API "reliability", but the cost of running things per request off API, or reasonably running som eor part of it locally.


If other people build it using OpenAIs API that's still B2B from OpenAI's point of view though.


And at release ChatGPT was meant as a marketing gimmick. A fun way to interact with a slightly finetuned version of GPT3.5 to showcase how good their models had become.

If anything it's remarkable how much they leaned into this success, building an iOS and Android app, speeding up the models, adding a premium plan, lots of new features, and eventually deprecating their text-completion mode and going all in on chat as the interaction mode for their LLMs.


Meanwhile Amazon will host Llama and other models in AWS (which you are already using) at reasonable rates.


Their numbers aren't public, but I'm not 100% certain that they're making significantly more money through the API than they are through paid subscribers to their products.

They have a LOT of paid subscribers, and they're signing big "enterprise" deals with companies that have thousands of seats.


I believe I read somewhere that it’s estimated only ~20% of their revenue is via the API.

Found a similar source[0] saying 15% estimated from API use.

[0] https://www.notoriousplg.ai/p/notorious-openais-revenue-brea...


Of course, you need to have companies that build product on top of it, that takes times. So I would not be surprised if in the first few years, subscription will earn more than API usage. But on the long term, if OpenAI stay as the top AI model, they will earn massive money through API calls.


How do you know they have a lot of paid subscribers if their numbers aren't public?

I'd guess that outside of a core "fanbase" / early adopter type they don't have that many subscribers.


Rumors. The Information had a story about this (but it's behind a paywall): https://www.theinformation.com/articles/openais-annualized-r... - their story was based on "people with knowledge of the situation" aka insider leaks.

I don't know the source for https://www.notoriousplg.ai/p/notorious-openais-revenue-brea... but it could just be that same Information article.


They leak like a sieve to their trusted testers.


> Imagine not understanding that their main way of doing money is through their API for other companies, and not through a product. They are focused on doing something they are good: good AI models, they let other companies take the risk to build product on top of it, and reap benefits from theses products.

There is no moat in an API-gated foundation model. One LLM is as good as any other, and it'll be a race to the bottom.

The only way to mint a new FAANG is to build a platform that captivates and ensnares the populace, like iPhone or Instagram.

The value in AI will be accrued at the product layer, not the ML infra tooling, not the foundation model. The product layer.

It might be too late to do this with LLMs and voice assistants, though. OpenAI is super distracted, and there's plenty of time for Google, Meta, and Apple to come in and fill the void.

Everyone was too busy selling the creation of gods, or spreading FOMO to elevate themselves to lofty valuations. At the end of the day, business still looks the same as it always has: create value for customers, ideally in a big market where you can own a large slice. LLMs and foundation models are fungible and easy.


> LLMs and foundation models are fungible and easy.

The top couple LLMs are extraordinarily expensive - will get dramatically more expensive yet - and are one of the most challenging products that have been created in all of human history.

If what you're claiming were true, it wouldn't cost so much for Meta and OpenAI to do their models, and it wouldn't take trillion dollar corporations as sponsors to make it all work.

> One LLM is as good as any other, and it'll be a race to the bottom.

Very clearly not correct. There will be very few top tier LLMs due to the extreme cost involved and the pulling up of the ladders regarding training data. This same effect has helped shield Google and YouTube from competition.

> There is no moat in an API or foundation model.

Do you have a billion dollars? No? Then you can't keep up over the next decade. Soon that won't even get you into the party. Say hello to an exceptional moat.


> The top couple LLMs are extraordinarily expensive - will get dramatically more expensive yet - and are one of the most challenging products that have been created in all of human history.

I disagree. The more we learn about LLMs the more it appears that they're not as difficult to build as it initially seemed.

You need a lot of GPUs and electricity, so you need money, but the core ideas: dump in a ton of pre-training data, then run layers of instruction tuning on top - are straight-forward enough that there are already 4-5 organizations that are capable of training GPT-4 class LLMs - and it's still a pretty young field.

Compared to human endeavors like the Apollo project LLMs are pretty small fry.


100%. I don't think we should at all minimize the decades of research that it took to get to the current "generative AI boom", but once transformers were invented in 2018, we basically found that just throwing more and more data at the problem gave you better results.

And not to discount the other important advances like RLHF, but the reason everyone talks about the big model companies as having "no moat" is because it's not really a secret of how to recreate these models. That is basically the complete opposite of, say, other companies that really do build "the most challenging products that have been created in all of human history." E.g. nobody has really been able to recreate TSMC's success, which requires not only billions but a highly educated, trained, and specialized workforce.


I also immediately thought about Apollo, when i read "most challenging products that have been created in all of human history".


> [Large Language Models] are one of the most challenging products that have been created in all of human history.

That statement seems, er... hyperbolically grandiose.

Are there examples of other products which you think are similarly "challenging" to create?


Sort of like railroads, a super simple product, but insanely expensive to build a network at scale except for those who already have.


Mistral AI has released their updated Mistral Large model and it gets basically the same scores on the chatbot arena leaderboard as a GPT4 version from the end of 2023.

OpenAI has to constantly keep moving and improving their models with zero forgiveness for any complacency and so far they have only achieved less a lead of less than a year versus an underfunded competitor.

Meanwhile Anthropic and Google are managing to build commercial models that are on par with GPT-4.


depends on your definition of "good." if good means creating the next generation of recommendation algorithms that result massive technology addiction and a mental health crisis, then yeah


Well the whole 'creating gods' thing is just silly fantasy nonsense to cover up for what's really going on, which is novelty and gimmicks.

It's okay, I mean even the internet started out as Charlie_Bit_Me.avi and free porn.


Factually incorrect about the genesis of the internet or even the “mass” internet. But, okay.


Apologies, I totally forgot about the pedantry of arguing on the internet that really drove its adoption.


I mean, the Internet really seemed to take off when people realized they could get free porn, to be honest.

I'm betting the adoption curve for AI hits when the first company sells photorealistic porn of anyone you have a picture of. Hell, half ass Photoshop porn is lucrative already.


You can do that with local models already.


> Charlie_Bit_Me.avi

Charlie Bit Me is of the YouTube generation, so it wasn't passed around as an avi email attachment like some older memes of the previous generation. From that long ago, Exploding Whale comes to mind.


I’m not sure what book you read on the origins of the internet but it is very very far off from the actual truth.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: