The premise of this is flawed. OpenAI is cheap because of has to be right now. They need to establish market dominance quickly, before competitors slide in. The winner of this horse race is not going to be the company with the best performing AI, it’s going to be the one who does the best job at creating an outstanding UX, ubiquitously presence, entrenching users, and building competitive moats that are not feature differentiated because at best even cutting edge features are only 6-12 months ahead of competition cloning or beating.
This is Uber/AirBnB/Wework/literally every VC subsidized hungry-hungry-hippos market grab all over again. If you’re falling in love because the prices are so low, that is ephemeral at best and is not a moat. Someone try calling an Uber in SF today and tell me how much that costs you and how much worse the experience is vs 2017.
OpenAI is the undisputed future of AI… for timescales 6 months and less. They are still extremely vulnerable to complete disruption and as likely to be the next MySpace as they are Facebook.
It's a race to the bottom on pricing on the provider/infra side. It seems very unlikely that any single LLM provider will achieve a sustained and durable advantage enough to achieve large margins on their product.
Consumers can swap between providers with relative ease, and there is very little stickiness to these LLM APIs, since the interfaces are general and operate on natural language. Versus something like building out a Salesforce integration and then trying to update that to a competitor. Or migrating from Mongo to DynamoDB.
Building the LLMs is where the cool tech lives, but surprised so many are seeing that as a compelling investment opportunity.
Certainly the undisputed winners will be the very few firms with enough engineering resources and GPUs to train their own models (not just fine-tune) where the models in question increase the productivity of workers in their non-ai-related profit centers. After that we have the real question of what the future will be of open source LLMs, on the one hand, and the question most relevant to this article of what sort and whether profitable “AI businesses” can be sustained over time. As Stratechery has analyzed, it is very possible that OpenAI turns out to be a very profitable B2C company with ad revenue in ChatGPT not concerned with their B2B sales or even the objective quality of their AI. Right now is an incredible time for AI: cheap Uber rides never qualitatively changed my life, but the current consumer access to AI models is truly incredible and I hope that only improves. However, even ignoring whatever happens on the regulatory front, I don’t think that is guaranteed at all.
This was definitely a theory that made people burn tons of money on the past couple of years, but I don’t think it holds water. These models are getting obsolete so fast, and there’s so many open ones, I doubt any one’s privately trained model can stay relevant for long
It's not anymore. If the model is publicly accessible, its skills can be distilled by performing some API calls and recording input-output pairs. This scheme works so well it has become the main mode to prepare data for small models. Model skills leak.
I agree, publicly deployed models seem to be easy to train from. I did say "internally deployed LLM" though. agentcoops said "...where the models in question increase the productivity of workers in their non-ai-related profit centers" above, that's the bit I was thinking about. I think private models, either trained from scratch or fine-tuned, are going to be a big deal though they won't make the PR splash that public models make.
The conclusion for that seems to be that it just yields a model that has the surface look and feel of GPT3 or 4 but without the depth, so the experience quickly becomes unsatisfactory once you go out of the fine tuning dataset.
You may not need to train a model to make use of your data though. Maybe a cheap fine tune would work just as well. Maybe just having the data well indexed and/or part of the prompt context is good enough.
I don’t think they necessarily will be allowed to train on their data unless they get explicit permission. They will try, but the way I see privacy revaluations is that users will have to authorize specific uses of their data and not be surprised by any application.
This could be one of the more interesting privacy fights of the next decade.
I’m sure there are easy cynical takes about how they will just shrink wrap the EULA, and maybe they will. But in a good privacy environment, users should never be surprised and have control over how their data is used. And I think we’ve made some progress there.
> I don’t think they necessarily will be allowed to train on their data unless they get explicit permission. They will try, but the way I see privacy revaluations is that users will have to authorize specific uses of their data and not be surprised by any application.
If there's one company that I don't think cares about user permissions or the law, it'd be Twitter.
The EU officially warned Elon about DSA fines and the response was less than serious.
Cloud infra may be a comparable market, since computation is a big share of AI costs. Did consumers win big from competition between AWS, Azure, and GCP? Not sure. I see an uptick in write ups saying “We switched off cloud and reduced costs by 2/3rds.” Not a scientific sample but may leave the question open.
Can confirm: The cloud computing fad is well underway to dying...that is why AI is booming. One need only to follow the wall street dollars to figure that one out.
Several very big profile names have recently begun moving back to self-hosted, hybrid, or dedicated hosting solutions.
Cloud computing never was good in terms of value, however, it was only good in terms of scalability. AI solutions built on top of 'the cloud' will always be even worse.
Note that I pay a certain "third tier" cloud provider less than $100/mo total for hosting large websites that would cost me more than 10 grand a month on AWS/Azure/Google...while having better uptime. (the biggest differences? the complete lack of IO and bandwidth charges, and much lower storage charges)
That should tell you all you need about these types of bubbles, but then again, most of us that watched the entire tech field unfold since the pre-internet phase already knew this.
Just gonna chip in here and say that anything costing a 100 bucks is not valid as an argument in a conversation about cloud.
The prime selling point for cloud for large enterprises was (and still is):
- a signatory that shares blame on several core security issues (iso stuff)
- high amount of flexibility for individual teams used to asking for a vm then waiting four weeks for the itops dept to bring it online
Now the vast majority of cloud moves for large enterprises ends up as a shitshow due to poor implementation sure, but the key points for getting it sold are still there.
And ofc you CAN still cost optimize with cloud, its just harder.
Context: Worked in post and pre sales over some years in MSFT in the enterprise segment.
I got out before the downturn and everyone talking about cost, but my approach in selling azure to the c-suite would be fairly similar today I reckon.
Dropbox has been the biggest name who did the whole “cloud repatriation” thing, which was all the rage at the beginning of 2023, with claims that the cloud was soon to be dead—but cloud revenues are supposedly expected to be in excess of $1T by 2026, so whatever.
Some random survey from ESG found 60% of respondents repatriated at least some workloads. Who knows what the N was though.
Much harder to switch cloud providers than to switch LLM models. How much time would it take most companies to move their product from AWS to GCP, for example? What if you use a cloud specific tool like DynamoDB?
Margin is a function of stickiness/cost of switching (among other things).
I suspect eventually we will enter a world where migrating cloud providers is mostly a click of a button, but we're a long ways off from that. Requires vendor agnostic and portable apis/containers/WASM runtimes everywhere.
Swapping an LLM, at least in the current state, is about as close to updating a pointer as you can get.
You need to rerun eval after each LLM update. GPTs have a new version every couple of months and their capabilities can change quite drastically pretty much randomly. Maybe they will make it more robust in time, but I think this is the feature of the technology and people will have to adapt to these quirks
Well I'd have to change my Terraform provider and the managed Kubernetes resource... Other than that, it'd be the same. So half an hour of coding + half an hour of reconfiguring CI secrets?
Pretty obvious to most that switching cloud providers is not quick or painless for the majority of orgs. There's not really an argument in good faith to suggest otherwise.
Especially given that many orgs use managed or cloud specific solutions that have no 1:1 mapping between vendors
I'm running the tech for a global startup with 100M EUR turnover. It's still pretty small, but it's something.
You need to plan ahead, sure. Portability was one of my main concerns (including the possibility to go self-hosted). But it's definitely not impossible, nor too hard to do.
What's a serious cloud provider-portable replacement for Terraform?
And Kubernetes? It goes way beyond containers, you can replace many cloud provider-specific resources with Kubernetes resources. What alternative gives you that?
There are a HUGE number of startups that would have had a massively harder time if they had to roll their own infra. Who cares if the companies eventually have to move off? (Though even Netflix seems ok with it overall for now). There are a ton of services that just wouldn't exist otherwise
Renting a server and using standard commodity open source software and standard (outsourced) sysadmins is way cheaper and faster than learning and dealing with all the proprietary AWS and Azure junk.
(Probably also less reliable than "cloud", but who cares if you're a startup.)
Sorry not sure if I'm missing something obvious but isn't a VPS gonna be hosted by a cloud provider? How would that be an alternative to using the cloud?
If you're just running VPS on the big clouds, you're not going to get much advantage out of it, indeed.
But tell me what third tier cheap provider has managed scale-to-zero-or-infinity functions? Managed storage with S3-like API? Where can I get an API gateway cheaper than Amazon? What about managed databases? These tools allow me to develop insanely scalable software incredibly easily.
Agreed - all the big clouds are very expensive VPS hostings. Don't use it for that.
You get portability. Which the functions do not provide. Open source solutions have a longer career utility than proprietary offerings. I remember when NetWare certs were all the rage. Useless now. I remember msce. But if you learned open tools 35 years ago instead... You get the picture.
You can scale from 1 thread 512MB RAM, to 500 threads and 12TB of RAM (off the shelf). Which is good enough for almost everyone who isn't planet scale.
Auto scaling also comes with auto billing. Oops, your accidental infinite loop spawning functions has bankrupted your company. You don't have that risk starting with a VPS.
I agree completely, but the argument remains the same - there's not much utility in using the big clouds as VPS providers, and it's definitely costly to do so.
Ingress that handles SSL. nginx, or caddy. Then stand up your app server behind that on the same VM. Database can be on the same or different VM.
I try to not use anything else if I can avoid it on a new project.
Ingress gives you the ability to load balance and is threaded and will scale with network transfer and cpu. Database should scale with a bump in VM specs as well, CPU and disk IOPS.
If you keep your app server stateless you can simply multiply it for the number of copies you need for your load.
Systemd can keep your app server running, it you docker it up and use that
No that's all fine, I mean physically, where would I put a server and stuff if we didn't have cloud providers? I'd need to pay an isp for an IP address and maybe port forward and stuff like that right? I don't get why I wouldn't just do what you mentioned on a five dollar digital ocean droplet or an ec2 instance or whatever, the cloud still seems orders of magnitude easier to get off the ground.
If you rent a cloud vps as an ingress you can run an overlay network and your actual hosted services can literally be anywhere. See nebula, netbird, etc. You can also ssh forward, but that doesn't scale well past a handful of services and is a bit fragile.
For new small systems I suggest you start with a cloud VPS. If traffic is low, cost of downtime is low, and system requirements are high then a cheap mini PC ($150) at the home or office can keep your bill microscopic. If your app server and database are small then you can just throw them on the VPS too.
I run light traffic stuff at home in a closet so it doesn't occupy more costly cloud RAM. Production ready saas offerings I'm trying to sell right now are all in the cloud. Hosting all my stuff in the cloud would cost me hundreds per month. My home SLA is fine for the extras. I don't need colocation at this time, but I have spoken with data centers to understand my upgrade path.
You can run a live backup server at a second location and have pretty good redundancy should the primary lose power or connectivity.
When system requirements elevate (SLA, security, etc) you probably want to move into a data center for better physical security, reliable power and network. Bigger VPS is fine if it is big enough. Can also do a colocation if you don't want to rent, and you contract directly with a data center. I wouldn't look at colocation until your actual hosting needs exceed at least $100/mo and you're ready for a year long commitment.
But to the point of the original question that started this thread - the takeaway is still that cloud services made development massively easier right? The answer to the original question seems to still be "Yes cloud providers did lead to big wins for customers" since none of these other suggestions are able to get away from needing a cloud service provider without making starting something intensely difficult. And you wouldn't be able to get a vps for 5 bucks for ingress without all the other cloud competition in the market.
> Cloud infra may be a comparable market, since computation is a big share of AI costs. Did consumers win big from competition between AWS, Azure, and GCP? Not sure. I see an uptick in write ups saying “We switched off cloud and reduced costs by 2/3rds.” Not a scientific sample but may leave the question open.
It's not the tech it's the data. As far as I know the data is not freely shared or at the very least there will be custom built models from data you can't get anywhere else.
Yep. I wonder why there is not more features that would differentiate your LLM API, like for example the Functions in OpenAI LLM APIs. While not perfect it is extremely useful and not aware of a similar offering (I do know that there are Python packages offering similar functionality but it is my understanding that they don't work as well as the OpenAI Functions).
Because the cost of entry to the market is so absurdly high right now, it is seen as a good investment opportunity. If you throw enough money at it you can make your place no nice and early and then win later by sheer experience in the field. That is the idea.
It seems to be compelling because a genie has been the promise of technological progress forever and let's be honest, it's been marketed as such. Why would people not invest in that unless you were a technological savvy skeptic like yourself.
Possibly unrealistic, but my fear is that they will end up like Netflix more than uber.
Some will scream in horror but I wanted Netflix to be a monopoly. A single place and app and account with all the content I need.
"competition" in streaming space has been nothing but disastrous for me as a consumer. It led to greedy heterogeneous islands of content, with proliferation of crappy apps and pointless restrictions and return to cable package mentality.
Again, Possibly irrationally and ignorantly, my fear is that 5 years from now I'll need a dozen subscriptions to less good services which will hoard their source data and models and be specialized based on which content they got licenses to. I. E. There'll be ai1 with new York times and Wikipedia, and ai2 with Washington post and encyclopedia Britannica, and ai3 with I don't know fox news and RT, and ai4 with mit and Harvard business libraries, and ai5 focused on math with extra subscription to wolfram, and ai6 with rights to stack overflow and JavaScript and so on.
There are many scenarios various writers have posited where we are actually in local maxima lf ll, with data being increasingly closed and or poisoned, and possibly segregated in the near future. :-/
Just like the pirate Bay online cinema is better than every streaming service there could be pirate LLM that uses all the data, maybe it could be even trained by internet users sharing a bit of compute with some program?
I don’t think you can really compare that. The streaming service market is not elastic because you cannot easily interchange one series for another.
If other vendors LLMs become good enough it will actually be easily to interchange and then the race for the best UX and integration will be upon us (which the other commenter alluded to).
"if other llms are good enough" assumes that in principle they have same opportunities, access to same data, or content. My fear is precisely that this assumption may be taken away - I. E. That news paper publishers or encyclopedia owners or big websites (stack overflow, web Md, etc) will enter into arrangement with specific llm companies - just like Netflix Disney prime etc aren't competing on their app or price or flexibility, but on exclusive underlying content. Nobody WANTS to subscribe to Paramount+ or cbs access... But if they hold enough material hostage some people will 'have to'. I can see a future, not far off, where different llm organizations selling feature is not how good their technology is - to your and overvodys point, THAT moat is likely to even out - but what underlying training data they have legal access to.
Your Uber/AirBnB/Wework all have physical base units with ascending costs due to inflation and theoretical economies of scale.
AI models have some GPU constraints but could easily reach a state where the cost to opperate falls and becomes relatively trivial with almost no lowerbound, for most use cases.
You are correct there is a race for marketshare. The crux in this case will be keeping it. Easy come, easy go. Models often make the worst business model.
But OpenAI appears to have some sort of data moat. I doubt their model is the best in the world, but more/better data generally beats better model, and GPT-4 definitely beats Claude, Bard, Bernie and the rest probably because they curated the best quality and largest data set. Maybe that moat doesn't last long but perhaps they have exclusive rights to some of that dataset through commercial agreements that could be a more durable moat.
> But OpenAI appears to have some sort of data moat.
I'm willing to bet dollars to doughnuts that Google and Facebook have at least one, possibly 2 or more orders of magnitude more latent training data to work with - not including Googles search index.
My uninformed opinion is that Google and Meta's ML efforts are fragmented - with lots of serious effort going into increasing existing revenue streams with LLMs and the like being treated as a hobby or R&D projects. OpenAI is putting all its effort into a handful of projects that go into a product they sell. The dynamic and headcounts will change if the LLM market grows into billions
> My uninformed opinion is that Google and Meta's ML efforts are fragmented -
It seems more likely that at Google at least they just fell into the classic innovator's dilemma in which they were stuck trying to apply innovation to their current business models in an attempt at incremental innovation instead of seeking an entirely different customer and market.
I got the impression that Google was running Bard on a smaller model with presumably cheaper inference costs. I imagine the unit economics of both Bard and Chat GPT are negative right now and Google is trying to stay in the game without lighting too much money on fire.
Google and Facebook are not interested in pooling all their resources in order to build the next big thing. They are just interested in doing “enough” so that people keep using their platforms. The race is about how often a day every person on the planet spends on either google or Facebook/instagram. It’s about who is “the homepage of the internet”. They just need to be good enough so that traffic doesn’t move off to chat gpt.
I'm sure some people at google and meta were screaming at the top of their lungs to jump on the ai bandwagon before chatgpt - but you know how things work in large companies.
They're not as good at innovating, that's why they acquire startups all the time. It's a blood transfusion
Facebook.com already has decades-worth of natural language text and audio/video from uploads and "live" sessions. That is a deep pool, and wide too because Facebook probably has content in all currently-spoken natural languages, with the exception of those exclusively used by uncontacted peoples. That is a data moat.
I'd bet that there are any number of submarine startups out there sitting on top of full downloads of Common Crawl, archive.org, etc. who are only too happy to let OpenAI be the first penguin off the iceberg.
If OpenAI survives all the legal challenges, they'll just click "Go" and be in business in weeks to months.
If OpenAI gets smacked down, they haven't lost much.
There are probably also some submarine operations that are already doing/have done the training. If OpenAI gets bankrupted for copyright violations, we'll just never hear from those.
The cost is peanuts compared to the potential profit. Apple/Google/Facebook could absolutely eat the costs for a skunkworks project to do that training and just sit quietly waiting on the yes/no from legal.
>submarine startups out there sitting on top of full downloads of Common Crawl, archive.org, etc. who are only too happy to let OpenAI be the first penguin off the iceberg.
Not sure any startup would be ok with sitting on a potentially gamechanging model.
I was using Bard and chatGPT in parallel, but lately I just default to Bard. To me it's a better model with much more accurate answers, while chatGPT just gives you bombastic words.
Codegen is an area that seems to have the best breakout performance compared to the big FMs. In hindsight it should be obvious given that openai created Codex.
However, I am skeptical that smaller and more focused models will do well for more general tasks. I’ve found gpt-4 to just be flatly excellent for tasks that involve emitting a custom DSL and customer-provided business context (the latter being something we cannot train for, not without immense expense).
Sure, open models often require much less hardware than chatGPT3.5 and offer ballpark (and constantly improving) performance and accuracy. ChatGPT3.5 scores 85 in ARC and the huggingface leaderboard is up to 77.
If you need chatGPT4-quality responses they aren't close yet, but it'll happen.
I was kind of curious because I’ve often felt the same about cheap Uber/Lyft rides. So I checked. I initially signed up with Lyft sometime in late 2015. The first airport ride I got from my house in Austin was in December of that year and they billed me $38 base fare and $1.50 airport surcharge. This was before tipping was available in the app as well.
I just took Lyft again to the airport earlier this month same location and I was billed $49 USD, and a $1.30 “Texas Surcharge”.
An inflation calculator says that $38 usd in 2015 is equivalent to $49 in 2023. Color me surprised. I thought the prices had significantly increased since I signed up but it looks like actually no they didn’t.
Trawling back through those old emails I do see constant “50% off all weekday rides” offers from the time I signed up until about March 2016, at which point they stopped. So there were some subsidized incentives when they were early in Austin but it looks like they stopped sometime in early 2016. So if the money train existed, it happened before that, at least in Austin.
> The premise of this is flawed. OpenAI is cheap because of has to be right now. They need to establish market dominance quickly, before competitors slide in. The winner of this horse race is not going to be the company with the best performing AI, it’s going to be the one who does the best job at creating an outstanding UX, ubiquitously presence, entrenching users, and building competitive moats
I hate this. Not the best will win, but the one with the biggest pockets. Nothing that helps with technofeudalism. Proper competition would be good.
I think the main difference is that OpenAI doesn't have network effect. AirBnB/Uber/any social network wins because you want to be where everyone else is. ChatGPT is great, but I can switch tomorrow to something else without any issues.
One mayor difference between Uber and Open AI is that fundamentally openeyes technology will get cheaper for them to run, hardware wise and software wise. They just need to hold their position long enough for variants of Moores law to kick in.
the hardware these LLMs run on isn't going to get 10x faster/cheaper in the span of a couple years, it will get incrementally faster at the cost of having to buy new expensive datacenter GPU hardware. It's not going to magically save them from losing money on serving requests.
This is true. The fields are green and lush with things that don't scale: freemium and low prices to get people hooked. The crack dealers will almost inevitably go full Unity when they are forced by their board of directors to turn a profit.
and shows how LLM technology has a lot more to offer than "ChatGPT". The real takeaway is that by training LLMs with real training data (even with a "less powerful" model) you can get an error rate more than 10x less than you get with the "zero shot" model of asking ChatGPT to answer a question for you the same way that Mickey Mouse asked the broom to clean up for him in Fantasia. The "few-shot" approach of supplying a few examples in the attention window was a little better but not much.
The problem isn't something that will go away with a more powerful model because the problem has a lot to do with the intrinsic fuzziness of language.
People who are waiting for an exponentially more expensive ChatGPT-5 to save them will be pushing a bubble around under a rug endlessly while the grinds who formulate well-defined problems and make training sets will actually cross the finish line.
Remember that Moore's Law is over in the sense that transistors are not getting cheaper generation after generation, that is why the NVIDIA 40xx series is such a disappointment to most people. LLMs have some possibility of getting cheaper from a software perspective as we understand how they work and hardware can be better optimized to make the most of those transistors, but the driving force of the semiconductor revolution is spent unless people find some entirely different way to build chips.
But... people really want to be like Mickey in Fantasia and hope the grinds are going to make magic for them.
If you look back just 2 years we had the grinds build those specialized models for QA, NER, Sentiment, Classification etc. and all their deep investment was rug-pulled by GPT-3 and then GPT-4.
You say that training datasets will win, but this is where OpenAI is currently have a big leg up: Everyone is dumping tons of real data into them, while the LocalLLM crowd is using GPT-4 to try to keep up.
> Remember that Moore's Law is over in the sense that transistors are not getting cheaper generation after generation, that is why the NVIDIA 40xx series is such a disappointment to most people.
I am unconvinced by the idea of trying to redefine Moore's Law to be about MSRP. The NVIDIA H100 has twice the FLOPS of the A100 on a smaller die. That's Moore's Law, full stop. When NVIDIA has useful competition in the AI space, they'll be forced to cut prices, as has reliably been the case for every semiconductor vendor for the last 60 years.
Agreed. We need Intel to get the software side of ARC together.
Or we need something like the unified ram or apple silicon. Apple have accidental stumbled into being the most competitive way to run llms with their 192gb studio.
Moore's law is irrelevant. Large language models are going to leave the digital paradigm behind altogether.
Neural nets don't need fully precise digital computing. Especially with quantization we're seeing that losing a bit of precision in the weights isn't impactful. Now that we're serving huge foundation models with static weights there's an enormous incentive to develop analog hardware to run them.
Mark my words, this will lead to a renaissance in analog computing, and in the future we will be shocked at the enormous waste of having run huge models on digital chips.
Just think, how many multiplications per second is the light refracting through your window right now clocking? More or less than is required to ChatGPT do you think? If only the crystals were configured correctly and the patterns of light coming through could be interpreted...
> Just think, how many multiplications per second is the light refracting through your window right now clocking?
It really depends where you draw the lines, because you could also say that one single transistor in my electrical CPU is doing a kerjillion calculations for all of the atoms and electrons involved.
Fresh approaches to AI hardware are emerging, like the Groq Chip which utilizes software-defined memory and networking without caches. To simplify reasoning about the chip, Groq makes it synchronous so the compiler can orchestrate data flows between memory and compute and design network flows between chips. Every run becomes deterministic, removing the need for benchmarking models since execution time can be precisely calculated during compilation. With these innovations, Groq achieved state-of-the-art speed of 240 tokens/s on 70B LLaMA.
Fascinating stuff - a synchronous distributed system allows treating 1000 chips as one, knowing exactly when data will arrive cycle-for-cycle and which network paths are open. The compiler can balance loads. No more indeterminism or complexity in optimizing performance (high compute utilization). A few basic operations suffice, with the compiler handling optimization, instead of 100 kernel variants of CONV for all shapes. Of course, it integrates with Pytorch and other frameworks.
In addition to what the other commenter said about Moores law, innovations like Flash Attention which reduced memory usage by over 10x and FA 2 which made huge leaps in compute efficiency show there is still a lot of room to improve the models and inference algorithms themselves. Even without compute we likely haven’t scratched the surface of efficient transformers.
You have to create a prompt/function that for a wide set of inputs, generates a token sequence that will perpetually expand in a manner that corresponds to an externally observed truth.
Way too often it feels like you have to shove a universal decoding sequence into a prompt.
“Talk your steps, list your clues, etc.”
Just trying to luck into a prompt that keeps decompressing the model/ generating the next token that ensures the next token is true.*
Basically - LLMs don’t reason , they regurgitate. If they have the right training data, and the right prompt, they can decompress the training data into something that can be validated as true
——-
* Also this has to be done in a limited context window, there is no long term memory, and there is no real underlying model of thought.
It's too early to say who is winning/will win, of course. But so far the UI and its accessibility have made a huge difference in how different gen AI models are being used.
For example, I struggle to see DALL-E winning over Firefly if Firefly is integrated into a very rich environment, whereas DALL-E is basically prompt UI only (while DALL-E 3 is a better model IMO).
Only Big Tech (Microsoft,Google,Facebook) can crawl the web at scale because they own the major content companies and they severly throttle the competition's crawlers, and sometimes outright block them. I'm not saying it's impossible to get around, but it is certainly very difficult, and you could be thrown in prison for violating the CFAA.
I'm not sure if training on a vast amount of content is really necessary in the sense that linguistic competence and knowledge can probably be separated to some extent. That is, the "ChatGPT" paradigm leads to systems that just confabulate and "makes shit up" and making something radically more accurate means going to something retrieval-based or knowledge graph-based.
In that case you might be able to get linguistic competence with a much smaller model that you end up training with a smaller, cleaner, and probably partially synthetic data set.
Yep, quality over quantity. The difference between 99.9% accurate and 99.999% accurate can be ridiculously valuable in so many real world applications where people would apply LLMs.
The improvements seem to be leveling off already. GPT-4 isn't really worth the extra price to me. It's not that much better.
What I would really want though is an uncensored LLM. OpenAI is basically unusable now, most of its replies are like "I'm only a dumb AI and my lawyers don't want me to answer your question". Yes I work in cyber. But it's pretty insane now.
I haven't played with the self-hosted LLMs at all yet, but back when Stable Diffusion was brand new I had a ton of fun creating images that lawyers wouldn't want you to create. ("Abraham Lincoln and Donald Trump riding a battle elephant."
It's just so much funnier with living people!) I imagine that Llama-2 and friends offer a similar experience.
I use Uber in SF all the time, and while it's absolutely more expensive than it was during those go-go years, it's actually an even better experience than it was before (specifically on how fast it is to get a driver near you).
It’s a funny point. After only taking Ubers and Lyfts all my adult life (young), I have recently switched to cabs because they are cheaper in my city (and I tell everyone I know to do the same). They have dominance now but if I had to guess whether cabs or Uber will still exist in 100 years… I know which I would bet on.
Cabs are an idea. Uber is a company. I can't think of any company that I'd predict to last 100 years. Even Google and Apple probably won't last 100 years. I honestly wouldn't even bet on cars being a dominant form of transportation in 100 years.
> Even Google and Apple probably won't last 100 years.
I'm not sure, there are 100+ yo companies like Kodak, Nikon, IBM, Panasonic, GSK, Merck, etc. With that in mind, it's not hard for me to imagine that some of the tech giants will also have a presence in hundred years, maybe not as dominant as today, but they could survive.
Never underestimate the allure of status symbols. For the longest time I thought the same thing of the iPhone. Why would people spend so much when they can get Android for considerably cheaper ? It's the status stupid. Of course, now they are ubiquitious and no longer really much of a status symbol, but if you go back a decade, you'll know what I'm talking about. The same will happen with VR as the tech improves and prices drop a bit.
They aren't that ubiquitous in the UK/Europe or Asia. I've never used an iPhone. I'm not sure of the market share figures but I'd guess it's about 50/50. Similarly with Mac laptops you get the impression that everyone buys apple, but it's really not the case and it's probably more like 75% windows at a guess.
Uber's surge pricing is dumb in places where they're competing with traditional taxi ranks. I usually pay $35-$45 for an Uber to/from the airport, but if a couple of flights land at once, suddenly Uber is $85+ and the taxis are only $55.
The point of surge pricing is to rebalance supply and demand when demand for rides outstrips supply of drivers, whether by attracting more drivers or by discouraging price sensitive riders.
Some riders at the airport will strongly prefer Uber, for whatever reason, so they're less price sensitive than you. Because you're happy to substitute an Uber for a taxi, you decrease the demand as a response to the surge pricing, preserving the limited supply of drivers for the riders who really want them (or are at least are price insensitive enough to pay for that privilege).
It's not clear there's any more of a long-term market for this any more than there is for compilers. I'm kind of scratching my head trying to figure out where this assumption there is one comes from.
I think this is under appreciated. I run a "talk-to-your-files" website with 5ish K MRR and a pretty generous free tier. My OpenAI costs have not exceeded $200 / mo. People talk about using smaller, cheaper models but unless you have strong data security requirements you're burdening yourself with serious maintenance work and using objectively worse models to save pennies. This doesn't even consider OpenAI continuously lowering their prices.
I've talked to a good amount of businesses and 90% of custom use cases would also have negligible AI costs. In my opinion, unless you're in a super regulated industry or doing genuinely cutting edge stuff, you should probably just be using the best that's available (OpenAI).
The bleeding obvious is that OpenAI is doing what most tech companies for the last 20 years have done. Offer the product for dirt cheap to kill off competition, then extract as much value from your users as possible by either mining data or hiking the price.
I don’t understand how people are surprised by this anymore.
So yeah, it’s the best option right now, when the company is burning through cash, but they’re planning on getting that money back from you eventually.
> Offer the product for dirt cheap to kill off competition, then extract as much value from your users as possible by either mining data or hiking the price.
Genuine question, what are some examples of companies in that "hiking the price" camp?
I can think of tons of tech companies that sold or sell stuff at a loss for growth, but struggling to find examples where the companies then are able to turn dominant market share into higher prices.
To be clear, I'm definitely not implying they are not out there, just looking for examples.
Uber is probably the biggest pure example. When I was in uni when they first spread, Uber's entire business model was flood the market with hilariously low prices and steep discounts. People overnight started using them like crazy. They were practically giving away their product. Now, they're as expensive, if not sometimes more expensive, than any other taxi or ridesharing service in my area.
One thing I'll add is that it's not always that this ends with higher prices in an absolute sense, but that the tech company is able to essentially cut the knees out of their competitors until they're a shell of their former selves. Then when the prices go "up", they're in a way a return to the "norm", only they have a larger and dominant market share because of their crazy pricing in the early stages.
Yeah I kinda wonder why people even use them anymore. I've long gone back to real taxis because their cheaper and I don't have to book them, I can just grab one on the street. Much more efficient than waiting for slowly watching my driver edge his way to me from 3 kilometers away.
The number of places where you can reliably walk out onto the street and hail a taxi is pretty small. Everywhere else, the relevant decision is whether calling a dispatcher or using a taxi company's app is faster/cheaper/more reliable than Uber/Lyft.
Here in Barcelona it works like that. 2-3 free taxis pass my place per minute or so. Just wave and you've got one. They're also very cheap.
But yeah the movie thing is not at all unrealistic here. Though at night I usually wave the torch on my phone to attract attention because they don't always see a raised hand.
Here in Barcelona it's great, I really never have to call one. It's always faster just waiting.
At the busiest time it's a bit harder but at that time the ride-sharing services are also overloaded so it's still faster to just wait for a green light (free taxi). We don't get Uber as far as I know but we do have a similar thing called Cabify. But it's useless if you need something quick and they've put the prices up too much. I now only use them for scheduled stuff like airport dropoffs.
I think the cab companies here have apps. I don't know how good they are, though I've been meaning to find out.
There's no way I want to ever wait on hold for a dispatcher again, or be mystified as to when my cab is arriving (this always seems to involve standing outside in the snow or pouring rain).
If the cab companies have apps comparable to Uber and Lyft, sure, I'll give them a shot.
"Uber/Airbnb is expensive now" is an entirely american phenomena. In Europe and Latin America, both are still cheaper than alternative (comparable hotels and yellow cabs). Most likely in other parts of the world too.
I think Google fits more in the "extract as much value from your users" bucket more than the price hiking one.
Uber/Lyft did raise prices, but interestingly (at least to me) is that if the strategy was the smother the competition with low prices, it didn't seem to work.
Unity is interesting too, though I'm not sure it would make a good poster child for this playbook. It raised prices but seems to be suffering for it.
Everyone's in "show your profits" mode, as befitting a mature market with smaller growth potential relative to the last few decades. Some of what we're talking about here is just what happens when a company tries to use investment capital to build a moat but fails (the Uber/Lyft issue you mentioned -- there's no obvious moat to ride-hailing, as with many software and app domains). My theory is that, going forward, we're going to see a much lower ceiling on revenue coupled with lots of competition in the market as VC investments cool off and companies can't spend their way into ephemeral market dominance.
As for Unity, they're certainly dealing with a bunch of underperforming PE and IPO-enabled M&A on the one hand (really should have considered that AppLovin offer, folks), but also just a failure to extract reasonable income from their flagship product on the other; I don't think their problems come from raising prices per se (game devs pay for a lot already, an engine fee is nothing new to them) as much as how they chose to do it and the original pricing model they tried to force on their clients. What they chose to do and the way they handled it wasn't just bad, it was "HBS case study bad."
OpenAI doesn’t own transformers, they didn’t even invent them. They just have the best one at this particular time. They have no moat.
At some point, someone else will make a competitive model, if it’s Facebook then it might even be open source, and the industry will see price competition downwards.
This argument has always felt to me like saying “google has no moat in search, they just happen to currently have the best page rank. Nothing is stopping yahoo from creating a better one”
Google has a flywheel where its dominant position in search results in more users, whose data refines the search algorithm over time. The question is whether OpenAI has a similar thing going, or whether they just have done the best job of training a model against a static dataset so far. If they're able to incorporate customer usage to improve their models, that's a moat against competitors. If not, it's just a battle between groups of researchers and server farms to see who is best this week or next.
well, this assumes the chat (where the ratings are given) is what people are using and paying for. I think most businesses pay for some combination of API access and specific use cases like code generation (at least, thats what I pay for) that don't really impact RLHF data. General search for consumers is likely to schism since chatGPT isn't especially different from Bard or Edge's AI assistant or the myriad of other product surface areas that can add it.
Yes the chat interactions don’t help with capability (what it can do) they only help with alignment (what it should do). And you don’t need a lot to get good results. Crowdsourcing will be enough.
My understanding is that Google search is a lot more than just Pagerank (Map reduce for example). They had lots of heuristics, data, machine learning before anyone else etc.
Whereas the underlying algorithms behind all these GPTs so far are broadly same. Yes, OpenAI does probably have better data, model finetuning and other engineering techniques now, but I don't feel it's anything special that'll allow themselves to differentiate themselves from competitors in the long run.
(If the data collected from a current LLM user in improving model proves very valuable, that's different. I personally think that's not the case now but who knows).
Google's moat in search has always been systems and data center infrastructure. You can create your own search ranking algorithm, but you can't crawl the web and serve search results to billions of worldwide users in a few milliseconds.
I think it's also more than just systems and data centers. it is also difficult to scrape the web the way Google does without using Google IP addresses. a lot of the web now will block you or severely throttle you if you aren't one of the well know engines that they want indexing them.
> You can create your own search ranking algorithm, but you can't crawl the web and serve search results to billions of worldwide users in a few milliseconds.
rephrasing this for LLMs instead of search: "you can create your own model architecture/training method, but you can't crawl the web and serve language query results to billions of worldwide users in a few milliseconds."
that checks out, right? Google/search == """Open"""AI/LLMs still seems like a decent metaphor to me.
> They just have the best one at this particular time
That is the moat. For developer platforms, it's all about building mindshare and adoption. The more people who know how to use OpenAI, the stronger OpenAI's position on the market. It doesn't matter if there's equivalent or slightly better models unless they start to fall significantly behind (and they're currently well in the lead).
I agree and what you say isn’t incompatible with what I said. But the point of the OP is “why even bother using other models/open source models when OpenAI is cheaper”? Well take away the competition and see what happens.
The difference between openai and next best model seems to be increasing and not decreasing. Maybe Google's gemini could be competitive, but I don't believe open source will match OpenAI's capability ever.
Also OpenAI gets significant discount on compute due to favourable deals from Nvidia and Microsoft. And they could design their server better for their homogenous needs. They are already working on AI chip.
Did you even read my comment? I specifically highlighted why openai might be cheaper in long run. One is they are already working on a chip that would be better just for running a single model.
They are not going to beat NVIDIA. Making a chip for one model is not really a good idea, there are more efficiency gains to be made by improving the model and using a general purpose AI chip, rather than keeping the model architecture static and building a special purpose chip for it. Regardless, whatever OpenAI can do, NVIDIA can do better, and on more recent process nodes because they have the volume.
No, because NVidia has to work for all the models. Nvidia has other constraints that they need to have for users like instructions, security etc. which openai doesn't have.
e.g. As they have a fixed model which they know they would get billions of request to, they could even work with analogue chip which is significantly cheaper and faster for inference. [1] could achieve 10-100x flops/watt for fixed models compared to nvidia for their first gen chip.
However, what will be interesting is if the price of delivering ChatGPT-style experiences drops as the industry matures/advances, and their pricing moat erodes.
Unlike Uber - where prices are dictated by factors unlikely to move significantly (Labour / Vehicles / Fuel / etc), The LLM space doesn't have these types of overheads.
The real problem for the OP is not prices going up but open AI making something like the AI equivalent of Excel (an AI swiss army knife) such that middle companies are not needed for chatbots or talk to your pdf type apps.
If they are confidential they probably shouldn’t be uploaded to any website no matter if it calls out to OpenAI or does all the processing on their own servers.
It’s simple really, lots of businesses share data with 3rd parties to enable various services. OpenAI provides a service contract claiming they do not mine/reshare/etc the data shared via their API. As the SaaS provider, you just need to call it out your user service agreement.
Ifnyoure a big company dealing with Microsoft , you're not just going to "pay as you go". You'll have a sales rep, dedicated support, custom contracts and special prices. In the contract, Microsoft just needs to promise they won't share your data ever: if they do, the company can sue.
Microsoft is a behemoth at dealing with enterprises and has been doing it for decades. Even old school enterprises are OK with uploading data to Azure.
I completely agree — open-source models and custom deployments just can't compete with the cost and efficiency here. The only exception here is if open-source models can get way smaller and faster than they are now while maintaining existing quality. That will make private deployments and custom fine-tuning way more likely.
Or FOSS models remain the same size and speed, but hardware for running them, especially locally, steadily improves till the AI is "good enough" for a large enough segment of the market.
5k monthly revenue (or even 100k) isn't big enough for OpenAI to care. It's barely big enough for a competitor to care.
I suspect that OpenAI doesn't have the bandwidth to build most uses of ai and so is in the bill gates platform land: the ecosystem should be pocketing more money than the owner of the platform is.
Was this an accident? If it's a complete thought intending to say "for now" - eh, I would bet on it for quite a while. They don't care about your little use case.
It's possible gpt next or next++ ends up making whatever work you do trivial, but it's likely you'll still have customers
Are any OpenAI powered flows available to public, logged-out user traffic? I’ve worried (maybe irrationally) about doing this in a personal project and then dealing with malicious actors and getting stuck with a big bill.
Are you using 3.5 turbo? Its always funny when i test a new fun chatbot or something and see my API usage 10x just from a single GPT 4 API call. Although i only usually have a $2 bill every month from openAI.
I think OpenAI may eventually have to go upmarket, as basic "good enough" AI becomes increasingly viable and cheap/free on consumer level devices, supplied by FOSS models and apps.
Apple may be leading the way here, with Apple Silicon prioritizing AI processing and built into all their devices. These capabilities are free (or at least don't require an extra sub), and just used to sell more hardware.
OpenAI is clearly going to compete in that market with its upcoming smart phone or device [1]. But what revenue model can OpenAI use to compete with Apple's and not get undercut by it? I suppose hardware + free GPT3.5, and optional subscription to GPT4 (or whatever their highest end version is). Maybe that will be competitive.
I also wonder what mobile OS OpenAI will choose. Probably not Android, otherwise they would have partnered with Google. A revamped and updated Microsoft mobile OS maybe, given their MS partnership? Or something new and bespoke? I could imagine Johnny Ive demanding something new, purpose-built, and designed from scratch for a new AI-oriented UI/UX paradigm.
A market for increasingly sophisticated AI that can only be done in huge GPU datacenters will exist, and that's probably where the margins will be for a long time. I think that's what OpenAI, Microsoft, Google, and the others will be increasingly competing for.
> OpenAI is clearly going to compete in that market with its upcoming phone.
Excuse me, I'm not an english native, you mean like a smart phone? Or do you mean some sort of other new business direction? Where did you get the info thtat they're planning to launch a phone?
I believe there has been rumors that OpenAI was working with Jony Ive to create a wearable device, but it was unclear wether it would be a phone or something else.
OpenAI will make its money on enterprise deals for finetuning their latest and greatest on corporate data. They are already having this big enterprise deals and I think that's where the money is.
They will keep pricing the off-the-shelf AI at-cost to keep competitors at bay.
As for competitors, Anthropic is the most similar to OpenAI both in capabilities and business model. I am not sure what Google is up to, since historically their focus has been in using AI to enhance their products rather than making it a product. The "dark horses" here are Stability and Mistral which both are OSS and European and will try to make that their edge as they give the models for _free_ but to institutional clients that are more sensitive to the models being used and where is the data being handled.
Amazon and Apple are probably catching up. Apple likely thinks that all of this just makes their own hardware more attractive. It's not clear to me what Meta's end goal is.
I actually expect open source models will be small _but larger than they are today_ because phones and laptops will get dedicated chips and software for running eg the best open source (weights?) model
So eventually you could be running decent sized models locally (iOS could even provide an API with fine tuning etc)
In my experience apple's ML on iphones is seamless. Tap and hold on your dog in a picture and it'll cut out the background, your photos are all sorted automatically including by person (and I think by pet).
OCR is seamless - you just select text in images as if it was real text.
I totally understand these aren't comparable to LLMs - rumor has it apple is working on an llm - if their execution is anything like their current ML execution it'll be glorious.
(Siri objectively sucks although I'm not sure it's fair to compare siri to an LLM as AFAIK siri does not do text prediction but is instead a traditional "manually crafted workflow" type of thing that just uses S2T to navigate)
Does android even have native OCR? Last I checked everything required an OCR app of varying quality (including windows/linux).
On ios/macos you can literally just click on a picture and select the text in it as if it wasn't a picture. I know for sure on iOS you don't even open an app to do it, just any picture you can select it.
Last I checked the Opensource OCR tools were decent but behind the closed source stuff as well.
Not sure about other Android OEMs but OCR has been built in to Samsung Gallery (equivalent to Photos app on iPhones) for a while. Works the same way - long press on text in an image to select it as text. Haven't had any issues with it.
I'm not saying they will on the high-end, but maybe on the low end. Apple's strategy is to embed local AI in all their devices. Local AI will never be as capable as AI running in massive GPU datacenters, but if it can get to a point that it's "good enough" for most average users, that may be enough for Apple to undercut the low end of the market.
> Local AI will never be as capable as AI running in massive GPU datacenters
I'm not sure this is true, even in the short term. For some things yes, that's definitely true. But for other things that are real-time or near real-time where network latency would be unacceptable, we're already there. For example, Google's Pixel 8 launch includes real-time audio processing/enhancing which is made possible by their new Tensor chip.
I'm no fan of Apple, but I think they're on the right path with local AI. It may even be possible that the tendency of other device makers to put AI in the cloud might give Apple a much better user experience, unless Google can start thinking local-first which kind of goes against their grain.
> But for other things that are real-time or near real-time where network latency would be unacceptable, we're already there.
Agreed. Something else I wonder is if local AI in mobile devices might be better able to learn from its real-time interactions with the physical world than datacenter-based AI.
It's walking around in the world with a human with all its various sensors recording in real-time (unless disabled) - mic, camera, GPS/location, LiDAR, barometer, gyro, accelerometer, proximity, ambient light, etc. Then the human uses it to interact with the world too in various ways.
All that data can of course be quickly sent to a datacenter too, and integrated into the core system there, so maybe not. But I'm curious about this difference and wonder what advantages local AI might eventually confer.
This is a fascinating thought! It could send all the data to the cloud, but all those sensors going all the time would be a lot of constant data to send, and would use a lot of mobile data which would be unacceptable to many people (including probably the mobile networks). If it's running locally though, the data could we quickly analyzed and probably deleted, avoiding long term storage issues. There's got to be a lot of interesting things you could do with that kind of data
> I think OpenAI may eventually have to go upmarket
Let me introduce you to the VC business model. Get comical amounts of money. Charge peanuts for an initial product. Build a moat once you trap enough businesses inside it. Jack up prices.
If you have the new iPhone with the action button, you can set a shortcut to ask questions of ChatGPT. It’s not as fluid as Siri, and can’t control anything, but still much more useful.
Nobody is switching away from Apple over this, so ultimately Tim is doing his job. Under his watch Apple has become the defacto choice for entire generations. Between vendor-lockin/walled gardens and societal/cultural pressures (don't want to be a green bubble!), they have one of the stickiest user bases there are.
True, but that doesn’t mean we shouldn’t complain.
My hope is that the upcoming eu rulings allow competition here. Ie force Apple to get out of the way of making their hardware better with better software.
I think it's shitty and has no excuse, but the parent is right. Apple has no incentive to respond to their users since all roads lead to first-party Rome. It's why stuff like the Digital Market Act is more needed than some people claim.
You know what would get Apple to fix this? Forced competition. You know what Apple spends their trillions preventing?
agreed. I'm not trying to "excuse" shitty work, merely observing the incentives/pressures on them. We can complain about it all we want, but we won't understand it until we understand the incentives.
I think that's a bit glib. Right now it's true that nobody's going to leave the Apple camp because of Siri, but it's also true that nobody's going to leave the Android camp because of it. That state of affairs could change.
It's not a time for complacency, if only because driver assistance is becoming more important every day. There are good, sound business reasons to put competent people on the Siri team.
I think the weird thing about this is that it's completely true right now but in X months it may be totally outdated advice.
For example, efforts like OpenMOE https://github.com/XueFuzhao/OpenMoE or similar will probably eventually lead to very competitive performance and cost-effectiveness for open source models. At least in terms of competing with GPT-3.5 for many applications.
I also believe that within say 1-3 years there will be a different type of training approach that does not require such large datasets or manual human feedback.
> I also believe that within say 1-3 years there will be a different type of training approach that does not require such large datasets or manual human feedback
This makes a lot of sense. A small model that “knows” enough English and a couple of programming languages should be enough for it to replace something like copilot, or use plug-ins or do RAG on a substantially larger dataset
The issue right now is that to get a model that can do those things, the current algorithms still need massive amounts of data, way more than what the final user needs
> I also believe that within say 1-3 years there will be a different type of training approach that does not require such large datasets or manual human feedback.
I guess if we ignore pretraining, don't sample-efficient fine-tuning on carefully curated instruction datasets sort of achieve this? LIMA and OpenOrca show some really promising results to date.
distilbert was trained from Bert. there might be an angle using another model to train the model especially if your trying to get something to run locally.
Yep. Building a project that needs some LLMs. I'm very much of the self-hosting mindset so will try DIY, but it's very obviously the wrong choice by any reasonable metric.
OpenAI will murder my solution by quality, by availability, by reliability and by scalability...all for the price of a coffee.
It's a personal project though & partly intended for learning purposes so there is scope for accepting trainwreck level tradeoffs.
No idea how commercial projects are justifying this though.
Their data retention policy on their APIs is 30 days, and it's not used for training [0]. In addition, qualifying use cases (likely the ones you mentioned) qualify for zero data retention for most endpoints.
In sensitive cases you do not think about the normal policy, you think about the worst case. You just can't afford a leak. Your local installation may be much better protected than a public service, by technology and by policy.
For years people have essentially made a living off FUD like "ignore the literal legal agreement and imagine all the worst case scenarios!!!" to justify absolutely farcical on-premise deployments of a lot of software, but AI is starting to ruin the grift.
There are some cases where you really can't afford to send Microsoft data for their OpenAI offering... but there are a lot more where some figurehead solidified their power by insisting the company build less secure versions of public offerings instead of letting their "gold" go to a 3rd party provider.
As AI starts to appear as a competitive advantage, and the SOTA of self-hosted lagging so ridiculously far behind, you're seeing that work less and less. Take Harvey.ai for example: it's a frankly non-functional product and still manages to spook top law firms with tech policies that have been entrenched for decades into paying money despite being OpenAI based on the simple chance they might get outcompeted otherwise.
Gah, this is just not how it works. You are probably right that e.g. patient information, private conversations, proprietary code, etc would be safe with OpenAI. But it's not the on-prem team that needs to convince the rest of the organization to keep things on prem. Quite the opposite -- every single tech person would love to make our data someone else's problem (and get a big career boost from dealing with cloud tech instead of the dead-end that is local sysadmin!).
But you just can't. You cannot trust the scrappy startup OpenAI. You can't even trust Microsoft's normal cloud offering, because the people who actually give a fuck about the risk NEED to have granular detail of what data, readable by whom, is stored exactly where and for how long, and how can you make sure, and how do you know that access is scoped to the absolute minimum number of people, and is there a paper trail for that?
For these "figureheads": the buck, stopping, here, etc.
> because the people who actually give a fuck about the risk NEED to have granular detail of what data, readable by whom, is stored exactly where and for how long, and how can you make sure, and how do you know that access is scoped to the absolute minimum number of people, and is there a paper trail for that?
You realize that Microsoft is a publicly traded company that has multiple privacy certifications? They have to subject to detailed data ownership and consumption audits. They most definitely have data ownership and retention logs and you can request for copies of their certification audits to understand how they log/track this. I think the parent is being too optimistic but your answer is so comically simplistic it's silly. I highly suggest you read about the world of HIPAA, PCI, and FedRAMP instead of just thinking "omg the data".
Absolutely. I am well aware of this, as are most tech people, that's what I'm saying. It's not us that are trying to convince our orgs to build a rack in the basement "to be more secure" just because we want to hear the fans running and see the lights blinking.
It's the lawyers that you need to convince. Good luck convincing any bigco lawyer that your company's data is safe on openAI because their legal agreement says "we don't train on API calls."
Vendor risk management is a thing, and plenty of companies that work with medical data or legal data or financial data or sensitive government data and are, in fact, able to store that data with their vendors. This is a thing that happens all the time by literally all of the companies.
This nonsense that you can't trust anyone with your data is completely unfounded
You're not saying anything counter to what I said.
> You cannot trust the scrappy startup OpenAI
Not saying you do: Azure has a dedicated capacity driven GPT-4/3.5 offering that you can stick in your VPC with everything from PCI to HITRUST certs. These are the things that come out if you actually care about delivering solutions vs jumping to deliver the right sounding words for the figureheads like "We'd never trust those scrappy OpenAI guys!!!!"
> Quite the opposite -- every single tech person would love to make our data someone else's problem (and get a big career boost from dealing with cloud tech instead of the dead-end that is local sysadmin!).
You're attracting the least equipped people who tumbled into what you just admitted is a dead end trajectory, usually paying below market rates as a result, and then expecting them to outperform the people paying the most money for competent security outlays with much bigger fish (Azure is working with teams that need FedRAMP, DoD certs, HIPPA compliance, and much more)
The end result is that you end up with a poorly maintained leak sieve of an infrastructure in which Azure would likely be the most secure component you have to lean on in your entire organization.
You say:
> because the people who actually give a fuck about the risk NEED to have granular detail of what data, readable by whom, is stored exactly where and for how long, and how can you make sure, and how do you know that access is scoped to the absolute minimum number of people, and is there a paper trail for that
They don't care about risk, they care about flawed perceptions of risk that don't align with reality. These are the same companies that get pwned for years through some basic social engineering, and all that they ever have to show for it is audit logs that show who ac... ah wait no one ever actually checked the logs and it turns out they're useless because subsystem X Y and Z aren't even connected to it.
It's “not be used to train or improve OpenAI models”, doesn't mean it's not used to get knowledge about your prompts, your business use case. In fact, the wording of the policy is lose enough they could train a policy model on it (just not the LLM itself).
They have a soc 2 so literally an (external) auditor has looked at their data retention policies and if your business is a customer you should request access to the report
but does that matter legally for health, finance and legal sectors? I am not familiar with the laws themselves but I worked in finance for a long time and the internal rules where that sensitive data cannot move off premises no matter what the external party promised/had certified.
Yes there are certifications for each of those sectors. Finance has pci compliance, health has HIPAA.
For legal issues it's a bit more nuanced (eg new york state has guidelines about best practices, but they're honestly fairly sensible and would probably allow SOC 2 or equivalent)
The best thing anybody can do with your data is not store it for very long. Beyond that, they should take sensible measures, like encrypt it at rest, have policies restricting access, etc
For basic usage, you can get away with a small graphics card or no graphics card at all (albeit it will be very slow).
The general rule of thumb is, take a model size (7B, 13B, 34B, 70B) and multiply that by 0.5 or 0.625. If that number is smaller than the combined amount of system RAM and VRAM in your system, you can run the model at 4-bit and 5-bit quantization respectively.
A jacked up PC can do really well and there is much fun to be had there.
...but you'd struggle to get close to even GPT 3.5 let alone 4 for generic tasks.
For custom tunes...yeah sure custom rolls will beat generic openAI. But that's a bit like pitting customed tuned cars against street legal manufacturer cars. It's an apple to oranges comparison
A lot of tools for constraint, creativity, and related rely on manipulating the entire log probability distribution. OpenAI won’t expose this information and is therefor shockingly uncompetitive on things like poetry generation
Maybe this has improved, but a few months ago, OpenAI p99 latency was much worse than a self hosted solution, which would be a problem in certain cases.
They basically are MS by now. Everyone at Microsoft I work with literally calls it an 'aquisition'. Even though they only own a share. It's pretty clear what their plans are.
> Microsoft will reportedly get a 75% share of OpenAI's profits until it makes back the money on its investment, after which the company would assume a 49% stake in OpenAI.
49% isn't _just_ a share, it's a significant portion of the company.
Of course, and it's also not _just_ a share, which is the comment I was responding to. 49% of a company's outstanding public shares is a significant portion. It's 2% away from controlling, and a fun merger event.
> Or are they just operating at a massive loss to kill off other competition?
Bingo.
> the deal with Microsoft for cloud services making it cheap?
It should make it cheaper, but it takes time and engineers to migrate work load from AWS (which is reasonably adept at scaling) to azure, which is not.
I think they run k8s in something like 4k groups of nodes, which is spectacular, because k8s isn't really designed to do that. Running it at that scale is challenging because the traffic required to coordinate is massive (well it was last time I looked into it.)
Probably the first two, plus first-mover brand recognition. Millions of $20 monthly subs for GPT4 add up.
They might also be operating at a loss afaik, but I suspect they're one of the few that can break even just based on scale, brand recognition, and economics.
I haven’t heard any evidence that they have millions of Plus subscribers.
I’ve seen 100 to 200 million active users, but nothing about paid users from them. The surveys I saw when doing a quick google search reported much less than 1% of users paying.
There’s also just the benefits of being in market, at scale and being exposed to the full problem space of serving and maintaining services that use these models. It’s one thing to train and release and OSS model, it’s another to put it into production and run all the ops around it.
Probably some combination of all the above! I think 1 and 2 are interlinked though — the cheaper they can be, the more they build that moat. They might be eating the cost on these APIs too, but unlike the Uber/Lyft war, it'll be way stickier.
I think it's mostly the scale. Once you have a consistent user base and tons of GPUs, batching inference/training across your cluster allows you to process requests much faster and for a lower marginal cost.
When the ChatGPT API was released 7 months ago, I posted a controversial blog post that the API was so cheap, it made other text-generating AI obsolete: https://news.ycombinator.com/item?id=35110998
7 months later, nothing's changed surprisingly. Even open-source models are trickier to get to be more cost-effective despite the many inference optimizations since. Anthropic Claude is closer to price and quality effectiveness now, but there's no reason to switch.
This article might have a point about the data flywheel, but it's lost in the confused economics in the second half. Why would we expect to hire one engineer per p4.24x instance? Why do we think OpenAI needs a whole p4.24x to run fine tuning? Why do we ignore the higher costs on the inference side for fine-tuned models? Why do we think OpenAI spends _any_ money on racking-and-stacking GPUs rather than just take them at (hyperscaler) cost from Azure?
It was roughly $150 for me to build a small dataset with a few thousand quarter-page chunks of text for a data project using GPT4. GPT3 is substantially cheaper but it would hallucinate 30% of the time; honestly a nice fine-tune of LlaMA is on-par with GPT3 and after the sunk cost all it costs is a few $0.01 in electricity to generate the same sized dataset.
Even GPT3.5 can be much more expensive. In some specific tasks, a finetuned 7B llama can work as well as GPT3.5.
You can rent a 3090 at $0.20/h on vast.ai, or $0.40/h on runpod. Using VLLM at 400t/s that's 1440000 generated tokens. Generating that amount of tokens with GPT3.5 would be $2.88.
Unless your use case isn't in English in which case LLama is as useful as a one-legged man in an ass-kicking contest.
LLama models only really shine for things that GPTs would refuse to even consider because of corporate RLHF, and if you need to keep your data local I suppose. For the rest they're second rate at best.
Great HN thread! I think it is close to impossible to predict where the market for AI and LLMs in particular will be in two years. The major players are making their best bets. For the value of frontier LLMs, the article by Blaise Agüera y Arcas and Peter Norvig making the case that we might already have AGI: https://www.noemamag.com/artificial-general-intelligence-is-...
I use both OpenAI and Anthropic (I use my own Common Lisp and Racket Scheme client libraries that I implement with similar APIs) and I was amazed last night how well self hosted LLama-based and Mistral LLMs run on a 32G Mac Mini that was delivered to me late yesterday afternoon. This is not expensive hardware. And tools like llama.cpp will keep getting more efficient, etc.
We are going to see unimaginable (at least to me) advances in AI and AGI, at all levels of the tech food chain. And these advances will occur quickly. Place your bets, and remain flexible!
The value I get for that $20/month is astonishing. It's by far the best discretionary subscription I've ever had.
That scares me. I hate moats and actively want out. Running the uncensored 70B parameter Llama 2 model on my MacBook is great, but it's just not a competitive enough general intelligence to entirely substitute for GPT-4 yet. I think our community will get there, but the surrounding water is deepening, and I'm nervous...
tentatively called “Claude-Next” — that is 10 times more capable than today’s most powerful AI, according to a 2023 investor deck TechCrunch obtained earlier this year.
this is the thing that scare me.
when do these models stop getting smarter? or at least slow down?
I have been struggling to understand the use case. Would you still be using it to ask questions and receive answers, you get the benefit of not having to go to their website and being able to use the GPT-4 model, or is there another use case for the API to return specific data?
Your comment reminds me a lot of how I felt when I was given my first email address in 1991 and didn't have anyone to send email to. Little did we all know how important email would become. =)
I'm forming a new company. I was asked for a single sentence to describe the company as well as a longer paragraph. I wrote the single sentence and then fed it into chat.openai.com ChatGPT 3.5 (before I paid) and asked for a longer version.
What it came up with was a bit too heavy on the adjectives (the tone was a bit too much marketing), but wow, it nailed it in terms general concepts about what I'm building. This was all without knowing anything about the business. I can easily edit it back down to an easier tone for people to digest.
When I fed the same input into ChatGPT4, the results were 100x better.
I also like to use it to summarize my thoughts. I can write down a bunch of unfiltered gibberish about how I'm feeling today and what I'm thinking. Feed it in and it'll give me a great summary of what I just said in a non-threatening and neutral tone. Having gone through a lot of professional therapy (and even being married to one in a past life), it feels a lot like that to me... except a lot less expensive.
Unless you're an extremely heavy user, it's cheaper to just use the API. I've been tempted to do that, but OpenAI doesn't have a free trial for me to see the quality of GPT-4 first.
I'm astonished how often this comes up and also how wrong it is.
The cost of the GPT-4 API is ballpark around $0.05 / 1000 tokens. If you want to include a rolling context window which you basically HAVE TO DO if you want to maintain a persistent conversation, you will easily meet or exceed 1000+ tokens.
ChatGPT Pro gives you 50 GPT-4 queries every three hours. If you're using it all day you might average about 100 daily queries. Using a dedicated GPT4 API would run you approximately five dollars a day for the same thing - that's $150 a month as opposed to flat cost of $20.
I think you're being really wasteful with that kind of context window which is why it's a bit apples and oranges. I keep stats on this, my average message is around 50 tokens, the average individual response is around 200 tokens, and my average conversion length is 1.2 (only counting my messages). 50/convos/day * 30 days is $21 and I don't come close to that usage. Hell most of the time I don't even turn on GPT-4 because 3.5-instruct is plenty good.
Right but the API is so unbelievably cheap in comparison. I couldn't spend $20 if I tried using it constantly. My bill is a few bucks every month and you don't have to deal with "As a large language model…"
I don't know if it's entirely uncensored but it doesn't have the moderation API in front of it or the other manual tweaks OpenAI added to ChatGPT so largely it will just do whatever you ask it to without paragraphs of disclaimers.
I will abandon ChatGPT, Claude etc... the millisecond that Siri has the same capabilities if for no other reason than it will take me two extra steps to use it and pay for it. Every voice assistant will unquestionably and inevitably implement GPTs eventually. So it's simply back to the question of who has the distribution of chatbots. I trust Apple 1000x more than basically any other tech company (not saying much) so I'm just waiting for that now.
OpenAI can't "win" (whatever that means) long term unless they figure out how to collect user data inputs (text based questions) and reward vectors (Thumbs up and down) persistently, at scale, in the extreme long term - which means building something people rely on all day everyday.
As far as I can tell they have no unique distribution avenues to do this today outside of copilot.
Meanwhile, Apple, Amazon and Alphabet are certainly bringing GPT capabilities to Siri/Alexa/Whatever Google's voice thing is called, albeit slower and more carefully, but they have no need to rush at all here.
I'll bet Microsoft will slowly absorb OpenAI given their investment position and integrate it into Bing or something and fade away.
AWS is extremely overpriced for nearly every service. I don’t know why anyone else outside of startups with VC money to burn or bigcos that need the “no one ever got fired for buying IBM” guarantee would use them. You’re better off with Lambdalabs or others which charge only $1.1/h per A100.
Also that is a 8xA100 system as others have noted, but it is the 40GB one which can be found on eBay for as low as $3k if you go with the SXM4 one (although the price of supporting components may vary) or $5k for the PCI-e version.
It's absolutely worth the money when you look at the whole picture. Also lambda labs never has availability. I actually can schedule a distributed cluster on AWS.
> It's absolutely worth the money when you look at the whole picture.
That highly depends on many things. If you run a business with a relatively steady load that doesn't need to scale quickly multiple times per day, AWS is definitely not for you. Take Let's Encrypt[1] as an example. Just because cloud is the hype doesn't mean it's always worth it.
Edit: Or a personal experience: I had a customer that insisted on building their website on AWS. They weren't expecting high traffic loads and didn't need high availability, so I suggested to just use a VPS for $50 a month. They wanted to go the AWS route. Now their website is super scalable with all the cool buzzwords and it costs them $400 a month to run. Great! And in addition, the whole setup is way more complex to maintain since it's built on AWS instead of just a simple website with a database and some cache.
We built a tool for lambda labs and other clouds that launches a specific instance whenever it becomes available and notifies you. We poll Lambda Labs for availability every 3 seconds. Would this be something that would be useful for you?
Spending the capex/opex to run a cluster of compute isn't easy or cheap. It isn't just the cost of the GPU, but the cost of everything else around it that isn't just monetary.
This could be an interesting comparison. My experience with AWS is that it was super easy and cheap to start on. By the time we could use whole servers we were using so much AWS orchestration that it's going to be put off until we are at least $1M ARR, and probably til we are at $5M.
Make adoption easy, give a free base tier but charge more could be a very effective model to get start ups stuck on you. It even probably makes adoption by small teams in big companies possible that can then grow ...
How much does an A100 consume in power a year (in dollar costs)?
How much does it cost to hire and retain datacenter techs?
How long does it take to expand your fleet after a user says "we're gonna need more A100s?"
How many discounts can you get as a premier customer?
Answer these questions, and the equation shifts a bunch!
A full rack with 16 amps usable power and some bandwidth is $400/month in Kansas City, MO. That is enough to power 5x A100s 24x7, so 10k plus $80 per month each, amortized, of course many more A100s would drop the price.
Once installed in the rack ($250 1 time cost) you shouldn't need to touch it. So 10k plus $1250 per A100, per year including power. You can put 2 or 3 A100s per cheapo Celeron based CPU with motherboards.
Of course if doing very bursty work then it may well make sense to rent...
Did you also include the network required to make the A100s talk to each other? Both the datacenter network (so the CPUs can load data) and the fabric (so the A100s can talk?)
You also left out the data tech costs- probably at least $50K/individual-year in KC (although I guess I'd just work for free ribs).
If you're putting A100s into celeron motherboards... I don't know what to say. You're not saving money by putting a ferrari engine in a prius.
You are correct that I left out the fabric which needs to be added in. Again a one time expense, though there are different ways to hook it up, it seems, and if the model fits on one system you may not need it.
You hire people who are employed by the DC, by the hour for DC work. How much work is there to do once it is screwed into the rack?
$50m GPU capex (which is A LOT) is about 2-3MW of power, it isn't that much.
The problem though is that getting 2-3MW of power in the US is increasingly difficult and you're going to pay a lot more for it since the cheap stuff is already taken.
Even more distressing is that if you're going to build new data center space, you can't get the rest of the stuff in the supply chain... backup gennies, transformers, cooling towers, etc...
The main flaw I see in the argument is that assumes that AIs should be generalists... But if you look at the reality of our current human economy, you will notice that generalists cannot even get jobs; the economy needs specialists. I think the same will happen with AI; companies will need specialist AIs, not generalists.
And since there are many different industries/specializations with a lot of nuanced, undocumented knowledge which is not available online, it will be difficult for a single large company to acquire all that specialist information. I think the bottleneck isn't going to be hardware costs, but merely putting together the optimal training data. To do this, you need to find the top experts in the world in any given field.
Unfortunately, it's difficult to do right now because top experts are often not given credit these days. Those who are promoted as the top people in any given field are often mostly good at politics and lack the deep nuanced knowledge that would be required to produce top quality training data.
"too cheap to beat" sounds anti-competitive and monopolistic. Large LLM providers are not dissimilar to industrial operations at scale - it requires alot of infrastructure and the more you buy/rent, the cheaper it gets. Early bird gets the worm I guess.
Not sure I understand your comment, but generally you have to prove anti-competitiveness /beyond/ too cheap to beat (unless it is a proven loss-leader which, viz all big tech companies, seems very hard to prove)
The pricing is too good to be true with you think about it rationally. If they raise prices they seem much, much less attractive than using AWS or Azure.
Amazon seem to have a much better business built around their Bedrock offering. And all their other tools are available there like SageMaker, ec2, integration with MLFlow, etc, etc.
I guess the same goes for Azure, if you are already using it it's much easier to just stick with whatever they are offering for LLM Ops.
OpenAI offering just models doesn't seem like it can last forever, and to compete with AWS or Azure at enterprise level they need to build all the things Amazon/MS have built.
The other side of that coin seems much more realistic.
> The pricing is too good to be true with you think about it rationally
In what way shape or form?
> If they raise prices they seem much, much less attractive than using AWS or Azure.
They're already significantly more expensive than Azure. OpenAI charges something like $30k a month for dedicated capacity on a "call our sales team" basis: GPT 3.5/ GPT-4 on Azure comes with that for free.
And GPT-4 is already slow and expensive enough that no one just chooses it arbitrarily... they're using it for things no other model can do. They could charge double for GPT-4 and GPT-4 would still be the only model that can do those tasks: you wouldn't get to just switch off to some other GPT-4 equivalent provider.
> Amazon seem to have a much better business built around their Bedrock offering
Amazon is literally doing the same thing with Bedrock! They're offering Anthropic at competitive prices to OpenAI for a model that's no cheaper to run based on their own dedicated capacity numbers.
> OpenAI offering just models doesn't seem like it can last forever, and to compete with AWS or Azure at enterprise level they need to build all the things Amazon/MS have built.
OpenAI is not trying to become Azure: They actively go out of their way to hide the fact they even offer half the things they offer to enterprises, instead relying on Azure absorbing demand as much as possible.
OpenAI wants ChatGPT Plus to be the new Prime, as in no one should be able to afford to not pay OpenAI for their immensely valuable offering.
Except unlike Prime, the offering is software, not commerce: If Amazon could get AWS-like margins from their e-commerce business, AWS would be a footnote.
Eh, OpenAI is too cheap to beat at their own game.
But there are a ton of use-cases where a 1 to 7B parameter fine-tuned model will be faster, cheaper and easier to deploy than a prompted or fine-tuned GPT-3.5-sized model.
In fact, it might be a strong statement but I'd argue that most current use-cases for (non-fine-tuned) GPT-3.5 fit in that bucket.
(Disclaimer: currently building https://openpipe.ai; making it trivial for product engineers to replace OpenAI prompts with their own fine-tuned models.)
So how could OpenAI make fine-tuning so cheap? Beside the computing power you need for fine-tuning, OpenAI also have to save your fine-tuned weights, and spawn a new instance for you every time you want to run your fine-tuned model. If so, I imagine it must be insanely expensive because the inference only model of ChatGPT is not what one can say "small". So, how did they do that? Can anyone share their "secret"?
The comparison here is not apples to apples. While fine tuning is less costly with OpenAI, I'd argue that running inference using GPT3.5 vs a fine tuned model should be roughly the same. OpenAI is gouging you on inference, thereby being able to offer fine tuning at a seemingly reasonable price.
Also, it's important to note that fine tuning produces a vast amount of data about use cases where fine tuning is useful.
Common sense let me think you need to calculate the value of the data that OpenAI receives. But the article quickly goes into a comparison between OpenAI, which is highly optimised for the AI workload versus the pricing of an instance at AWS. AWS, renowned for their hefty cost and in the center of the cloud repatriation movement. That alone kills the argument that OpenAI is 'too cheap'.
Nothing in that article convinces me the situation couldn't change entirely in any given month. Google Gemini could be more capable. Any number of new players (AWS, Microsoft, Apple) could enter the market in a serious way. The head-start OpenAI has in usage data is small and probably eclipsed by the clickstream and data stores that Google and Microsoft have access to. I see no durable advantage for OpenAI.
Gemini very well might be the biggest threat to OpenAI. ChatGPT has first-mover advantage so has a decent moat, but the amount of people willing to pay $20 per month for something worse[1] than they get for free with google.com is going to dwindle. I'd be very worried if I were them.
[1]: That knowledge cutoff and terrible UX of browse the web is brutal compared to the experience of Bard
OpenAI is seeking to make their own chips and hardware.
If they can pull off what Apple has done with its M line, then possibly they can make themselves even more cost effective than the competition. The competition will mostly be limited to supplies from Nvidia.
I believe in house manufacturing of their own hardware is definitely the way forward. Top down lock down. Own the hardware, own the models, own the user trained datasets.
It's not even just the cost of finetuning. The API pricing is so low, you literally can't save money by buying a GPU and running your own LLM, no matter how many tokens you generate. It's an incredible moat for OpenAI, but something they can't provide is an LLM that doesn't talk like an annoying HR manager, which is the real use case for self-hosting.
Totally flawed calculation. For many tasks Llama 70b works just fine and you can run that on a 6000e server as much as you like.
Renting GPU servers at AWS is just stupid. Did you ever see bitcoin miners calculate the price of mining one bitcoin by looking at renting AWS GPU servers?
I've haven't looked into how these new AI products are implemented. Can someone give a ballpark estimate on the current costs if someone wanted to build their own, say:
1. LLM that talks like ChatGPT
2. Image generator that makes realistic portraits from verbal descriptions.
Are the costs in the data acquisition, human training input, training CPU/GPU hours, hardware, or ??
I'll assume that OpenAI does not offer $1 of service for less than $1, at least not after certain scale of economy. If that assumption is true, then OpenAI goes back to the core of the valley: building insanely great product that solves insanely hard problems and that people are willing to buy on the merits of the products.
We just started a service different open source models and with an OpenAI compatible API [1]. The pricing isn't final and we haven't officially launched yet but you should be able to save at least 75% compared to GPT 3.5.
Hey BrunoJo, saw your posts on a couple of threads. Love what you're doing at lemonfox! Do you have any troubles with finding cheap GPUs to host models on? If so, I'm working on service that provides a single API and UI for launching cloud GPUs across 10 different cloud providers so you can always find available gpus. Let me know if this might be useful for you!
Disagree with this. Model quality is THE factor keeping them ahead.
I think a comparison to Netflix and the media industry is fair. Years ago, people claimed Netflix's tech and infrastructure was their moat, but it turns out it was the cheap easy access to loads of high quality content that was the real motivator.
This is _the_ playbook for big, fast scaling companies...Uber subsidized every ride for _a decade_ before finally charging market price, just to make sure that Uber was the only option which made sense.
While it's nice to consume the cheap stuff, it is not good for healthy markets.
This focuses on compute capacity but wouldn't the algorithmic improvements be much more important in bang for the buck at this stage as there's so much low hanging fruit as evidenced by constant stream of news about getting better results with less hardware.
It's also worth noting that if you build your business on using OpenAI's LLM or Anthropic etc, then, in the majority of cases I've seen so far (no fine tuning etc), your competitor is just one prompt away from replicating your business.
This is like Uber if self-driving cars had simply been a matter of making a faster GPU.
OpenAI gets cheaper by twiddling their thumbs for the next few years, meanwhile they continue to amass more and more data for RLHF.
It's weird that people are trying to drag non-software scaling into a software scaling problem: Lyft, Doordash, Instacart, etc. all relied on VC dollars to scale non-software growth like software. OpenAI really is just stupidly cheap compared to anything those past high CAC plays were aiming to do.
ok so you'll have to help me here, I'm still learning this stuff.
RLHF I looked it up. Is this really useful? The average human has zero general expertise because people are specialized (I know nothing about say, 1960s avant garde french cinema and my responses in a conversation there would be garbage - given the breadth of human knowledge even the most accomplished scholars are useless for over 99% of it). Won't there be a quality decrease? How is this accommodated for?
If the chat systems simply gave the most popular answers it would cease to be useful real fast.
Classic anti-competition strategy, sell below cost and burn money until competition is out, then sell higher than you could have ever sold with competition.
I signed up for OpenAI's ChatGPT tool, and entered a query, like 'What does the notation 1e100 mean?' (just to try it out). And then when displaying the output it would start outputting the reply in a slow way, like, it was dripfeeded to me, and I was like: 'what? surely this could be faster?'
Maybe I'm missing something crucial here, but why does it dripfeed answers like this? Does it have to think really hard about the meaning of 1e100? Why can't it just spit it out instantly without such a delay/drip, like with the near-instant Wolfram Alpha?
Under the hood, GPT works by predicting the next token when provided with an input sequence of words. At each step a single word is generated taking into consideration all the previous words.
You can but it’ll take longer. So one way to get faster answers is to stream the response as it is generated. And in GPT-based apps the response is generated token by token (~4chars), hence what you’re seeing.
Its a result of how these transformer models work. It's pretty quick for the amount of work it does, but it's not looking up anything, it's generating it a token a time.
not really. It's general purpose, running llama2 70b has been shown to work out to be cheaper if you have a high usage rate. And depending on use case with fine tuning can likely achieve far superior results.
The one thing openAI has that's hard to compete with is a metric shit ton of money and brand recognition.
they've technically been in this game for much longer than anyone else, of course they're more prepared. a lot of the competition popped up overnight on the amazing prospects they demonstrated.
These LLMs are confabulation machines. They're as good as the knowledge of the person who is driving the current session. For coding problems this makes them like a very good teddy bear debugger, but do not expect independently produced creative work from them.
Yep, batching is a feature I really wish the OpenAI API had. That and the ability to intelligently cache frequently used prompts. Much easier to achieve this with a hosted OS model, so I guess it's a speed + customizability/cost tradeoff for the time being.
imo they dont have batching because they pack sequences before passing through the model. so a single sequence in a batch on OpenAI might have requests from multiple customers in it
none of it is cheap, "AI" insanely expensive. Meredith Whittaker talks about it in this interview https://www.youtube.com/watch?v=amNriUZNP8w She's the president of the Signal Foundation.
Failing to embrace it will leave you behind. Consultants that use OpenAI’s GPT-4 language model are “significantly more productive and produced significantly higher quality results” than those who do not, according to a new study from the Harvard Business School: https://aibusiness.com/nlp/harvard-study-gpt-4-boosts-work-q...
This is Uber/AirBnB/Wework/literally every VC subsidized hungry-hungry-hippos market grab all over again. If you’re falling in love because the prices are so low, that is ephemeral at best and is not a moat. Someone try calling an Uber in SF today and tell me how much that costs you and how much worse the experience is vs 2017.
OpenAI is the undisputed future of AI… for timescales 6 months and less. They are still extremely vulnerable to complete disruption and as likely to be the next MySpace as they are Facebook.