There is a lot of hype around LLMs, but (BUT!) Mistral well deserves the hype. I use their original 7B model, as well as some derived models, all the time. I can’t wait to see what they release next (which I expect to be a commercial product, although the MoE model set they just released is free).
Another company worthy of some hype is 01.AI which released their Yi-34B model. I have been running Yi locally on my Mac (use “ ollama run yi:34b”) and it is amazing.
They are not close to GPT-4. Yet. But the rate of improvement is higher than I expected. I think there will be open source models at GPT-4 level that can run on consumer GPUs within a year or two. Possibly requiring some new techniques that haven't been invented yet. The rate of adoption of new techniques that work is incredibly fast.
Of course, GPT-5 is expected soon, so there's a moving target. And I can't see myself using GPT-4 much after GPT-5 is available, if it represents a significant improvement. We are quite far from "good enough".
Maybe it should be an independent model in charge only of converting your question to American English and back, instead of trying to make a single model speak all languages
I don't think this is a good idea. A good model if we are really aiming at anything that resembles AGI (or even a good LLM like GPT4) is a model that have world knowledge. The world is not just English.
There’s a lot of world knowledge that is just not present in an American English corpus. For example knowledge of world cuisine & culture. There’s precious few good English sources on Sichuan cooking.
>I think there will be open source models at GPT-4 level that can run on consumer GPUs within a year or two.
There is indeed already open source models rivaling ChatGPT-3.5 but GPT-4 is an order of magnitude better.
The sentiment that GPT-4 is going to be surpassed by open source models soon is something I only notice on HN. Makes me suspect people here haven't really tried the actual GPT-4 but instead the various scammy services like Bing that claim they are using GPT-4 under the hood when they are clearly not.
You're 100% right and I apologize that you're getting downvoted, in solidarity I will eat downvotes with you.
HNs funny right now because LLMs are all over the front page constantly, but there's a lot of HN "I am an expert because I read comments sections" type behavior. So many not even wrong comments that start from "I know LLaMa is local and C++ is a programming language and I know LLaMa.cpp is on GitHub and software improves and I've heard of Mistral."
LLMs are going to spit out a lot of broken shit that needs fixing. They're great at small context work but full applications require more than they're capable of imo.
I don't see the current tech making supply infinite. Not even close.
Maybe a more advanced type of model they'll invent in the next years. Who knows... But GPT-like models? Nah, they won't write useful code applicable in prod without supervision by an experient engineer.
It's weird for programmers to be worried about getting automated out of a job when my job as a programmer is basically to try as hard as I can to automate myself out of a job.
You’re supposed to automate yourself out but not tell anyone. Didn’t you see that old simpsons episode from the 90s about the self driving trucks? The drivers rightfully STFU about their innovation and cashed in on great work life balance and Homer ruined it by blabbered about it to everyone, causing the drivers to try to go after him.
We are trying to keep SWE salaries up, and lowering the barrier to entry will drop them.
Curious thought: at some point a competitor’s AI might become so advanced, you can just ask it to tell you how to create your own, analogous system. Easier than trying to catch up on your own. Corporations will have to include their own trade secrets among the things that AIs aren’t presently allowed to talk about like medical issues or sex.
As someone who doesn’t know much about how these models work or are created I’d love to see some kind of breakdown that shows what % of the power of GPT4 is due to how it’s modelled (layers or whatever) vs training data and the computing resources associated with it.
This isn't precisely knowable now, but it might be something academics figure out years from now. Of course, first principles of 'garbage in garbage out' would put data integrity very high, the LLM code itself is supposedly not even 100k lines of code, and the HW is crazy advanced.
so the ordering is probably data, HW, LLM model
This also fits the general ordering of
data = all human knowledge
HW = integrated complexity of most technologists
LLM = small team
Still requires the small team to figure out what to do with the first two, but it only happened now because the HW is good enough.
LLMs would have been invented by Turing and Shannon et al. almost certainly nearly 100 years ago if they had access to the first two.
That’s true now, but maybe GPT6 will be able to tell you how to build GPT7 on an old laptop, and you’ll be able to summon GPT8 with a toothpick and three cc’s of mouse blood.
What is inherent about AIs that requires spending a billion dollars?
Humans learn a lot of things from very little input. Seems to me there's no reason, in principle, that AIs could not do the same. We just haven't figured out how to build them yet.
What we have right now, with LLMs, is a very crude brute-force method. That suggests to me that we really don't understand how cognition works, and much of this brute computation is actually unnecessary.
Maybe not $1 billion, but you'd want quite a few million.
According to [1] a 70B model needs $1.7 million of GPU time.
And when you spend that - you don't know if your model will be a damp squib like Bard's original release. Or if you've scraped the wrong stuff from the internet, and you'll get shitty results because you didn't train on a million pirated ebooks. Or if your competitors have a multimodal model, and you really ought to be training on images too.
So you'd want to be ready to spend $1.7 million more than once.
You'll also probably want $$$$ to pay a bunch of humans to choose between responses for human feedback to fine-tune the results. And you can't use the cheapest workers for that, if you need great english language skills and want them to evaluate long responses.
And if you become successful, maybe you'll also want $$$$ for lawyers after you trained on all those pirated ebooks.
And of course you'll need employees - the kind of employees who are very much in demand right now.
You might not need billions, but $10M would be a shoestring budget.
And when you spend that - you don't know if your model will be a damp squib like Bard's original release. Or if you've scraped the wrong stuff from the internet, and you'll get shitty results because you didn't train on a million pirated ebooks.
This just screams to me that we don’t have a clue what we’re doing. We know how to build various model architectures and train them, but if we can’t even roughly predict how they’ll perform then that really says a lot about our lack of understanding.
Most of the people replying to my original comment seem to have dropped the “in principle” qualifier when interpreting my remarks. That’s quite frustrating because it changes the whole meaning of my comment. I think the answer is that there isn’t anything in principle stopping us from cheaply training powerful AIs. We just don’t know how to do it at this point.
>Humans learn a lot of things from very little input
And also takes 8 hours of sleep per day, and are mostly worthless for the first 18 years. Oh, also they may tell you to fuck off while they go on a 3000 mile nature walk for 2 years because they like the idea of free love better.
Knowing how birds fly ready doesn't make a useful aircraft that can carry 50 tons of supplies, or one that can go over the speed of sound.
This is the power of machines and bacteria. Throwing massive numbers at the problem. Being able to solve problems of cognition by throwing 1GW of power at it will absolutely solve the problem of how our brain does it with 20 watts in a faster period of time.
> Transistors used to cost a billion times more than they do now
However you would still need billions of dollars if you want state of the art chips today, say 3nm.
Similarly, LLM may at some point not require a billion dollars, you may be able to get one, on par or surpass GPT4, easily for cheap. The state of the art AI will still require substantial investment.
Because that billion dollars gets you the R&D to know how to do it?
The original point was that an “AI” might become so advanced that it would be able to describe how to create a brain on a chip. This is flawed for two main reasons.
1. The models we have today aren’t able to do this. We are able to model existing patterns fairly well but making new discoveries is still out of reach.
2. Any company capable of creating a model which had singularity-like properties would discover them first, simply by virtue of the fact that they have first access. Then they would use their superior resources to write the algorithm and train the next-gen model before you even procured your first H100.
I agree about training time, but bear in mind LLMs like GPT4 and Mistral also have noisy recall of vastly more written knowledge than any human can read in their lifetime, and this is one of the features people like about LLMs.
You can't replace those types of LLM with a human, the same way you can't replace Google Search (or GitHub Search) with a human.
Acquiring and preparing that data may end up being the most expensive part.
Mistral's latest just released model is well below GPT-3 out of the box. I've seen people speculate that with fine-tuning and RLHF you could get GPT-3 like performance out of it but it's still too early to tell.
I'm in agreement with you, I've been following this field for a decade now and GPT-4 did seem to cross a magical threshold for me where it was finally good enough to not just be a curiosity but a real tool. I try to test every new model I can get my hands on and it remains the only one to cross that admittedly subjective threshold for me.
> Mistral's latest just released model is well below GPT-3 out of the box
The early information I see implies it is above. Mind you, that is mostly because GPT-3 was comparatively low: for instance its 5-shot MMLU score was 43.9%, while Llama2 70B 5-shot was 68.9%[0]. Early benchmarks[1] give Mixtral scores above Llama2 70B on MMLU (and other benchmarks), thus transitively, it seems likely to be above GPT-3.
Of course, GPT-3.5 has a 5-shot score of 70, and it is unclear yet whether Mixtral is above or below, and clearly it is below GPT-4’s 86.5. The dust needs to settle, and the official inference code needs to be released, before there is certainty on its exact strength.
(It is also a base model, not a chat finetune; I see a lot of people saying it is worse, simply because they interact with it as if it was a chatbot.)
One thing people should keep in mind when reading others’ comments about how good an LLM is at coding, is that the capability of the model will vary depending on the programming language. GPT-4 is phenomenal at Java because it probably ate an absolutely enormous amount of Java in training. Also, Java is a well-managed language with good backwards-compatibility, so patterns in code written at different times are likely to be compatible with each other. Finally, Java has been designed so that it is hard for the programmer to make mistakes. GPT-4 is great for Java because Java is great for GPT-4: it provides what the LLM needs to be great.
If you can run yi34b, you can run phind-codellama. It's much better than yi and mistral for code questions. I use it daily. More useful than gpt3 for coding, not as good as gpt4, except that I can copy and paste secrets into it without sending them to openai.
Typically a few lines snippets that would require me a few minutes of thinking but that ChatGPT will provide immediately. It often works, but there are setbacks. For instance, if I'm lazy and don't very carefully check the code, it can produce bugs and cancel the benefits.
It can be useful, but I can see how it'll generate a class of lazy coders who can't think by themselves and just try to get the answer from ChatGPT. An amplified Stack Overflow syndrome.
How do you use these models? If you don't mind sharing. I use GPT-4 as an alternative to googling, haven't yet found a reason to switch to something else. I'll for example use it to learn about the history, architecture, cultural context, etc of a place when I'm visiting. I've found it very ergonomic for that.
I’ve use lm studio. It’s not reached peak user friendliness, but it’s a nice enough GUI. You’ll need to fiddle with resource allocation settings and select an optimally quantized model for best performance. But you can do all that in the UI.
I just installed it on my 32B Mac yesterday, first impressions: it does very well reasoning, it does very well answering general common sense world knowledge questions, and so far when it generates Python code, the code works and is well documented. I know this is just subjective, but I have been running a 30B model for a while in my Mac and Yi-34B just feels much better. With 4bit quantization, I can still run Emacs, terminal windows and a web browser with a few tabs without seeing much page faulting. Anyway, please try it and share a second opinion.
But you need to run the top Yi finetunes instead of the vanilla chat model. They are far better. I would recommend Xaboros/Cybertron, or my own merge of several models on huggingface if you want the long context Yi.
Of course, the reason Mistral AI got a lot of press and publicity in the first place was because they open-sourced Mistral-7B despite the not-making-money-in-the-short-term aspect of it.
It's better for the AI ecosystem as a whole to incentive AI startups to make a business through good and open software instead of building moats and lock-in ecosystems.
I don’t think that counts as open source. They didn’t share any details about their training, making it basically impossible to replicate.
It’s more akin to a SaaS company releasing a compiled binary that usually runs on their server. Better than nothing, but not exactly in the spirit of open source.
This doesn’t seem like a pedantic distinction, but I suppose it’s up to the community to agree or disagree.
A compiled binary is a bad metaphor because it gives the implication that Mistral-7B is an as-is WYSIWIG project that's not easily modifiable. In contrast, there have been a bunch of new powerful new models created by modifying or finetuning Mistral-7B such as Zephyr-7B: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta
The better analogy to Mistral-7B is something like modding Minecraft or Skyrim: although those games are closed source themselves, it has enabled innovations which helps the open-source community directly.
It would be nice to have fully open-source methodologies but lacking them isn't an inherent disqualifier.
I'm well aware of the many open source architectures, and the point stands. Models like GPT-J have open code and data, and that allows using them as a baseline for architecture experiments in a way that Mistral's models can't be. Mistral publishes weights and code, but not the training procedure or data. Not open.
With all the new national supercomputers scale isn’t really going to be an issue, they all want large language models on 10k GH200s or whatever and the libraries are getting easier to use
"Source code is defined as the preferred form of the program for making changes in. Thus, whatever form a developer changes to develop the program is the source code of that developer's version."
According to the Open Source Definition:
"The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed."
LLM models are usually modified by changing the model weights directly, instead of retraining the model from scratch. LLM weights are poorly understood, but this is an unavoidable side effect of the development methodology, not deliberate obfuscation. "Intermediate" implies a form must undergo further processing before it can be used, but LLM weights are typically used directly. LLMs did not exist when these definitions were written, so they aren't a perfect fit for the terminology used, but there's a reasonable argument to be made that LLM weights can qualify as "source code".
> LLM models are usually modified by changing the model weights directly, instead of retraining the model from scratch. LLM weights are poorly understood, but this is an unavoidable side effect of the development methodology, not deliberate obfuscation.
They're understood based on knowing the training process though, and a developer working on them would want to have the option of doing a partial or full retraining where warranted.
also because their model is unconstrained/censored. and they are commited to that according to what they say, they build it so others can build on it. GPTs are not finished business and hopefully the open source community with surpass the early successes.
Why would it be a dubious valuation scheme? I guess if an investor is looking at just revenue, or only looking at one area of their business finances, maybe? Otherwise it seems like the loss in funds would be weighed against the increase in revenue and wouldn't distort earnings.
Say big green gives a company $100M with the rider that it needs to spend all that on nvidia's hardware in exchange for 10% of the company.
Has Nvidia valued the company at 1B? Say their margin is 80% on the sales. So Nvidia has lost some cashflow and $20M for that 10%. Has Nvidia valued the company at $200M?
I see :) Thanks for clarifying. I would say that I don't have a strong enough grasp on biz finances to do more than speculate here, but:
1) Is all the money spent up front? Or does it trickle back in over a few years? Cash flow might be impacted more than implied, but I doubt this is much of an issue.
2) I wonder how the 10% ownership at 2B valuation would be interpreted by investors. If it's viewed as a fairly liquid investment with low risk of depreciation then yeah, I could see Nvidia's strategy being quite the way to pad numbers. OTOH, the valuation could be seen as pure marketing fluff and mostly written off by the markets until regulations and profitability are firmly in place.
If it was a good valuation scheme, then Nvidia giving them $100 million at a $2 billion valuation would mean that Nvidia thinks the company is worth $2 billion. But if Mistral uses that money to buy GPUs that Nvidia sells with 75% profit margin, the deal is profitable for Nvidia even if they believe the company is worth only $0.5 billion (since they effectively get 75% of the investment back). And if this deal fuels the wider LLM hype and leads other companies to spend just $50 million more at Nvidia, this investment is profitable for Nvidia even if Mistral had negative value.
The problem is that no European VC has that amount of capital. European VCs typically have a couple of hundred million under mgmt. SV VCs have a few billion under mgmt.
Index Ventures has the money. But the truth of the matter is that even most US VCs aren't willing to shell out 2B valuations for a company with no revenue.
Was CTO for some European startups. I'll always remember one when by the time the EU VC was mid-way through its due dilligence for 500k seed, we already had some millions lined up from some US VCs no questions asked.
There were european VCs investing in the very first round, french one in particular. Founders are french. This qualifies as european in my book (let’s not get too demanding)
I have realised just how meaningless valuations now are. As much as we use them as a marker of success, you can find someone to write the higher valuation ticket when it suits their agenda too e.g the markup, the status signal, or just getting the deal done ahead of your more rationale competitors in the investment landscape. Now that's not to say Mistral isn't a valuable company or that they aren't doing good work. It's just valuation markers are meaningless and most of this capital raise in the AI space is about offsetting the cloud/GPU spend. Might get downvoted to death but watching valuation news feels like no news.
Mistral has a lot of potential, but there's the obvious risk that without proper monetization strategies it might not achieve sustainable profitability in the long term.
The French have a urge to be independent, the French government will hand them some juicy contract as soon as the can provide any product that justifies that.
I would say most European countries have that desire. That and the fact it can easily by fine tuned to the local language could make these models very popular outside the US.
Coupled with the concern that once you’re charging users money for a product, you are also liable for sketchy things they do with it. Not so much when you post a torrent link on twitter that happens to have model weights.
On their pitch deck it said they will monetise serving of their models.
While it may feel like a low moat if anyone can spin up a cloud instance with the same model, it's still a reasonable starting point. I think they will also be getting a lot of EU clients who can't/don't want to use US providers.
they seem pretty committed to open-source AI (from interviews I've heard with the founders) - but maybe if they manage to train models with truly amazing capabilities somewhere down the line, they will keep some closed source
The old open source, but we'll host it for you? I think Bezos is going to be in fits of evil laughter about that model in 5 years, as all the open source compute moves to the clouds, with dollars flowing his way.
But one thing Mistral could do is have a free foundational model, and have non-free (as in beer, as in speech) "pro" models. I think they will have to.
There are huge economy of scale benefits from providing hosted models.
I've been trying out all sorts of open models, and some of them are really impressive - but for my deployed web apps I'm currently sticking with OpenAI, because the performance and price I get from their API is generally much better than I can get for open models.
If Mistral offered a hosted version which didn't have any spin-up time and was price competitive with OpenAI I would be much more likely to build against their models.
At this valuation and given the strength of the team, it’s not hard to imagine a future acquisition yielding a significant ROI.
Besides, we don’t know what future opportunities will unfold for these technologies. Clearly there’s no shortage of smart investors happy to place bets on that uncertainty.
I really hope that a European startup can successfully compete with the major companies. I do not want to see privacy violations, such as OpenAI's default use of user prompts for training, become standard practice.
How on Earth would it count as European? It's a completely American company. Founded in the US, by Americans, headquartered in the US, funded by American VCs... I genuinely don't get how you arrived at the idea that it's European.
maybe not the distinction you meant but the UK is still in Europe (the continent) and to me, European is a word based on location not membership of the European Union (which the UK left)
The old Masters have a saying: Never fall in love with your creation.
The AI industry is falling into the trap of their own making (marketing).
LLM's are nice toys, but implementation is resource/energy expensive and murky at best.
There are a lot of real life problems that would be solved trough rational approach.
If someone is thirsty, the water is the most important part, not the type of glass:)
If you compared the efficiency of steam engines during industrial revolution with the ones used today, or power generation from 100 years ago to that of now, or between just about any chemical process, manufacturing method or agricultural technique at its invention and now, you'd be amazed by the difference. In some cases, the activity of today was several orders of magnitude more wasteful just 100 years ago.
Or, I guess look at how size, energy use and speed of computer hardware evolved over the past 70 years. Point is, implementation being, right now, "resource/energy expensive and murky at best" is how many very powerful inventions look at the beginning.
> If someone is thirsty, the water is the most important part, not the type of glass:)
Sure, except here, we're talking about one group selling a glass imbued with breakthrough nanotech, allowing it to keep the water at desired temperature indefinitely, and continuously refill itself by sucking moisture out of the air. Sometimes, the type glass may really matter, and then it's not surprising many groups strive to be able to produce it.
Let me say this. Whoever is going to be able to let "normal" Mac users to install and run a local copy of an LLM, is going to reap tons of commercial benefits. (e.g. DMG, click-install, run. No command line).
It is nuts to me that we have 100M computers capable of running LLMs properly, and yet only a tiny fraction of them does.
Heck, let us do p2p, and lend our computing power to others.
Let us build a personalized LLM.
This is, IMHO, a really interesting path forward. It seems no one is doing it.
> Whoever is going to be able to let "normal" Mac users to install and run a local copy of an LLM, is going to reap tons of commercial benefits. (e.g. DMG, click-install, run. No command line).
Noob questions (I don't know anything about LLM, I'm just a casual user of ChatGPT)
- is what Mistral does better than Meta or OpenAI?
- will LLM become eventually open-source commodities with little room for innovation or shall we expect to see a company with a competitive advantage that will make it the new Google? in other words, how much better can we expect these LLM to be in the future? should we expect significant progress or have we reached to diminished returns (after all, this is only statistical prediction of next word, maybe there's an intrinsic limitation of this method)
- are there some sorts of benchmarks to compare all these new models?
> In a significant development for the European artificial intelligence sector, Paris-based startup Mistral AI has achieved a noteworthy milestone. The company has successfully secured a substantial investment of €450 million, propelling its valuation to an impressive $2 billion.
I’m cracking up. I don’t need to be a rocket scientist to read this and immediately conclude it’s AI-generated. I mean, they didn’t even try to hide that. Haha.
I believe that the rationale is that if you can do an outstanding 7B model, it is likely that you are able to create, in the near future, something that may compete with OpenAI, and something that makes money, too.
We both know that's not how regulations work. Mistral is going to have to get a legal team to understand the regulations, have a line item for each provision, verify each one doesn't apply to them, get it signed off and continously monitor for changes both to the laws and the code to make sure it stays compliant. This will just be a mandate from HR/Legal/Investors.
Alot of work for a company with no commercial offering off the bat. And possibly an insurmountable amount of work for new players trying to enter.
Meta didn't have any commercial offering until what, WhatsApp for business a few years ago, around 2018? By your logic they should have never been valued at anything or made any profit, yet they did.
Or another way to put it - if you are an enterprise based in Europe that needs to stay compliant, future regulation will make it very hard to not use Mistral :P.
The part of Meta research that worked on LLaMa happened to be based in the Paris office. Then some of the leads left and started Mistral.
Complex/simple is not really the right way to think about training these models, I'd say its more arcane. Every mistake is expensive because it takes a ton of GPU time and/or human fine tuning time. Take a look at the logbooks of some of the open source/research training runs.
So these engineers have some value as they've seen these mistakes (paid for by Meta's budget).
Because many around here have a preconceived bias that Europe cannot be innovative, and any proof to the contrary needs to be shat upon as not good or innovative enough/only looking for government contracts, or that they're not the size of Meta or Alphabet or Apple so obviously they aren't really innovative, or some other goal post shifting exercise.
It’s kinda weird thinking deep tech companies should be profitable a year in.
Like it takes time to make lots of money and it’s really hard to build state of the art models.
Reality is this market is huge and growing massively as it is so much more efficient to use these models than many (but not all) tasks.
At stability I told team to focus on shipping models as next year is the year for generative media where we are the leader as language models go to the edge.
They didn't say that companies should be profitable at a year in.
To my mind they just seemed to be responding to the slightly clickbait-y title, which focuses on the valuation, which has some significance but is still pretty abstract. Still, headlines love the word "billion".
The straight-news version of the headline would probably focus more on a16z's new round.
I acknowledge it’s easy to be an armchair critic. You are the ones in battlefield doing real work and pushing the edge.
The thing is I don’t want the pro-open-source players to fizzle out and implode because funding dried up and they have no path to self sustainability.
AGI could be 6 months away or 6 decades away.
E.g Cruise has a high probability of imploding. They raised too much and didn’t deliver. Now California has revoked their license for driverless cars.
I’m 100% sure AGI, driverless cars and amazing robots will come. Fairly convinced the ones who get us there will be the cockroaches and not the dinosaurs.
I think its also tough at the early stage of the diffusion (aha) of innovation curve, we are at the point of early adopters and high churn before mass adoption of these technologies over the coming years as they are good enough, fast enough and cheap enough.
AGI is a bit of a canard imo, its not really actionable on a business sense.
Profitability likewise means jack shit. You just need to be have a successful acquisition by a lazy dinosaur or go make enough income to go public. You can lose money for 10yrs straight while transferring wealth from the public to the investors/owners. With that said, I'm short Mistral for them being French. I have absolute zero faith in EU based orgs.
On profitability, For all the new comers, I don't think anyone can wager that any of them is going to make money. Capital efficiency is overrated so long as they can survive for the next year+, they are all trying to corner the market and OpenAI is the one that seems to have found a way to milk the cow for now. I truly believe that the true hitmakers are yet to enter the scene.
This is just tangential, but I wouldn't call their APIs "nice", I'd be far less charitable. I spent a few hours (because that's how long it took to figure out the API, due to almost zero documentation) and wrote a nicer Python layer:
Yes, and it can matter in a very bad way if you need to subsequently have a "down round" (more funding at a lower valuation).
Initial high valuations mean the founders get a lot of initial money giving up little stock. This can be awesome if they become strongly cash-flow positive before they run out of that much runway. But if not, they'll get crammed hard in subsequent rounds.
The more key question is: how much funding did they raise at that great valuation, and is it sufficient runway? Looks like €450 million plus an additional €120 million in convertible debt. Might be enough, depending on their expenses...
I'm not saying that either of your concerns are invalid. The LLM space is just the wrong place to be for investors who are worried about cash-flow positivity this early in the game. These models are crazy expensive to develop _currently_, but they is getting cheaper to train all the time. Meaning Mistral spent a fraction of what OpenAI did on GPT-3 to train their debut model, and that companies started one year from now will be spending a fraction of what both are spending presently to train their debut models.
YUP. Plus, the points at the end of your post, abt how much faster and cheaper it is getting to train new models indicates that Mistral may have hit a real sweet-spot. They are getting funding at a moment where the expectations are that huge capital is needed to build these models, just when those costs are declining, so the same investment will buy them a lot more runway than it did for previous competitors...
Instead of "path to profitability", I think path to ROI is more appropriate, though.
WhatsApp never had a path to profitability, but it had a clear path to ROI by building a unique and massive user base that major social networks would fight for.
Perhaps too much off-topic, but I hate how the press (and often the startups themselves) focuses on the valuation number when a company receives funding. As we've seen in very recent history, those valuation numbers are at best a finger in the wind, and of course a big capital intensive project like AI requires a valuation that is at least a couple multiples of the investment, even if it's all essentially based on hope.
I think it would make much more sense to focus on the "reality side" of the transaction, e.g. "Mistral AI received a €450 million investment from top tech VC firms."
I think there are enough genuine use cases. People are saving time using AI tools. There are a lot of people in office jobs. It is a huge market. Not to say it won't overshoot. With high interest rates valuations should be less frothy anyway.
Right now they’re shoveling “potential”. LLMs demonstrate capabilities we haven’t seen before, so there’s high uncertainty about the eventual impact. The pace of progress makes it _seem_ like an LLM “killer app” could appear any day and creating a sense of FOMO.
There's also the race to "AGI" -- companies spending tens of billions on training, hoping they'll hit a major intelligence breakthrough. If they don't hit anything significant that would have been money (mostly) down the drain, but Nvidia made out like a bandit.
I can’t think of any software/service that’s grown more in terms of demand over a single year than ChatGPT (in all its incarnations, like the MS Azure one).
I don’t know what you’re talking about. I use chatGPT extensively. Probably more than 50 times a day. I am extremely excited for anything that can top the already amazing thing we have now. They have a massive paying customer base.
100%. ChatGPT is used heavily in my household (my wife and I both have paid subscriptions) and it’s absolutely worth it. One of the most interesting things for me has actually been watching my wife use it. She’s an academic in the field of education and I’ve seen her come up with so many creative uses of the technology to help with her work. I’m a power user too, but my usage, as a software engineer, is likely more predictable and typical.
- Writing: emails, documentation, marketing - Write a bunch of unstructured skeleton of information. Add a prompt about the intended audience and a purpose. Possibly ask it to add some detail.
- Coding: Especially things like "Is there a method for this in this library" - a lot quicker than browsing through documentation. Some errors - copy-paste the error from the console, maybe a little bit for context, and quite often I get the solution.
And API based:
- Support bot
- Prompt engineering of some text models that normally would require labeling, training, and evaluation for weeks or months. A couple of use cases - unstructured text as an input + prompt, JSON as an output.
A lot of very varied things so it’s hard to remember. Yesterday I used it extensively to determine what I need to buy for a chicken coop. Calculating the volume of concrete and cinder blocks needed, the type and number of bags of concrete I would need, calculating how many rolls of chicken wire I would need, calculating the number of shingles I would need, questions on techniques, and drying times for using those things, calculating how much mortar I would need for the cinderblocks (it took into account that I would mortar only on the edges, the thickness of mortar required for each joint, it accounted for the cores in the cinderblocks, it correctly determined I wouldn’t need mortar on the horizontal axis on the bottom row) etc. All of this, I could’ve done by hand, but I was able to sit and literally use my voice to determine all of this in under five minutes.
I use DALLE3 extensively for my woodworking hobby, where I ask it to come up with ideas for different pieces of furniture, and have constructed several based on those suggestions.
For work I use it to write emails, to come up with skeletons for performance reviews, look back look ahead documents, ideas for what questions to bring up during sprint reviews based on data points I provide it etc.
Not OP but I used it very successfully (not OpenAI but some wrapper solution) for technical/developer support.
Turns out a lot of people prefer talking to a bot that gives a direct answer than reading the docs.
Support workload on our Slack was reduced by 50-75% and the output is steadily improving.
we don’t even know what AI is truely going to look like in 2 years, and 2 years ago nobody cared. Isn’t it a bit too early to regulate a field that’s barely starting ?
Another company worthy of some hype is 01.AI which released their Yi-34B model. I have been running Yi locally on my Mac (use “ ollama run yi:34b”) and it is amazing.
Hype away Mistral and 01.AI, hype away…