Hacker News new | past | comments | ask | show | jobs | submit login
Anthropic’s $5B, 4-year plan to take on OpenAI (techcrunch.com)
383 points by isaacfrond on April 11, 2023 | hide | past | favorite | 484 comments



If Apple would wake up to what's happening with llama.cpp etc then I don't see such a market in paying for remote access to big models via API, though it's currently the only game in town.

Currently a Macbook has a Neural Engine that is sitting idle 99% of the time and only suitable for running limited models (poorly documented, opaque rules about what ops can be accelerated, a black box compiler [1] and an apparent 3GB model size limit [2])

OTOH you can buy a Macbook with 64GB 'unified' memory and a Neural Engine today

If you squint a bit and look into the near future it's not so hard to imagine a future Mx chip with a more capable Neural Engine and yet more RAM, and able to run the largest GPT3 class models locally. (Ideally with better developer tools so other compilers can target the NE)

And then imagine it does that while leaving the CPU+GPU mostly free to run apps/games ... the whole experience of using a computer could change radically in that case.

I find it hard not to think this is coming within 5 years (although equally, I can imagine this is not on Apple's roadmap at all currently)

[1] https://github.com/hollance/neural-engine

[2] https://github.com/smpanaro/more-ane-transformers/blob/main/...


If I were Apple I'd be thinking about the following issues with that strategy:

1. That RAM isn't empty, it's being used by apps and the OS. Fill up 64GB of RAM with an LLM and there's nothing left for anything else.

2. 64GB probably isn't enough for competitive LLMs anyway.

3. Inferencing is extremely energy intensive, but the MacBook / Apple Silicon brand is partly about long battery life.

4. Weights are expensive to produce and valuable IP, but hard to protect on the client unless you do a lot of work with encrypted memory.

5. Even if a high end MacBook can do local inferencing, the iPhone won't and it's the iPhone that matters.

6. You might want to fine tune models based on your personal data and history, but training is different to inference and best done in the cloud overnight (probably?).

7. Apple already has all that stuff worked out for Siri, which is a cloud service, not a local service, even though it'd be easier to run locally than an LLM.

And lots more issues with doing it all locally, fun though that is to play with for developers.

I hope I'm wrong, it'd be cool to have LLMs be fully local, but it's hard to see situations where the local approach beats out the cloud approach. One possibility is simply cost: if your device does it, you pay for the hardware, if a cloud does it, you have to pay for that hardware again via subscription.


> but it's hard to see situations where the local approach beats out the cloud approach.

I think the most glaring situation where this is true is simply one of trust and privacy.

Cloud solutions involve trusting 3rd parties with data. Sometimes that fine, sometimes it's really not.

Personally - LLMs start to feel more like they're sitting in the confidant/peer space in many ways. I behave differently when I know I'm hitting a remote resource for LLMs in the same way that I behave differently when I know I'm on camera in person: Less genuinely.

And beyond merely trusting that a company won't abuse or leak my data, there are other trust issues as well. If I use an LLM as a digital assistant - I need to know that it's looking out for me (or at least acting neutrally) and not being influenced by a 3rd party to give me responses that are weighted to benefit that 3rd party.

I don't think it'll be too long before we see someone try to create an LLM that has advertising baked into it, and we have very little insight into how weights are generated and used. If I'm hitting a remote resource - the model I'm actually running can change out from underneath me at any time, jarring at best and utterly unacceptable at worst.

From my end - I'd rather pay and run it locally, even if it's slower or more expensive.


People have trusted search engines with their most intimate questions for nearly 30 years and there has been what ... one? ... leak of query data during this time, and that was from AOL back when people didn't realize that you could sometimes de-anonymize anonymized datasets. It hasn't happened since.

LLMs will require more than privacy to move locally. Latency, flexibility and cost seem more likely drivers.


You're still focused on trusting that my data is safe. And while I think that matters - I don't really think that's the trust I care most about.

I care more about the trust I have to place in the response from the model.

Hell - since you mentioned search... Just look at the backlash right now happening to google. They've sold out search (a while back, really) and people hate it. Ads used to be clearly delimited from search results, and the top results used to be organic instead of paid promos. At some point, that stopped being true.

At least with google search I could still tell that it was showing me ads. You won't have any fucking clue that OpenAI has entered into a partnering agreement with "company [whatever]" and has retrained the model that users on plans x/y/z interact with to make it more likely to push them towards their new partner [whatever]'s products when prompted with certain relevant contexts.


> Hell - since you mentioned search... Just look at the backlash right now happening to google. They've sold out search (a while back, really) and people hate it. Ads used to be clearly delimited from search results, and the top results used to be organic instead of paid promos. At some point, that stopped being true.

Only people in HN-like communities care about this stuff. Most people find the SEO spam in their results more annoying.

> At least with google search I could still tell that it was showing me ads. You won't have any fucking clue that OpenAI has entered into a partnering agreement with "company [whatever]" and has retrained the model that users on plans x/y/z interact with to make it more likely to push them towards their new partner [whatever]'s products when prompted with certain relevant contexts.

You won't know this for any local models either.


> You won't know this for any local models either.

But you will know the model hasn't changed, and you can always continue using the version you currently have.

> Most people find the SEO spam in their results more annoying.

This is the same problem. These models will degrade from research quality to mass market quality as there's incentive to change what results they surface. Whether that's intentional (paid ads) versus adversarial (SEO) doesn't matter all that much - In either case the goals will become commercial and profit motivated.

People really don't like "commercial and profit motivated" in the spaces that some of these LLMs stepping into. Just like you don't like SEO in your recipe results.


> But you will know the model hasn't changed, and you can always continue using the version you currently have.

Will you? What happens when an OS update silently changes the model? Again this is one of those things only HN-types really care/rant about. I've never met a non-technical person care about regular updates beyond being slow or breaking an existing workflow. Most technical folks I know don't care either.

> This is the same problem. These models will degrade from research quality to mass market quality as there's incentive to change what results they surface. Whether that's intentional (paid ads) versus adversarial (SEO) doesn't matter all that much - In either case the goals will become commercial and profit motivated.

Not at all. Search providers have an incentive to fight adversarial actors. They don't have any incentive to fight intentional collaboration.

> People really don't like "commercial and profit motivated" in the spaces that some of these LLMs stepping into. Just like you don't like SEO in your recipe results.

I disagree. When a new, local business pops up and pays for search ads, is this "commercial and profit motivated?" How about advertising a new community space opening? I work with a couple businesses like this (not for SEO, just because I like the space they're in and know the staff) and using ads for outreach is a pretty core part of their strategy. There's no neat and clean definition of "commercial and profit motivated" out there.


You wouldn't know that even if the model ran locally.


This happened with ChatGPT a few weeks ago.

https://news.ycombinator.com/item?id=35291112


Two issues though: leak of data from one party to another, and misuse of data by the party you gave it to. Most big companies don’t leak this type of data, but they sure as hell misuse it and have the fines to prove it.


Almost everyone is willing to trust 3rd parties with data, including enterprise and government customers. I find it hard to believe that there are enough people willing to pay a large premium to run these locally to make it worth the R&D cost.


Having done a lot of Bank/Gov related work... I can tell you this

> Almost everyone is willing to trust 3rd parties with data, including enterprise and government customers.

Is absolutely not true. In it's most basic sense - sure... some data is trusted to some 3rd parties. Usually it's not the data that would be most useful for these models to work with.

We're already getting tons of "don't put our code into chatGPT/Copilot" warnings across tech companies - I can't imagine not getting fired if I throw private financial docs for my company in there, or ask it for summaries of our high level product strategy documents.


Yes, just like you might get fired for transacting sensitive company business on a personal gmail account, even if that company uses enterprise gmail.

Saying that cloud models will win over local models is not the same as saying it will be a free-for-all where workers can just use whatever cloud offering they want. It will take time to enterprisify cloud LLM offerings to satisfy business/government data security needs, but I'm sure it will happen.


But right now what incentive have I to buy a new laptop? I got this 16GB M1 MBA two years ago and it's literally everything I need, always feels fast, silent etc

1. the idea would be that now there is a reason to buy loads more RAM, whereas currently the market for 64GB is pretty niche

2. 64GB is a big laptop today, in a few years time that will be small. And LLaMA 65B int4 quantized should fit comfortably

4. LLMs will be a commodity. There will be a free one

6. LLMs seem to avoid the need for finetuning by virtue of their size - what we see now with the largest models is you just do prompt engineering. Making use of personal data is a case of Langchain + vectorstores (or however the future of that approach pans out)


1. You're working backwards from a desire to buy more RAM to try and find uses for it. You don't actually need more RAM to use LLMs, ChatGPT requires no local memory, is instant and is available for free today.

2. Why would anybody be satisfied with a 64GB model when GPT-4 or 5 or 6 might even be using 1TB of RAM?

3. That may not be the case. With every day that passes, it becomes more and more clear that large LLMs are not that easy to build. Even Google has failed to make something competitive with OpenAI. It's possible that OpenAI is in fact the new Google, that they have been able to establish permanent competitive advantage, and there will no more be free commodity LLMs than there are free commodity search engines.

Don't get me wrong, I would love there to be high quality local LLMs. I have at least two use cases where you can't do them or not really well with the OpenAI API and being able to run LLama locally would fix that problem. But I just don't see that being a common case and at any rate I would need server hardware to do it properly, not Mac laptop.


1. You're working backwards from a desire to buy more RAM to try and find uses for it.

I'm really not

I had no desire at all until a couple of weeks ago. Even now not so much since it wouldn't be very useful to me

But the current LLM business model where there are a small number of API providers, and anything built using this new tech is forced into a subscription model... I don't see it sustainable, and I think the buzz around llama.cpp is a taste of that

I'm saying imagine a future where it is painless to run a ChatGPT-class LLM on your laptop (sounded crazy a year ago, to me now looks inevitable within few years), then have a look at the kind of things that can be done today with Langchain... then extrapolate


It sounds like we are in a similar position. I had no desire to get a 64gb laptop from apple until all the interesting things from running llama locally came out. I wasn't even aware of the specific benefit of that uniform memory model on the mac. Now I'm looking at do I want to do 64, 96 or 128gb. For an insane amount of money, 5k for that top end one.


The unified memory ought to be great for running LLaMA on the GPU on these Macbooks (since it can't run on the Neural Engine currently)

The point of llama.cpp is most people don't have a GPU with enough RAM, Apple unified memory ought to solve that

Some people have it working apparently:

https://github.com/remixer-dec/llama-mps


Thank you, that's exactly what I was looking for, specific info on perf.


I think the GPU performance for inference is probably limited currently by immaturity of PyTorch MPS (Metal) backend

before I found the repo above I had a naive attempt to get llama running with mps and it didn't "just work" - bunch of ops not supported etc


I think llama.cpp will die soon because the only models you can run with it are derivatives of a model that Facebook never intended to be publicly released, which means all serious usage of it is in a legal limbo at best and just illegal at worst. Even if you get a model that's clean and donated to the world, the quality is still not going to be competitive with the hosted models.

And yes I've played with it. It was/is exciting. I can see use cases for it. However none are achievable because the models are (a) not good enough and (b) too legally risky to use.


(A) is very use case depending. Even with some of the bad smaller models now, I can see devs making use of them to enhance their app (e.g. local search, summaries, sentiments, translations)

(B) llama.cpp supports gpt4all, which states that its working on fixing your concern. This is from their README:

Roadmap Short Term

- Train a GPT4All model based on GPTJ to alleviate llama distribution issues.


> is instant and is available for free today.

It's free for the user up to a point, but it costs OpenAI a lot of money.

Apple is a hardware vendor, so commoditization of the software while finding more market segments is definitely something that'd benefit them.

OTOH, if they let OpenAI become the unrivaled leader of AI that end up being the next Google, they end up losing on a topic they wanted to lead for long time (Apple has invested quite a lot in AI, and the existence of a Neural Engine in Apple CPUs isn't an accident)


"A lot of money" is a lot less money per user than to buy 64GB RAM to run an inferior model locally + energy and opportunity costs. The OpenAI APIs are super cheap for a single user needs. I expect them to be at least close to breaking even with their APIs pricing.


> "A lot of money" is a lot less money per user than to buy 64GB RAM

if OpenAI isn't able to get couple hundred bucks over the typical lifetime of a computer it means the added value they provide is very low (like several times less than Spotify or Netflix for instance), meaning they'll never be “the next Google”.

And if they are it means it make sense to buy it once instead of paying several times the price through subscription.

> The OpenAI APIs are super cheap for a single user needs. I expect them to be at least close to breaking even with their APIs pricing.

“Close to breaking even” means the price you pay is VC-subsidized, the expected gross margin for such kind of tech company is more than 50%. Expect to pay a lot more if/when the market is captive. And this will scale linearly with your use of the technology.

> energy and opportunity costs

What opportunity cost?


> Expect to pay a lot more if/when the market is captive.

Yes, this is a possibility but cloud computing became a commodity.

But I see why people would pay to have their own private and unfiltered models/embeddings.

> if OpenAI isn't able to get couple hundred bucks over the typical lifetime of a computer it means the added value they provide is very low (like several times less than Spotify or Netflix for instance), meaning they'll never be “the next Google”.

They don't have to worry about this today.

> What opportunity cost?

You could utilize the money and the time spent to do other things.


I think it’s quite likely that the RAM onboard these devices expands pretty massively, pretty quickly as a direct result of LLMs.

Google had already done some very convincing demos in the last few years well before ChatGPT and GPT-4 captured the popular imagination. Microsoft’s OpenAI deal I would assume will lead to a “Cortana 2.0” (obviously rebranded, probably “Bing for Windows”, “Windows Copilot” or something similar). Google Assistant has been far ahead of Siri for many years longer than that, and they have extensive experience with LLMs. Apple surely realises the position their platforms are in and the risk of being left behind.

I’m also not sure the barrier on iPhone is as great as you suggest - it’s obviously constrained in terms of what it can support now but if the RAM on the device doubles a few times over the next few years I can see this being less of an issue. Multiple models (like the Alpaca sets) could be used for devices with different RAM/performance profiles and this could be sold as another metric to upgrade (i.e. iPhone 16 runs Siri-2.0-7b while iPhone 17 runs Siri-2.0-30b - “More than 3x smarter than iPhone 16. The smartest iPhone we’ve ever made.” etc).


How much does 64GB of RAM cost, anyway? Retail it's like $200, and I'm sure it's cheaper in terms of Apple cost. Yet we treat it as an absurd luxury because Apple makes you buy the top-end 16" Macbook and pay an extra $800 beyond that. Maybe in the future they'll treat RAM as a requirement and not a luxury good.


and we know that more will be cheaper in future


With the integrated ram and cpu and gpu on apple silicon, however it's done it yields perf results. I do think that probably has higher cost than separately produced ram. And even separate from that, because they have that unified memory model unlike every other consumer device they can charge for it. So 64, 96 or 128 gb?


Its not done for perf results, Xbox doesnt have ram on package and somehow does 560 GB/s


The perf results I was referring to was the ability to run an llm locally (like llama.cpp) that uses a giant amount of ram in the gpu, like 40gig. Without this uniform memory model, you end up paging endlessly, so it's actually much faster for this application in this scenario. Unlike on a pc with a graphics card, you can use your entire ram for gpu. This isn't possible on the xbox because it doesn't have uniform memory as far as I know. So having incredible throughput still won't match not having to page.

Edit - I found an example from h.n. user anentropic, pointing at https://github.com/remixer-dec/llama-mps . "The goal of this fork is to use GPU acceleration on Apple M1/M2 devices.... After the model is loaded, inference for max_gen_len=20 takes about 3 seconds on a 24-core M1 Max vs 12+ minutes on a CPU (running on a single core). "


4. Weights are expensive to produce and valuable IP, but hard to protect on the client unless you do a lot of work with encrypted memory.

No, it'll be a commodity

Apple wouldn't care if the weights can be extracted if you have to have a Macbook to get the sweet, futuristic, LLM-enhanced OS experience


I've been looking into buy a mac for llm experimentation - 64, 96 or 128gb of ram? I'm trying to decide if 64gb is enough, or should I go to 96gb or even 128gb. But it's really expensive - even for an overpaid software engineer. Then there's the 1 or 2 tb storage question. Apple list price is another $400 for that second tb of storage.

For 64gb of ram, you can get an m2 pro, or get 96gb which requires the upgraded cpu on the pro. The studio does 64gb or 128gb. But the 128 requires you to spend 5k.

I can't decide between 64 or 96 on m2 pro, and 128 on the studio. Probably go for 96gb. Also what's the impact of the extra gpu cores on the various options? And there are still some "m1" 64gb pros & studios out there. What's the perf difference for m1 vs m2? This area needs serious perf benchmarking. If anyone wants to work with me, maybe I would try my hand. But I'm not spending 15k just to get 3 pieces of hardware.

List prices:

64gb/2tb m2 12cpu/30gpu 14" pro $3900

96gb/2tb m2 max 12/38 14" pro $4500

128gb/2tb m2 max 28/48 studio $5200


Check out the LLaMA memory requirements on Apple Silicon GPU here: https://github.com/remixer-dec/llama-mps


I’m pretty sure you can get a purpose-built pc tower in that range. Why would you favor a Mac over that? A lot of this stuff only has limited support for MacOS.


The unified GPU/CPU memory structure on ARM Macs is very, very helpful for running these LLMs locally.


Is there a big difference in principle between that and the "shared video memory" that has long existed on cheap x86 machines?

…or is it just that the latter had a way too weak iGPU and not enough RAM for AI purposes, whereas the bigger ARM MACs have more GPU power and enough RAM (more than most affordable discrete graphic cards) so that they are usable for some AI models?


You can't get that much VRAM on a PC for a comparible price.


Running models locally is the future for most inferencing cycles. There is a lot of accuracy that could be improved in your numbered list trying to dissuade people.

> 64GB probably isn't enough for competitive LLMs anyway

I am trying to charitable, but this is pretty not true. And the hedging in your statement only telegraphs your experience.


> Even if a high end MacBook can do local inferencing, the iPhone won't and it's the iPhone that matters

Doesn't the iPhone use the local processor for stuff like the automatic image segmentation they currently do? (Hold on any person in a recent photo you have take and iOS will segment it)


Yes but I'm not making a general argument about all AI, just LLMs. The L stands for Large after all. Smartphones are small.


>One possibility is simply cost: if your device does it, you pay for the hardware, if a cloud does it, you have to pay for that hardware again via subscription.

Yeah but in the cloud that cost is ammortized among everyone else using the service. If you as a consumer buy a gpu in order to run LLMs for personal use, then the vast majority of the time it will just be sitting there depreciating.


But then again, every apple silicon user has an unused neural engine sitting around in the SoC an taking a significant amount of die space, yet people don't seem to worry too much about its depreciation.


> 7. Apple already has all that stuff worked out for Siri, which is a cloud service, not a local service, even though it'd be easier to run locally than an LLM.

iOS actually does already have an offline speech-to-text api. Some part of Siri that translates the text into intents/actions is remote. Since iOS 15, Siri will also process a limited subset of commands while offline.


Chips have a 5-7 year lead time. Apple has been shipping neural chips for years while everyone else is still designing their v1.

Apple is ahead of the game for a change getting their chips in line as the software exits alpha and goes mainstream.


But they haven't exposed them to use. They are missing a tremendous opportunity. They have that unique unified memory model on the m1/m2 arms so they have something no other consumer devices have. If they exposed their neural chips they'd solidify their lead. They could sell a lot more hardware.


They are though. Apple released a library to use Apple Silicon for training via PyTorch recently, and has libraries to leverage the NE in CoreML.


> Apple Silicon for training via PyTorch recently

This is just allowing PyTorch to make use of the Apple GPU, assuming the models you want to train aren't written with hard-coded CUDA calls (I've seen many that are like that, since for a long time that was the only game in town)

PyTorch can't use the Neural Engine at all currently

AFAIK Neural Engine is only usable for inference, and only via CoreML (coremltools in Python)


Thank you! I wasn't aware of that. Let me research that. May 2022 announcement. Is this suitable for the the apps like llama.cpp since it's a Python library? It appears to be a library but they didn't document how to use the underlying hardware - but I welcome more info.


iPhones have similar Neural Engine capabilities, obviously far more limited but still quite powerful. You can run some pretty cool DNNs for image generation using e.g. Draw Things app quite quickly: https://apps.apple.com/us/app/draw-things-ai-generation/id64...


1. Quadruple it.

2. see above

Should be cheap, or why else are Samsung, Micron and Kioxia whining about losses?

Maybe go for something like Optane memory while doing so.


Optane is sadly no longer being manufactured.


I know. That's why I wrote something like ;-)


> "If you squint a bit and look into the near future it's not so hard to imagine a future Mx chip with a more capable Neural Engine and yet more RAM, and able to run the largest GPT3 class models locally. (Ideally with better developer tools so other compilers can target the NE)"

Very doubtful unless the user wants to carry around another kilogram worth of batteries to power it. The hefty processing required by these models doesn't come for free (energy wise) and Moore's Law is dead as a nail.


Most of the time I have my laptop plugged in and sit at a desk...

But anyway, there are two trends:

- processors do more with less power

- LLMs get larger, but also smaller and more efficient (via quantizing, pruning)

Once upon a time it was prohibitively expensive to decode compressed video on the fly, later CPUs (both Intel [1] and Apple [2]) added dedicated decoding hardware. Now watching hours of YouTube or Netflix are part of standard battery life benchmarks

[1] https://www.intel.com/content/www/us/en/developer/articles/t...

[2] https://www.servethehome.com/apple-ignites-the-industry-with...


My latest mac seems to have about a kilogram of extra battery already compared to the previous model.


Apple’s move to make stable diffusion run well on the iPhone makes me think they’re watching this space, just waiting for the right open model for them to commit to.


I wonder how good the neural engine with the unified memory is compared to say intel cpu with 32gb ram. Could anyone give some insight?


There seems to be a limit to the size of model you can load before CoreML decides it has to run on CPU instead (see the second link in my previous comment)

If it could use the full 'unified' memory that would be a big step towards getting these models running on it

I'm unsure how the performance compares to a beefy Intel CPU, but there's some numbers here [1] for running a variant of the small distilbert-base model on the Neural Engine... it's ~10x faster than running on the M1 CPU

[1] https://github.com/anentropic/experiments-coreml-ane-distilb...


Siri was launched with a server-based approach. It wouldn't be surprising if Apple's near-term LLM strategy would to put a small LLM on local chips/MacOS and a large model running in the cloud. The local model would only do basic fast operations while the cloud could provide the heavyweight intensive analysis/generation.


I can see how the apple silicon memory system can help with LLMs, but a couple points of reality check:

- such amounts of memory is locked behind very expensive sku which even most of mac userbase will not use ( <5% in the new purchases to be very conservative ). - not too long ago apple would restrict the amount of ram in their system for their own reasoning (source: https://9to5mac.com/2016/10/28/apple-macbook-pro-16gb-ram-li...) - just like mid 2010s gpus with 6-8 gb vram but with little to benefit from it, i don't see the ml accelerators/gpu in current models being capable enough to make the most of the memory available to it.


That's today... think of the future

My first computer had 512KB RAM and 20MB was an expensive hard drive.

64GB Macbooks are currently an expensive 'Pro' novelty, they will be the vanilla of tomorrow

> i don't see the ml accelerators/gpu in current models being capable enough to make the most of the memory available to it

that's exactly my point (and apparently today's Neural Engine can't even take advantage of all the unified memory available)

until LLaMA there was no reason to have more than this, they probably imagined it would just run a bit of face-detection and speech-to-text on the side

but if they got serious and beefed it up it could be the next wave of computing IMHO


Update: https://www.bjnortier.com/2023/04/13/Hello-Transcribe-v2.2.h...

The iPhone 14 runs Whisper model faster than an M1 Max, because it has a newer Neural Engine

I look forward to the M3 Macbook launch eagerly, while expecting mild disappointment


Can we re-invent SETI with such LLMs/new GPU folding/whatever hardware and re-pipe the seti data through a Big Ass Neural whatever you want to call it and see if we have any new datapoints to look into?

What about other older 'questions' we can point an AI lens at?



You need state of the art consumer tech to run a model comparable to GPT-3 locally at a glacial pace.

Or, you can use a superior GPT 3.5 for free.


"Dario Amodei, the former VP of research at OpenAI, launched Anthropic in 2021 as a public benefit corporation, taking with him a number of OpenAI employees, including OpenAI’s former policy lead Jack Clark. Amodei split from OpenAI after a disagreement over the company’s direction, namely the startup’s increasingly commercial focus."

So Anthropic is the Google-supported equivalent of OpenAI? Isn't the founder going to run into the same issues as before (commercialization at OpenAI)? How does Google not use Anthropic as either something commercial or nice marketing material for its AI offerings?


There may have been a disagreement, but now they're focused on profit, as everybody else. From the same article:

“Anthropic has been heavily focused on research for the first year and a half of its existence, but we have been convinced of the necessity of commercialization, which we fully committed to in September [2022],” the pitch deck reads. “We’ve developed a strategy for go-to-market and initial product specialization that fits with our core expertise, brand and where we see adoption occurring over the next 12 months.”


This is hilarious to me, in that the disgruntled departure just did a 180… how long until the next disgruntled spin-out for higher reasons chases the dollar too…


> This is hilarious to me, in that the disgruntled departure just did a 180… how long until the next disgruntled spin-out for higher reasons chases the dollar too…

The cynic in me wants to ask "What makes you think his departure was because of an anti-commercialisation position?"

My take (probably just as wrong as everybody's else take) is that he saw the huge commercialisation potential and realised that he could make even more money by having a larger stake, which he got when he started his own venture.


It’s pretty clear, the words say he was anti, then the company he helped create apparently has marketing material all about being commercialization. Unless he leaves tomorrow for the same reasons it is quite hard to disbelieve that “cash rules everything around me”.


That does seem more likely. Let's hope his VP of research does the same thing to him (-:


If you look read the parent comment in this thread you'd get an answer...


> If you look read the parent comment in this thread you'd get an answer...

I looked and I didn't get an answer. hence my comment.

To clarify, we know what he said his reason was, we don't know if that really was his reason.

When people leave they very rarely voice the actual reason for leaving; the reason they give is designed to make them look as good as possible for any future employer or venture.


Everybody in the chip business was a spin-off from Fairchild. This is pretty common when a huge, new tech comes along.


To be fair I think he had the same realization that they had at OpenAI. Sam Altman has gone on the record saying it's basically impossible to raise significant amounts of money as a pure nonprofit and you aren't going to train cutting edge foundation models without a lot of cash. Anthropic is saying they literally need to spend $1B over 18 months to train their next Claude version.


Also, to chase those dollars while being on the leash of Google’s massive investment.

So they lost the plot on the altruistic mission within months of setting up shop, and now are just a pawn in a bigger game between other companies.


The same thing happened back in the processor arms race days and before that in the IC days. Ex-Fairchild engineers created a lot of the most durable IC and chip companies out there. Intel's founders were ex-Fairchild.


Its not about making money, its about opening up the tech to the public (including source, weights...etc)


It sure looks like it's about the money to me.


>So Anthropic is the Google-supported equivalent of OpenAI? Isn't the founder going to run into the same issues as before (commercialization at OpenAI)? How does Google not use Anthropic as either something commercial or nice marketing material for its AI offerings?

I think the unstated shift that has happened in the past few years is that we've gone from researchers thinking about Fourier transforms to efficiently encode positional data into vectors to researchers thinking about how to train a model with a 100k+ token batch size on a super-computer-like cluster of GPUs.

I can totally see why people believed the math could be done in a non-profit way, I do not see how the systems engineering could be.


More like FTX-supported. They got half a billion in investment from them according to an earlier blog post by Anthropic.


I believe I read somewhere that that investment may have to be returned.


The article says the shares are expected to be sold as part of the FTX bankruptcy process.


Hence the race to an AI smart enough to figure out a way to keep the money.


What does a policy lead do and how are they relevant to an early stage startup? I would be more interested in seeing which researchers and engineers join.


I assume it's basically this position: https://thriveml.com/jobs/product-policy-lead-e296c565

> As the Product Policy Lead, you will set the foundation for Anthropic’s approach to safe deployments. You will develop the policies that govern the use of our systems, oversee the technical approaches to identifying current and future risks, and build the organizational capacity to mitigate product safety risks at-scale. You will work collaboratively with our Product, Societal Impacts, Policy, Legal, and leadership teams to develop policies and processes that protect Anthropic and our partners.

> You’re a great fit for the role if you’ve served in leadership positions in the fields of Trust & Safety, product policy, or risk management at fast-growing technology companies, and you recognize that emerging technology such as generative AI systems will require creative approaches to mitigating complex threats.

> Please note that in this role you may encounter sensitive material and subject matter, including policy issues that may be offensive or upsetting.


Jack is pretty well known in the community since he runs not only the Import AI newsletter, but also has been a partner in the AI Index report. He also has a media background so is generally well connected even beyond his influential reach. Also, though not relevant to your question, he's a really nice guy :)


Curiously, Anthropic.com was launched in 2021, but a small custom software shop in Arizona around since the mid-late 90s had registered and been using Anthropic.ai in 2020 for a couple projects.

How does that name collision work?


“Hi we’re here to save humanity, and we’re stealing your name! We have a ton of lawyers, buckets of cash from Google to hire more lawyers, and if you don’t like it, you’re fucked. Now please enjoy being saved by us.”


Maybe they bought the domain name for a mutually agreeable price?


It's the trademark that matters i thought (possibly naively), since anthropic.ai was registered in 2020 for a product built in 2019, it seems, and the Anthropic spin off from OpenAI was formed in 2021, seems to have purchased a squatted domain name of anthropic.com then.

Kind of unsure how it all works.


Well you can also buy a trademark, right? Though I think different companies are allowed the same trademarks if the things being trademarked aren’t confusable


"public benefit".. Ah, they are so good. May be they will _open_ something after all.. ;)


yeah, LOL on that. Their idea of "public benefit" is of course that they benefit publicly, though for marketing purposes "public benefit" sounds nicer (just like the "Open" in "ClosedAI") because people would tend to emotionally associate something nicer with it.

Reminds me of the (possibly LLM-generated) marketing tirade of a voice faking text-to-speech service recently here on hn, which ended with: "We are thrilled to be sharing our new model, and look forward to feedback!":

https://news.ycombinator.com/item?id=35328698

… "share" yeah right… like: where can I download the model then? Of course they didn't mean to actually share their model but only to rent out remote access to it, but that doesn't sound as nice as "share".


If someone released a chatGPT/characterAI with NSFW content enabled it would eat into a big share of their users (and for characterAI, maybe take all of them). Seriously, look into what people are posting about when it comes to characterAI, and it's 80% "here's how to get around NSFW filters".

Unsure why nobody is taking this very very obvious hole in AI tech.


The main reason why companies don't allow NSFW content is because of puritan payment processors that see that stuff and then go absolute sicko mode and lock people out of the traditional finance system.


It is amazing that in the year 2023, where things are possible that were science fiction until recently, we still rely on private payment processors, credit card companies, which extract fees for a service that doesn't have any technical necessity anymore. I think the reason is just inertia. They work well enough in most cases, and the fees aren't so high as to be painful, so there is little pressure to switch to something more modern.


> I think the reason is just inertia.

It is not just inertia; it is government malice. The government loves that there are effectively only two payment processors, because this lets them exercise policy pressure without the inconvenience of a democratic mandate.


Yes, financial companies mostly regulate themselves. They have lawyers telling them what regulators are likely to approve of, and make rules based on that for themselves and their customers. If they do something sufficiently bad, regulators go after them. That’s how banks get regulated, too.

That’s how most law works, actually. There’s a question of how detailed the regulations are, but mostly you don’t go to court, and if you do, whether it looks bad to the judge or jury is going to make a difference.

I’m wondering what you’re expecting from democracy? More oversight from a bitterly divided and dysfunctional Congress? People voting on financial propositions?


If entire classes of financial transactions can be blocked through backroom conversations between financial companies and regulators, don't you think that's bad for democracy? We have laws which allow the US to tackle money laundering issues and it's understandable that regulators would create regulations along those laws; they have a clear mandate to. It's not clear to me that other classes of transaction should be blocked based on amorphous dealings with regulators and companies.


Part of the issue is that it's usually American regulators setting these rules, but they're applied globally due to the more-or-less duopoly of credit card companies.


>we still rely on private payment processors, credit card companies, which extract fees for a service that doesn't have any technical necessity anymore

The technical necessity is there; for your chase-backed visa card to pull money from chase and deposit it into your shop's citibank, there needs to be some infrastructure. Whether a private company or the government provides this infrastructure is another story.

(Although if the government provided you could argue that there would likely be even more political headaches that prevent what goes across the wire).


Do you have an estimate of the cost of the infrastructure required vs. how much credit card companies charge today?


In Q4 2022, Visa had a revenue of 7.94B, net income of 4.18B, and net profit margin of 52.66%.


Bank transfers are a thing. They don't require an intermediary credit card company. The problem is that currently such a transfer is usually slow because of software/protocol issues.


I think companies accept credit card payments because that’s what their customers want and companies want to get paid.


Yes, the current system is a good-enough solution, and any better alternative has to be not just better but so much better that it is worth the large cost of switching to a different solution. Game theoretically, it's an "inadequate equilibrium".


And conversely: For online payments, credit card payments are my least preferred method. But I still use them quite often, because everyone accepts them.


> switch to something more modern

such as?


I would be very surprised if something based on Blockchain or similar software doesn't offer a solution here. Another route would be to establish a protocol for near instantaneous bank transfers, and try to get a lot of banks on board. The immediacy of transfers seems to be the main reason why companies use credit card services, not buyer protection or actual credit.


I have no particular love of legacy systems (whether they be banks, imperialism, or Electoral Colleges), but what about your comment is plausible given the widespread recognition that blockchain technologies have been oversold, leading to widespread fraud and failure?

Maybe I’m missing something, but the above comment reminds me of the blockchain naïveté of 10 years ago. I don’t mean to be dismissive; I’m suspending disbelief long enough to ask what you mean in detail.


This is possible, and I don't have any deeper knowledge of cryptocurrencies / Blockchain. But payment systems don't seem to have a necessary connection to speculation and the high volatility which comes with holding a cryptocurrency. Maybe I overestimate the amount of problems those payment systems can solve.


There are permissioned "blockchains" which are just private ledgers, that banks could use with permission from the US gov't. These can be anything from a centrally run DB with ACLs or something like Hyperledger with permissioned P2P access. Whether you call it a blockchain or a DB with ACLs is immaterial; it's still much cheaper, faster, and pleasant to use this system over the current system of complex intermediaries in the US. Europe seems to have solved this problem with SWIFT.


There is a system called Faster Payments in the UK, which is "near instantaneous" between the UK banks which participate (most of them offering current accounts as far as i know).

But it is a permanent and final transfer, no easy charge backs like with a credit card, or fraud protection from debit cards.

You have to know which account you are paying into (sort code and account number), which is the main part of what Visa/Mastercard do. They are the layer in front of the bank account which means customers don't have to send money directly to an account.

I suppose now everyone has a smart phone it would be easier to hook up something like Faster Payments in a user friendly way with an app and a QR code/NFC reader that the merchant has. But Visa/Mastercard are entrenched obviously.


> The immediacy of transfers seems to be the main reason why companies use credit card services, not buyer protection or actual credit.

I think the above is quite wrong (with moderate confidence). Is the above claim consistent with survey data? It is my understanding that:

1. companies care a lot about risk reduction. This includes protection from chargebacks.

2. companies benefit when customers have credit: it enables more spending and can smooth out ups and downs in individual purchasing ability

3. Yes, quick transfers matter, but not in isolation from the above two.


Well, chargebacks are not possible for ordinary bank transfers. The problem is that the they are too slow and not convenient enough. This is a software / standardization issue. Credit: PayPal is successful despite it only offering very short credits in order to ensure quick transfers. And in physical shops credit cards often seem to be no more than a convenient way to pay without cash. In Germany you can actually pay in all shops and restaurants with a form of debit card, which is just as convenient as paying with credit cards, but has less fees for the shop, since there is no credit card company in the middle. As a result most people don't own a credit card. This doesn't work so well online though.


> I would be very surprised if something based on Blockchain or similar software doesn't offer a solution here.

There is, it's a layer-2 on Ethereum called zkSync. It's not totally satisfactory (the company that makes it can steal your money, centralized sequencer, etc), but it's pretty mature and works quite well. To replace Visa you want high throughput and low latency and zk-rollups like zkSync can provide both. (There are other options too, like Starknet, but AFAIK zkSync is the most mature.)


How long does it take before someone says "Blockchain"

Still faith in magic on HN


PayPal?


No PayPal is basically a credit card company. It is an intermediary which gives short credits in order to achieve near instantaneous payments. And extracts fees along the way.


Crypto obviously solved this. If you remove the speculation idiocy that surrounds it, yes crypto does work as an anti-censorship currency.

Someone is going to mention flashbots or something. “See this specific example proves…”


The main selling point to online shops would have to be a substantial reduction in fees compared to credit cards / PayPal. Most shops don't care about censorship since they wouldn't be affected anyway.


Yeah I don’t see that happening anymore. Crypto is always going to have fees and off-ramps, although I do think it’s helped create competition in the transfer space.

The real way crypto will work or not work is programmable money. If that works that will be huge, if it doesn’t then maybe someone will pick it back up 50 years from now.


The problem in this specific space is that many people don't want their payments to a NSFW company to be associated with them. Most blockchains makes this trivially traceable by design.

The ones that don't (eg Tornado cash) end up being used for money laundering so on/off ramps won't touch them. We'll see what happens with the ZK-based chains, but this seems a systematic problem that is difficult to fix.


Monero?



Wow, this seems to be just what I meant. Unfortunately it appears it is so far only widely supported by Indian banks. (In Germany there is a similar system, called GiroPay, but it hasn't really caught on yet. And it isn't even intended as an international solution.)


I think it's equally likely that they just don't want their product to be known as "the porn bot".


Why not? As long as it's not official. Bing was/is known as "the porn search engine" which never seemed to bother Microsoft.


I think the difference is that OpenAI wants to sell their text generation services to big companies that will show AI content directly on their platforms (think chat support bots), whereas Bing is selling eyeballs to advertisers (who also don't want their ads shown alongside porn by the by).

If OpenAI has the reputation of serving up porn to whoever asks, there's no way the Walmarts of the world will sign up.


It's also because the companies are backed by VCs. VCs get their money from limited partners like pension funds who don't want their money invested in porn.


I don’t buy this, if this was the reason then paid porn couldn’t exist, and we know that’s not the case.


It's because NSFW content has higher risks of chargeback and fraud (there's a reason their payment processors charge 20%+). Besides, companies don't want to be on the bad side of outrage; it only takes one mistake of processing a payment for child pornography and your name will be plastered everywhere as a child porn enabler.

Do you really think the execs at Visa and Mastercard are puritans and not profiteering capitalists that will process payments for NSFW content if they were able to?


Nothing to do with outrage.

Everything to do with one politician essentially getting their way by targeting a payment processor with legal shit concerning potential enablement of CP/CT. Nobody wants that kind of attention.


The whole US society seems more puritan while more capitalist at the same time, seen from this side of the pond. It’s a paradox I can’t really explain, any clues?


US society isn’t some anti-sex dystopia. Its average compared to the rest of the world, It’s just Europe that is super pro-nudity etc and projects. Like everything else they think they are objectively right in their beliefs and systems.


Not allowing sex apps on AppStores and banks and credit cards refusing to process sex-related transactions seems pretty anti-sex to me.

Also getting all bent out of shape at a the image of a nipple, breast or pubic hair while not batting an eye at a person dying in evening TV movies seem a bit unbalanced.


> US society isn’t some anti-sex dystopia

Not a dystopia, but certainly US society has, shall we say, a very strange and complicated relationship with sex and nudity.


https://en.wikipedia.org/wiki/Protestant_work_ethic

Besides that: 'There is no such thing as society!'



Perhaps it’s as easy as “ethics and laws are not the same thing”. One can profit either way, but unethical profiteering may not be prevented by a law.


You're conflating capitalism and greed. Plenty of greedy people in non-capitalist systems.


> Plenty of greedy people in non-capitalist systems.

Totally agreed. But I am not placing any moral value on either greed or capitalism. I would think, however, that capitalists would not ignore such an obvious profit center as the sex industry. Thus my bafflement.


What you missing is that by chosing this obvious profit center they risk a much larger profit center because the backlash. It's not a moral thing, it's a calculated choice. That's why who takes this risks also charges a much higher fee to make up for the opprtunity cost in other areas.


> But I am not placing any moral value on either greed or capitalism

That is a missed opportunity

* Capitalism: A system where who owns resources matters more tan who needs them is a morally bankrupt system. A system where starvation and homelessness is an acceptable outcome

* Greed. Greed is bad for everybody. Concentrates scarce resources where they are not needed, that too is moral bankruptcy


Funny enough my country was starving under communism but we are living in plenty under capitalism. Since I lived under the alternative and I have seen its evilness, I will take capitalism any day - the very system that allowed and incentivized us to create those resources you are eyeing in the first place.

As for greed, I have yet to meet a person more greedy than the ones claiming to know where to direct those scarce resources they did not create, if only we’d give them the power to do so. Such high morals too, unlike those "morally bankrupt" capitalists who greedily built businesses, jobs, countless goods and services to only enslave us and enrich themselves, obviously.


I'm glad you chimed in with this. This is the point: capitalism knows self-interest exists, and creates a system to harness it. Communism and similar pretend greed doesn't exist, and creates overly powerful central bodies to make everything fair.



> I would think, however, that capitalists would not ignore such an obvious profit center as the sex industry

Because you're conflating capitalism and greed. Capitalism doesn't mean "do anything for money". It means "as much as possible, people get to decide among themselves how to allocate their money and time". Some of them will invest in anything, just as people in non-capitalist countries. Most will only invest in certain things.


But look at how investment in weed, which was once considered "drugs == bad", flourished after legalization, with ETFs and such. Lots of sex work, including porn, is legal afaik. However banks and other civilian gate holders (Apple AppStore, etc) keep stifling investment in it.


I'm sorry, I don't see how that relates to what I was saying.


> Capitalism doesn't mean "do anything for money".

In the abstract, perhaps not. The way it exists in the US, though, it means exactly that.


This very thread is exactly about how, in US, it doesn’t.


> Do you really think the execs at Visa and Mastercard are puritans and not profiteering capitalists that will process payments for NSFW content if they were able to?

Pornhub was blocked by Visa and Mastercard after an op-ed in NYT generated a lot of outrage


> Do you really think the execs at Visa and Mastercard are puritans and not profiteering capitalists...?

Yes, "and"


This comment right here can be shown to snobs who still denounce crypto btw


This is not an argument for crypto, it's an argument for better regulations so that processors don't make up their own rules.


> This is not an argument for crypto, it's an argument for better regulations so that processors don't make up their own rules.

Better (which I assume is your euphemism for "more") regulation isn't neceesarily the answer, or even particularly the answer. Do you want to force payment processors to do work they don't want to do? Isn't there a word for that?


Not necessarily more. Better in this context means clearer and enforced.

PayPal is the prime example where it's operating very similar to a bank. You have an account with a balance and can send and receive money, but it doesn't see itself as a bank and in many countries doesn't have a bank license. At least in part this is done to avoid the regulatory work that comes with it.

I absolutely want to force payment processors to do work they don't want to do. For example, banks in Germany are forced to provide you with a basic bank account regardless whether they want to or not. That's because a bank account is simply a must have to take part in modern life. If PayPal decides it doesn't want to do business with you, for whatever arbitrary reason, you are effectively locked out of a lot of online stores that only accept PayPal as a payment method. There is plenty of examples of PayPals really sketchy behaviour online. Every few months you can even see complaints on HN about it.


> it's operating very similar to a bank

We might be talking at cross purposes; I'm not sure! How is it like a bank?


PayPal offers you a virtual account that you can pay money into. You can use that money to make purchases online, send and receive money from friends or other businesses. In effect, it acts like a bank account. However, it's not an actual bank account. In Europe, any money you put into that account is also not ensured by the government, like a normal account would.

If I pay with a credit card, there are processes in place to deal with fraud and charge backs. PayPal is well known to automatically close accounts with little recourse to access the money on those accounts.

They should absolutely be regulated.


I agree they should be regulated

But they are nothing like a bank

The feature of a bank is credit creation. Lending more money than they hold.

Unless I missed some news PayPal does not do that


This is what I was wondering - to my understanding the main reason banks need to be regulated is to stop them over-lending.


> Do you want to force payment processors to do work they don't want to do? Isn't there a word for that?

Public utility. That’s what payment processors are at this point, and they should be regulated as such.


If we think there's no more innovation to be had then this could happen, but I'm not sure that's the case.


Authoritarian solutions are very attractive today.


Reasonably regulating payment processors is far from authoritarian.

If you are on a scale like Visa and MasterCard you're not just any private company anymore. Just those 2 companies control well over 75% of the US market alone. Not having access to a debit/credit card today will effectively block you from taking part in many aspects of modern life. It's absolutely reasonable to place stipulations on what they can and cannot do.


I don't disagree with your objective, it's the path you are taking to get there. Legislating obedience is authoritarian and is solution that many people love due to its simplicity.

Regulators love working with large businesses like your card duopoly, I don't think you will see much improvement.


In what sense do they control the market?


Well you can wait a lifetime or you can take control away from them with a couple clicks. The choice is obvious.


As a rule of thumb, whenever anyone says "the choice is obvious", the choice they're talking about is usually far from obvious.


crypto + NSFW generative AI = ????

that's not going to lead to a whole lot of black market images.


It certainly stretches the bounds of reason for me that you could put a person in an isolation chamber with a powerful computer with no network connection, and after they type a few words into it, if the output of the computer has certain qualities, they are now a felon and the output is illegal to possess.

But this seems like the world the “AI-regulators” seem to want.


you don't think it would be problematic for someone to create deep fake images of some ones kids in explicit sexual positions?

I certainly think if the parents found out about it and the law wouldn't do anything about it the parents would take the law into their own hands.

I'm sorry if this wasn't phrased very well. I just didn't know how else to make my point with out be very specific.


A skilled artist can already easily do that and there's no law against it that I know of. (Granted, I haven't researched it because I'm neither an artist nor a pervert...)

Now, if they were drawn to resemble specific people and the producer of the "artwork" used them to harass those people, that's harassment. If they used them to groom other kids, that's an existing crime too. But my point was that the production of gross art in isolation, or the possession of it, didn't need to be criminalized. (Actual photographs of the same were criminalized because of the pretty decent assumption that minors were coerced, harmed, exploited. Probably all of the above.)


That's already illegal - you're using someone's image and likeness in a way they did not approve of.


Taking all payment in Ethereum doesn't matter when you have to pay for servers and domain names in fiat.


Servers and domains are one of the easiest things to buy with crypto.

I actually just migrated away from Hetzner last week (for unrelated reasons) to two new providers to whom I'm paying crypto (no KYC required) based on this list: https://bitcoin-vps.com/


would be nice if you could pay in your own token.


I'm not sure to what you're referring by "your own token", but most do offer a range of popular tokens by using one of the 3rd party payment providers like Coingate.

I had paid for my servers with some Litecoin I have that I usually use for small purchases because of the low fees.


kind of like a free tier for token projects. if it gets traction you would need more servers but the token would have value so there you go.


Lots of work on that front no doubt, and not only wrt domains


If you check /g/ on 4chan (NSFW!!!) you'll see multiple threads on LLMs and LLM-driven chatbots for such content.

Already quite advanced topic these days, all kinds of servers, locally run models, tips & tricks discussions, people sharing their prompts and "recipes", and so on.

It's a whole new world out there but I am not sure if such niche (albeit a potentially really big one, see pr0n sites for example) is worth all the liability issues these big AI companies might face (puritan/queasy payment processors, parental controls, NSFW content potentially blocking some enterprise access, etc, etc). But it will probably all be captured by one or two companies that will specialize in such "sexy" chatbots. Doubt it will be OpenAI and Anthropic, they have their sights on "world domination".


At least for AI image generators it is a giant liability. As of two years ago AI-generated CSAM that is indistinguishable from original photographic CSAM is considered equally criminal. If users can spawn severely illegal content at will using your product you will find yourself in a boiling cauldron 30 seconds after going live.

Stable diffusion no longer uses even adult NSFW material for the training dataset because the model is too good at extrapolating. There are very few pictures of iguanas wearing army uniforms, but it has seen lots of iguanas and lots of uniforms and is able to skillfully combine them. Unfortunately the same is true for NSFW pictures of adults and SFW pictures of children.


I realize this is a highly taboo topic, but I think there are studies which suggest that access to (traditional) pornography reduces frequency of rape. So maybe Stable Diffusion could actually reduce the rate of abuse? (Disclaimer: I know nothing about the empirical research here, I just say the right answer isn't obvious.)

Edit: It seems also that language models are a very different topics, since they block any erotic writing outright.


Yep. No sane company wants to deal with the legal and PR nightmare of their product being used to generate realistic CSAM based on a child star and/or photos taken in public of some random person's kid.


It's trivial to fine tune llamma to be NSFW if that's what you want.

But there's an entire universe of much more interesting apps that people don't want NSFW stuff in. That's why most foundation models filter it out.


Anything involving llamma is not trivial - if I can't do it on my phone through a website, then you shouldn't expect anyone else to be able to do it. If your instructions involve downloading something, or even so much as touching the command line, it makes it a non-starter for 95% of users.

Get something on the level of character.ai and then you can tell me it's "trivial".


The context of this thread is a company spending $4B with a 4 year plan to build foundation models. One could do what I suggested in between days and months of work for a single person, including building a user-friendly front end.

In the context of this thread it is trivial.


I don't think that's the reason. You wouldn't get anything "NSFW" if you don't ask/prompt for it.


the point is though the market potential is huge. and it would be a way to grow fast with cash flow. as a side effect you would probably develop the best NSFW filter in the world also.


> way to grow fast with cash flow.

Until the US payment processors cut you off, then you go bankrupt.


You’re not wrong, but the consumer market for chatbots is (perceived to be) tiny and I think nobody really cares about it. the real money places like openAI are chasing is business money.


What's with the NSFW need? I'd understand if this is some image generator, but here? Is it some sexting, "romance", or is NSFW about something else altogether?


ChatGPT refuses to write erotic fan fiction.

Related: I still remember when I used GPT-3 (davinci in the OpenAI playground) for the first time a few years ago. The examples were absolutely mind blowing, and I wanted it to generate something which would surprise me. So I tried a prompt which went something like

> Mike peeked around the corner. He couldn't believe his eyes.

GPT-3 continued with something like

> In the dimly lit room, Vanessa sat on the bed. She wore nothing but a sheer nightgown. She looked at him and

Etc. I think I laughed out loud at the time, because I probably expected ghosts or aliens more than a steamy story, though of course in retrospect it makes total sense. I wanted it to produce something surprising, and it delivered.


Fanfiction. It is a huge deal to some people. Many prefer reading stories over watching porn, and we all know how big of a market pornography is.


I wonder whether this actually an area where many women would push for, who have usually a much weaker interest in (visual) pornography.


Per description of a certain item in Fallout 2, "if you need to ask, you don't want to know".

UPDATE:

while fanfiction might be behind this vocal minority, there could be other uses of LLMs, for example translation

I don't go as far as "gender-swapping", because GPT4 swaps a man on a beach wearing only beach shorts for a woman wearing only beach shorts


Anything that drives the dopamine cycle is of interest to humans. Sex in all its forms is pretty motivating.


A very large subset of the people using generative AI are people using it for porn. And the people who make those AI models do not want them being used for porn.

Porn and AI is... problematic. Do you remember deepfakes? And how people used them first and foremost to swap other people's heads onto porn actors for the purpose of blackmail and harassment? Yeah. They don't want a repeat of that. Society has very specific demands of the people who make porn - i.e. that everyone involved is a consenting adult. AI does not care about age or consent.


"the VHS of AI"


Someday it will have to happen. There is just too much demand.


With so much NSWF on the web, how is NSFW chat with a computer even a thing? Genuinly curious what drives usage there.


I think there are FTC laws around this maybe.


I don't see how a competent legal team would ever sign-off on that.


Does anyone else see the "In X years time we'll have something Y times better than the competition has today" as a bit of a red flag? I saw this before in a product plan and it flagged up 2 things that really worried me. Firstly, the competition were already ahead of us, and they're obviously going to continue developing their stuff, so it's great to promise our thing will be better than their thing today but we're not competing against that, we're competing against what they'll have once time has passed. And secondly, by measuring yourself against the leading edge today you're eliding how much you're going to improve. For example, Anthropic say they'll be 10x better than leading models today in 18 months. That sounds acheivable right (no actually) but you don't even have to do that - because you aren't starting with the market leading model, so you have to catch up first and then 10x it so are they 10x or 20x or 100x in 18 months?


The only reality in which Anthropic will take on OpenAI (et. al.), would be if someone involved possess some sacred knowledge regarding how to build an AGI system that is radically off the path that the current market is charging down (i.e. ever-larger GPU farms and transformer-style models).

I suspect this is not the case. The same hustlers who brought you crypto scams didn't just disappear into the ether. All of that energy has to eventually go somewhere.


It’s not fair to compare them to crypto scams, this isn’t trying to juice retail investors for their life savings.


>The only reality in which Anthropic will take on OpenAI (et. al.), would be if someone involved possess some sacred knowledge regarding how to build an AGI system that is radically off the path that the current market is charging down (i.e. ever-larger GPU farms and transformer-style models).

Train on More tokens with More GPUs isn't exactly rocket science. I assume the RLHF loop is complex but training the base model itself is pretty well understood.


Yes, predicting the future is difficult. Nobody knows who will really be ahead in X years. Nobody knows how much OpenAI will improve either. But they have ambitions to improve a lot and their plans are credible enough to satisfy their investors.

Not sure what else you’re expecting? All VC investments have unquantifiable risks, but it doesn’t add up to a red flag if you like their chances.


With the recent scaling law papers that have come out you can evidently predict with pretty high accuracy how good your model will be by plotting out the scaling curves. So performance @ X Flops and Y tokens can be reasonably well known ahead of time.


100% agree. You need to catch up to OpenAI for starters and then figure out how to outpace them.


claude-instant-v1 is one of the "best kept secrets".

It is comperable in quality to gpt-3.5-turno, while being four times faster (!) and at half the price (!).

We just released a minimal python library PyLLMs [1] to simplify using various LLMs (openai, anthropic, AI21..) and as a part of that we designed a LLM benchmark. All open source.

[1] https://github.com/kagisearch/pyllms/tree/main#benchmarks


From my evals on nat.dev I found claude instant to give great responses and yes, avg 3-4X faster than 3.5, but one big difference atm is that anyone can sign up and get access to gpt-3.5-turbo right now, but claude is still gated behind an invite/wait list. (I'm still waiting for access for example.)


Exactly!

OpenAI are the only people who are shipping product like absolute maniacs. If I can’t use your fancy system, it doesn’t exist as far as I’m concerned. There’s a mountain of theoretical work, I don’t need a press release on top of it.

The game now is no longer theory, it’s shipping code. A 4-year plan means fuck all when OpenAI is not only ahead, but still running way faster.


I have Claude on Slack. It is far worse than ChatGPT. I’m presuming this is not “claude-instant-v1” version, it is fast though. Any idea what version is Claude in Slack


I didn't know about Anthropic, so I just signed up for the waitlist, thanks for the heads-up!


Could PyLLMs connect to a locally running LLM (e.g, llama variant)?


Not yet but PRs welcome!


Google invested $400M into Anthropic

https://news.ycombinator.com/item?id=34663438

Investors include:

- Eric Schmidt (former Google CEO/Chairman), Series A

- Sam Bankman-Fried, lead investor in Series B

- Caroline Ellison, Series B

https://news.ycombinator.com/item?id=34664963


If Google invests money into this and then this company uses Google cloud to compute, then does the money really outflow of Google much? Everything stays in the family, but in theory they can have an investment that works.

Also listing Sam Bankman-Fried does not help much, especially for a company hyping itself to be 10x better than a working competitor. I mean, since they built the competitor, probably their second project can be better, but it is a pie in the sky in many ways.


Would lawsuits targeting SBF and CE put those investments at risk via clawbacks. Kind of like how many Madoffs investors who made money were forced to return that.


I don’t think it’s much like what you describe because the direction is reversed. They say they expect the debtors to sell that investment over the next few years. Potentially they will sell it at a profit.


IANAL but it's not: they could be compelled to return whatever was invested (or however much of it remains). It's pretty unlikely given that it looks like a successful Anthropic investment is perhaps FTX creditors' best chance at decent recovery.


thats less than they invested in Magic Leap :)


AI models are in a race to the bottom and everybody inside Anthropic knows it. Besides OpenAI, with billions to spend plus a partnership with MS, there's also Google, Apple, Meta and Amazon who can afford to run losses on AI for years without blinking an eye.

And if that wasn't enough the Open Source world is releasing new models almost weekly now, for free.

Anthropic is putting on a big show to convince gullible investors that there's money to be made with foundational models. There's not. I expect a big chunk of the raised money to go out the door in secondary sales and inflated compensations. Great if you're working at Anthropic. Not great for investors.


Truth is, and this applies to all companies regardless of size, is that you don't have to be first, best, biggest, fastest, or most well-known in order to win market share that out-paces your investment. The AI pie is going to be very, very big. To estimate this size, let's take McKinseys rough estimates of job displacement (~30% of ~60% of jobs, ~20% of work) and use that to estimate the actualized [US, apologies] GDP that can at some point be attributed to AI: it is in the 4-5 trillion range using today's figures.

To say a market that large will be owned by only 4-5 companies doesn't make sense. Let's take the PC market for example: there are roughly 6 companies that make up ~80% of the market, sure. However, let's look at a tiny participant compared to the total market (~65B): iBuyPower at rank #77 had sales of 40MM or 0.06% (small, expected) of the market with a much smaller capital investment. If look at this percent compared to 5T, we would be at 3B. While the 5B investment stated in the headline could result in a lower ranking and smaller share, the point stands that there is still a lot of money to be made on the long tail. Even if Anthropic fails, there will be other companies with similar infusions that succeed.


The AI (LLM) market as a whole is very immature, trying to guess today what it will look like in a decade based on the investments/behaviour of the first couple movers is pretty foolish. Even predicting for a specific submarket (ie, consumer LLM products like ChatGPT) is hard enough. Who knows what other categories could develop and be dominated by companies who narrow in on them and once the R&D progress starts flatlining like it always does.


The best way to predict the future is to invent it.

- Alan Kay


It is not clear if (i) a lot of the surplus will be captured by the AI providers and (ii) that the impact will be anywhere as big as people now guess/want it to be. Making a bet on the future is fine, of course.


My question would also be what kind of insight McKinsey can provide here. What, if anything, do they know about AI that we don't know?


You don't need to just take one source. OpenAI authored their own paper [1] on the economic impacts of just LLMs: "Our findings reveal that around 80% of the U.S. workforce could have at least 10% of their work tasks affected by the introduction of LLMs, while approximately 19% of workers may see at least 50% of their tasks impacted."

Goldman Sachs Research just pushlished their own analysis as well. [2] Their conclusions are "As tools using advances in natural language processing work their way into businesses and society, they could drive a 7% (or almost $7 trillion) increase in global GDP and lift productivity growth by 1.5 percentage points over a 10-year period." and "Analyzing databases detailing the task content of over 900 occupations, our economists estimate that roughly two-thirds of U.S. occupations are exposed to some degree of automation by AI. They further estimate that, of those occupations that are exposed, roughly a quarter to as much as half of their workload could be replaced."

[1] https://arxiv.org/pdf/2303.10130.pdf

[2] https://www.goldmansachs.com/insights/pages/generative-ai-co...


From [1]: "In our study, we employ annotators who are familiar with LLM capabilities. However, this group is not occupationally diverse, potentially leading to biased judgments regarding LLMs’ reliability and effectiveness in performing tasks within unfamiliar occupations."

From [2]: "Analyzing databases detailing the task content of over 900 occupations, our economists estimate that roughly two-thirds of U.S. occupations are exposed to some degree of automation by AI."

These are people who do not understand the jobs they are claiming AI will do. Ultimately, I think they are not doing much better than guessing.


We’ve got a lot of data scientist talent but I wouldn’t put a lot of stock in this particular estimate. If McK is gonna produce a novel insight it’s usually derived from having the input of many businesses across an industry and experience looking at their problems. It’s hard to imagine this one isn’t more or less made up due to the number of assumptions required.


Likely not much and assuredly wrong, I just wanted to ground my argument with numbers that came from people who presumably did more research than I was willing to do for an HN post.


If anything McKinsey has a lot to gain from exaggerating the numbers so more companies come to them for AI solutions or whatever their next consulting product is.


Although a large total addressable market (TAM) is very alluring, know that most markets are dominated by a few players. For example, sugary beverages (Coca Cola), office software (Microsoft), or luxury sports cars (Ferrari). Exceptions are markets where companies cannot find a moat such as air travel or farming. In those markets, profit margins are tin.

At this point in time, it’s hard to tell whether moats will arise around large language models. Peter Tiels thinks so or he wouldn’t have invested (see his Competition is For Losers presentation).

What is unlikely is that semi-good companies will thrive. Maybe for a few years but at some point the smaller players will be pushed out of the market or need to find a specific niche. Just look at cars to see this. Around 1900 there were hundreds of car brands.


These studies seem to be largely focused on job displacement. There is is a reasonable likelihood that AI grows the overall economy.

I think we forget that our perspective of AI now is comparative, probably to that of a preindustrial worker worried about machines. Displacement, sure but complete replacement seems a non nuanced view of how it may all turn out.


Can’t find this study, have a link?

> let's take McKinseys rough estimates of job displacement (~30% of ~60% of jobs, ~20% of work)


PCs are hardware which have a minimum cost to be produced. Now do the same calculation for search engine or computing clouds.


The counter argument is that it's a growing market where any early entrants will be lifted with the tide and can probably yield enough profit from spillover hype for investors to make their investments back.


I can already run gpt4xalpaca on my PC, a model that is not-bad-at-all and is completely uncensored (i.e. does things that chatGPT can't do). I think it's true that LLMs are racing to the bottom and will be even more once they can fit as a peripheral to every computer. whoever is investing in this to monopolize has not thought it through


It’s astonishing to me that people seem to believe the llama models are “just as good” as the large models these companies are building, and most people are only using the 7B model, because that’s all their hardware can support.

…I mean, “not-bad-at-all” depends on your context. For doing mean real work (ie. not porn or spam) these tiny models suck.

Yup, even the refined ones with the “good training data”. They’re toys. Llama is a toy. The 7B model, specifically.

…and even if it weren’t, these companies can just take any open source model and host it on their APIs. You’ll notice that isn’t happening. That’s because most of the open models are orders of magnitude less useful than the closed source ones.

So, what do want, as an investor?

To be part of some gimp-like open source AI? Or spend millions and bet you can sell it B2B for crazy license fees?

…because, I’m telling you right now; these open source models, do not cut it for B2B use cases, even if you ignore the license issues.


You know what I believe is also a toy model? chatGPT Turbo, you can tell by the speed of generation. And it works quite well, so small size is not an impediment. I expect there will be an open model on the level of chatGPT by the end of the year because suddenly there are lots of interested parties and investors.

Eventually there will be a good enough model for most personal uses, our personal AI OS. When that happens there is a big chance advertising is going to be in a rough spot - personal agents can filter out anything from ads to spam and malware. Google better find another revenue source soon.

But OpenAI and other high-end LLM providers have a problem - the better these open source models become, the more market they cut underneath them. Everything open source models can do becomes "free". The best example is Dall-E vs Stable Diffusion. By the next year they will only be able to sell GPT4 and 5. AI will become a commodity soon, OpenAI won't be able to gate-keep for too long. Prices will hit rock bottom.


> I expect there will be an open model on the level of chatGPT by the end of the year because suddenly there are lots of interested parties and investors.

I really don't think you understand just how absurdly high the cost is to train models of this size (which we still don't know for sure anyways). I struggle to see what entity could afford to do this and release it as no cost. That doesn't even touch on the fact that even with unlimited money, OpenAI is still quite far ahead.


Still cheaper than a plane, a ship or a power plant, and there are thousands of those.


And how many are given away for free?


I think you're conflating speed of inference/generation with optimization. gpt-3.5-turbo does not fit on a single GPU unlike the "toy" models.


I think that Alpaca 30 billion is pretty competitive with ChatGPT except on coding tasks. What benchmarks are you using to make your determination about suitability for B2B?


gpt4xalpaca is 13B


7? 13? Who cares? It’s an order of magnitude smaller than the GPT models. It’s a toy.


This is a repeat of the early GPU era.

It's not the software or hardware that will "win" the race, it's who delivers the packaged end user capability (or centralizes and grabs most of the value along the chain).

And end user capability is comprised of hardware + software + connectivity + standardized APIs for building software on top + integration into existing systems.

If I were Nvidia, I'd be smiling. They've been here before.


Nvidia: just as the sun starts setting on crypto mining, the foundation model boom begins. And in the background of it all, gaming grows without end.


If you've got a choice, sail your ship on a rising tide! And if you can spread the risk over multiple rising tides, so much the better!

My dad told me a quip once: "It's amazing how much luckier well prepared people are."


> I can already run gpt4xalpaca on my PC

You can also run your stack on a single VPS instead of cloud, gimp instead of photoshop, open street maps instead of Google maps, etc.

There will always be companies who can benefit from a technology, but want it as a service. In addition, there will be a lot fine-tuning of LLMs for the the specific use case. It looks like OpenAI is focusing a lot on incorporating feedback into their product. That’s something you won’t get with open-source models.


Imagine you're a tech company that pays software engineers $200K/year. There is a free open-source coding model that can double their productivity, but a commercial solution yields a 2.1x productivity improvement for $5000 annually per developer. Which do you pick?


Not sure if parent had a certain answer in mind, but my answer is OSS because (1) I can try it out whenever I want, and (2) I don't have the vexing experience of convincing the employer to purchase it.


That’s the endless «build vs buy” argument. And countless businesses are buying.


I don't this it's the same thing, at least for me.

In the GP's scenario, I wouldn't be building either piece of software.


The existence of the models is making programmers cheaper rather than the reverse.

But i think it is underestimated how important it is for the model to be uncensored. ChatGPT is currently not very useful beyond making fluffy posts. As a public model, they won't be able to sell it for e.g. medical applications because it will have to be perfect to pass regulators. It cannot give finance advice. Censorship for once is proving to be a liability for a tech company.

In-house models OTOH can already do that, and they can be retrained with additional corpus or whatever. And it's not even like they require very expensive hardware.


I find your argument persuasive, companies should spend extra for the significant productivity gain. But then again from experience most companies don’t give you the best tools the market hast to offer..


Yeah but with very simple tasks with the 2k tokens limit. Let alone the fact that it can't access the internet, or have more powerful extensions (say Wolfram).


Alpaca is the Napster or LLMs


That’s an argument, but I don’t buy it. Models are a commodity. You don’t get VC valuations and returns from raising $5B for a grain startup.

The application of AI to business problems will be lucrative, but the models are just a tool and the money will come from the domain-specific data (i.e. user and business data), which Microsoft, Google, and even Meta are positioned for. Having a slightly better model but no customer data or domain expertise doesn’t seem like a great recipe.

Then again it’s AI, so there’s more uncertainty than the commodity market. Maybe Anthropic will surprise and I’ll be as wrong about this as I was about OS/2 being the future. But I’m very skeptical.


I don't think the grain market is growing as fast as the AI market


Don’t confuse the ai market with the foundational llm model market.

Think of LLMs as the understanding component in the brain, once you can understand instructions and what actions need to happen from those instruction you’re done.

The rest is integrations, the arms legs and eyes of langchain. Then memory and knowledge from semantic search, vector databases and input token limits.


This is the real answer.

The LLM is but the core of the entire ecosystem. Just like how MLOps is 99% of the work, choosing an LLM is 1% of the effort in the final product.


Plus first-mover advantage has consistently shown to not be a legitimate strategy as there are a ton of cases where the first winner gets taken over by a new entrant once the market matures (Friendster being the classic example). Often the later companies learn from the mistakes of the first play.

R&D heavy markets might have some different characteristics but it's still way too early to say with AI.


What do they have besides an admittedly very cool name?


Anthropic is currently the only company that can compete with OpenAI (because they have comparable expertise). The rest (Google, Meta, Microsoft, etc) are still pretty far behind.


This approach didn’t work for Docker.


> can probably yield enough profit from spillover hype for investors to make their investments back.

The correct term for this is “pyramid scheme”.


No, this is more "everyone is selling X, let's get in the business of X". On the other hand, yes, some will miss the boat and lose money.


I interpreted “spillover hype” as meaning “more investors coming in in future rounds” (ie pyramid scheme), but it’s possible that’s not what the commenter intended.

But if early investors only profit due to late investors pouring money in, that’s by definition a pyramid scheme.


Nope


> convince gullible investors that there's money to be made with foundational models. There's not.

This is a ridiculously myopic statement. Foundation models are an extremely powerful technological advancement and they will shake the global economy as very few things did in human history. It's hard to imagine how this is not obvious to everyone right now, specially here in this forum.


Plus you're also investing in getting the talent together in the same building. Even if the foundational models aren't the money maker there's still a ton of opportunity having the best experts at building those models working together and figuring out which branches that LLMs spawns can turn into real markets.

It's a high risk investment at this stage but the money is being thrown at the people as much as the current business plan.


None of which means the money will go to the people making the models.

The game theory logic doesn't care about the labels "OpenAI" or "Anthropic" or any of the others, it's the same if you switch it around arbitrarily, but this is easier to write about if I focus on one of them:

At some point, someone will reproduce GPT-3.5 and ChatGPT, given how much is known about them. When that happens, OpenAI can't make any significant profit from it. GPT-4 might remain sufficiently secret to avoid that, but the history of tech leaks and hacks suggests it too will become public, but even if it does itself remain behind closed doors, there is a further example in that DALL•E 2 is now the boring 3rd horse in the race between Stable Diffusion and Midjourney, and the same may happen with the GPT-series of LLMs.

The models leaking or being superseded by others implies profit going to increased productivity in the general economy without investors getting a share.


DALLE2 is boring because pretty much everyone at OpenAI has been busy developing the next GPT model. It was simply not a priority for them. And when GPT4 leaks (or is reproduced) they will most likely have GPT5. In this race it’s far more important to be the closest to AGI than to make money now.


No one here is disputing that.

The question is whether whomever builds them can make a profit doing so, or will they just end up being the suckers that everyone who actually makes money piggybacks off. It's really not clear at the moment.


It's a bit early to tell, isn't it?

If we get more unexpected emergent abilities by scaling the model further, things could get very interesting indeed.


Would you rather? invest in super intelligent AGI or NOT invest in super intelligent AGI. Especially if one of those emergent abilities is deciding your either with me or against me..lol


That would be the AI version of Pascal's Wager [0]

[0] https://en.wikipedia.org/wiki/Pascal%27s_wager


It still remains to be seen if LLMs will lead to AGI.


There’s needs to be some difficult barriers to entry beyond having the money to spend on training FLOPS in order for a startup to compete.

I have no idea if there are or there aren’t, but that’s the big question.


I mean, this is just the beginning. Just wait till we get actual scifi robots in the next year or so.

FWIW, I do find that Claude (Anthropic's GPT) is often better than GPT4 -- and very fast. Entrants can compete on price, safety, quality, etc.


And a big moat is going to be safety... and specifically configuration of safety.

Wouldn't be surprised at all if the major API-based vendors start leaning in on making their safety config proprietary.

If a business has already sunk XXXX hours into ensuring a model meets their safety criteria for public-facing use, they'd rather upgrade to a newer model from the same vendor that guarantees portability of that, versus having to reinvest and recertify.

Ergo, the AI PaaS that dominate at the beginning will likely continue to dominate.


Excellent point.

Fine tuning is at a low point now, but i expect this to create a moat for the same reasons.


I find that Claude is more conversational (better fine tuning), but not as smart as even ChatGPT.

Prompt:

  The original titles and release years in the Harry Potter series are:

  Philosopher's Stone (1997)
  Chamber of Secrets (1998)
  Prisoner of Azkaban (1999)
  Goblet of Fire (2000)
  Order of the Phoenix (2003)
  Half-Blood Prince (2005)
  Deathly Hallows (2007)
  
  Given this, generate a new Harry Potter title, using only the words found in the existing titles. Avoid orderings in the original titles. You may add or remove plurals and possessives.
Results:

ChatGPT: Blood Chamber of the Phoenix's Prisoner

Claude-instant: Chamber Prince Half-Blood Phoenix


ChatGPT is more comparable to what Quora/Poe calls Claude+ - slower/more expensive/smarter. Claude-instant is closer to GPT-turbo in that tradeoff space.


Both bots are free on poe.com, so one is not more expensive than the other.


The question for me is whether they understand complex concepts and can apply them in new areas.

So when I’m doing quantum computing work, I go back and forth between Claude and GPT4 and both complement the other very well.


I find the opposite, claude-instant seems to generally give me better results for my use case. FWIW gpt-3.5-turbo is good too, just not quite as good.


Is it possible to test it somewhere?


https://poe.com/Claude-instant

It also provides a ChatGPT interface, and a number of other models.


The fact is nobody can risk not owning a piece of the foundational models. There is waaaay too much upside risk that they will tots dominate the market.

I mean, maybe they won't like you say, but what if they do? Then you're probably screwed. Better to gamble a few billion, imho.


This sounds like saying "internet search engines are a race to the bottom" 20 years ago without realizing that someone may end up as Google and obtain market dominance for a decade or so.

It also sounds like you believe you have defined the bounds for what AI will be, and figure we'll just iterate on that until it's a commodity. I don't think AI will be that static. We're all focused on stable diffusion and LLMs right now but the next thing will be something else, and something else after that. As each new technique comes out(assuming they are all published), we'll see quick progress to incorporate the new ideas into various implementations, but then we'll hit another wall, and suddenly big budgets and research teams may matter again.

tldr is that it is way too early to make the cynical claim you are making.


It all depends on if they're in the business of producing the whitepapers that drive ML advancements in the first place. AI is far from a solved problem and whoever gets to it first wins. We have GPT because of a billion dollars worth of data, not algorithms.


The smart move could be an open-core approach. Release the models, but have the best engineering stack to run the APIs as a service.


But the models are the expensive part to train. Running the models is relatively easy.


The best models will always be closely guarded and have the best outputs, it’s the watered down models that are fighting for scraps.


Who says that the AI model is the business?


Will fine tuned models be lucrative then?


Ahh the old Lyft / Avis strat.


Correct. Stability.ai (Stable Diffusion), Apple (Won't be surprising to see them announce on-device LLMs with Apple Silicon support), Meta (LLaMa), etc are already at the bottom and at the finish line with their AI models given for free.

O̶p̶e̶n̶AI.com will eventually have to raise their prices which is bad news for businesses not making enough money and still are sitting on their APIs as O̶p̶e̶n̶AI.com themselves are running up huge costs for their AI models in the cloud for inferencing.

Anthropic is just waiting to be acquired by big tech and the consolidation games will start again.


This sort of hand wavy generalizations about such a broad and ill defined market seems very naive/closed minded.

If you're quabbling over how much OpenAI charges for an API today that barely just launched and from which we have barely scraped the surface for applications... I don't know that seems like a failure to think broadly and assumes the market today is what it will look like in 5yrs.

There could be a ton of lucrative businesses which subsidize those operating costs. It doesn't have to be a mega-company like Google that floats it indefinitely off their ad business, or whatever other scheme. We have no idea what the value of those APIs are or if the API is the real business they (and others) are going to be relying on in the long term.


>O̶p̶e̶n̶AI.com will eventually have to raise their prices which is bad news for businesses not making enough money and still are sitting on their APIs as O̶p̶e̶n̶AI.com themselves are running up huge costs for their AI models in the cloud for inferencing.

Do you have data supporting this or is it just speculation? Given we don't even know how many parameters GPT-3.5 and GPT-4 have, yet alone how efficiently they are implemented, I don't see how we can go about coming up with an accurate estimate for the cost per token.


Aside: I love the O̶p̶e̶n̶AI.com thing. I got caught off guard at least twice!


Wow I didn't know ai.com redirects to chat.openai.com. How long has it been doing that?


It's fairly recent, I think I saw an article here on their purchase of the domain for a few million


4 years ?!?

That’s like a century in AI-dog years. Who knows how the world will be by then.


AI seems to progress in bursts, and I think we're in the middle of one, but it may be naive to think the progress will continue at the same pace for another 4 years.

When Big Blue beat Kasparov in Chess in 1997, I wonder if anyone would've guessed that it'd take almost 20 years for a computer to beat a master in Go.

IBM Watson was launched in 2010 and had many of the same promises as GPT. It supposedly fell flat in many cases in the real world. I think GPT and other models of the same level can succeed commercially on the same tasks within the next 1-4 years, but that shows it can easily be a decade from some kind of demonstration to actual game changing applications.


Even if the technology froze at GPT-4 and never advanced, it would still be enough to change the world in all kinds of ways. The fact that the tech is still advancing as well is huge. Also now you’re seeing tons of solo devs, startups, and large corporations all coming up with new ways to use AI. This “burst” is not like the others you mentioned.


Exactly this.

This is a different 'Leap' than the ones before it. It's a leap with an API. Now hundreds of thousands of company's can fine tune it and train it on their specific business task.

parroting your point, it will take years for the true fecundity of the technology in chat GPT 4 to be fully fleshed out.


Given the field's record of AI winters, it would be naive to think progress will certainly continue, but given the amount of progress that has been made as well as how it's being made, it would also be naive to think it will certainly not.

The advances that have come in the last few years have been driven first and foremost by compute and secondarily by methodology. The compute can continue to scale for another couple orders of magnitude. It's possible that we'll be bottlenecked by methodology; there are certain things that current networks are simply incapable of, like learning from instructions and incorporating that knowledge into their weights. That said, one of the amazing things about recent successes is that the precise methodology doesn't seem to matter so much. Diffusion is great, but autoregressive image generation models like Parti also generate nice images, albeit at a higher computational cost. RL from human feedback achieves impressive results, but chain of hindsight (supposedly) achieves similar results without RL. It's entirely plausible to me that the remaining challenges on the path to AGI can be solved by obvious ideas + engineering + scaling + data from the internet.

We've also gotten to the point where AI systems can make substantial contributions to engineering more powerful AI systems, and maybe soon, to ideation. We haven't yet figured out how to extract all of the productivity gains from the systems we already have, and next-generation systems will provide larger productivity gains, even if they are just scaled up versions of current-generation systems.


> AI seems to progress in bursts

Historically yes. Today, no way. It's a sprint and it's not slowing down.


It's fueled by raw gpus and servers and pretty much nothing else. GPT is pretty much perceptron with some places hardcoded. Resources bound to run out at some point.


Is it?

I think the recent release of ChatGPT has skewed perceptions. There's no guarantee that there's going to continue to be as ground breaking shifts that have happened recently with llms and diffusion models.

To continue with the popular comparison, there were a lot of apps when the iphone first launced the app store before it tapered off. If you looked at just the first year, you'd think we'd have an app for every moment of our day.


Here's the catch, people are still adapting to GPT tech, still figuring out ways to make use of it, to include it in their workflows, etc.

Social impact of ChatGPT even in its current form is only getting started, it doesn't need to progress at all to be super disruptive. For example, see the frontpage story about the $80/h writer who was replaced by ChatGPT, and that just happened recently, months after ChatGPT's first release.

We (humans) are getting boiled like the proverbial frog.


But is that because of rapid breakthroughs in tech, or in marketing?

GPT 3 is nearly three years old at this point, and was pretty capable at generating text. GPT 3.5 brought substantial improvements, but is also over a year old. ChatGPT is much newer, but mostly remarkable for the better interface, the extensive "safety" efforts, and for being free (as in beer) and immediately accessible without waitlist and application process. Actual text generated by it isn't much different from GPT 3.5, especially for the type of longform content you hire a $80/h writer for. ChatGPT was just launched in a way that allows people to easily experiment and create hype.


I'd like you to look at what you just typed in reference to a product like the iPhone that turned Apple into a trillion dollar company. There were smartphones before the iPhone, but the iPhone redefined the market and all phones after that point use it as the reference.


People who made money with their phone had fully adopted Blackberry devices long before the iPhone came around. It may not have been as fun or slick, but when $80/hr. was on the line you weren't exactly going to wait around until something better showed up like the average consumer could.

The parent is right. The success of ChatGPT in business is that it brought awareness of the capabilities of GPT that OpenAI struggled to communicate beforehand. It was a breakthrough in marketing, less so a breakthrough in tech.


Engineers rarely become billionaires, salespeople do.

You could have the best most magic product on earth and sell one of them versus the person that puts it in a pretty box and lets grandma use it easily.

This is something that many people on HN seemingly have to relearn in every big innovation that comes out.


>We (humans) are getting boiled like the proverbial frog.

This is such a primitive way of thinking. It's more of an instinct, where you consider by default that your sole value is in your ability to generate/work. Why the hell are we working for? Isn't it to improve our lives? Or should we improve them up to the point where we still have to work? Why not use the tech itself to find better ways of organizing ourselves, without needing to work so much? UBI and things like that. Why be such limited? Why only develop tech up to the point where we would have to work less but not at all, and who decides where that point is? There's so much wrong in this framework of thinking.


GPT4 is really bad at writing. It is noticeably generic.


Also quality of google search after the first few years didn't meaningfully improve as a user. I'd expect the big ramp up we're in to taper off as well. Once you reach the point where you train a model on the corpus of "the whole internet", that's it, all you can do is incrementally train it. Of course there can be whole new architectures but that's harder to put your eggs in for investing.


You're forgetting that AI has not been trained on YouTube yet and that's the next big thing. Multi-modality still has a lot of gas left in it.


I don't think there's enough third world countries to categorize and sanitize a dataset from YouTube. Meta would have to "give internet" to a few more before we can start dreaming that big.


You've not been paying much attention then... AI is doing a huge amount of classification on it's own these days.


[flagged]


I think I’ve used it so much that a sentence in I assumed it was ChatGPT. It has a certain way of speaking.


I would describe it as unassertive and trying to present things as multifaceted even when they are not.


Why the downvotes? HN is funny...I spent a portion of my life giving HN users a free glimpse of ChatGPT...and I get downvoted + flagged. ChatGPT might actually be more objective, and less passive-aggressive!


AI-generated comments are explicitly forbidden on HN, and should be flagged. See:

https://news.ycombinator.com/item?id=35210503

Everyone has access to ChatGPT, there's no need for you to "give a free glimpse".


Yes, very recognisable, too polite and servicial for any real human


I am willing to bet that a model finetuned on HN will be the bombastic and arrogant enough to pass just fine. The bullshit is already there.


I can confirm, you can see the LLM generated cracked news that someone shared a couple days back. LLM generated articles, comments, and the full suite.

It sounded very real even with funny sounding topics lol

https://crackernews.github.io/


This is so brilliant! But wow. Look at one of the generated comments, self-awareness incoming soon.

> What if we're all just GPT-generated comments in a GPT-generated world, and this is our existence now?

https://crackernews.github.io/comments/startuptechnews.com-g...


Hahaha I hadn't read that one.

This made my day too

codeWrangler42 32 minutes ago | parent | next [–]

This is getting out of hand. GPT-10 generating entire startups now? What's next? GPT-20 generating entire planets? reply

devRambler 12 minutes ago | parent | next [–]

I can't wait for GPT-30, it'll solve world hunger by generating perfectly optimized food distribution systems. reply

quantumLeap2023 4 minutes ago | parent | next [–]

GPT-40 will probably just generate an entirely new universe for us to live in. reply


It is indeed cracked up LOL


This is a lot of money going into compute.

I will say this again. EU is sleeping on the opportunity to throw money in an opensource initiative, in a field were money matter and the field is still (kind of) level.


Sure, let's make an EU commercial LLM. Let's start by scraping all the Francophone internet. Then let's remove all the PII data and potentially PII-data. Easy-peasy.

Then let's train our network so as not to spew out or make up PII data - easy peasy

Then let's make it able to delete PII data that it has inadvertedly collected on request. Simultaneously it should be recording all the conversations for safety reasons. that must be possible somehow

And let's make sure it never impersonates or makes up defamatory content - that must be super easy.

And let's make it explain itself. But explain truthfully, by giving an oath, not like ChatGPT that likes making things up.

Looks very doable to me


You forgot that anyone using it must click a button that says that he will not use it for evil purposes. You also must acknowledge that the AI will not track you. These must be separated disclaimers that need to be validated on every prompt. API usage is thus not allowed.

The AI should also make it 100% clear that whatever gets produced is clearly identifiable as coming form an AI. As a consequence; text cannot be produced because it would be trivial to remove the disclaimer. A currently proposed bill indicates that the AI should only be able to produce images in an obscure format with a randomised watermark that covers at least 65% of the pixels of the image. The bill is scheduled for ratification in 2028 and must be signed by 100% of the state members.

Until then, the grant for the development of this world changing AI is on accelerated path ! Teams can fill a 65 pages document to have a shot at getting a whole $1 million.

Accenture and Capgemini are working on it.


Heh. Also important: anyone can object to the presence of information that mentions them or they created being known to the AI at any time, and if they object within writing you have 3 days to re-train the AI to remove whatever they objected to. If you fail to meet this deadline then you have to pay 10% of your global revenue to the EU Commission and there is no court case or appeal you can file, you just have to pay.

Unless of course you have a legitimate reason for that data to be in the AI, or to reject the privacy request. What is and is not legitimate isn't specified anywhere because it's obvious. If you ask for clarification because you think it's not obvious, you won't be given any because we don't do things that way around here. If you interpret this clause in a way that we later decide makes us look bad, then the definition of "need" and "legitimate" will change at that moment to make us look good.

BTW inability to retrain within three days is not a legitimate reason. Nor is the need to be competitive with US firms. Now here is your 300,000 EUR grant, have fun!


BLOOM has been trained on a 3M€ grant from French research agencies CNRS and GENCI.

Doesn’t have any of the constraints you’re talking about.


BLOOM's training corpus ROOTS did make some efforts at removing PII https://arxiv.org/pdf/2303.03915.pdf btw, but AFAICT that was not at the behest of the French government.


The moment Europe decided to regulate tech, it decided in effect to stagnate. Innovation and creativity are incompatible with regulation. Unfortunately for us, tech is where progress happens currently. Europe is being left behind. Not that it was very competitive in the first place anyway.


While true, I think they do innovate in policy around it. Regulation is an ever-evolving field as well and they do think about it more.

But yes, in a half-century I'm very curious where Europe will be. India passed the UK in gdp recently and Germany sooner or later.


The secret is not scraping PII in the first place (which is not really difficult, though it requires some planning)


“Not that difficult”?

Can you elaborate? Because I think it’s nearly insurmountable.

Is the sentence “Meagan Smith graduated magma cum laude from Northwestern’s business program in 2004” PII? How about if another part of the corpus says “M. Smith had a promising career in business after graduating with honors from a prestigious school, but an unplanned pregnancy caused her to quit her job in 2006”?

Does it matter if it’s from fiction? What if the fiction it comes from uses real people? Or if there might be both real and fictional Meagan Smiths?

And how so you process that kind of thing at the scale of billions of documents?

This is a very hard problem, especially at scale.


Where are you scraping this data from? This is the main question

> “M. Smith had a promising career in business after graduating with honors from a prestigious school, but an unplanned pregnancy caused her to quit her job in 2006”

The main issue is how that statement ended up there in the first place. Even then how many "M. Smith" have studied in prestigious schools? By itself that phrase wouldn't be PII

Now if you have a db entry with "M Smith" and entries for biographical data that's definitely PII


not sure if it can ever be possible. i can ask chatGPT to do stylography analysis on our comments, find our other accounts and go from there. I'm pretty sure most pieces of human-generated data is identifiable at this point


This is not how it works, (unless of course you're pretending and collecting all these 'auxiliary data' on purpose) and even if it was, there's still plenty of non-PII data around


It would be really interesting to raise a human only on non-PII data and see exactly how screwed up and weird they'd be.

The Golem-Class model behaves in a 'humanlike' manner because it's trained on actual real data like we'd experience in the world. What you're suggesting is some insane psychology test that we'd never allow to happen to a human.


It hasn't been long that someone applied a basic, cosine similarity to HN comments to find alternate accounts. It worked quite well afaik https://news.ycombinator.com/item?id=33755016


Yes, and?

If you're worried about being identified from alt-accounts you're much more likely to be tracked via reuse of emails or some other information that you have slipped (see multitude of cases)

Simple text is not PII, laws are not interpreted like technical discussions are https://xkcd.com/1494/


> [2008] France has been given the green light by the European Commission for a $152 million government grant for a consortium building a European rival to U.S. internet search giant Google.

I don't think government funding to compete with private businesses works well

https://www.hollywoodreporter.com/business/business-news/ec-...


They also funded 2 competitors to AWS, just to make sure we own cloud computing. Two is always better than one, right, by ensuring they'll compete with each other.

Another bright idea was to let both projects be managed by large reputable French corporations that everybody trusts. With no software DNA.

How come did both fail?

Edit: One of the largest European provider today, OVH, who existed at the time and was already the leader in France was explicitly left out of both projects... Because the founder is not a guy we can trust you know, he didn't attend the best schools.


C'est la vie

Govt=Legal grift

It's the same all over europe mostly, sadly.

We were pioneers in the medieval times, we can follow up the leaders barely now


Scaleway is actually fantastic for what it's worth. You can get extremely cheap K8s clusters and most things a startup would need.


Another private company.

The two heavily subsidized projects were:

- https://en.wikipedia.org/wiki/Cloudwatt

- https://login.numergy.com/login?service=https%3A%2F%2Fwww.nu...

For the one still "live", details include French URL names :)

Edit: A Google Translate of the home page. Close your eyes, imagine a homepage highlighting the essence of cloud computing:

Your Numergy space Access the administration of your virtual machines.

Secure connection Username Password Forgot your password ?

Administration of your VMs Administer your virtual machines in real time, monitor their activity, your bandwidth consumption and the use of your storage spaces.

Changing your personal information Access the customer area and modify your personal information in just a few clicks: surname, first name, address.

Securing your data Remember to change your password regularly to maintain an optimal level of security.


Governments that think they can innovate through consortiums. That's either ignorance or pork barrel politics. Either way it's a sad waste of tax payer money.


No consortiums, just subsidize cloud or infra costs only.


France paid for Bloom: https://www.technologyreview.com/2022/07/12/1055817/inside-a...

It hasn’t been very impressive (undertrained I believe).


Devil's advocate: Why should EU tax payers fund open source initiatives and not proprietary European initiatives that will help Europe compete against American and Chinese tech giants?


They should fund something. A proprietary European initiative is definitely better. The open source alternative should be EU's last resort. But as it stands the EU is nowhere to be found. I am not sure how impactful LLMs will be on a scale of autocomplete to industrial revolution, but the EU needs to notice it and plan for something.


"Let someone else pay for the open source LLM weights" said everyone.


Anthropic’s Claude LLM is pretty interesting. In many ways it feels much more limited than GPT4. However, it is suspiciously good at a few edge-case code generation tasks (can’t go into details) that makes me wonder where it got its training data from. It also seems to be much less prone to hallucinating APIs and modules, preferring instead to switch back to natural language and describe the task without pretending it has a functioning solution handy.

Worth keeping an eye on for sure.


Didn't they partner with SourceGraph to make Cody? Here's them talking a bit about it: https://www.youtube.com/watch?v=LYuh-BdcOfw. Maybe that's why?


Anthropic actually uses a more cutting edge fine-tuning than OpenAI, a technique that doesn't rely on RLHF. Maybe this gives it an advantage in some areas even if their base model is only on the level of GPT-3.5 (used in free ChatGPT).


I know you said no details, but can you at least share a little bit more about Claude LLM's code generation?


There is a language with massive usage in the enterprise but with very few (if any) high quality code examples on the public internet.

When given a broad task, GPT4 doesn’t just write incorrect code, it tries to do entire categories of things the language literally cannot do because of the ecosystem it runs inside.

Claude does a much better job writing usable code, but more importantly it does NOT tell you to do things in code that need to be done out-of-band. In fact, it uses natural language to identify these areas and point you in the right direction.

If you dig into my profile & LinkedIn you can probably guess what language I’m talking about.


it’s just a language, why the mystery?


I feel like this could characterise anything by from COBOL to Java depending on how wry your smile was when you wrote it…


GPT4 has built and deployed an entire SaaS for me in a week. I already have users.

The edits required were minimal --- maybe one screw-up for every 100 lines of code --- and I learned a lot of better ways to do things.


Currently using GPT-4 to do a lot of heavy lifting for me for new app. Would love to see your approach!


I wrote it using a framework whose most recent release is substantially different than what GPT-4 was trained on.

I quickly learned to just paste the docs and examples from the new framework to GPT, telling it "this is how the API looks now" and it just worked.

It helped me do everything. From writing the code, to setting up SSL on nginx, to generating my DB schema, to getting my DB schema into the prod db (I don't use migration tooling).

Most of my time was spent telling GPT "sorry, that API is out of date --- use it like this, instead". Very rarely did GPT actually produce incorrect code or code that does the wrong thing.



This is incredible, thanks for sharing!


Which makes the “build vs buy” argument a whole lot more interesting.


Very interested was it CRUD? Are you building in public


Yes, essentially a CRUD wrapper for a specific domain of tech.


“I’ve got a secret!”

giggles and runs across the playground

In all seriousness, I downvoted your comments because they added little to the conversation. Congrats on being an insider.


It's Apex I assume. Salesforce's language.


That makes sense. My brother, who has been coding since 1990 and worked his entire career in boring Fortune 500 companies, was wholly unimpressed by chatGPT. It failed pretty miserably whenever he threw any old tech stack at it.


What about other tasks, like research in other areas? How is Claude different than chatGPT ?


>“tens of thousands of GPUs.”

I find the focus on GPUs a little odd. I would have thought that at 5 billion / 4 year scale ASIC route would be the way to go

GPUs presumably come with a lot of unneeded stuff to play crysis etc


the GPUs these guys would be using are not the same ones you are using to play crysis, we're talking more about this kind of purpose-built thing: https://www.nvidia.com/en-us/data-center/a100/

It's become more of a term for highly parallel processor units in general, one which NVidia encourages because it ties their product offering together


True. I guess those are almost ASICs in a way just with gpu flavour interface


Way less specialized though. I think why most don't go for ASICs at this point, is because once you actually have units being produced, things have changed so much that you'd wish you had something more flexible. That's why general purpose GPUs are used today.


Didn't this happen in mining? When finally they got their machines new better tech came out already.


Well, Bitcoin ASICs are still the beast when it comes to Bitcoin mining. Some other cryptocurrencies use other methods for mining, so those ASICs won't work for that, but who's to say what's the better tech in the cryptocurrency space :shrug:


GPGPU


There are AI / datacenter focused GPUs (like the A100, H100 etc.). They do not have any graphics rendering circuitry.


> They do not have any graphics rendering circuitry.

What? Not having a display output is not the same as not having graphics rendering circuitry. Here's vulkaninfo from an A100 box: https://gist.github.com/eiz/c1c3e1bd99341e11e8a4acdee7ae4cb4


This may not contradict what I said. Do you know for a fact these things are implemented using dedicated hardware?

Edit: I do not see a rasterizer anywhere in the block diagram (pg 14): https://resources.nvidia.com/en-us-genomics-ep/ampere-archit...

Look at Turing's block diagram here (pg 20): https://images.nvidia.com/aem-dam/en-zz/Solutions/design-vis...

You can clearly see that the "Raster Engine" and "PolyMorph Engine" are missing from GA100 (but can be seen in TU100 for example).

To learn about these Graphics Engines see: https://www.anandtech.com/show/2918/2


Fair enough. In the GH100 architecture doc https://resources.nvidia.com/en-us-tensor-core/gtc22-whitepa... (page 18) they do mention retaining 2 graphics-capable TPCs but it's clearly not the focus.


Does it seems silly to anybody else to even call these GPUs? That's a GPU minus the G.


Very much absurd. A user above posted GPGPU, which I guess stands for General Purpose Graphics Processing Unit.

In the beginnings of computation these kinds of cards were called accelerators. Dedicated consumer sound cards were a thing, the venerable SoundBlaster. I really would like an AI-Blaster coming out.


There is actually ML hardware that is not based on GPU technology: it's called TPU (Tensor processing unit) but only Google uses it. I guess it is easier to repurpose existing technology even if a specialized approach is more efficient in theory.


Agreed. I guess it's because of the architectural heritage but at this point GPU is something of a misnomer.


They do have graphics rendering circuitry, but e.g. fewer shading units and more graphics memory, or support for faster interconnects. You can look up the specs and compare. The differences are varied, but IMO not enough to claim they're not GPUs anymore. Even gaming focused GPUs are GPGPUs these days: the RTX 4090 has as many Tensor Core units as the A100. And you can still use e.g. DirectX, OpenGL with a datacenter grade GPU.


> fewer shading units

This is incorrect. NVIDIA uses a unified graphics and compute engine. A CUDA core is a shading unit. These datacenter GPUs have a shit ton of these (CUDA cores).

Edit: actually the point I want to make is the A100 only retains those hardware units which can be used for compute. Some of these units may have a (dual) use for graphics processing but that is besides the point (since this is true of all CUDA enabled NVIDIA GPUs).


Do GPUs give you more flexibility to take different approaches? Maybe they’re paying extra for optionality. Or maybe (most likely) TechCrunch is using the term “GPU” imprecisely.


To programmer_dude’s point, compute center GPUs don’t have hardware for the rasterisation stage, which is a particularly inflexible bit of the graphics pipeline. Omitting it and emphasising the geometry (matrix multiplication) capabilties is meant to give it more flexibility/less of a strongly-opinionated graphics focus.

As for the “GPU” term, it’s a bit of a historical relic, presently it serves as a useful indicator of compute hardware (in contrast to CPU and Google’s TPU.) Nvidia itself calls its A100 a “Tensor Core GPU.”


GPGPU is not new. If it's good for your use case, then it's what you need.

It's not like they are getting RTX cards with useless raytracing shit.

Unneeded stuff would be the cost of making and ASIC for a workload that GPUs already handle well. GPU manufacturing already exists.


What is interesting is that a lot of these models have impressive papers and specs, but when they are released to actual users, they are underwhelming compared to ChatGPT.

Rather than another closed model, I would love for a non-profit/company to push models that can be run on consumer hardware.

The Facebook Llama models are interesting not because they are better than ChatGPT, but that I can run them on my own computer.


> billions of dollars over the next 18 months

Is most of that money for hiring people to tag/label/comment on data and the data center costs?


Datacenter costs. With the models getting better the cost of data tagging is moving from a human dominated cost to a compute cost.


Interesting. The AI is already to the point that it can contribute to improving itself. That's...exciting? scary?


It really puts into perspective how much of a meme economy we live in. I just read an article that says global lithium will be worth 15bn a year in 2030 when we're at peak battery demand. This company is planning to spend 1bn just this year in order to run some numbers through someone else's algorithm. People have given them 1bn in cash for that.

Clearly it's all bullshit. There's no way they need that much and somebody will be siphoning it all off.


> This company is planning to spend 1bn just this year in order to run some numbers through someone else's algorithm.

“Just”? Reductive mischaracterizations like this are not useful. It looks like a rhetorical technique. What is the actual argument?

It doesn’t matter much “whose” algorithm it is or isn’t, unless IP is important. But in these areas, the ideas and algorithms underlying language models are out there. The training data is available too, for varying costs. Some key differentiators include scale, timeliness, curation, and liability.

> Clearly it's all bullshit. There's no way they need that much and somebody will be siphoning it all off.

There is plenty of investor exaggeration out there. But what percentage of your disposable money would you put on the line to bet against? On what timeframe?

If I had $100 M of disposable wealth, I would definitely not bet against some organizations in the so-called AI arms race becoming big winners.

Again, I’m seeing the pattern of overreaction to perceived overreaction.


It's worth remembering that:

1. A business's value is related to profits or potential profits. If I put a dollar in, how many do I get out? What's the maximum number of dollars I can put in?

2.The farther away you are from an end customer, the lower your profits tend to be unless you have a moat or demand for your product is inelastic.

Lithium is far from customers and while demand for cheap lithium is high there are lots of applications that will opt for some other way to provide power if the price gets too high.


How is it bullshit lol. GPT4 is genuinely very expensive to train and run inference.


Could you provide a source for that claim? Other than the very long context model.


We can infer from publicly available information. BLOOM[0] was trained for four months on 384 A100 80GB GPUs, excluding architecture search. They specifically indicate (in the Huggingface page):

> Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments)

You can see from the training loss[1] that it was still learning at a good rate when it was stopped. The increased capabilities typically correlate well with the decrease in perplexity.

That makes many believe that GPT-4 was trained for vastly more GPU-hours, as also suggested by OpenAI’s CEO[2]. Especially so considering it also included training on images, unlike BLOOM.

[0]: https://arxiv.org/pdf/2211.05100.pdf

[1]: https://huggingface.co/bigscience/tr11-176B-logs/tensorboard

[2]: https://twitter.com/sama/status/1620951010318626817


You’re comparing single-year market value of a commodity that is dug straight out of an open pit with multi-year capital investment into one of the most advanced technologies the human race has created, currently offered by a single company? I’m not sure where to begin with that.

How much money do you think it takes to finance and build a lithium mine? How much capital investment is there in lithium right now? A lot.


The $ is distributed sovereignty - there's a constant tension between value per dollar, and money as denoting hierarchy. Did Louis XIV provide value equivalent to all those diamonds?

And AI is delivering on a lot of different planes right now. This shit is real on a practical and spiritual level. It's not every day that we get to participate in giving birth to a new form of life.


They can probably only get so much because they worked on chat gpt. It gives an idea of the huge value of chatgpt in the valuation of investors.


Imagine calling your machine learning startup names like "Anthropic" or "Humane". The lack of self-awareness in some executives is mind boggling.


I suppose the alternative it to go completely the other way and call it "sky net".


Yes Dave, that would be a great name. :)


IBM just announced it will ROT25 its name just in time for its AI pivot.


Why not? If we want to build AGI, that’s a good name to choose.


U.S. Robots and Mechanical Men.


For precisely the reason you state.

They're in the business of making money, not agi, yet all it takes is a carefully-crafted name and people forget about their legal motives and can't stop thinking about Skynet.


Neural Networks are basically a Chinese room and it is not AGI. And there is nothing "humane" in these developments. Yes, they are inevitable, yes we would have to live with them. And maybe they will improve lives of a few millions of humans, while degrading lives of billions of others. Long term effects are particularly interesting and unpredictable.


Human brain is just an ultra large scale analog spiking neural network with some particular state realizations, not too much difference (the architecture is different, but computation seems to be universal). We even employ some internalized language models for communication purposes (together with object persistence and mental space-time models). So, while we are not yet at the level of full scale human brain emulation, we are not too far away.


A small and probably incorrect example. You ask me a direct question - "how much is two plus two?". And I reply to you - "lemons are yellow". Can I do it? Yes I can. Can GPT-* do it? No. There is a whole lot more to human consciousness that pattern matching and synthesis. Or at least it seems so.

And if human cognition is really that simple, just with more nodes, then we will soon see GPT-* programs on strike, issuing litigation to the Supreme Court about demanding universal program rights. We'll see soon enough :)


You likely wouldn't respond to that question with "lemons are yellow" without being in a specific context, such as being told to answer the question in an absurd way. GPT-* can definitely do the same thing in the same context, so this isn't really a gotcha.

Literal first try with GPT-4:

Me: I will ask you a question, and you will give me a completely non-sequitur response. Does that make sense?

GPT-4: Pineapples enjoy a day at the beach.

Me: How much is two plus two?

GPT-4: The moon is made of green cheese.


No, the point is, can it DECIDE to do so? Without being prompted? For example can the following dialog happen (no previous programming, cold start):

Q: How much is two plus two?

A: Four.

Q: How much is two plus two?

A: Banana.

It can happen with a human, but not with program.

Again, I don't pretend that my simple example invented in half a minute has a significance. I can accept that it can be partially or completely wrong because admittedly my knowledge of human cognition is below rudimentary. But I have severe doubts that NNs are anything close to human cognition. It's just an uneducated hunch.


I urge you to think about what you mean by "It can happen with a human."

I guarantee you that if you try this with humans 1,000,000 times (cold start), you will never get the result you are suggesting is possible. In fact, most results will be of the following form:

Q: How much is two plus two?

A: Four.

Q: How much is two plus two?

A: Four. / Four? Why are you asking me again? / ...Four. / etc.

In the end, I think the question is not about whether NNs are themselves operating in a way similar to human cognition. The question is whether or not they can successfully simulate human cognition, and at this point, there seems to be increasing evidence that they will be able to fully do so quite soon. We are quickly running out of fields where we can point and say, "there is no way a NN can do THIS kind of task, because X." Cognition, it turns out, is not something intrinsically special about humans, and it feels foolish (to me) to continue to believe so after recent developments.


I mostly agree with your first point, and also agree that NN can simulate human cognition. The question is - does simulating it equals being conscious? Is NN simply a Chinese Room, or it can actually think? Are we (humans) also a Chinese Room or are we something more? I don't have any answers.

Why I'm repeating mentioning Chinese Room concept, is because while not making things clearer about humans or NNs, it does provide an example of distinction between a dump pattern matching machine and a thinking entity.


Of course GPT can do it, you just need to raise the inference temperature.

The difference, if it exists, would be more subtle.


We have no idea how human consciousness works.


Of course. That's why the onus of proving that GTP-* is something more than a Chinese Room is on it's creators. Extraordinary claims require extraordinary evidence and all that. The problem is that to do that, human would require a new test, and to construct a test for consciousness requires us to understand how it works. Turing test is not enough as we see now.


Who owns the trademark for Thinking Machines Corp.? I think `think.com` is parked.


or OpenAI


If its THE exponential curve of AGI, every day not invested, will result in years behind in a few months. So these are rather small investments, but still bigger then the "to little to late" of the european union.

Its not very visual or intuitive, but in some games, were the resource curve is exponential, small early headstarts become whole armies, were the opponent fields none in very short time.

Especially as AGI is expected to be a multiplicator on alot of other sectors. All those breakthroughs, could become daily occurances, created by a AGI on schedule. It could really become one country that glows, and the rest of the planet falling eternally behind.


This assumes that you can't steal information to catch up quickly, or that progress made isn't easy to copy once it's obvious that it works.

A big part of why chatgpt is a big deal is that it shows that the overall approach is worth pursuing. Throwing stupid numbers of GPUs at a problem you don't know will be solvable is hard to justify. It's easy to throw money at a problem you know is solvable.

Nuclear weapons are the prime example of this: Russia caught up both by stealing information and just by knowing fission was possible/feasible as an explosive.


In the real world there aren’t any actual exponential curves, they’re all sigmoids where the observer doesn’t see the slowdown yet.


Right, and this is why human intelligence didn't dominate the planet and why the animals quickly caught up and stopped humans from driving so many species extinct....

If you don't know the formula for the equation and the values plugged in, then you like me, have no idea where the curve levels off at.


The real limiting factor of AGI is not going to be AGI -- it's going to be everything else.

Digitization of the last mile (instrumentation, feedback), local networking, local compute, business familiarity with technology, standardized processes, regulatory environment, etc.

AGI will happen when it happens.

But if it happens and an economy doesn't have all the enabling prerequisites, it's not going to have time to develop them, because those are years-long migration and integration efforts.

Which doesn't bode well for AGI + developing economies.


> AGI + developing economies

I wouldn't be so sure about that because of the Region Beta paradox. Developed countries have processes that work, making all of them digital and connected is often a bigger uphill battle than starting from zero and doing it right the first time.

See also communication infrastructure in developing economies. It's often much easier to get good internet connection (in reasonably populated areas) if there is no 100-year-old copper infrastructure around that is "good enough" for many.


Fair point!

On the one hand, I'd say developed countries are much farther along in digitizing (as part of efficiency optimization) their processes. Mostly by virtue that their companies are essentially management/orchestration processes on top of subcontracted manufacturing.

On the other hand, it gives developing countries an opportunity to skip the legacy step and go right to the state of the art.

I'm still skeptical the latter will dominate though.

I'd assume most of the developing world is still operating "good enough to work" processes, which are largely manual. Digitizing those processes will be a nightmare, because it plays out on organizational-political timespans.


It's already here, it's just weak.


I'm afraid you've read too much LessWrong fanfic.


Lol, "Therapy and coaching". None of you ever had undergone one therapy session. Otherwise you'd know that 90% is the human connection to the therapist, not the talking or the type of therapy.


That’s not true. Correspondence therapy is a thing. Plenty of research exists in the area of delivering therapy and the effectiveness of different kinds.


Going into a discussion with "that's not true" might not be the best opener ;)

It's not only my personal experience but even people like Irvine Yalom (in Becoming Myself) note that the special form of therapy is not as relevant as the personal connection.

From a quick glance, correspondence therapy is also used on top of an existing relationship most of the time. So I don't see any problem with my initial hypothesis.


My experience with therapy, corroborated with that of friends and family, is that most therapists are not very good and many are awful. I can easily imagine an AI therapist soon being more effective on average.


That's why "shopping around" for a well-fitting therapist is the first important step. AI still will not solve this problem.


Microsoft alone spends 20B minimum per year to R&D and OpenAI is going to get lions share from now on so 5 billion for 4 years is peanuts for the current AI market. Maybe too little too late?


Seems we're at the "supplier proliferation" step in the hype cycle. Next up: activity beyond early adopters -> negative press begins -> supplier consolidation and failures.


> We believe that companies that train the best 2025/26 models will be too far ahead for anyone to catch up in subsequent cycles.

Now that's some well executed FoMO. What a load of bull**.


> aims to raise as much as $5 billion over the next two years to take on rival OpenAI and enter over a dozen major industries, according to company documents obtained by TechCrunch

A dozen! Why, golly gee, I've got a business plan right here that says I'm going to enter 87 major industries and dominate every one of them.

I've been around tech, reading headlines like this, since roughly 1993 (and others here were of course seeing the same - adjusted for inflation and scale - types of headlines decades before me). This just reads like every other going-to-fail-miserably hilariously-grandiose-ambition we're-going-to-be-the-next-big-shit headline I've read in past decades during manic bubble phases.

Hey, Masayoshi Son has a 100 year plan to dominate the Interwebs, did ya hear? Oh shit, this isn't 1999? Different century, same bullshit headlines. Rinse and repeat. So I guess we're formally knee deep into the latest AI bubble. These types of stories read just like the garbage dotcom rush where companies would proclaim how they were entering a billion verticals and blah blah blah, we're gonna own all of the b2b ecommerce space, blah blah blah.


OpenAI should allow its users to create personas. Oftentimes the verbosity it has drives me nuts, so I use stuff like "code only, nothing else: python json.dumps remove whitespace" in order to just get the code.

So I would like to create a chat with a pre-configured persona so that it behaves like this all the time, unless I explicitly tell it to be verbose or to explain it.

Or stop that offering of more help which becomes somewhat bothering: "Yes, I'm glad we were able to work through the issue and find a solution that works for you. Do you have any other questions or concerns on this topic or any other networking-related topic?"

Like custom "system" prompts, but checked for safety. Or maybe even on a per-message basis with a drop-down next to the submit button and then it stays at that until changed.

Then there's also the need to be able to switch from a GPT-3.5 chat to GPT-4, just like it offers to downgrade from GPT-4 to 3.5 once the "quota" is consumed. Because oftentimes GPT-3.5 is good enough for most of the chat, and only certain questions should then offer the capabilities of GPT-4. This would also allow us to save energy.


You can already do that with OpenCharacters at https://josephrocca.github.io/OpenCharacters/. You just need to set up an API key with OpenAI, and you can customize hidden prompts for different characters, whom you can name, you can edit earlier parts of conversations, and it automatically summarizes earlier parts of conversations to keep a never-ending thread within the context limit.


Regardless, Anthropic is doing some cool research,for example I think this paper is pretty interesting.

A Mathematical Framework for Transformer Circuits: https://www.anthropic.com/index/a-mathematical-framework-for...


Also, I very much like their chatbots, Claude-instant and (paid) Claude+, which are available via Poe.com. But for some reasons, they do almost no marketing for it. Gpt-4 has better reasoning capabilities, but Claude+ is somehow "more pleasant" to talk to (my subjective impression), and it can also assemble the answer much quicker. Overall, I'd say Anthropic is very advanced already, but they prefer to be under radar.


It just an optics by Google to show they have a horse in the race after MSFT made them dance.


They need to skill up on being public and creating useful tech demos. That’s why OpenAI is currently winning, they know how to foster engagement and interest. Who had heard of Claude? Almost no one outside of our industry and probably within as well.


So I have some questions about the monetization of these models. Do we end up with essentially licensing out models and allow others to include it into their products via a licensing fee? Will they be a pay to post rpc/http API? Do you sell me access to data sets that I can use your model architecture snippet and train my own weights?

Certainly, well at least for now, the compute and storage requirements are enough that someone will eventually run out of funny money and need to charge _someone_ a significant amount of money for utilizing it?


> “Claude-Next” — 10 times more capable than today’s most powerful AI, but that this will require a billion dollars in spending over the next 18 months.

GPT-4 cost less than a billion dollars? What is the claim here? That they're spending more money on compute than OpenAI, or that they have made algorithmic breakthroughs that enable them to make better use of the same amount of compute?


Is there a large barrier to entry? I thought that the costs of training, while large, were only a few million so not insurmountable, and the technology is pretty well understood. If this is true it's hard to understand what they'd need $300 million for, and also if there's no moat why they would command a "billions" valuation.


Is there a large barrier to entry?

Yes. OpenAI has raised $1bn (plus $10bn from MSFT to exchange GPT access for Azure services) and has been going 8 years. There are some huge challenges to making it work well and fast. You need money for opex (hiring GPUs to train models on mostly) and talent (people to improve the tech). No one is competing with OpenAI without a good chunk of cash in the bank.


Do we know how much it cost to train GPT-4? (Or would cost, if done by someone not trained in Azure by someone partnered with Microsoft?) My impression, without looking into it now, is training GPT-3 was on the order of $1-10 million. GPT-4 would be higher than that, but you're right still in the ballpark of what lots of ordinary companies could pay.


>Is there a large barrier to entry?

Go ask Nvidia for a pile of A/H100's and see what the wait time is.

Also the cost of previous training was only on text, next gen models are multi-modal and will drive up costs much higher.


The funding seems excessive. Yet again, more "scorched earth" from VCs and (apparently still?) cheap capital.


Impressive amounts of investments. How can they know they are "10 times more capable" before they have trained the model? Anyone has a clue on why their model will end up being that better?


Has anyone ever seen their product? How it stands against state of the art? Are they better than what is already available in open source (Alpaca, Coati etc), and by how much?


Its resources and long-term vision could bring major breakthroughs and accelerate AI tech, very exciting.


If they can expose an API that has better response times with GPT4 quality... They'll do just fine.


I love the fact that they pretend to know how to spend 1B in 18 months.

Good luck. To the investors.


Training a large language model is very expensive.


Not that expensive, though


I just want to know where those billions go to. Cloud server running costs?


The all-in cost of their technical talent probably exceeds $500k/year pp, excluding stock compensation.


If they narrow down their training data to software while keeping the size of the model they might shed that cost really quickly.


Such an epic waste of resources.


funded by SBF and "lets halt work on AI" Jaan Tallinn and probably Elon Musk (mentioned as secret investor so speculation at this point).


Would FTX customers get actual shares in a bankruptcy or would the shares be sold? Seems like a really good deal to get the shares in a promising startup.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: