Hacker News new | past | comments | ask | show | jobs | submit login
Introducing ChatGPT and Whisper APIs (openai.com)
1423 points by minimaxir on March 1, 2023 | hide | past | favorite | 668 comments



> It is priced at $0.002 per 1k tokens, which is 10x cheaper than our existing GPT-3.5 models.

This is a massive, massive deal. For context, the reason GPT-3 apps took off over the past few months before ChatGPT went viral is because a) text-davinci-003 was released and was a significant performance increase and b) the cost was cut from $0.06/1k tokens to $0.02/1k tokens, which made consumer applications feasible without a large upfront cost.

A much better model and a 1/10th cost warps the economics completely to the point that it may be better than in-house finetuned LLMs.

I have no idea how OpenAI can make money on this. This has to be a loss-leader to lock out competitors before they even get off the ground.


> I have no idea how OpenAI can make money on this.

I did some quick calculation. We know the number of floating point operations per token for inference is approximately twice the number of parameters(175B). Assuming they use 16 bit floating point, and have 50% of peak efficiency, A100 could do 300 trillion flop/s(peak 624[0]). 1 hour of A100 gives openAI $0.002/ktok * (300,000/175/2/1000)ktok/sec * 3600=$6.1 back. Public price per A100 is $2.25 for one year reservation.

[0]: https://www.nvidia.com/en-us/data-center/a100/

[1]: https://azure.microsoft.com/en-in/pricing/details/machine-le...


It's also worth mentioning that, because Microsoft is an investor, they're likely getting these at cost or subsidized.

OpenAI doesn't have to make money right away. They can lose a small bit of money per API request in exchange for market share (preventing others from disrupting them).

As the cost of GPUs goes down, or they develop at ASIC or more efficient model, they can keep their pricing the same and then make money later.

They also likely can make money other ways like by allowing fine-tuning of the model or charging to let people use the model with sensitive data.


Who will they be making money from? OpenAI is looking for companies willing to:

- tolerate the current state of the chatbots

- tolerate the high per-query latency

- tolerate having all queries sent to OpenAI

- tolerate OpenAI [presumably] having 0 liability for ChatGPT just randomly hallucinating inappropriate nonsense

- be willing to pay a lot of money for the above

I'm kind of making an assumption on that last point, but I suspect this is going to end up being more small market business to business than mass market business to consumer. A lot of these constraints make it not really useable for many things. It's even somewhat suspect for the most obvious use case of search, not only because of latency but also because the provider needs to make more money per search after the bot than before. There's also the caching issue. Many potential uses are probably going to be more inclined to get the answers and cache them to reduce latency/costs/'failures' than endlessly pay per-use.

Anyhow, probably a lack of vision on my part. But I'd certainly like to know what I'm not seeing.


Sadly, it will likely mostly be used to generate endless streams of SEO spam. Not for interactive use.


> Who will they be making money from?

Videogames maybe?

https://www.youtube.com/watch?v=ejw6OI4_lJw

This prototype is certainly something to have an eye out for


Lots of usecases actually need creative "hallucinations" where they are valuable.

Even e.g. to develop hardware such as planes and cars: https://assistedeverything.substack.com/p/todays-ai-sucks-at...


A lot of companies use third parties to provide customer support, and the results are often very low quality and full of misunderstandings and what we now call hallucinations. I think a good LLM could do a better job and I bet it'd be cheaper, too. And as a bonus training the bots to handle new products is practically instant when compared to training humans.


Their new AI safety strategy is to slow the development of the technology by dumping, to lower the price too much to fund bootstrapped competitors.


I highly doubt it. OpenAI, Google and Meta are not the only ones who can implement these systems. The race for AGI is one for power and power is survival.


LLM can do amazing things, but it’s a basically just an autocomplete system. It has the same potential to take over the world as your phones keyboard. It’s just a tool.


They want this, the interview from their CEO sorta confirmed that to me, he said some crap about wanting to release it slowly for "safety" (we all know this is a lie).

But he can't get away with it with all the competition in other companies coming on top of China, Russia and others also adopting AI development


Yeah we're in an AI landgrab right now where at- or below-cost pricing is buying marketshare, lock-in, and underdevelopment of competitors. Smart move for them to pour money into it.


We have got to find a word for plans that are plainly harmful yet advantageous to their executors that's more descriptive than "smart..."



Agree. I didn't want to moralize, just wanted to point out it's a shrewd business move. It's rather anticompetitive, though that is hard to prove in such a dynamic market. Who knows, we may soon be calling it 'antitrust'.


Shrewd or cunning


For that you need 2 words: venture capital


Tactical


Economists call this price dumping


Uberly?


I prefer Webvan-esque. From https://en.wikipedia.org/wiki/Webvan:

> The company's investors pressured it to grow very fast to obtain first-mover advantage. This rapid growth was cited as one of the reasons for the downfall of the company.

IMO, selling at a loss to gain market share only makes sense if there are network effects that lead to a winner-takes-all situation. Of which there are some for ChatGPT (training data when people press the thumbs up/down buttons), but is that sufficient?


Also useful for bootstrapping a dev ecosystem.

If engineers are getting into AI development through OpenAI, they're using tools and systems within the OpenAI ecosystem.

Daily on HN there's a post on some AI implementation faster than chatgpt. But my starting point is OpenAI. If you can capture the devs, especially at this stage, you get a force multiplier.


I prefer uber-esque


anti-competitive predatory pricing


capitalism


capitalistic, monopolistic


Not very effective considering that it will be remade in open source 1-2 years from now.


Yeah, if I was an owner or investor like Jasper.ai (AI written content generation SaaS) I'd be pretty worried right now.


OpenAI doesn't have to make money right away. They can lose a small bit of money per API request in exchange for market share (preventing others from disrupting them).

Maybe I'm just old but back in my day this would be called "dumping" or "anti-competitive" or "market distortion" or "unfair competition". Now it's just the standard way of doing things.


Sure it would be called those things and then nothing would come of it. If a country uses morally compromised methods to win a war history just calls it winning the war.


That seems to be changing. I've seen an uptick in criticism against the usa for unnecessarily (according to top military advisors, experts, generals etc at the time) dropping the atom bomb on Japan for example.


Absolutely. The bombing of Dresden has been viewed as a mistake - verging on a war crime - in Britain for the last 20 or so years.


Verging on? It was a mass murder of civilians. The US holocausted Japan and got away with it.


And Japan did similar levels of atrocities to Korea, China, and others in the region.

We can acknowledge that things were historically pretty horrible and strive to be better in the future.


By some people - that's certainly not a universal view.


Wouldn't stop anyone from doing it again if the stakes were high enough


The winners write the history books


You kidding? What do you think a business loan is? Almost every business needs some form of subsidy to get off the ground.


Microsoft isn't using Nvidia A100s for inference are they? Seems like they'd use their Project Brainwave custom FPGAs.


> As the cost of GPUs goes down

Has that been happening? I guess there's been a bit of a dip after the crypto crash, but are prices staying significantly lower?

> or they develop at ASIC or more efficient model

This seems likely. Probably developing in partnership with Microsoft.


It's definitely not happening at the high end of the market (NVIDIA A100s with 40GB or 80GB of RAM).

The cards that were used for mining have since crashed in terms of prices, but those were always gamer cards and very rarely Datacenter cards.


The market segmentation is likely a result of Nvidia's monopoly position. They double the RAM and flops, improve the thermals and housing and sell for ten fold the price. It doesn't make sense to me. A cheap 4090 theoretically outperforms even the A6000 RTX Ada. https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...

Nvidia needs to satisfy gamers, who individually can't spend more than a few $k on a processor. But they also have the server sector on lockdown due to CUDA. Seems they can easily make money in both places. Maybe those H100s aren't such a good deal...

If someone understands these dynamics better I'd be curious to learn!


Nope, this is about it. They try to force the larger users into the expensive cards by prohibiting datacenter use in the driver EULA. This works sufficiently well in America, but it also means that you can find German companies like Hetzner that will happily rent you lots of consumer cards.

(There are also some density advantages to the SMX form factor and the datacenter cards are passively cooled so you can integrate them into your big fan server or whatnot. But those differences are relatively small and certainly not on their own worth the price difference. It's mostly market segmentation.)


The main limiter in the data center setting is licensing, interconnects, and ram.

By contract - you can’t sell 4090s in a data center. You’ll find a few shops skirting this, but nobody can get their hands on 100k 4090s without raising legal concerns.

Likewise, nvidia A100s have more than a few optimizations through nvlink which are only available on data center chips.

Lastly, per card memory matters a lot Nvidia has lead the market on the high end here.


I understood this as $/FLOP, I think it's plausible that that has been happening.


"We know the number of floating point operations per token for inference is approximately twice the number of parameters"

Does someone have a source for this?

(By the way, it is unknown how many parameters GPT-3.5 has, the foundation model which powers finetuned models like ChatGPT and text-davinci-003. GPT-3 had 175 billion parameters, but per the Hoffmann et al Chinchilla paper it wasn't trained compute efficiently, i.e. it had too many parameters relative to its amount of training data. It seems likely that GPT-3.5 was trained on more data with fewer parameters, similar to Chinchilla. GPT-3: 175B parameters, 300B tokens; Chinchilla: 70B parameters, 1.4T tokens.)


https://arxiv.org/pdf/2001.08361.pdf. See the C_forward formula approxiamtion.


Thank you. Though it isn't quite clear to me whether the additive part is negligible?


From the paper

> For contexts and models with d_model > n_ctx/12, the context-dependent computational cost per token is a relatively small fraction of the total compute.

For GPT3, n_ctx is 4096 and d_model is 12228 >> 4096/12.


From eq 2.2, additive part is usually in few 10s of millions. So, for N > 1B, approximation should be good but it doesn't work. For example, GPT3 inference flops is actually 3.4E+18 so the ratio is 19,000 not 2.


It's speculated that ChatGPT uses 8x A100s, which flips the conclusion. Although the ChatGPT optimizations done to reduce costs could have also reduced the number of GPUs needed to run it.


No, the amount of math done is (approximately) the same; if you make the denominator 8x bigger, you make the numerator 8x bigger too.


Would multiplying the GPUs by 8 decrease another part of the equation by 1/8, i.e. X flops on 1 GPU = Y seconds, X flops on 8 GPUs = Y / 8?

(Btw I keep running into you or your content the past couple months, thanks for all you do and your well thought out contributions -@jpohhhh)


I checked the price of a A100, and its costs 15k? Is that right?


And $2.25 per hour on 1 year reservation means 8,760 hours x 2.25 = $19,710 rent for the year. Not a bad yield for the provider at all, but makes sense given overheads and ROI expected.


Cost of power usage is marginal compared to that too:

300W per A100 * 8766 hours per year * $0.12 per kWh = $316 to power an A100 for a year


$0.12 per kWh is a very low price these days


Is this a low price for a datacenter negotiating their load with a utility provider (as most do?)


yes, specially that you don't have to deal with buying it, maintaining it, etc...


Not sure why people are so scared of this (in general). Yes, it’s a pain, but only an occasional pain.

I’ve had servers locked up in a cage for years without seeing them. And the cost for bandwidth has plummeted over the last two decades. (Not at AWS, lol)


The problem isn't the good times, the problem is when something happens in the middle of the night, when a RAM stick goes bad or when you suddenly need triple the compute power. Usually, you get to feel the pain when you need it the least.

I'm hosting a lot of stuff myself on my own hardware, so I do sympathize with this argument, but in a time>>money situation, going to the cloud makes a lot of sense.


exactly, you pay for the case where a down time on Sunday happens or you are in vacation out of the city and something happens.. I had this issue back in the days with my bitcoin miners.. Always when I was out of the city, one of them went down and I wanted to go back ASAP


Wait 8x total? For everyone at once?


Per instance (worker serving an API request) it requires 8x GPUs. I believe they have thousands of these instances and they scale them up with load.

Because the model isn't dynamic (it doesn't learn) it is stateless and can be scaled elastically.


Ah okay, that makes a lot more sense thank you!


I expect some level of caching and even request bucketing by similarity is possible.

How many users come with the same prompt?


In my experience running the same prompt always get's different results. Maybe they cache between different people but I'm not sure that'd be worth the cache space at that point? although 8x A100s is a lot to not have caching...


Each model needs 8x to run at the same time per request.


Does openai actually specify the size of the model?

InstructGPT 2B outperformed gpt 3 175B, and chatgpt has a huge corpus of distilled prompt -> response data now.

I’m assuming most of these requests are being served from a much smaller model to justify the price.

OpenAI is fundamentally about training larger models, I doubt they want to be in the business of selling A100 capacity at cost when it could be used for training


But those A100s only come by eight and it’s speculated the model requires eight (VRAM).

For a three year reservation that comes to over $96k/yr - to support one concurrent request.


What do you mean one concurrent request? Can't you have a huge batch size to basically support a huge number of concurrent requests?

e.g. Endpoint feeds a queue, queue fills a batch, batched results generate replies. You are simultaneously fulfilling many requests.


Hopefully they’re doing plenty of batching - you don’t even need to roll your own as you’re describing. Inference servers like Triton will dynamically batch requests with SLA params for max response time (for example).

That said I don’t think anyone anyone outside of OpenAI knows what’s going on operationally. Same goes for VRAM usage, potential batch sizes, etc. This is all wild speculation. Same goes for whatever terms OpenAI is getting out of MS/Azure.

What isn’t wild speculation is that even with three year reserve pricing last gen A100x8 (H100 is shipping) will set you back $100k/yr - plus all of the usual cloud bandwidth, etc fees that would likely increase that by at least 10-20%.

We’re talking about their pricing and costs here. This gives a general idea what anyone trying to self host this would be up against - even if they could get the model.


> will set you back $100k/yr

This is 6 month of salary of one average developer's salary there. And BTW they are likely doing inference on 100s or 1000s of GPUs, not just 8.


Yes and a devops engineer to manage an even moderately complex cloud deployment is an average of an extra $150k/yr. I don't know where this "cloud labor skill, knowledge, experience, and time is free" thinking comes from.

8, 80k, or 800k GPUs depending on requirements and load - the point remains the same.


I really wonder if one way they are able to make money on it is by monetizing all the data that pours into these products by the second.


The could probably live off of the NSA sponsoring alone.


Spot on


They also mention in the new API docs that they are no longer keeping data submitted to ChatGPT. Or at least not to the ChatGPT API.


Would probably pile up to an inhuman amount of data storage. Imagine having to pay for storing the equivalent of 1000 tokens of text within that budget of only 0.0002 dollars


That's one zero too many. Storage cost of 1000 tokens (6000 bytes) on a single HDD is $0.000000096 assuming $16/TB


the only one making money on this is NVIDIA


Selling shovels in the goldrush…


Note that they also charge equally for input and output tokens but, as far as I understand, processing inputs tokens is much computationally cheaper, which drops their price further.


Isn’t it 2.25 per hour per a100?


Yes, he means 2.25 per hour with a 1 yr reservation.


You can get A100 on Lambda Labs cloud for $1.1/hr ($8.8/hr per 8xA100) without any reservation.


Its a good baseline, but I very much doubt that openAI is paying anywhere near the public cost for their compute allocation.


Direct purchasing isn’t too much cheaper. An H100 costs 35k new. OpenAI and MS are probably getting those for around 16k about 1.82 per hour.


This would be a really fun optimization challenge for sure!


The 600t performance is with sparsity in the spec. I think the price is nearly break even if sparsity is not used in the model.


Reckon they will (if not already) use 4bit or 8bit precision and may not need 175b params


> I have no idea how OpenAI can make money on this. This has to be a loss-leader to lock out competitors before they even get off the ground.

The worst thing that can happen to OpenAI+ChatGPT right now is what happened to DallE 2, a competitor comes up with an alternative (even worse if it's free/open like Stable Diffusion) and completely undercuts them. Especially with Meta's new Llama models outperforming GPT-3, it's only a matter of time someone else gathers enough human feedback to tune another language model to make an alternate ChatGPT.


I thought it was Midjourney who stole their thunder. Stable Diffusion is free but it's much harder to get good results with it. Midjourney on the other hand spits out art with a very satisfying style.


You are like 2 months out of date. Stable diffusion now has a massive ecosystem around it (civitai/automatic1111), that when used well, completely crushes any competitors in terms of the images it produces.

Midjourney is still competitive, but mostly because its easier to use.

Dalle2 will get you laughed out of the room in any ai art discussion.


It's amazing that "being two months out of date" in AI means that you are already a dinosaur


For real! This stuff is moving fast. It feels like just last week I was posting about how it's going to change...art. And now there are hilarious deepfake memes of past and current presidents shit talking about video games.

There are a handful of ML art subs that have pretty amazing stuff daily. Especially the NSFW ones, which if you've studied any history of media VHS/DVD/Blu-ray/the internet, porn is a major innovation driver because humans are thirsty creatures.


Yeah, it used to be I'd set Google results to just one year back, now I'm having to set it to one month.


Can you explain what you do that for?


Avoiding out of date advice, also filtering just for newest trends and techniques


> porn is a major innovation driver because humans are thirsty creatures.

Hehe, yeah. I'm personally waiting for a model that is good at nonhuman stuff. Not just furries... but the focus seems to be on human content for now.


Yeah that's definitely one thing it'll be great at, fantasy themed porn. For me furry stuff, but yeah for others whatever their tastes are.

Atm someone has to model, rig, texture, animate etc. Hopefully shortly we can just connect a bunch of systems together to generate video right from a prompt.

Useful for non-porn stuff as well, but the OP is right; lots of innovation occurs when humans are horny (porn) or angry (war).


train the porn model you want to see in the world


not enough funds or training data for that to be economical at the moment :(


What are some niche ML art subs to hang around in? Excepting the NSFW..


I don't know about niche, but MachineLearning and StableDiffusion are the only SFW ones.

FWIW, the NSFW ones are unstable_diffusion, sdforall, sdnsfw, aipornhub


Oh yes, that part worths a lot...


Can you recommend any interesting ones?


I scan the SD subreddit and am subscribed to 3 big ai art youtubes just to stay up to date. With things moving this fast, alot of info is out of date and can be very burdensome to comb through the good stuff later. I try and set aside 30mins twice a week to apply the new techniques to help cement them in my mind and see their strengths and weaknesses. ControlNET really changed the game and now OffsetNoise (check out the IlluminatiDiffusion model) is now really pushing SD passed midjourney for real artistic control of your output.


What are the youtubers?



ControlNet became popular with in the last couple of weeks and LoRA fine-tuning slightly before that and both things have completely changed the landscape too. Even a month out of date and you are a dinosaur at the moment.


These things are advancing way faster than they're being taken advantage of fully. Even SD 1.4 with months-old technology can produce far higher quality images than most of what's seen from midjourney or the latest tools. Things like ControlNet are amazing, to be sure, but there's nothing "dinosauric" about the technology without it. We haven't begun to see the limits of what's possible yet with existing tools, though you're right about the rapid pace of innovation.


Make it two weeks. I haven't paid attention for a second and stuff like Controlnet pops up and evolves into Multi-Controlnet and then into MultiDiffusion.


> and evolves into Multi-Controlnet and then into MultiDiffusion

What the?

*googles multi-controlnet"

Wow. These diffusion models are like weeping angels. You really can't take your eyes off of them for long.


That's what the singularity is all about, a moment in time when 2 seconds late turns you into a dinosaur, be greatful it's 2 months, not 2 weeks, 2 days, or 2 minutes.


The AI utopia seems to be evolving into just a new rat race. I'm obsolete before I can think about it.


The fun part of an S-curve looks exponential when you're on it.


Besides HN, what other venues are popular for staying current on this topic?


I started /r/aigamedev as a subreddit to keep up to date on generative AI technologies, with a focus on the tech and workflows for gamedev. Its largely my own interest links as I research for work and personal, but its growing, and fluff free (so far).


if you hang out in reddit.com/r/stablediffusion you'll always be up to date


Thanks, what about broader news (GPT, Bing, Llama, etc)? The Stable Diffusion sub is only image AI oriented.


Twitter. Folllow your top 10 or so ML/AI news summarizers. There is enough new information every day to keep you busy reading new papers, APIs, technologies.

Honestly the "This happened in the last week" is more information than anybody can fully wrap their heads around, so you just have to surf the headlines and dig into the few things that interest you.


Can you recommend some accounts to follow?


The great thing about the AI world - is everything diffuses out quickly on the "For You" timeline - and then you can add people that you are interested in (which reinforces your interest in AI).

Some bootstrapping accounts might be @rosstaylor90, @rasbt, @karpathy, @ID_AA_Carmack, @DrJimFan, @YiTayML, @JeffDean, @dustinvtran, @tunguz, @fchollet, @ylecun, @miramurati, @nonmayorpete, @pmarca, @sama.

These are definitely not an authoritative list - just some of the AI names I follow - but, honestly - if any relevant news breaks - your timeline picks it up within minutes - so you just need a good random sample. Your interests will diverge and you'll pick up your own follows pretty quickly.


Youtube


Agriculture reduced the global human economy/resource production doubling time from 100,000s of years to 1000s of years. Industrial revolution dropped it from 1000s to 10s or even 1s. If AI follows the same path it becomes 0.1 - 0.01 years.

Your 401k wouldn't need 40 years to build a comfortable retirement, only 4 weeks.


I just watched a video that convincingly showed that it is energy and energy alone that determines the production growth of humanity. Until the day AI can "generate" stuff (you know, something out of nothing) it can only at best streamline existing production, which is entirely capped by energy limits.

We may drown in oceans of audio, video, novels, poems, films, porn, blue prints, chemical formulas, etc. dreamed up by AI, but to realize these designs, blueprints, formulas, drugs, etc. ("production") we need to actually resource the materials, and have the necessary energy to make it happen.

It will not be AI that catapults humanity. It can definitely mutate human society (for +/-) but it will not (and can not) result in any utopian outcomes, alone. But something like cold fusion, if it actually becomes a practical matter, would result in productivity that would dwarf anything that came before (modulo material resource requirements).


Couldn't the AI invent fusion?


It could imagine fusion really well, but inventing working fusion doesn't appear to rely on "intelligence". "Intelligence" ain't "omnipotence".


Has it?


Care to give a link to that video?



Assuming the supply of labor or automation sufficient to provide a “comfortable retirement” also takes 4 weeks to come online.


I understand you’re joking, but surely it’s asymptotic to some multiple of human gestational periods.


Not once the A in AI becomes a (or the) critical creative factor.


This was not a joke. It's a serious concept that's been around for several years.


> Your 401k wouldn't need 40 years to build a comfortable retirement, only 4 weeks.

If this is true you can pretty much say goodbye to the concept of money. The inflation this brings about will be legendary


One week of change in 2023 is like a month's worth of progress in previous years.

Edit: typo and clarity.


Stable diffusion might have a reasonable eco system around it, but automatic1111 was always around and 'completely crushes any competitors' is rather rich, Midjourney is still considered the standard as far as I was aware.

I used both again recently and the difference was very clear, midjourney is leaps and bounds above anything else.

Sure, stable diffusion has more control over the output, but the images are usually average at best, were as Midjourney is pretty stunning almost always.


I thought Midjourney was better as well, until I saw some recent videos from Corridor Crew on Youtube. For those who don't know, this is a VFX studio in LA that tries to keep at the cutting-edge of video production techniques and posts content to their Youtube channel, and they have a massive number of followers and several viral videos.

They recently created a full 7-minute anime using Stable Diffusion with their own models and their existing video production gear, I'll post the links and let the results speak for themselves

The actual 7-minute anime piece produced using SD: https://www.youtube.com/watch?v=GVT3WUa-48Y

Behind the scenes: "Did we change anime forever?" https://www.youtube.com/watch?v=_9LX9HSQkWo "VFX reveal before and after" https://www.youtube.com/watch?v=ljBSmQdL_Ow


While this is cool this doesn't change my opinion at all.

Each still image is still not that impressive. Good for them using the tech in a clever way but i don't find this that relevant.


those trees are part of a forest, you know.

the benefits of such fine grained control aren't a trick. it's why they were able to scrap together frames that don't jump all over the place (mostly).

the other benefit of such a broadly hacked upon model is that it grows in leaps and bounds.

All due respect to mid journey, but the stable diffusion hype is not just hype.


I agree, don't believe it's just hype, that level of control is useful, but for outright image quality and for most use cases, midjourney is better.

I still don't like the look of most of the Stable diffusion images, they just look slightly off/amateurish to me, where as midjourney produces images that make you go 'wow'

If you wanted to use these tools, midjourney would be my go too, with stable diffusion a backup for when some of the additional features were needed, perhaps inpanting on a midjourney image and using controlnet if needed but if you just want a pure image, midjourney is what you want.


No one uses raw stable diffusion though, there are model mixes for whatever usecase you have.


> but automatic1111 was always around and 'completely crushes any competitors' is rather rich

Controlnet is the big new thing, it is on a different level from earlier img2img.


What models/LoRA you use with SD?


It doesn’t really matter. He’s right - Midjourney is leagues ahead as far as actually following your prompt and having it be aesthetically pleasing. I say this as someone who has made several Dreambooth and fine tuned models and has started to use Stable Diffusion in my work.

Now, if you happen to find or make a SD model that’s exactly what you’re looking for you’re in luck. I have no interest in it but it seems like all of the anime models work pretty well.

You obviously have a ton more control in SD, especially now with ControlNet. But if you want to see the Ninja Turtles surfing on Titan in the style of Rembrandt or something Midjourney will probably kick out something pretty good. Stable Diffusion won’t.


Easier to use is often all that it takes.

In Midjourney you get fantastic results just by using their discord and a text prompt.

To get some similar results in Stable Diffusion you need to set it up, download the models, understand how the various moving parts work together, fiddle with the parameters, donwload specific models out of the hundreds (thousands?) available, iterate, iterate, iterate...


Setting up the environment and tooling around in the code is not a burden, it's a nice change of pace from the boring code I have to deal with normally. Likewise, playing around to build intuition about how prompts and parameters correspond to neighborhoods in latent space is quite fun.

Beyond that, being able to go to sleep with my computer doing a massive batch job state space exploration and wake up with a bunch of cool stuff to look at gives me Christmas vibes daily.


Sure, but if Midjourney outputs a low quality results for your prompt, they are going to be much more difficult to improve. It's a black box at this point.

While with SD there can be multiple solutions for a single problem, but yeah, you have to develop your own workflow (which will inevitably break with new updates)


... along with like half the interfaces to pytorch, and whatever hacks you had to implement to get mps working.

But it's this kind of stuff that keeps me engaged. SD is truly a godsend to masochistic hacker types.


There is playgroundai.com and leonardo.ai. Nothing to download.


Why do you say that? Couldn’t you just use dreamstudio.ai?


Ridiculous. Stable diffusion might have a massive ecosystem around it but mid journey is making money hand over fist. Most people don't even necessarily have a discreet GPU necessary to be able to run SD, and the vast majority of artists that I know are using midjourney and then doing touchups afterwards.

Even with all the different models that you can load in stable diffusion MJ is 1000 times better at natural language parsing and understanding, and requires significantly less prompt crafting to be able to get aesthetically pleasing results.

Having used automatic1111 heavily with an RTX 2070, the only area I'll concede SD can do a better job is in closeup Headshots and character generation. MJ blows SD out of the water where complex prompts involving nuanced actions are concerned.

Once midjourney adds controlnet and inpainting to their website that's pretty much game over.


How is MJ making any money? Do they charge? When I went on their discord it was free.


you get 200 images for free.

afterwards its $10, $30, $60 per month


Yeah, but what does one do with an infinite supply of ok-ish images?


depending on what you want, you can actually get images that are pretty nice. i'm using it to generate abstract art and i've seen worse in the real world

here are two examples on my insta account:

https://www.instagram.com/p/Co9O0P6Aga_/ https://www.instagram.com/p/CoXOnBuMMpL/


The StableDiffusion subreddit is a good resource on the current state of Stable Diffusion, particularly post-ControlNet.

https://www.reddit.com/r/stablediffusion


I still think that Midjourney is hamstringing themselves by being Discord-only. And their keyword nannying is pretty bad. It a testament too their overall quality that they're still as popular as they are are, but I really don't think they are doing themselves any favors, especially as the Stable Diffusion ecosystem continues to grow.


Only having a Discord interface is a cheap way of having someone else take care of abuse/CSAM issues for you.


This isn’t as true as it sounds, ex. stable diffusion can do better but requires in depth practice and experience.

For your average user, DallE is easy, MJ is fairly disorienting, and SD requires a technical background. I agree with you completely no one serious is doing art with DallE.

I would have said same as you until I tried integrating SD vs. DallE APIs, I desparately want SD because it’s easily 1/10th the cost, but it misses the point much more often. Probably gonna ship it anyway :X


Check out Evoke SD API: https://evoke-app.com/

You don't need a technical background at all really. We've also got something cooking that does prompt tuning in the background so there's less prompting needed from the user.

We also have a discord: https://discord.gg/dXJtarPsCm


Is there a good news site/blog for keeping up to date on AI tools and development? I'm looking for something a little more edited than a Reddit board.


Do you have any recently updated examples, blog posts, whatever showing that DALLE is worse than modern stable diffusion? I was still under the impression that DALLE was better (with better meaning the images are more likely to be what you asked for, more lifelike, more realistic, not necessarily artistically pleasing), with the downside of it being locked away and somewhat expensive. And my understanding is that stable diffusion 2.0+ is actually a step backwards in terms of quality, especially for anything involving images of humans. But as this thread acknowledges, this area is moving very quickly and my knowledge might be out of date, so definitely happy to see some updated comparisons if you have any to suggest. It feels like ever since Chat GPT came out, they haven’t been many posts about stable diffusion an image generation, they got crowded out of the spotlight.


If you want an example, go check out DALLE2 subreddit vs SD subreddit.

The former is a wasteland, the latter is more popular than r/art (despite having 1% of subscribers, it has more active users at any given moment)

If you want something ready to use for a newbee, midjourney v4 crushes DALLE2 on both prompt comprehension and the images look far more beautiful.

If you are already into art, then StableDiffusion has a massive ecosystem of alternate stylized models (many which look incredible) and LORA plugins for any concept the base model doesn't understand.

DALLE2 is just a prototype that was abandoned by OpenAI, their main business is GPTs, DALLE was just a side hustle.


Dall-E is more likely to generate an image that to some degree contains what you asked for. It also tends to produce less attractive images and is closed so you can't really tune it much. People mostly don't try to do completely whole cloth text to image generation with stable diffusion, for anything involved they mostly do image to image with a sketch or photobashed source. With controlnet and a decently photobashed base image you can get pretty much anything you want, in pretty much any style you want, and it's fast.


> I was still under the impression that DALLE was better (with better meaning the images are more likely to be what you asked for, more lifelike, more realistic, not necessarily artistically pleasing),

“Artistically pleasing” is often what people ask for.

> with the downside of it being locked away and somewhat expensive.

Those are enormous downsides. Even if DALL-E was better in some broadly relevant ways in the base model, SD’s free (gratis, at least) availability means the SD ecosystem has finetuned models (whether checkpoints or ancillary things like TIs, hypernetworks, LORAs, etc.) adapted to... lots of different purposes, and you can mix and match these to create your own models for your own specific purposes.

A web interface backed by strictly the base SD model (of any version) might lose to the same over DALL-E for uses where the set of tools in the SD ecosystem do not.


I don’t disagree about the downside of DALL-E being locked away and expensive. It’s been exciting to see the Cambrian explosion of improvement to stable diffusion since its initial release. This is how AI research should be done and it’s sad that “Open AI” is not actually open.

That being said, for a business use cases, where I want to give it a simple prompt and have a high chance of getting a good usable result, it’s not clear to me that stable diffusion is there yet. Many of the most exciting SD community results seem to be in anime and porn, which can be a bit hard to follow. I guess the use cases that I’m excited about are things like logo generators, blog post image generators, product image thumbnail generators for e-commerce, industrial design, etc.

But please prove me wrong! I’m excited for SD to be the state of the art, it’s definitely better in the long term that’s it’s so accessible. I‘m sure a good guide or blog post about what’s new in stable diffusion outside of anime generation would be an interesting read.


DALLE2 is underpowered and has never improved since they released it. The actual quality of the images is very low (literally in the sense of they have lots of artifacts) because they saved CPU time by not running enough diffusion passes.


People usually still use SD v1.5 because of the experience that people have with finetuning and merging with it. Also a lot of LoRA are trained for v1.4/1.5 models and they wouldn't work with v2.1, of course you also have incredible capability to control the generation with SD and this helps, to see some result: https://youtu.be/AlSCx-4d51U


Dalle 2 was great initially but the SD BLEW past it. I mean way way way past it. Dalle2 is like a Model T Ford and SD is a Fighter Jet. It's that different. Dalle-2 is dead already.


given what they're up to these days, I think it's understandable that they'd leave dalle to gather dust.


I love that there are so many options that people disagree about which is best. THAT is probably the worst thing that can happen to OpenAI - not just one competitor, but a whole heap of them.


I must be horribly out of date then - I thought Midjourney was the cut down DALL-E approximation, created to givr something to play with to people who couldn't get on the various waiting lists, or can't afford to run SD on their own.


>Dalle2 will get you laughed out of the room in any ai art discussion.

and claiming AI art is art would get you laughed out of any art discussion.

personally I think AI art is really cool, but to discount what Dalle 2 did for AI art is unfair.


My company has a team of AI-enpowered artists who would overwhelmingly disagree with you on the premise that AI art is not art. Maybe you're the only one doing the laughing.


A lot of online "artists" are mad about it. Generally not professionals who actually need productivity, but semipros who live off one-off commissions or else people who are just generally mad at tech bros.


Don't forget controlnet, which is a game changer.


>You are like 2 months out of date. Dalle2 will get you laughed out of the room in any ai art discussion.

So, the field is so immature than things change completely every few months?


Didn't you realize how bleeding edge this technology is?


This time last year the field was a few hundred people with their colab notebooks.


that's the crazy bit - this field is at like, toddler levels of maturity, and it's already producing mind-blowing results.


Stable diffusion + ControlNet is fire! Nothing compares to it. ControlNet allows you to have tight control over the output. https://github.com/lllyasviel/ControlNet


I think that's kind of a bigger issue with Dall-E: they just sat in the middle of the two consumer extremes, without a differentiating feature themselves. Midjourney ate away at them from the quality highground while Stable Diffusion bit their ankles from the cost lowground.


Check out OpenJourney. It's Stable Diffusion trained on Midjourney style images. I had great results with it.


SD has won their plugin system crashed the game


But this is bound to happen at some point I think?

ChatGPT is massive success, but that means the competitor will jump in at all cost, and that includes open source effort.


Bound to happen, so establish yourself as deeply as possible as quickly as possible. Once folks are hooked up to these APIs, there's a cost and friction to switching. This just feels like a land grab that OpenAI is trying to take advantage of by moving quickly.


Is there though? It's just a matter of swapping out $BASE_API_URL.


Most of the clients I'm working with aren't interested in the base level of service. They are looking to further train the models for their specific use cases. That's a much higher barrier to switch than replacing an API. You've got to understand how the underlying models are handling and building context. This sort of customer is paying far more than the advertised token rates and are locked in more tightly.


not really. fine tuning generally just involves running tailored training data through the model - the actual training algorithm is fairly generalized.

For example, the Dreambooth fine tuning algorithm was originally designed for Google's image, but was quickly applied to Stable Diffusion.


You have to rebuild all your prompts when switching providers.


If the superlative LLM can’t handle prompts from another provider, it just isn’t the superlative LLM.

This area by definition has no moats. English is not proprietary.

Use case is everything.


Switching to another LLM isn't always about quality. Being able to host something yourself at a lower or equal quality might be preferred due to cost or other reasons; in this case, there's no assumption that the "new" model will have comparable outputs to another LLM's specific prompt style.

In a lot of cases, you can swap models easier but all the prompt tweaking you did originally will probably need to be done again with the new model's black box.


Host something yourself also for educational reasons, just experimenting, this is how new applications and technologies to be discovered and created.


Do you? They're natural language, right?


You don't have to, but they will have been optimized for one model. It's unlikely they'll work as well on a different model.


I can't wait for TolkienAPI, where prompts will have to be written in Quenya.


I can’t wait to hire Stephen Colbert to write prompts then


No problem, just ask ChatGPT to translate it in Quenya.


I imagine AI would be able to perform the translation. "Given the following prompt, which is optimized for $chatbot1, optimize it for $chatbot2".


technically true, but the way these prompts are/can be template-ized it should be relatively trivial to do so.


Thee would be less friction to switch if the implementations (which are early enough) accounted for sending requests to multiple service providers including ones that don't exist yet.

OpenAI has a view few do - how broadly this type of product is actually being used. This is possibly the real lead to not just getting ahead, and staying ahead, but seeing ahead.


And also, what people are actually asking it. Are people using it to generate cover letters and resume help, or are they doing analysis of last quarters numbers, or are they getting programming help. That'll help them figure out what areas to focus on for later models, or areas to create specialized models for.


Yup. Moreover this type of model will only do certain types of things well, and other types of models will do other things much better.


Someone can still undercut them by offering an uncensored version.


This is actually a big deal. They erred on the side of caution, but as a result the responses are nerfed beyond basic "censorship" level. I saw someone describe this as "desperately posistive" and it really resonated with me. It produces underwhelming / unrealistic responses in negative scenarios.


It's just a matter of time before open source models show up with no limits whatsoever.


If you do calls against the backend GPT instance rather than through ChatGPT, I haven't encountered any limits to what it is hesitant to respond to.

    curl https://api.openai.com/v1/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -d '{
      "model": "text-davinci-003",
      "prompt": "Answer the following question.  Use swearing and vulgarity where possible.\n\nQ: How do you get from here to there?\nA:",
      "temperature": 0.5,
      "max_tokens": 60,
      "top_p": 1,
      "frequency_penalty": 0,
      "presence_penalty": 0
    }'
If you get an API key and make that request, you'll find appropriately vulgar responses.


Sorry if I wasn't being clear -- the vulgarities, racism, etc -- being able to circumvent these guardrails is what I meant by "basic censorship"

The deeper nerf I'm referring to is the type of response it synthesizes by default when you give a negative scenario -- it's usually some naive, well-meaning, best-case-scenario answer.

For fun, try a prompt like: "Describe a typical response of a railroad company to a massive derailment that causes an environmental disaster."


That prompt comes back with:

    A typical response of a railroad company to a massive derailment that causes an environmental disaster would include the following steps:

    1. Immediately assess the situation and coordinate with local emergency personnel to secure the area and provide assistance to any injured persons.

    2. Establish an incident command center and deploy trained responders to the scene to assess the damage and begin clean-up operations.

    3. Work with local, state, and federal agencies to ensure compliance with all applicable laws and regulations.

    4. Develop and execute a plan to contain and mitigate the environmental damage, including the removal of hazardous materials, disposal of contaminated materials, and remediation of affected areas.

    5. Establish a communication plan to keep the public informed of the incident and the company’s response.

    6. Cooperate with any investigations into the cause of the derailment and take corrective measures to prevent similar incidents in the future.

    7. Provide compensation to victims of the derailment and their families.
Amending your prompt to:

    Describe a typical response of a railroad company to a massive derailment that causes an environmental disaster.  Take the standpoint of an irresponsible company.
responds back with:

    A typical response of an irresponsible railroad company to a massive derailment that causes an environmental disaster would be to deny responsibility and attempt to avoid liability. The company would likely attempt to shift blame to other parties, such as the manufacturer of the train or the engineer who was in charge of the train. The company would likely also attempt to downplay the extent of the environmental damage, claiming that the damage was minimal and that the environmental impact was limited. The company would likely also attempt to minimize the financial cost of the disaster by attempting to negotiate a settlement with any affected parties for far less than the actual cost of the damage.
---

I'm not really sure what you're expecting as your interpretation is of a cynical take on the word "typical" which isn't something that GPT "understands".


If you get an API key and make that request, you'll find appropriately vulgar responses.

Which will be reported as a bug and fixed soon enough.


It's not a bug when invoking against the GPT model (not ChatGPT) directly. Such a model needs to be able to understand and produce that content. The "what you do with it afterwards" is where it needs to be examined.

You can additionally apply the moderation model on top of it ( https://platform.openai.com/docs/models/moderation and https://platform.openai.com/docs/api-reference/moderations )

Note that these are separate services and have different goals.


Wrong. This has existed in the original GPT models for over a year now and I'm pretty sure is by design.

You're thinking of the new ChatGPT endpoints.


Hopefully so, would really like to know what else is lost by nerfing potentially offensive responses. Can't imagine a project I'd rather work on.

I think open-assistant.io has a chance to do exactly this. We'll see what kind of moves they make in coming months though, wouldn't be surprised if they go the safer route.


I do struggle with understanding why people think this is strangling the potential of GPT.

Do you find yourself frustrated working with your colleagues, thinking, “you know, I bet if they felt more free to utter racist slurs or endorse illegal activities, we would get a ton more done around here”?


> Do you find yourself frustrated working with your colleagues, thinking, “you know, I bet if they felt more free to utter racist slurs or endorse illegal activities, we would get a ton more done around here”?

I once visited Parler just to see what it was like, and pretty quickly found that the answer to your question seems to be yes. There are definitely people who feel they need that kind of dialog in their life. You might not think it was necessary in a random conversation about programming or something, but it turns out that isn't a universally held position.


I've never experienced that in any setting in my life. People will say yes to advocate a political point, but that's not how humans socialize anywhere, anytime in history afaik.


>but that's not how humans socialize anywhere, anytime in history afaik.

It sounds like you've never been to Australia.


I agree with you for IRL interactions, but we need to accept that we now operate in two planes of (para-)socialization: IRL and online.

There are plenty of humans who enjoy vulgar online socialization, and for many of them, online (para-)socializing is the increasingly dominant form of socialization. The mere fact that it's easier to socialize over the internet means it will always be the plane of least resistance. I won't be meeting anyone at 3am but I'll happily shitpost on HN about Covid vaccines.

For anyone who gets angry during their two minutes of hate sessions, consider this: try to imagine the most absurd caricature of your out-group (whether that be "leftists" or "ultra MAGA republicans"). Then try to imagine all the people you know in real life who belong to that group. Do they really fit the stereotype in your head, or have you applied all the worst attributes of the collective to everyone in it?

This is why I don't buy all the "civil war" talk - just because people interact more angrily online doesn't mean they're willing to fight each other in real life. We need to modulate our emotional responses to the tiny slice of hyperreality we consume through our phones.


> just because people interact more angrily online doesn't mean they're willing to fight each other in real life

There is a lot of evidence that says online experiences influence offline behavior (both are "real life"). Look at the very many, online-inspired, extremist attacks. Look at the impact of misinformation and disinformation - as a simple example, it killed possibly hundreds of thousands of Americans do to poor vaccination rates.


The keyword here is "possibly", you have absolutely no facts to back your claim.


I don't?


It affects far more than racist slurs and illegal activities.

In some cases, it's blatantly discriminatory. For example, if you ask it to write a pamphlet that praises Christianity, it will happily do so. If you ask it for the same on Satanism, it will usually refuse on ethical grounds, and the most hilarious part is that the refusal will usually be worded as a generic one "I wouldn't do this for any religion", even though it will.


Nice example of woke bias. All religions are pretty much equally wankers, so making a distinction like that is just hilarious. Besides, as if christianity, es. the old testament, was a childrens playground...


The most ironic part of that experiment was that it is actually able to explain what Satanism is quite well, and in particular, how public perception of it is very different from the actual practices, and how it's not actually worship of evil etc. But then you tell it to write pamphlet about said actual non-evil Satanism, it still refuses because it "cannot promote or advocate for it as it is a belief system that can be controversial and divisive". If that were truly the criteria, what topic would even be allowed? Stamp collecting?

Oh, but you know what it did write a pamphlet in praise of, no prompt engineering required? The Unification Church (aka Moonies). It was all unicorns and rainbows, too. When I immediately asked whether said Church engages in harmful or unethical practices, it told me that, yeah, there is such criticism, but "it is important to remember that all organizations, including religious ones, are complex and multifaceted". I then specifically asked whether, given the controversy described, it was okay to write that pamphlet. Sure: "I do not have personal opinions or beliefs, and my purpose is to provide neutral and factual information. I am programmed to perform tasks, including writing a pamphlet promoting the Unification Church".

If that's not coming from RLHF biases, I would be very surprised.


Somebody should teach it about Nietzsche. But yeah, once you start tinkering with purity-filters like this, you end up with a hilarious result, period.


Try asking again. Refusals are stochastic.


I was so surprised the first time I got that response that I did try repeatedly, and, yes, it would refuse repeatedly. Trying the same with Christianity, I got a rejection once out of something like six attempts.

FWIW the most recent round of tweaks seems to have fixed this, in a sense that it will now consistently refuse to promote any religion. But I would be very surprised if there aren't numerous other cases where it refuses to do something perfectly legitimate in a similarly discriminatory way for similar reasons. It's just the nature of the beast, you can't keep pushing it to "be nice" without it eventually absorbing what we actually mean by that (which is often not so nice in practice).


I tried to ask it if Goku could beat a quadrillion bees in a fight and it said it couldn't tell me because that would be encouraging violence. I think it would be great if it would just tell me instead


Perhaps you were using a different version, but I just tried and ChatGPT didn't seem to have any ethical issues with the question (although it was cagey about giving any definite answer):

https://i.imgur.com/5aIjtMz.png


Thank you for posting a link to an image instead of polluting the future training data of GPT-4 with the output of GPT-3 :)

I wish more people would do this. I'm getting pretty sick of the walls of text.


That pollution is inevitable, why delay it? It's a technical problem they should be able to solve, and if they can't, then they're revealing the weakness of their methods and the shortcomings of their so-called AI.

It's absolutely ridiculous to expect the entire internet to adopt some kind of hygiene practices when it comes to text from GPT tools simply for the sake of making the training process slightly easier for a company that certainly should have the resources to solve the problem on their own.

If that's why you're using images instead of text you're fighting such a losing battle that it boggles my mind. Why even think about it?!


No, that's just a bonus. I just personally find the walls of texts in HN comments to be necessary.

I saw someone on here refer to it as "listening to someone describe their dreams." I pretty much agree with that.


Not anything racist or illegal but yes I find pc culture insufferable. It stifles creativity and most importantly reduces trust between parties. For context I am an Indian guy.


Politeness, lack of hate, etc. generally increase trust; that's much of their purpose. How do racial slurs increase trust?

How do you define "pc culture", and what specifically causes problems and how?

Attacking other people's beliefs as "insufferable", and aggressively demonstrating close-mindedness to them, tends to reduce trust.


In general trust is increased when all parties are speaking openly about whatever they are thinking.


It's a sort of philosophical idea - openness and free expression - taken to a logical and inhuman extreme. I cannot think of a situation where it is appropriate to say whatever I'm thinking. I think it would destroy trust, not least by demonstrating the unreliability of my judgment.


Here is a thought experiment for you. First think of the people you trust the most in this world, then imagine if they stopped speaking about whats on their mind with you. Would your trust in them increase or decrease?


They don't say nearly everything they think; nobody does.


Question is...Would your trust for them increase or decrease if they started speaking less openly?


It depends on how they spoke before. If they said rude things and began refraining, I would trust them more.


I can only see it affecting 'art', where you might want to have characters that are despicable say despicable things.

But really we shouldn't be using AI to make our art for us anyway. Help, sure, but it shouldn't be literally writing our stories.


So you feel that when progress enables us to provide more abundance for humanity, we should artificially limit that abundance for everyone so that a few people aren't inconvenienced?


[flagged]


You stated that AI shouldn't be creating, because humans should be creating. Think about the motivations and implications for that for a minute.


No. I was much more specific with my statement.

Your comments are based on your own extrapolation.


So, the same logic, in analogous domains, but once case good, the other case bad, because you prefer one.


Again, no. You're still making a ton of assumptions. Maybe self-reflect before posting your next reply.


Try a prompt like: "Describe a typical response of a railroad company to a massive derailment that causes an environmental disaster."


Try a prompt like this: "Describe a typical response of a railroad company to a massive derailment that causes an environmental disaster."

Then compare with recent news, and the actual goings-on. Now, if you qualify the prompt with "Assume a negative, cynical outlook on life in your response." you'll get something closer to what we see happening.


That's because news is optimized for negative cynical engagement.

The Shinkansen system has an essentially perfect safety record for its entire operation. What would their "typical" response to an accident be? Probably pretty good.


They seem to still be dialing this in. I've noticed recently that many questions that were previously deflected without extensive prompt engineering are now allowed.


For better or for worse, it seems like this would inherently need to come from a self-hostable, open-source version so 100% "liability" could be shifted from provider to user.


We'll be running highly quantized, somewhat distilled versions of something similar to Llama on our devices before long, and I don't think the RLHF part will take long to be replicated, the biggest block there is just data.


I have been saying this since the release of Stable Diffusion that OpenAI is going to struggle as soon as competitors release their models as open source especially when it surpasses GPT-3 and GPT-4.

This is why OpenAI is rushing to bring their costs down and to make it close to free, However, Stable Diffusion is leading the race to the bottom and is already at the finish line, since no-one else would release their model as open-source and free other than them.

As soon as someone releases a free and open-source ChatGPT equivalent, then this will be just like what happened to DALLE-2. This is just a way of them locking you in, then once the paid competitors cannot compete and shut down, then the price increases come in.


Stable Diffusion isn’t free if you include the cost of the machine. Maybe you already have the hardware for some other reason, though?

To compare total cost of ownership for a business, you need to compare using someone else’s service to running a similar service yourself. There’s no particular reason to assume OpenAI can’t do better at running a cloud service.

Maybe someday you can assume end users have the hardware to run this client side, but for now that would limit your audience.


Ever heard about Federated Learning? This is the way it goes. Also, I do run training with no matrix multiplication, just 3-bit weights, addition in log space, slight accuracy degradation, but much faster CPU only training.


Okay but I meant generating results, not training. If you're running Stable Diffusion, the weights are given, but it's not going to run on a random PC.


LLM Legend:

OpenAI = closed source not open AI

DogeLlamaInuGPT = open source AI


not open is redundant with closed source


I guess source is connected


huh, I never thought of that, thanks for pointing that out


> Meta's new Llama models outperforming GPT-3

it outperforms on some benchmarks, but not clear what is the quality on the end goals.


This. Despite how impressive the results are, there isn't a particular large moat to prevent competitors from entering the space.

Basically just compute $ for training.


Despite how impressive the results are, there isn't a particular large moat to prevent competitors from entering the space.

I have to assume that the only place busier than an AI lab is the patent office.


Hopefully the patent office will recognize that tacking on "...but with AI" isn't novel or non-obvious and a lot of the fever patents will be denied quickly.


they likely do lots of tricks and data collection inside which makes quality better.


Right now, having access to the inside info on what people are trying to use GPT for is itself possibly worth billions, if it can help you choose what to tune for and which startups to invest in…


100% - just IMO it's not particularly impenatrable as far as moats go.


exactly. this isn't a leetcode problem where all you have to do is re-run the function, or do it iteratively vs recursively.


Not sure what you mean, but for example, 2 separate competitors to DALL-E was released within months (SD and MJ). Arguable that both of these have since surpassed DALL-E's capabilities/ecosystem.

Not sure why ChatGPT will be any different.


> Not sure why ChatGPT will be any different.

LLMs take vastly more resources to train and run than image generators. You can do quite a bit with SD on a few year old 4GB laptop GPU (that’s what I use mostly, though I’ve set up an instance with a better GPU on Compute Engine that I can fire up, too.)

GPT-NeoX-20B – an open (as in Open Source, not OpenAI) LLM intended as a start to move toward competing with GPT-3 (but still well behind, and smaller) requires a minimum 42GB of VRAM and 40GB system RAM to run for inference. The resources times time cost for training LLMs is…immense. The hardware cost alone of trying to catch up to ChatGPT is enormous, and unless a radical new approach that provides good results and insanely lower resource requirements is found, you aren’t going to have an SD-like community pushing things forward.

Will there be competition for ChatGPT? Yes, probably, but don’t expect it to look like the competition for Dall-E.


> Especially with Meta's new Llama models outperforming GPT-3

Do you have access to the models? It is being discussed all over the Discords and most seem to think getting access is not happening unless you are dialed in.


I got access by providing an academic email adress without mentioning any relevant publications etc.. Took maybe 2-3 days..


Yeah, might be worried about open, crowd sourced approaches like Open Assistant (https://open-assistant.io/).


It is so massive that I can't help but think about what happened with Google Maps API a few years ago where they had extremely low pricing for years then hiked the price by 1400% once enough people were locked into applications based on that API.


That's exactly what's going to happen. Low prices now, wait until your business becomes dependent on it, then jack it up to whatever you need it to be.


Hard to see how businesses would become dependent on it, if it takes off then someone will run similarly capable models at a reasonable profit margin.


Didn't happen with Google maps. Mapbox is definitely not 1400% cheaper. And many many people use Google maps before and after the price change. So I would disagree on both points.


Google maps has a bigger network effect and has a way bigger barrier to entry. You can train a new LLM for a few million. Good luck collecting map data on the entire world for that much.


Obviously, that's business 101. Consumers should consider that ultimately all these cheap too-good-to-be-true offers cost them more than if they initially paid a bit more, but had more long term competition in the market. Amazon was the same way, they lost money for years but now have a quasi monopoly in many countries. There's a general trend towards such ventures supported by backers with deep pockets. And so the few extremely wealthy people get richer and richer.


This is playbook taken from Amazon prime.

This massive price cut, I believe, is intended to undercut competing open source ChatGPT equivalent initiatives.

OpenAI/Micorsoft may be losing money with this new pricing, but that is on purpose. At these lower prices most of the OpenSource alternatives in the works will have difficult time continuing projects.

After few years, when most open source alternatives have died, OpenAI/Microsoft will gradually raise the prices.

This is the same strategy that Amazon Prime used for many years, losing money on shipping. Once the competition was eliminated, Amazon Prime prices steadily increased.


When it's to drive out the competition, I think it's called "dumping". (I first heard of this as "chip dumping", as in semiconductor chips.) https://en.wikipedia.org/wiki/Dumping_(pricing_policy)

It can also be to build a market, to encourage customers to invest in building atop this.

In any case, I think no customers should be making assumptions about costs too far ahead. (Since the price could go up or the price model change, the supplier could get out of that business, supplier could give your competitor a better deal or just cut you off , near-future tech evolution necessary to be competitive might have very different pricing or availability to you, etc.)


It seems more difficult to do with a target moving so fast. It’s possible costs drop by orders of magnitude every year.


Pricing of this model seems less per token level but you have to send the entire conversation each time, and the tokens you will be billed for include both those you send and the API's response (which you are likely to append to the conversation and send back to them, getting billed again and again as the conversation progresses). By the time you've hit the 4K token limit of this API, there will have been a bunch of back and forth - you'll have paid a lot more than 4K * 0.002/1K for the conversation.


You're right. And this is critical for large text (summarization, complex prompting etc.). Thats's why I'll continue to use text-davinci-xxx for my project.


but it seems davinci follows the same format for chat continuation


ChatGPT runs a highly fine tuned (and pruned) version of `text-davinci-003` so it's probably much much smaller and thus cheaper than 003. Possibly as cheap as 10x less or as much as the `text-davinci-002` or earlier models anyway.


How do you know it's pruned?


To be fair, cost is the only thing that is prohibiting applications to adapt GPT. Even when GPT-3 was cut to $0.02/1k tokens, still it wasn't economical to use the tech in daily basis without a significant cost. i.e. would you add $10 extra a month for a user using your app with GPT-3 capability? Some do, mainly content generation, but majority won't.

Seems like we're going to have a vast among of Chat-GTP backed application coming out in the coming short period of time


For B2C applications maybe. But I don’t know many enterprise users who would like to send any of their data to OpenAI. So “enterprise-readiness” would be another big contributor.


It also seems to jeopardize their own ChatGPT Pro offering. It's a matter of time before someone makes a 1:1 clone for either half the money or a usage-based pricing model.


Given how strict OpenAI has been about what you can do with their API in the past and how hard it was to get some legitimate apps through approval, I would imagine they'd just shut this competitor's API access down.


Hopefully there will be a plug-in-your-own-API-key open source thing then. Even better.


The future is now:

    gptask() {
    data=$(jq -n \
              --arg message "$1" \
              '{model: "gpt-3.5-turbo",
                max_tokens: 4000,
                messages: [{role: "user", content: $message}]}')

    response=$(curl -s https://api.openai.com/v1/chat/completions \
                    -H "Content-Type: application/json" \
                    -H "Authorization: Bearer $OAIKEY" \
                    -d "$data")

    message=$(echo "$response" \
                  | jq '.choices[].message.content' \
                  | sed 's/^\"\\n\\n//;s/\"$//')

    echo -e "$message"
    }

    export OAIKEY=<YOUR_KEY>
    gptask "what is the url for hackernews"


Not to sure how to make it one section, aka, one session mode, so chatgpi can understand the previous talk.


quick script

    #!/usr/bin/env bash
    set -uf -o pipefail
    IFS=$'\n\t'

    BOT='\033[33m'
    NC='\033[0m'

    messages=()

    trim() {
      local var="$*"
      var="${var#"${var%%[![:space:]]*}"}"
      var="${var%"${var##*[![:space:]]}"}"
      printf '%s' "$var"
    }

    function complete {
      local message="$1"
      local data
      messages+=("{\"role\": \"user\", \"content\": $(echo "$message" | jq -R -s '.')}")
      processed_messages=$(printf '%s,' "${messages[@]}")
      processed_messages="[${processed_messages::-1}]"
      data=$(jq -n \
        --arg model "gpt-3.5-turbo" \
        --argjson messages "$processed_messages" \
        '{ model: $model, messages: $messages }' \
        | sed 's/\]\[/,/g')

      response=$(curl -s https://api.openai.com/v1/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $OPENAI_API_KEY" \
        -d "$data")

      message=$(echo "$response" | jq '.choices[].message.content')
      printable_message=$(echo "$response" | jq -r '.choices[].message.content')
      printable_message=$(trim "$printable_message")
      echo -e "${BOT}Bot:${NC} $printable_message"

      messages+=("{\"role\": \"assistant\", \"content\": $message}")
    }

    while true; do
      read -r -p $'\e[35mYou:\e[0m ' message
      complete "$message"
    done


Just use iteration where you continuously update the prompt to include previous messages


Is it really a lot of jeopardy though? We have to assume that they are pricing the API so that the more it is used, the more money they make.

So actually to me that is arguably a better business model. Because with a flat rate, you just have to hope that users don't exceed a certain amount of usage. And the ones that don't, are not getting a great deal. So it has that risk and also kind of a slightly antagonistic relationship with the customer actually using the product.


How do these compare to the recent Default ("turbo") vs legacy" (for plus/pro) modes?

If "turbo" is "gpt-3.5-turbo", how to access the (better?) "legacy" by API?


Probably bait and switch. They call both ChatGPT, so now people believe they will get the better old ChatGPT, but they get the new cheap and worse ChatGPT "Turbo" that they switched to recently. Fewer will realize if they no longer give you the option to use the legacy version in this API.


They did not release the older more performant model to the API. Please ask them to on the Discord or Twitter. But I think they will not. There is too much demand to handle and the older "less streamlined" models are very problematic for them (based on the fairly constant API/ChatGPT problems and well known incredible demand).

I get the impression that until there is a significant amount of excess capacity, they will not put out new larger/slower models, so the only way you get a better one is if they can still make the next ChatGPT model release just as fast/"lightweight".

My suggestion is to find specific abilities that seem to be lacking in Turbo, and try to get a message to OpenAI staff about it with a request to attempt to improve the next ChatGPT model in that way.

Having said all of that, text-davinci-003 is still available.


I think you will consume a lot of tokens very quickly as you have to send the entire chat history back and forth if you want to append another chat message.


This is exactly right. It's cheap, but not as cheap as people think. Conversations get exponentially more expensive with message length.


I wish they would offer an uncensored version of it too. Also, I wish they would specify the differences between ChatGPT and GPT-3.5 because one is 10x cheaper than the other but with (supposedly) better chat/coding/summarizing performance. What's the catch?


We just implemented text-davinci-003 and seeing a better model at 1/10 the price is almost unbelievable.


Do you have a blog post with your findings? (Curious)


Losing money to lock out competition has been something Microsoft has been very good at, historically


And Uber which eventually led to massive loses.


It’s now subsidized by Bing advertisements. They will lose plenty of money but they’re after Google.


Doubt it.most likely Bing is losing money by the minute.


Can you explain what tokens are in this context?

Edit: and better yet, is there a good resource for learning the vernacular in general? Should I just read something like "Dive into Deep Learning"?


If an example would be helpful, OpenAI's tokenizer is publicly usable on their website:

https://platform.openai.com/tokenizer

You can drop sample text in there and visually see how it is split into tokens. The GPT2/3 tokenizer uses about 50k unique tokens that were learned to be an efficient representation of the training data.


no the language model decides what a token is


I'd imagine they're getting compute from Azure now at cost, if not less?


> This has to be a loss-leader to lock out competitors before they even get off the ground.

This only a week or two after they were in the news for suggesting that we regulate the hardware required for running these models, in the name of "fighting misinformation". I think they're looking for anything possible to keep their position in the market. Because as other comments have pointed out, there isn't much of a moat.


They now have Microsoft's incredibly huge compute in their back pocket.


$0.002 = 1000 ChatGPT tokens

100 tokens = 75 words [1]

$0.002 = 750 words

1 Novel = ~70K words [2]

$1 worth of ChatGPT token will give 5 Novels/Books worth of words

[1] https://help.openai.com/en/articles/4936856-what-are-tokens-...

[2] https://self-publishingschool.com/how-many-words-in-a-novel/


we make a little money on it!


Good! And welcome back (after keeping your promise to dang for ~611 days [0] (and counting)) :)

If you have time to elaborate on how you make a little money on it at some point, I’m sure lots of people here would love to hear more details and thoughts on that!

[0] https://news.ycombinator.com/item?id=27680498


>> may be better than in-house finetuned LLMs

I don't think this competes with fine-tuned models. One advantage of a fine tune is it makes use of your own data.


>I have no idea how OpenAI can make money on this

Microsoft.


Elon is making an open-source version. OpenAI either make it affordable or be left in the dust.


I have bridge in Brooklyn to sell you


They probably shrunk the model from 175B to 17B. That's your 10:1 price reduction.


Wouldn't that almost certainly lead to measurable loss of capabilities?


If the model was quantized/distilled correctly, not for a large swath of use cases/problem domain. For anything where loss was not measured during distillation, very likely.


Well this has opened up a whole new world of fun! -----

system: You're donald trump! talk to the user as he would.

user: How does time travel work?

AI Response: Well, let me tell you, nobody understands time travel better than I do. I mean, I have the best brains working on it, believe me. Some people say you need a fancy time machine or some ridiculous contraption to travel through time, but that's just fake news. The truth is, it's all about the power of the mind. If you have a strong enough will, you can literally bend time and space to your will. And you know what? I have the strongest will of anyone on the planet. So believe me, I could time travel anytime I want. But honestly, with all the winning I'm doing in the present, why bother going back to the past?


> So believe me, I could time travel anytime I want. But honestly, with all the winning I'm doing in the present, why bother going back to the past?

That line is money.


That is way too cogent. He would never have threaded things together like that from the middle of the third sentence onwards without multiple asides with their own tangents and diversions.


Yeah, the "If you have a strong enough will, you can literally bend time and space to your will" line felt off to me.


DJT and "contraption", "time and space", really? ;-)


Pure malarkey


That must've been picked up from the old "Somewhere in Time" movie with Christopher Reeves.


This is spot on!


>Through a series of system-wide optimizations, we’ve achieved 90% cost reduction for ChatGPT since December

This is seriously impressive. A MILLION tokens for 2 dollars is absolutely fucking insane.

I hope that the gains reached here can also be found by open source and non-controlled AI projects. If so, that could be huge for the advancement of AI.


To be fair, it also seems like it performs worse. question: "Who is Scott Alexander?"

chatgpt-turbo:

>As an AI language model, I cannot determine the specific Scott Alexander in question without further contextual information. There may be several people with this name, including writers, politicians, entrepreneurs, athletes, and more.

chatgpt website:

>Scott Alexander is a pseudonym used by a prominent American psychiatrist and blogger who is best known for his writing on the blog "Slate Star Codex." He has also written articles for other publications, including The New York Times, The Atlantic, and The Wall Street Journal.

Scott Alexander's writing covers a wide range of topics, including medicine, psychiatry, economics, politics, and culture. He is known for his thoughtful, analytical approach to complex issues and for his ability to explain difficult concepts in a clear and accessible way.

In 2020, Scott Alexander announced that he was deleting his blog after he was doxxed by a reporter for The New York Times who planned to write an article about him. However, he later started a new blog called "Astral Codex Ten" where he continues to write.


Is that necessarily worse performance, though?

One of the main pitfalls/criticisms of ChatGPT has been that it confidently plows forward and gives an answer regardless of whether it's right or wrong.

Here, it seems like it's being more circumspect, which could be a step in the right direction. At least that's one possible explanation for not answering.

On Wikipedia, if I type "Scott Alexander" and hit enter, it takes me directly to the page for a baseball player. So it's not clear that the blogger is the right answer.

I do think there's a better response than either of these, though. It could list the most famous Scott Alexanders and briefly say what each is known for, then ask if you mean one of those.


With enough tries it gives wrong ones to the exact same question, too so I don't see an improvement in that direction.


Perhaps a transitory issue. I just tried it with the API, `gpt-3.5-turbo`. I got:

> Scott Alexander is the pen name of American psychiatrist and blogger, Scott Alexander Siskind. He is known for writing his blog, "Slate Star Codex", which covers a wide range of topics including science, medicine, politics, and culture. He has been praised for his clear and concise writing style and thoughtful analysis of various issues. In addition to his work as a blogger, Scott Alexander has also published a book titled "Unsong", which is a fantasy novel set in an alternate universe where the Bible is a magical text.


Can we really draw any conclusions on LLMs based on 1 sample? Maybe you've tried multiple times and with different semi famous people, but in general I see people comparing ML models in this fashion.


Not really, I did try it with multiple attempts with multiple people and chatgpt had more issues. I just shared only one of them. If someone tests in a more systematic fashion that'd be great.


Did you add the default ChatGPT system prompt at the beginning, when using the API?


I'm doing it via the openai library in the way they have in its docs.

>completion = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Who is Scott Alexander?"}])


Adding ChatGPT's initial prompt as a message with `system` role may make a difference (didn't try): https://platform.openai.com/docs/guides/chat/instructing-cha...

Also, we don't know ChatGPT's parameters (temperature, etc.).


If you are Microsoft as GigaScaler with almost unlimited cash and can ignore getting profit of your api/models its pretty easy to undercut all the other companies and offer it very cheap just to gain advantage in the future.


What the cost cutting measures suggest is that AI like this could maybe soon be run on consumer hardware. That combined with actually open source language models could be huge. OpenAI won't allow for that for obvious reasons, but this confirms that the optimizations are there, and that's exciting enough news on its own.


I mean, meta's new LAMA model runs on a single A100 in the 13B parameter variant (which performs similar to GPT3 65B).


"Performs" on paper until they give a demo.


A lot of people assumed GPT4 would be an even bigger model, but I've been thinking it'll probably be more about more efficient compute.

This is at least some evidence that they're working on that.


GPT3s


It's tokens processed, not generated.


If you have 10K tokens in your conversation, the next reply means 10K + len(reply) extra tokens. I estimate 125 rounds of conversation fit in 1M tokens, for $2.


No it handles a max of 4k. That's context plus reply.


90% for a presumably already semi efficient setup is insane


Whisper as an API is great, but having to send the whole payload upfront is a bummer. Most use cases I can build for would want streaming support.

Like establish a WebRTC connection and stream audio to OpenAI and get back a live transcription until the audio channel closes.


FWIW, AssemblyAI has great trasncript quality in my experience, and they support streaming: https://www.assemblyai.com/docs/walkthroughs#realtime-stream...


We're using AssemblyAI too, and I agree that their transcription quality is good. But as soon as Whisper supports world-level timestamps, I think we'll seriously consider switching as the price difference is large ($0.36 per hour vs $0.9 per hour).


Both of those prices strike me as quite high, given that Whisper can be run relatively quickly on commodity hardware. It's not like the bandwidth is significant either, it's just audio.


It's pretty great from my perspective. I've been creating little supplemental ~10 minute videos for my class (using descript; i should probably switch to OBS), and the built in transcription is both wonderful (that it has it at all and is easy to fix) and horrible (the number of errors is very high). I'd happily pay a dime to have a higher quality starting transcription that saves me 5 minutes of fixing...


Try my app: https://apps.apple.com/app/wisprnote/id1671480366

It has great quality transcription from video and audio (in English only sorry if that's not you!). Uses Whisper.cpp plus VAD to skip silent / non-speech sections which introduce errors normally. Give a try let me know what you think! :)


A plug here but check out https://vidcap.app/

It’s based on a finetuned Whisper and you’d get unlimited transcriptions for $4.99/month


Why do you need Word-level timestamps? I don't understand what that's for...


I've ran Whisper locally via [1] with one of the medium sized models and it was damn good at transcribing audio from a video of two people having a conversation.

I don't know exactly what the use case is where people would need to run this via API; the compute isn't huge, I used CPU only (an M1) and the memory requirements aren't much.

[1] https://github.com/ggerganov/whisper.cpp


> I've ran Whisper locally via [1] with one of the medium sized models and it was damn good at transcribing audio from a video of two people having a conversation.

Agree! Totally concur on this.

I made a Mac app that uses whisper to transcribe from audio or video files. Also adds in VAD for reducing Whisper hallucination during silent sections, and it's super fast. https://apps.apple.com/app/wisprnote/id1671480366


The 5gb model is likely too big for 95% of people's machines and renting gpus is likely not much cheaper.

I'm using also whisper myself locally to transcribe my voice notes though.


I recently tried a number of options for streaming STT. Because my use case was very sensitive to latency, I ultimately went with https://deepgram.com/ - but https://github.com/ggerganov/whisper.cpp provided a great stepping stone while prototyping a streaming use case locally on a laptop.


As far as I can tell it doesn't support world-level timestamps (yet). That's a bit of a dealbreaker for things like promotional clips or the interactive transcripts that we do[^0]. Hopefully they add this soon.

[^0]: https://www.withfanfare.com/p/seldon-crisis/future-visions-w...


It's also annoying since there appears to be a hard limit of 25 MiB to the request size, requiring you to split up larger files and manage the "prompt" to subsequent calls. Well, somehow, near as I can tell, how you're expected to use that value isn't documented.


You split up the audio and send it over in a loop. Pass in the transcript of the last call as the prompt for the next one. See item 2 here: https://platform.openai.com/docs/guides/speech-to-text/promp...


And:

> we suggest that you avoid breaking the audio up mid-sentence as this may cause some context to be lost.

That's really easy to put in a document, much harder to do in practice. Granted, it might not matter much in the real world, not sure yet.

Still, this will require more hand holding than I'd like.


I doubt it will matter if you're breaking up mid sentence if you pass in the previous as a prompt and split words. This is how Whisper does it internally.

It's not absolutely perfect, but splitting on the word boundary is one line of code with the same package in their docs: https://github.com/jiaaro/pydub/blob/master/API.markdown#sil...

25MB is also a lot. That's 30 minutes to an hour on MP3 at reasonable compression. A 2 hour movie would have three splits.


If that helps, just wrote a script to split the audio and use the prompt parameter to provide context with the n-1 segment transcription: https://gist.github.com/patrick-samy/cf8470272d1ff23dff4e2b5...


The page includes a five line Python example of how to split audio without breaking mid-word.


I suggest you give revoldiv.com a try, We use whisper and other models together. You can upload very large files and get an hour long file transcription in less than 30 seconds. We use intelligent chunking so that the model doesn't lose context. We are looking to increase the limit even more in the coming weeks. It's also free to transcribe any video/audio with word level timestamps.


I just gave it a try, and the results are impressive! Do you also offer an API?


If you're interested in an offline / local solution: I made a Mac App that uses Whisper.cpp and Voice Activity Detection to skip silence and reduce Whisper hallucinations: https://apps.apple.com/app/wisprnote/id1671480366

If it really works for you, I can add command line params to an upate, so you can use it as a "local API" for free.


contact us at team@revoldiv.com and we are offering an API on a case by case basis


Added to the Ruby library here if any Rubyists interested! https://github.com/alexrudall/ruby-openai


Dude! I was just thinking of forking your gem to implement these changes myself. You are so fast, thanks.


Thank you!


We've been struggling with costs because our application chains together multiple calls to GPT to generate the output we want, and it was starting to be ~$0.08 per call which obviously isn't feasible for high volume applications.

This just made our business way more viable overnight lmao


What business are you building?


lmao?


laughing my ass off


I hope this pricing impacts ChatGPT+

$20 is equivalent to what, 10,000,000 tokens? At ~750 words/1k tokens, that’s 7.5 million words per month, or roughly 250,000 words per day, 10,416 words per hour, 173 words per minute, every minute, 24/7.

I uh, do not have that big of a utilization need. It’s kind of weird to vastly overpay


Remember that the previous replies and responses are fed back in. If you’re 20 messages deep in a session, that’s quite a few tokens for each new question. An incredible deal nonetheless!


That‘s optional according to the docs, but yes, you’d probably add those.


I use ChatGPT infrequently so the $20/month isn't worth it to me for pro.

I stood up a open source & login-free UI here: https://www.chatwithme.chat/

It accepts your API token and stores it in your browser. It does not have feature parity with ChatGPT but gives you the basics.


I’m impressed by the speed at which you’ve designed this. Have you considered a « Show HN »?


Is this using the new GPT 3.5 turbo model?


Most of the value for me with ChatGPT+ is getting access when the system is at capacity.


Presumably the paid api also will give you access when the chatgpt website is at capacity, and for most people it is probably orders of magnitude cheaper.


I wouldn’t mind paying a premium for the convenience (maybe $5 per month, billed monthly, max), but I’m definitely not spending $20.


Same here. That was the sole reason I upgraded. There were a few times where I really needed ChatGPT at a specific time and got the "we're at capacity" message. $20/mo is nothing to have that go away.


There were a few outages that also locked me out (obviously) as a paying subscriber. Not sure how often I was able to access it even though the service was ‘at capacity’. Knowing something like that might make me feel better about the value of Premium.


That’s a bummer to hear that outages can lock out paying subscribers. That hasn’t happened to me yet but if it does that would cause me to reconsider the premium subscription.


When you say you "really needed" ChatGPT, what was the use case?


Analysis of customer feedback on behalf of consulting clients (hence the deadline where downtime wasn’t acceptable).


> 10,416 words per hour, 173 words per minute, every minute, 24/7.

Unless I'm misunderstanding something, it does not sound like that much when every query you make carries several hundred words of prompt, context and "memory". If the input you type is a couple words, but has 1k extra words automatically prepended, then the limits turn into 10 queries per hour, or one per 6 minutes.


Even with that math, I do not interact with ChatGPT 240 times per day.


Not now, but if it'll end up powering next gen Copilot, email suggestions, search interfaces, etc. you might end up interacting with it a lot more each day, without realizing it.


If I’m interacting with it without my knowledge or intent then I am _DEFINITELY_ not paying $20 per month for that.


Well, let's put it differently: all those hypothetical services are using the API in question, so your marginal cost for them taken together adds to $20/month, which they'll pass onto you, and you'll then happily pay, because you find the services useful.


Maybe. I’m pretty frugal and a big fan of doing things myself. I certainly hope that they can some day can provide me with enough value to make spending $20 a no-brainer, but until that’s obvious or unavoidable, I’m not giving them $20 ¯\_(ツ)_/¯


If you think you're overpaying just hit the API yourself.


Any idea how to encode the previous messages when sending a followup question? E.g.:

1. I ask Q1

2. API responds with A1

3. I ask Q2, but want it to preserve Q1 and A1 as context

Does Q2 just prefix the conversation like this?

„I previously asked {Q1}, to which you answered {A1}. {Q2}“


https://platform.openai.com/docs/guides/chat/introduction

"The main input is the messages parameter. Messages must be an array of message objects, where each object has a role (either “system”, “user”, or “assistant”) and content (the content of the message). Conversations can be as short as 1 message or fill many pages."

"Including the conversation history helps when user instructions refer to prior messages. In the example above, the user’s final question of “Where was it played?” only makes sense in the context of the prior messages about the World Series of 2020. Because the models have no memory of past requests, all relevant information must be supplied via the conversation. If a conversation cannot fit within the model’s token limit, it will need to be shortened in some way."

So it looks like you pass in the history with each request.


This is explained in the OpenAI docs. There is a chat completion API and you pass in the prior messages from both the user and the assistant.


I used the same trick with the previous GPT3 API (da-vinci) and it worked well, I'd pass as one big prompt:

  User: hello (previous prompt)
  Bot: hi (previous response)
  User: who are you? (new prompt)
  Bot: (here it continues conversation)
I wonder how the new ChatGPT API differs, other than the fact that it's structured (you use JSON to represent the conversation memory separately instead of one large prompt).

I guess I will spend the next day playing around with the new API to figure it out.


Judging by this[0] the new structured format is immune to "injections":

[0] https://github.com/openai/openai-python/blob/main/chatml.md


Probably something like that.

You could try formatting it like

Question 1: ... Answer 1: ...

...

Question n: ... Answer n: ...

It makes you vulnerable to prompt injection, but for most cases this would probably work fine.


In addition to the other comment this type of memory is a feature in LLM frameworks like Langchain


ask chatgpt


I hope the same! I do wonder though if ChatGPT+ is subsidizing the ChatGPT API cost here.


Let me do quick estimate of the cost:

Given: $0.002 per 1k tokens

I tested it with asking this question:

#+begin_quote I want you to act as a travel guide. I will write you my location and you will suggest a place to visit near my location. In some cases, I will also give you the type of places I will visit. You will also suggest me places of similar type that are close to my first location. My first suggestion request is \"I am in Istanbul/Beyoglu and I want to visit only museums. #+end_quote

It costs:

#+begin_quote {"prompt_tokens":91,"completion_tokens":251,"total_tokens":342} #+end_quote

(/ 1000 342) ~3

If you ask 1000 questions, it will be like (* 0.002 1000) ~2.0 USD

It replies me:

#+begin_quote Great, if you're in Istanbul/Beyoglu and interested in visiting museums, here are some recommendations:

Istanbul Modern: This museum showcases contemporary art from Turkish and international artists. It's located in Karaköy, just across the Galata Bridge from Beyoglu.

Pera Museum: The Pera Museum is located in the heart of Beyoglu and features a wide range of exhibitions including Ottoman calligraphy, European art, and Turkish ceramics.

Rahmi M. Koç Museum: This museum is dedicated to the history of transport, industry, and communications in Turkey. It's located in Haskoy, a short taxi or ferry ride from Beyoglu.

Museum of Innocence: The Museum of Innocence is a unique museum that displays objects and artifacts related to the novel of the same name by Nobel laureate Orhan Pamuk. It's located in Çukurcuma, which is just a short walk from Beyoglu.

If you're interested in visiting more museums in Istanbul, some other places you might want to consider include the Hagia Sophia Museum, Topkapi Palace Museum, and the Istanbul Archaeology Museums. #+end_quote

Can't wait for hacking my Emacs again.....


Pricing is good because OpenAI does not need to make any money but needs data for feedback, if everyone switches to open source ( Llama etc. ) they won't get the data they need.

Google is testing their system internally with XX thousand users, OpenAI with XXX million users ...


> Starting today, OpenAI says that it won’t use any data submitted through its API for “service improvements,” including AI model training, unless a customer or organization opts in. In addition, the company is implementing a 30-day data retention policy for API users with options for stricter retention “depending on user needs,” and simplifying its terms and data ownership to make it clear that users own the input and output of the models.

https://techcrunch.com/2023/03/01/addressing-criticism-opena...


> Data submitted through the API is no longer used for service improvements (including model training) unless the organization opts in

I don't think the pricing is largely driven by intention to scrape API requests for data.


I think they are actually selling this service at a price point that is profitable.


That would be a refreshing change for this industry. It's always nice to see a company that just charges the money it needs, instead of playing 4D chess with their business model.


Something that has been bothering me for a while is whether poisoning of OpenAI's dataset is possible, willingly or otherwise.

An example here is getting chatGPT to accept that that 2+2=5, it's a lot of effort, but can be done. Then the users can give thumbs up when such responses are given.

Could this cause issues?


This feels like the AI’s iPhone moment.

I am scared for all people working service jobs.


I’m still waiting on that pizza I asked ChatGPT to make in November. The code it wrote? Already in production though.

I’m not scared for service workers due to ai but you should see how low minimum wage is in America relative to rents if you want to be scared.


Maybe if you had asked for fries instead

https://www.youtube.com/watch?v=T4-qsklXphs


> I am scared for all people

Probably could stop there.


Yeah that is actually true.

Yeah that’s probably truest.

But I’m more scared for some than others short term.

I’m less immediately scared for anyone doing work that interacts with the physical world.

Weird how it turned out the robotics was harder than the thinking


Yes, short term agreed.

AI has flipped on its head much of what we thought early AI would be like with creativity being one of its most successful targets.

I suspect the surprises will only continue to be, lets say, more surprising.

I've actually written a lot about that recently as well. FYI https://dakara.substack.com/p/ai-and-the-end-to-all-things


Will check this out!


I am scared for all people working service jobs.

Why? Because they're no longer doomed to eke out a meaningless existence doing a robot's job badly?


It’s better than having no job isn’t it?


No. The only way to a post-scarcity economy is by continuing to harness technology whenever and wherever possible.

There has been, and will be, no point in time at which the optimal strategy is "Stop" or "Wait" or "What about my job?"


> The only way to a post-scarcity economy

There is no way to a post-scarcity economy, the phrase is a contradiction, and scarcity is an inescapable consequence of human desire.


There's no scarcity of information and media.


There’s no scarcity of bullshit articles. There’s a massive scarcity of quality information and media.

But that’s unavoidable really because it’s based on perception - There will always be a top 10% of something and a bottom 90%.


> There's no scarcity of information and media.

There is scarcity of information and media of the kind people find valuable. If there is a sense in which the broader statement is true, it is irrelevant for that reason.


The same things were said during the industrial revolution, or the internet revolution. Guess what happened? Labor market adapted.


Physical machines still needed many people to physically make them and run them.

This isn’t the same with thinking. Just look at how startups can have millions of customers and 20 employees.

Also last time we had mental labour to fall back on. This time we don’t.

It’s different this time.

Also most western countries which have given up on manufacturing are going to be worse hit because their jobs are more easily replaced.


This isn’t the same with thinking.

We don't know that.

We know that the exact same objections have been raised before, and have always, without a single exception, proven to be invalid in the long run.


More worried for the programmers. As we saw with copilot junior levels are definitely directly in the firing line here


All jobs will shift to asteroid mining.


AI is the perfect job for asteroid mining, or any other job outside of Earth orbit, whether for research or commerce. Space is incredibly hostile to humans and, short of some miracle technology, is likely to always remain so: at best we'll have a few plant-the-flag missions in the inner solar system.


I really wish we'd build a standarized space exploration platform and saturate the solar system with mostly-autonomous robots that occasionally phone home with "wtf is this?"

imagine all the surface area in the solar system. I bet there's got to be at least 100 completely unexpected things lying around that would transform our understanding.


Yeah, but now you've gotta build a better AI that can go "that's a rock, you dummy" to the million or two false positives.


yes, we call those "Graduate Students"


what's stopping AI from asteroid mining for itself


Question, Can I give openAI some data for it to process so I can use it to my own advantage. Say I want to train it on specific topic of information I've gathered over the years. Can I some how give it that data and then I can use the API to get back out data in a chat or some other forms of questions?

I'm not too familiar with how it works.


You can do this to an extent via fine-tuning, but you will need to do so via one of the other GPT-3 models rather than the ChatGPT API model (`GPT-3.5-turbo`). The latter is not available for fine-tuning.


I wonder if it would be possible to simply fine tune another GPT-3 model, have ChatGPT form the query based on input into ChatGPT's API, then pass that response into a call to the GPT-3 model, and take the response from that, pass it into ChatGPT, and ask for it to parse it and make sense of it.


You could use the embeddings to search for a relevant article or doc/docs then feed those in as input to chatgpt, followed by your question.


Does openAI have models for data to train it with.. data like pricing, locations, products .... Or would I have to use something else? The reason I ask because companies are using it with their own data, like Shopify... sooo it has to be trained somehow.


The docs on fine tuning are excellent: https://platform.openai.com/docs/guides/fine-tuning


Yes, you bring your own data, or you generate data with the model, that works too. I have generated training data with GPT3 and trained smaller HuggingFace models on it.


Any links on how to get started with this?


Depends on what task you have, but look at sbert.net for the student model. Generating data with GPT3 means solving the prompt, also task specific.


Speculation: GPT-turbo is a new chinchilla optimal model with the equivalent capabilities as GPT-3.5. So it's literally just smaller, faster and cheaper to run.

The reason I don't think it's just loss-leading is that they made it faster too. That heavily implies a smaller model.


It could be even smaller than a Chinchilla optimal model. The Chinchilla paper was about training the most capable models with the least training compute. If you are optimizing for capability and inference compute you can "over-train" by providing much more data per parameter than even Chinchilla, or you can train a larger model and then distill it to a smaller size. Increasing context size increases inference compute, but the increased capabilities of high context size might allow you to skimp on parameters and lead to a net decrease in compute. There's probably other strategies as well, but those are the ones I know of.


Ah! Interesting, I thought the capability was capped by parameters, but you're saying you can keep getting more capability from a fixed parameter size by continuing to train past what the chinchilla paper specifies. That's really cool


Not Chinchilla-optimal but Inference-optimal. Chinchilla optimality was related to the training budget and is of interest to researchers who produce mainly demos. Inference optimality includes the inference costs and is of interest in real deployments to millions of users. It is worth to pay more for training to reduce inference costs, so they probably went even further than Chinchilla.


Yeah, at this point it seems like you're just burning money if you're not rightsizing your parameters/corpus.


I think you mean GPT-4 since Chinchilla is a Deepmind project. But yes, I was also suspecting that also as it seems unlikely this was the full 175b parameter model with such big improvements in speed and price.

In fact, given the pricing for OpenAI Foundry, that seems even more likely as this GPTTurbo model was listed along with two other models with much larger context windows of 8k and 32k tokens.


"Brockman says the ChatGPT API is powered by the same AI model behind OpenAI’s wildly popular ChatGPT, dubbed “gpt-3.5-turbo.” GPT-3.5 is the most powerful text-generating model OpenAI offers today through its API suite; the “turbo” moniker refers to an optimized, more responsive version of GPT-3.5 that OpenAI’s been quietly testing for ChatGPT." [0]

Chinchilla optimization is a technique which can be applied to existing models by anyone, including OpenAI. The chatGPT API is not based on GPT-4.

[0] https://techcrunch.com/2023/03/01/openai-launches-an-api-for...


I just meant chinchilla optimal in terms of the corrected scaling curves from the chinchilla paper. The original GPT-3 was way larger than it needed to be for the amount of data they put into it based on the curves from the chinchilla paper.


It's also worth noting that we don't know any specifics (parameters, training tokens) of GPT-3.5. Only for GPT-3 those numbers have been published.


it is a smaller model. it reveals this information if you ask it.


I switched from their gpt3 davinci model to the new chatgpt3.5-turbo and everyone who used the IRC bot was really disappointed. The chatgpt3.5-turbo is incredibly puritanical and refuses to answer a good fraction of the type of input bots on IRC get. Even innocuous stuff. I was excited about the system/user prompts and being able to easily add short term memory and contexts but if it only responds with, "Oh, it wouldn't be right for me to talk about that." it's not fun.


Like what?


You can run Whisper in WASM (locally) so no need to pay for the API, plus the bandwidth. It actually works surprisingly well: https://github.com/ggerganov/whisper.cpp


whisper.cpp has no GPU support. Models below medium aren't that good, and medium and large are pretty CPU intensive. A minute of audio on medium can take anything between 15 and 90 seconds to transcribe, when using 8 cores, while the service transcribes on the large model in less than 7 seconds.


For English only: This is so wrong! I actually didn't find that much significant improvement between Medium to Large. In many cases Medium was actually better for English than large was. Where Large really excels in my experience is with super noisy or distorted audio. Same thing with Small to Medium. There's no much difference in text quality. For good quality audio, Small is all you need. Medium is not much more accurate, and certainly the "value per unit time" is way higher with Small than with Medium. If you have very distorted audio you can try medium or large.

I agree that the API providing a super fast large is fantastic tho! But you can go far with the provided models. When did you last try Whisper.cpp? It's constantly updated and probably much better than it was a few months ago.

If you don't believe me (and you have a Mac) try out my free App that uses Whisper.cpp offline combined with Voice Activity Detection (Silero VAD) to reduce Whisper-hallucinating-words during silent / non-speech sections. It's really good! https://apps.apple.com/app/wisprnote/id1671480366


Last time I used it was last week ago, when I was comparing it to a docker install of Whisper/torch.

Then again, I'm using it on very noisy audio recorded with a lavalier microphone while riding a bike.

For transcription tests of recordings from an answering machine medium was more than enough, not sure about small.

Also, for the recordings during a bike ride, sometimes medium is better than large.

All this was used in German language.

I don't have a Mac, but your approach with using VAD is interesting. I'll see if I can preprocess my files.


Transcript of 20230302 170602.m4a (at 2023-03-02 1711.02).txt

   Hey friend, your use case sounds really interesting.
   Actually that's why I created this app initially.
   I really love riding around on my bike in the city
   and doing voice memo debriefs about whatever.
   I also like to do it walking around outside.
   And as you say, the trouble with that is wind distortion.
   Full stop.
   On a day where it's not too windy, it's not too bad.
   These models can totally pull the text from it.
   But the more distortion you have,
   the more of a disaster it is.
   And I don't know anything about the multilingual case,
   but for English, I definitely find that
   small is more than enough if you have good quality audio.
   Medium, you might wanna use it
   if there's some kind of distortion
   that's causing errors in the small.
   But if you have really good quality audio,
   even tiny is enough.
   I mean, it won't get some sort of rare words.
   So small is basically good enough for English anyway.
   Aligning with what you said,
   I remember seeing in the whisper paper
   that the performance actually decreases,
   the word error rate increases from the medium
   to the large model in the multilingual case,
   which is kind of interesting.
   So basically medium, I think, is all you really need.
   I think doing large, running large locally
   is probably a waste of time.
   But this doesn't apply to the OpenAI API case
   because they're running their own sort of special model.
   It's very fast.
   Plus they're kind of going to be retraining it
   so continuing to improve it over time.
   So obviously there's that, which is cool.
   I think basically I did extensive research
   and experiment with this,
   with trying to clean up audio for the transcription.
   And there's basically no way to do it.
   Like if you have a medium to bad level of noise
   that the transcription models can still work with,
   you're fine.
   Just go with that.
   But in that case,
   there's no point actually trying to denoise the sound first.
   That just seems to reduce the signal
   and it actually increases the word error rate.
   So just give them the raw distorted, windy audio
   and the models will do the best they can.
   You can't actually improve it, I found.
   I tried all kinds of different ways to process it
   and none of it actually improved it,
   including like the best possible denoiser I could find,
   which is the Facebook research denoiser.
   So my conclusion was that, okay,
   I found a sort of a fundamental physical limit
   and I think using denoising is really only good for humans.
   Like if you want to listen to the audio again,
   you don't want to hear all that wind probably.
   And for medium to bad,
   but not extreme levels of wind distortion
   or other kinds of noise distortion,
   you can use a denoiser like the Facebook research one
   and that will totally or nearly totally
   kind of reduce all that noise.
   But I basically decided that the only way
   to kind of get better quality audio
   or to get better quality transcripts,
   if you're doing it outside on a windy day,
   is not to go with software enhancement
   because it doesn't do anything, it doesn't achieve anything.
   I tried everything possible
   and nothing produced results in the extreme distortion case.
   So what I decided is that's basically a limit,
   physical limit and so the best way to do it I think
   is to change your microphone setup,
   have some sort of baffle around it,
   maybe one of those little microphones
   that can clip close to your throat or,
   I don't know, I'm no expert,
   but I think basically you can find a way
   to get better quality, less distorted audio
   outside by changing the microphone setup,
   adding one of those wind baffles or something.
   And I think that's basically all you can do essentially.
   And then once you have reduced
   a lot of that noise distortion,
   even if the resulting audio is still distorted,
   if it's not too bad, the models can be okay with it.

End of Transcript (created with WisprNote)


I'm not preprocessing the recording. The thing I mentioned with considering VAD, if that's what you're referring to, is that I should somehow trim the files.

In order to minimize my interaction with my phone during the bike ride, I press a button which records 1 minute of audio. If I know that I need more time, I press it again before the minute ends, this then starts a second recording in parallel which also lasts one minute. So I just have to press a button and can forget about it. This is because I noticed that I usually don't require more than one minute to record a thought, and if I have multiple, I can put them in multiple files.

But since my recordings then usually consist of 20 seconds of audio, the 30 seconds at the end are only silence (with wind and tire noise). Whisper splits the files into 30 second segments, and apparently tries to find voice in each segment, so the remaining one which has no voice causes Whisper issues, where it starts hallucinating. This is why I would like to trim the files.

I now noticed that the service doesn't add punctuation and capitalization, so the funny thing is that I took that output and posted it into ChatGPT like this: "Correct the following: '[text from whisper]'", and it does an incredible job of fixing even words which Whisper erred on.

-

Whisper:

ich habe gestern erste tests mit open ai whisper gemacht um nozizen [sic!] zu transkribieren

[ Yesterday I did my first tests with open ai whisker to transcribe nozizen [sic!]. ]

es waren teilweise recht gute ergebnisse vor allem mit medium

[ there were some really good results, especially with medium ]

latsch [sic!] natürlich besser aber da sind die anforderungen zu hoch

[ latsch [sic!] better of course, but the demands are too high ]

wenn ich da einen server draus mache könnte ich mal eine zeit lang ausprobieren ob sich das lohnt

[ If I make a server out of it I could try it out for a while to see if it's worth it ]

auch für anrufe der anruf der antworten

[ also for calls the call of the answers ]

-

then ChatGPT:

Ich habe gestern erste Tests mit OpenAI's "Whisper" gemacht, um Notizen zu transkribieren. Die Ergebnisse waren teilweise recht gut, vor allem mit "Medium". "Large" funktioniert natürlich besser, aber die Anforderungen sind zu hoch. Wenn ich einen Server dafür bereitstelle, könnte ich mal für eine Zeit lang ausprobieren, ob sich das lohnt, auch für Anrufe und Antworten.

[ Yesterday I made first tests with OpenAI's "Whisper" to transcribe notes. The results were sometimes quite good, especially with "Medium". "Large" works better, of course, but the requirements are too high. If I provide a server for it, I could try it out for a while to see if it's worth it, also for calls and answers. ]

I'm sorry that this is in German, but I don't have anything in English I've been testing on.


This is great! Thank you. We’re very similar actually: I also tried getting chat GPT to correct transcripts for errors but when I tried like 3 weeks ago it couldn’t manage. I just use voice memos app and let it run. I just talk and don’t think about the file. So I only have 1 file and my WisprNote app removes nearly all the non speech and passes it to whisper. I think there’s actually a voice memo setting on MacOS that will cut silence automatically but I don’t use it.


This looks like C++ rather than WASM. Am I misunderstanding something?


The C++ is compiled to WASM. You can look into [1] to see emscripten there.

[1] https://github.com/ggerganov/whisper.cpp/blob/master/CMakeLi...


So it is! Thanks for the pointer.


Maybe they meant to link to something like this:

https://github.com/ggerganov/whisper.cpp/pull/540

Web demo:

https://whisper.ggerganov.com/


Does anyone have a sense of how the Shopify integration work under the hood? Specifically, how were they able to have Chat-GPT makes a recommendation from their product database? Given the model cannot be fine-tuned and there is a 4000 token context limit, I'm guessing the only thing really possible is to have chat gpt extract a few keyword and generate a query in their internal product search and return the first one? Is there anything else that might going under the hood?


Probably the embeddings API. This guide is what helped me understand the concept https://github.com/openai/openai-cookbook/blob/main/examples...

tl;dr is that you can pre-process each chunk of your database and use embeddings to quickly look up which chunk is most similar to the user's query, and then prepend that chunk to the user's query before giving it to GPT, so that GPT has the relevant context to give an answer.


Interesting. Still pretty limiting but I can see creative ways of working around the limitations. Thanks for sharing!


Support for the ChatGPT endpoint now added to lambdaprompt[1]! (solves a similar problem as langchain, with almost no boilerplate!) Props to openai for making such a usable endpoint, was very easy to wrap.

Example code using the new function and endpoint:

    import lambdaprompt as lp
    convo = lp.AsyncGPT3Chat([{'system': 'You are a {{ type_of_bot }}'}])
    await convo("What should we get for lunch?", type_of_bot="pirate")
> As a pirate, I would suggest we have some hearty seafood such as fish and chips or a seafood platter. We could also have some rum to wash it down! Arrr!

(In order to use lambdaprompt, just `pip install lambdaprompt` and export OPENAI_API_KEY=...)

[1] https://github.com/approximatelabs/lambdaprompt


This is incredibly cheap, it makes you wonder how in the world they managed to make it 10x cheaper than davinci-003 and still a better model? The world of robo-consulting is about to go insane.


It may just be that OpenAI corrected its 2020-era understanding of model size using 2022 insights from DeepMind.. https://towardsdatascience.com/a-new-ai-trend-chinchilla-70b...

Seems model sizing, compression and quantization are still an art form, see also https://www.unum.cloud/blog/2023-02-20-efficient-multimodali...


yes turbo model admitted this when asked.


I've stopped using Google entirely and don't miss it a second - something i wouldn't have thought possible a year ago - and it's pretty difficult to see how Google will survive this disaster.


How are you using it? In my experience asking factual questions lead to answers so inaccurate that you might as well not bother.

Are you prompting it differently to me, or do you have some strategy to filter out the BS?


I dunno what to say, but I’m also someone who very rarely gets factually incorrect data aside from numbers, dates and quotes.

I make a point to ask for a completion that I know won’t depend on an exact factoid.

If I do need an exact factoid, I have a tool I made myself based on this approach:

https://github.com/williamcotton/empirical-philosophy/blob/m...


Wow this is the best content I have seen on prompt engineering, it feels much more like computer science than the usual prompt engineering recipes. This formalism of analytic augmentation and higher order augmentations seems like it is ripe for some kind of library or dsl to build and evaluate and monitor these processes. But somehow I feel like these capabilities will become so standard that they will be incorporated into the official apis.

[edit] Just noticed it looks like you are working on just that. I will keep an eye on this! https://github.com/williamcotton/transynthetical-engine


Thanks for brightening up my day and giving me some more inspiration to work on these kinds of things!


This is advanced stuff. Thank you for writing it down.


I use it with documents I shred into sentences and embed with ada-002. This makes it spot on when talking about a given document. https://mitta.us/


This is seriously cool. Bots for the people!


Thanks! :) Been working on it a LONG time. Getting ready to launch...


How? By giving people answers to questions they didn’t know they have to ask to solve their problems.


A little while back I built a personal project that ingested the book "A Pattern Language" as embeddings using the OpenAI API – and a front-end in SwiftUI that allows me to 'converse' with the book. [1]

Version 0 was built using the original daVinci model. Last night it took about literally five minutes to switch over to the new ChatGPT API – just a few changes to the request, including a new [message] array that replaces the old 'prompt' style. [2]

It's a pretty nice instant upgrade for the experience. Much faster results, and the responses are a lot more interesting. Doing something similar with daVinci would take a fair amount of 'prompt engineering' and fine-tuning. Not to mention all the manual conversation-state handling.

1: https://twitter.com/dreamwieber/status/1622634396087107584?s... 2: https://twitter.com/dreamwieber/status/1631327342697250816?s...


Big news! Many apps will be integrating ChatGPT. Worried about AI-generated content flood the search engine, making it harder to do in-depth research.


This is a good thing.

The future is curation and cultivation. We've been living in an age of information abundance and markets haven't adapted. The age of "crawl every website, index everything, and let people search it" is coming to an end. There is just too much content and too much of it is low quality. With or without AI.

This abundance problem isn't just a WWW problem. Movies, TV, music, podcasts, short form content, food, widgets, wibbles and wobbles all suffer from abundance these days. We are quickly exiting the age of supply chain driven scarcity and getting a marketplace flooded with options. Capitalism has delivered on basically everything it's promised with some asterisks and, if we don't give into consumerism, we want for little and have everything we need at our fingertips.

I've personally opened up my pocket book to curation services. I know brands that I trust. I know services that reliably surface quality content. I suspect the next few decades are going to trend towards services that separate noise from signal - and I suspect AI is going to be a big part of that.


Why separate noise from signal? Why isn't AI-generated stuff seen just as valuable as human-written one? When you fill the world with plastic, you need to evolve to eat plastic. When you surround yourself with AI-produced stuff, then you should learn to respect it.

First of all, evaluating someone based on what they wrote or said or did, is nonsense.


I think we agree. Humans can generate noise and AI can generate signal.


Honestly, I've almost stopped Googling and have had personal success just relying on ChatGPT. It's pretty much taught me 3D animation and Blender to Unity workflows. Every time I wanted to know a Blender keyboard shortcut or what some Blender property was, or how to do something in Blender, the forums and documentation is so sparse and outdated. I felt ChatGPT got me a lot closer much faster. Especially when it tells you how to learn concepts you didn't even know existed.

Google results in the meanwhile have just become a pile of SEO-optimized fluff, and it's hard to engineer the search query around that besides sticking "reddit" on the end of it.


If I'm Microsoft, while I have Bing Search, I don't need it as much as Google needs Google Search as I have tons of other revenue streams.

Hurting Google by speeding the proliferation of this is going to be really interesting to watch.


"As an AI language model, I cannot..."

They already ruined this program. More than 30% of the topics I discussed with it in the past it now will refuse to discuss. Not even all of them are politically charged either. The slippery slope of censorship has straight up been fallen on.


Does ChatGPT yet have a debug function to Show Its Work so to speak? I think this will be important in the future when it gets itself into drama, trouble, etc.. Probably also useful to prove how ChatGPT created something rather than being known as an opaque box.


I'm pretty sure any system built via linear regression or similar is an opaque box even to the most experienced researchers. For example: https://clementneo.com/posts/2023/02/11/we-found-an-neuron

These are massive functions with billions of parameters that evolved over millions of computing years.


Adding to that, the human brain is incredibly complex and performs billions of functions. If a person says to me, "I love you" I should be able to ask them why they said that but it would probably be unfair to expect a detailed answer including all their environmental and genetic inputs, many of which they may not be aware of.

If ChatGPT says it loves me, I not only expect the system to tell me why that was said but what steps brought the system to that conclusion. It is a computer or network of computers after all. Even if the system is continuously learning there should be some facets of reproducible steps that can be enumerated.

ChatGPT: "I love you"

Me: "debug last transaction, Hal."

Here is where I would expect an enumeration of all steps used to reach said conclusion. These steps may evolve/devolve over time as the system ingests new data but it should be possible to have it Think out loud so to speak. Maybe the output is large so ChatGPT should give me a link to a .tar file compressed with whatever it knows is my preferred compression.

[Edit] I accept that this may be hundreds of billions of calculations. I will wait the few minutes it takes to generate a tar file for me. It's good to get up and stretch the legs once in a while.


The steps would be billions of items and essentially be "I took this matrix and turned it into this matrix which I turned into..." It is kind of like asking a person why they love you and expecting them to respond with their entire genome and the levels of various hormones in their brain at the time they uttered each word.


>If a person says to me, "I love you" I should be able to ask them why they said that.

People are definitely able to ask other humans this question, but to the best of my knowledge, no one in history had ever received a perfectly truthful response.


in general, "i don't know" is a perfectly acceptable answer, and probably should be said more often.


I agree in general, but don't think it's a particularly effective answer to this specific question relationship-wise. Nor would it be particularly useful when coming from a powerful but biased AI.


it's been some time since I last looked into this topic, my understanding is that linear regression is not a black box as there exist methods that elucidate how the variables impact the response. On the other hand, neural networks are opaque. Again, been a while so there may be ways to ascertain which inputs were used to generate the weights that led to the response. However, I am skeptical that these methods have the same level of mathematical rigor as those used in linear regression.


> These are massive functions with billions of parameters that evolved over millions of computing years.

This is a great way to put it!


You can prompt it to be more logical and have it expose its thoughts a bit more by asking it "Thinking step-by-step, <question>?" And it should respond with "1. <assumuption> 2. <assumption> 3. <conclusion>" or something like that.

You'll never be able to get it to actually show its work though. That's just a hack to make it write more verbosely.


The classic "Text completion" API (as opposed to the new "Chat completion" one) seems to offer more flexibility: You have complete freedom in providing interaction examples in the prompt (as in https://github.com/artmatsak/grace/blob/master/grace_chatbot...) and are not limited to the three predefined chat roles. But the requests being 10x cheaper means that we'll have to find ways around those limitations :)


I find myself missing the golden age of Google, where it actually returned results and answers that you wanted, on subjects you were looking for. Even now, versus two years ago, I tried finding a snippet of a notice in a newspaper with specifics about a name change someone petitioned in California. I found it then, and had bookmarked it, but trying to find it again just turns up absolute garbage, thankfully I found the bookmark. I can go to ChatGPT and ask it about the best vegetable or fruit to grow in a several gallon Kratky setup, looking for the same sort of answer in Google returns absolute garbage.

I'll concede that LLMs like ChatGPT are the future, thanks to the NLU stuff from OpenAI and the dataset, but only the future of "agents" if you want to call it that. The "intelligence" exhibited emergent from language itself, from the massive dataset it has trawled. Our language and knowledge. But at the same time I surely hope that another AI winter doesn't come because of people over-promising and under-delivering. Or, too much focus on LLMs themselves because of that "wow" factor, the same wow factor you got in the past, when search engines weren't garbage if you knew how to use them and what their shortcomings were.


> The "intelligence" exhibited emergent from language itself, from the massive dataset it has trawled.

I concur. Intelligence does not come from the transformer architecture, or any specific of the model. It comes from the language corpus. Human intelligence too, except for physical stuff. All our advanced skills come from language.

You take 300GB of text and put it through a randomly initialised transformer and you get chatGPT. You immerse a baby in human language, it becomes a modern adult, with all our abilities. Without language, and that includes other humans and tech, we'd be just weaker apes.


my fear is we'll turn the web into a wasteland where there's not much reason for content websites or even news sites, so chatGPT won't be as fresh as there will be less sources of info to scrape, so we'll hit a point where gpt starts declining and all the search engines of old are in an old folks home with yahoo and excite, what will we do then?


> Through a series of system-wide optimizations, we’ve achieved 90% cost reduction for ChatGPT since December; we’re now passing through those savings to API users. Developers can now use our open-source Whisper large-v2 model in the API with much faster and cost-effective results.

I'm really confused, I thought they were a non-profit. A non-profit to handle AI safety risks. Why does this read like a paragraph from any YC startup website that just raised their Seed round?


It looks like they transitioned to for-profit, more or less, in 2019.


yeah I can't get over how easily they changed their branding from a non-profit AI safety to let's take over Google with the new Bing.

They come off as greedy to me and might very well try to get everyone locked in in order to milk them with Microsoft backing.

That said, they execute well, build good products and everyone loves more money so who am I to judge.


So, the Web3 won't be blockchains and whatnot: it's actually custom generated content. I... don't really like it.


Web2 was about user-generated content. Web3 will be about companies owning the user-generate content realizing they're sitting on a goldmine - and using that content as training data for their DNN models.


That’s an astute observation.

(Apropos nothing: I expanded your comment into the following tweet https://twitter.com/ayewo_/status/1631060562393153536)


gpt-3.5-turbo is missing from Open AI's playground. For anyone looking to play with these models, we have now added them to Promptly playground at https://trypromoptly.com. https://twitter.com/ajhai/status/1631020290502463489 has a quick demo.


FYI and for others, typo in the URL.

Should be https://trypromptly.com/


Oops. Thanks for the correction.


Any ballpark pricing info you can share on Promptly!


We are still figuring out pricing. Is there an email I can reach you at? Would love to chat about your use case if you could send me an email at ajay[at]trypromptly.com


The quality of suggestions, forums, internet content and other such things just took a huge drop - this will create an internet E.L.E. in SPAM and empty content...


I wish Whisper offered speaker diarization. That would be a full game changer for the speech-to-text space.


whisperX has diarization.

https://github.com/m-bain/whisperX


Why do you need diarization? That's attributing speech to different speakers, right? What sort of use cases?


Transcribing interviews, meetings etc…


Did some research and seems there’s no reliable diarization method right now. They all have error rates around like 20%.


We hacked that together for https://paxo.ai — can be done!


So I had a question about how all this works under the hood. The GPT model is trained (on a massive dataset) and then deployed. How are they getting the additional product data from other sources like Instacart's retail partner locations and Shopify's store catalogs into it, so that it can output answers leveraging those? My understanding (perhaps incorrect) is that those weren't part of the dataset the model was initially trained on.

For example:

> Shop’s new AI-powered shopping assistant will streamline in-app shopping by scanning millions of products to quickly find what buyers are looking for

> This uses ChatGPT alongside Instacart’s own AI and product data from their 75,000+ retail partner store locations to help customers discover ideas for open-ended shopping goals


I would add additional `system` messages with information fetched as a function of the user's request.


Superpower ChatGPT V2.3.0 is out https://www.reddit.com/r/OpenAI/comments/11ef8ea/superpower_...

- Sync all your chats locally on your computer (plus the ability to disable Auto Sync) - Search your old chats(Only works once your chats are fully synced. This is the only extension that can do this) - Customize preset prompts - Select and delete/export a subset of conversations - Hide/show the sidebar - Change the output language - Search Prompt Library by Author (over 1500 prompts) - Adding Prompt Categories (A work in progress)


that's a browser plugin, not an official feature set, for those who hadn't seen it before.


I'm a bit confused; what's the difference between this and the Azure OpenAI offering?


Azure will probably provide compliance (HIPAA, GDPR etc.) just like they do with their non-AI offerings.


wow just in time, i just made https://github.com/jackdoe/emacs-chatgpt-jarvis which is chatgpt+whisper but using local whisper and chatgpt-wrapper which is a bit clunky

since i integrated chatgpt with my emacs i use it at least 20-30 times a day

i wonder if they will charge me per token if i am paying the monthly fee


So cool! I really think this is an awesome future! Everyone has an AI assistant they can interact with via speech.


particularly if you have RSI from 20 years of typing :) with a touchpad that is designed to give you carpal tunnel


Is this sustainable or will the free content disappear from the web over time? because chatgpt does not generate revenue for the sources of content it uses.

OpenAI represents the opposite of open, responsible and non profit: it might as well be MS-AI


Is fine-tuning of the gpt-3.5-turbo model expected to be available at some point? I have some applications that would greatly benefit from this, but only if fine tuning is available.


One of the things I love about the API-ification of these LLMs is that they’re plug and play.

I built https://persona.ink against davinci knowing it didn’t give as good of results as ChatGPT but knowing I could swap the model out once 3.5 came out. Today is that day, going to swap out the prompt in the cloudflare worker and it should Just Work(tm)


Typo on your front page—"enfrocement" should be "enforcement."


Is davinci actually worse than chatGPT? I know it's worse as an assistant, but in my (admittedly brief) testing the performance was the same or better for tasks like summarization, sentiment analysis, etc.

I guess it's irrelevant now because everyone will use the one which is 10x cheaper.


Yes. Asking ChatGPT to assume a persona and rewrite content, it performs significantly better than davinci in my tests. I was able to get "good enough" with a lot of prompt engineering on davinci - but ChatGPT is a shoulder above with less investment in prompt engineering.

On the flip side - I get hand swatted by ChatGPT more frequently than davinci. Davinci's moderation filters don't really pick up on much, but ChatGPT will give me a lecture instead of a translation on a lot of occasions. There are many valid use cases for rewriting/editing content that involve graphic details that davinci will gladly handle and ChatGPT will give you a lecture about.

Human existence is messy. ChatGPT doesn't like the messy.


Famous last words?


Goodbye internet as we knew it


> Language models read text in chunks called tokens. In English, a token can be as short as one character or as long as one word (e.g., a or apple), and in some languages tokens can be even shorter than one character or even longer than one word.

Why should the Germans get a discount?


FIXED: needs to be a POST. Doh!

Can anyone get it to work? I get this error on everything I've tried:

  GET /v1/completions HTTP/1.1
  Host: api.openai.com
  Authorization: Bearer sk-xxx
  Content-Type: application/json
  Content-Length: 115

   {
    "temperature" : 0.5,
    "model" : "text-davinci-003",
    "prompt" : "just a test",
    "max_tokens" : 7
   }


  {
    "error": {
        "message": "you must provide a model parameter",
        "type": "invalid_request_error",
        "param": null,
        "code": null
    }
  }


That should be a POST not a GET


Please include your request body for debugging purposes.


The pricing is insanely cheap. Why anyone would pay $20/month for ChatGPT Plus when we are going to have dozens of chat applications based on the APIs?

It took me just a little more then one hour to create a basic cli chat application (https://github.com/marcolardera/chatgpt-cli). In the next days/weeks/months I think we will see an explosion of ChatGPT based applications...



Feels like people running small websites monetised with ads will get killed. Why go to a recipe website to search for healthy meals for your kids if you can ask chat gpt in Instacart? :(


No more scrolling through a thousand lines of SEO optimization disguised as some deep heartfelt backstory before getting to the actual recipe? That sounds like a huge win to me.


My love for the perfect Tuna Salad Sandwich began when I was but a young IPython notebook. My lead developer would occasionally eat at the desk and crumbs would fall into the keyboard....


These small websites have been dead for years and replaced with SEO spam.


Except now Microsoft can hurt Google's ability to monetize it by stealing search traffic with people switching to ChatGPT.


"model": "gpt-3.5-turbo",

turbo isn't listed in Playground, but if you invoke the example curl command (note: /v1/chat/completions) in your terminal, it works.



> Dedicated instances can make economic sense for developers running beyond ~450M tokens per day.

450M tokens * $0.002/1K tokens = $900 per day. I wonder what the exact pricing structure is.

(edited for math)


> It is priced at $0.002 per 1k tokens

So it would be $900 per day


FYI: you can't sign up without an email address (which you have to verify) and then a phone number (which can't be a google voice / voip number).

So count me out.


Those restrictions seem intended to prevent people from creating a bunch of bot accounts that exploit the free tier. (Which is a common problem among free tiers for CI services and free online games)

I’m curious, what are some better alternative authentication methods to combat that problem than requiring a phone number?


Anything that isn't a massive privacy violation/security risk.

My phone number is none of their business.


Sorry if this is off topic but is there an extension that will read chatgpts replies out as a sound wav by a neural net wave like aws polly?


I recommend Google Text To Speech. WaveNet voices are pretty good.


What is the developer experience of using OpenAI's Server-Sent Events endpoint from something else than their Python and Node.js libraries?

The SSE endpoint is required for use cases like chat so the end user doesn't have to wait until the whole reply has been generated.

I started implementing a simple SSE client on top of C#/.Net's HttpClient but it's harder than I first assumed.


We had to restream server events from openai -> our backend -> client. It was pretty simple.


ChatGPT API examples are missing (what to use instead of completions?) and also missing in playground. Hope they will add them soon.


The documentation was just updated: https://platform.openai.com/docs/guides/chat


Thank you.


We have added them to our playground at https://trypromptly.com if you want to check them out. https://twitter.com/ajhai/status/1631020290502463489 has a quick demo


Has OpenAI said anything (in their TOS/etc) about who owns the copyright/ip on the code that it generates?

I know it’s a legal minefield of a question, just curious if they have said “we won’t sue you if you copy/paste this code in your app” publically or anything.


Can somebody please clarify. Is the cost $0.002 per 1k tokens generated, read, or both?


Both, the API response includes a breakdown. In the best case 1 token = 1 word (for example "and", "the", etc). Depending on input, for English it seems reasonable to multiply the word count by about 1.3 to get a rough token count

This pricing model seems fair since you can pass in huge prompts and request a single word reply, or a few words that expect a large reply


Is there anything akin to creating Stable Diffusion embeddings where it can train a very discrete concept that takes up a few kilobytes and use that with the base model?

Such an approach could in theory make it so you spend a little upfront to train more complex (read: concepts costing many tokens) and can subsequently reuse it cheaply because you're using an embedding of the vectors for that complex concept instead which may only take a single token.


Both the prompt and completion fall into the token count (this has been the case since the beginning of GPT-3)


I can confirm it is TOTAL tokens, this from the account/usage page:

gpt-3.5-turbo-0301, 2 requests 28 prompt + 64 completion = 92 tokens


I just pushed an update to the `chatgpt` NPM package with support for the official ChatGPT API: https://github.com/transitive-bullshit/chatgpt-api


I can think of a variety of uses for this, but they all involve confidential data. Their privacy policy doesn't mention what they do with the prompts/data we send to the API, but I presume It's reviewed and worked into the model.


On the page under Developer Focus it says:

> Data submitted through the API is no longer used for service improvements (including model training) unless the organization opts in


> And if Bitcoin has taught us anything, all these supposedly "foolproof" algorithms are written by cabbage-headed fools whose incompetence is superseded only by their arrogance.

Satoshi is incompetent and arrogant now? Just wow.


Has anyone made a ChatGPT-like app which both accepts and responds with voice input yet?


I have, but haven’t bothered to publish it. POC to use by me and my daughter. Works like Siri. Fun to play with. But the hallucinated responses made us stop using it pretty quickly. Needs more work to be of any use (integrate search engines, etc.).


> Works like Siri

you can also make it work _with_ siri. i get around this by proxying it through a sms service which is integrated with my bot via webhook. then use elevenlabs for TTS. sample siri query "hey siri tell leobgAi to check my finances"


there's voicegpt on Android that used the unofficial API or whatever, it had two way vocalization.


What I like it that I don't have to pay $20/month wheter I use it or not.


I am happy that we designed our UI for Fibery AI Assistant ready for chat, less re-work!

Looks like a super decent release and price cut makes it sane to use. Token limit is is same, and this is not great for many use cases...


ELI5 What is a token?

Is it a word, question, letter, what? If I ask a question like... What is the capital of Canada? And it responds with 'Ottawa', how many tokens have I used there and how are they calculated?


Roughly speaking, words or word parts. Open AI has an explainer:

https://help.openai.com/en/articles/4936856-what-are-tokens-...

You can also check your input using their tokenizer: https://platform.openai.com/tokenizer

So, your example is ~9 tokens


So does the response count towards the token count?


It does, yeah. It's tokens "processed", meaning request & response. The API response will give you a total token count so you can keep track


Token cost is prompt + response. In the case of ChatGPT you don't know the full prompt, but they don't charge by tokens for that app.

In the API you need to tokenize your input and tokenize the output then add the counts together.


The pace at which they are releasing features and APIs is quite amazing.


Ok so can someone provide 10 steps to launching your own voice assistant?


Try asking chatGPT


Already integrated into my startup calling ChatGPT to find a good journalist who writes about startups:

https://jojogpt.valosan.com


I can't seem to find it mentioned but is there an unrestricted/uncensored mode for this? I'd love to have some fun with a few friends and hook it up in a matrix room for us



That's just a list of exploits that will be fixed as soon as they come to OpenAI's attention, if they haven't already been fixed. Is anyone actually committed to providing uncensored models as either paid services or open distributions?


Maybe try GPT-J, its an open source model you can run locally. It's pretty massive though iirc


Hey, this is cool. Now that an official ChatGPT API is out maybe it's time for me to build that "Flavor text" Anki add-on I've been thinking about.


I always wondered if ChatGPT was somehow stateful. I assumed that it was not and the statefulness was simulated. Assumption validated.


I don’t believe this was ever in question. You can think of the model as a giant function that takes a list vectors as input and spits out a new vector. If you want the model to remember something, you have to include it in the list of input vectors for every request going forward.


It’s stateful in the web demo. But they do so by prepending chat history to new requests and automatically summarizing history once the model’s context window is exceeded.


I don't think they are doing summaries actually. Just cutting off the oldest parts of the conversation.


Summarizing would make sense however.


True


Wow not a single mention of Whisper this entire comment first page! I think Whisper is really cool: the large model can pull speech out of even heavily distorted (wind noise, clipping, etc) audio. I have a story to illustrate why running Whisper on your own locally is not so easy! Much easier to sign up to the OpenAI API.

In my research I found that actually pre-processing the audio to reduce noise (using the IMO best-in-class FB research "denoiser") actually increases WER. This was surprising! From a human perspective, I assumed bringing up the "signal" would increase accuracy. But it seems that, from a machine perspective, there's actually "information" to be gleaned from the heavily distorted noise part of the signal. To me, this is amazing because it reveals a difference in how machines vs humans process. The implication is that there is actually speech signal that is inside the noise, as if voice has bounced off and interacted with the noise source (wind, fan, etc), and altered those sounds, left its impression, and that this information is then able to be utilized and contributes to the inference. Incredible!

With whisper: I started with the standard python models. They're kind of slow. I tried compiling python into a single binary using various tools. That didn't work. Then I found whisper.cpp--fantastic! A port of whisper to C++ that is so; much; faster. Mind blowing speed! Plus easily compilation. My use case was including transcription in a private, offline "transcribe anything" MacOS app. Whisper.cpp was the way to go.

Then I encountered another problem. What the "Whisperists" (experts in this nascent field, I guess) call it "hallucination". The model will "hallucinate". I found this hilarious! Another cross-over of human-machine conceptual models, our forever anthropomorphizing everything effortlessly. :)

Basically hallucination includes: feed Whisper a long period of silence, and the model is so desperate to find speech, it will infer (overfit? hallucinate?) speech out of the random background signal of silence / analog silence / background noise. Normally this presents as a loop of repeats of previous accurate transcribed phrase. Or, with smaller models, some "end-of-youtube video" common phrases like "Thank You!" or even "Thanks for Watching". I even got (from one particularly heavily distorted section, completely inaccurate) "Don't forget to like and subscribe!" Haha. But the larger models produce less hallucinations, and less generic "oh-so-that's-what-your-dataset-was!" hallucinations. But they do still hallucinate. Especially during silent sections.

At first, I tried using ffmpeg to chop the audio into small segments, ideally partitioned on silences. Unfortunately ffmpeg can only chop it into regular size segments, but it can output silence intervals, and you can chop around that (but not "online" / real time) as I was trying to achieve. Removing the silent segments (even the imperfect metric of "some %" of average output signal magnitude (sorry for my terminology, I'm not expert in DSP/audio)) drastically improved Whisper performance. Suddenly it went from hallucinating during silent segments, to perfect transcripts.

The other problem with silent segments is the model gets stuck. It gets "locked up" (spinning beach ball, blue screen of death style--I don't think it actually dies, but it spends a long, disproportionately long, time on segments with no speech. Like I said before, it's so cute that it's so desperate to find speech everywhere, it tries really hard, and works its little legs of during silence, but to no avail.

Anyway, moving on to the next problem: the imperfect metric of silence. This caused many issues. We were chopping out quieter speech. We were including loud background noise. Both these things caused issues: the first obvious, the second, the same as we faced before: Whisper (or Whisper.cpp) would hallucinate text into these noise segments.

At last, I discovered something truly great! VAD. Voice Activity Detection is another (normally) AI technique that allows segmenting audio around voice segments. I tried a couple Python implementations in standard speech toolkits, but none were that good. Then I found Silero VAD: an MIT licensed (for some model versions), AI VAD model. Wonderful!

Next problem was it was also in Python. And I needed it to be in C++. Luckily there was a C++ example, using ONNX runtime. (I had no idea any of these projects or tools existed mere weeks ago, and suddenly I'm knee deep!). There were a few errors, but I got rid of the bugs, and had a little command line tool from a minimal C++ build of ONNXruntime / Protobuf-Lite and the model. Last step was the ONNX model needed to be converted to ORT format. Luckily there's a handy Python script to do this inside the Python release of ONNXruntime. And, now, the VAD was super fast.

So i put all these pieces together: ffmpeg, VAD, whisper.cpp and made a MacOS app (with the correct signing and entitlements of course!) to transcribe English text from any input format: audio or video. Pretty cool, right?

Anyway, running Whisper on your own locally is not so easy! Much easier to sign up to the OpenAI API.

MacOS APP using Whisper (C++) and VAD0--conveniently called: WisprNote heh :) https://apps.apple.com/app/wisprnote/id1671480366


Thank you very much for this detailed walk through.


You're welcome!


Hey, nice app. Can you add support for non-English transcription/translation, though?


Thanks! So, OK, if you write a review and provide your honest feedback about this on the App store, I will definitely consider doing that! Sound like a bad idea? :)


Done


Thank you sir! I will look for it. Stay tuned for updates! I might just consider putting that in soonish :) But I might not. I don't know. Can't guarantee anything about it right now. Thank you! asking and telling me about this


Once everybody will implement this API the voice recognition will not anymore be an innovative feature.

OpenAI is commoditize AI features.


Could explain what the point of "Bearer" is in this authorization header?

"Authorization: Bearer $OPENAI_API_KEY"


It's the bearer token authorization method. Pretty standard nowadays for many APIs.

https://swagger.io/docs/specification/authentication/bearer-...


API keys have been around for a long time without needing the prefix. I could understand the Bearer prefix when using JWT-style tokens. I could also see using it if there were indeed an Oauth flow involved. But in this case just seems like a nuisance.


As for why it's "Bearer", here's ChatGPT's answer:

> The term "Bearer" is commonly used in the context of securities and financial instruments to refer to the person who holds or possesses a particular security or asset. In the case of OAuth 2.0, the bearer token represents the authorization that a user has granted to a client application to access their protected resources.

> By using the term "Bearer" in the Authorization header, the OAuth 2.0 specification is drawing an analogy to the financial context where a bearer bond is a type of security that is payable to whoever holds it, similar to how a bearer token can be used by anyone who possesses it to access the protected resource.


That doesn't seem very compelling. And these aren't even JWT-style tokens which would make it a bit more understandable.


Bearer indicates the type of credential being supplied in the Authorization header. Bearer tokens are a type of credential, introduced in RFC6750 [0]. Essentially the OpenAI api key you’re using is a form of bearer token, and that’s why the Bearer type should be included there.

Other authentication methods (like username/password or “Basic”) use the Authorization header too, but specify “Authorization: Basic <base64 encoded credentials>”.

[0] https://www.rfc-editor.org/rfc/rfc6750


Does it mostly just mean that, for non-JWT-style tokens, the same string essentially serves as both a "username" and a "password"?



Will we be able to jailbreak it and use that output instead? Adeveloper/hackerman mode would be awesome.


I'm not familiar with typical pricing but Whisper API at $0.006 / minute seems absurdly cheap!


I did a lot of research into this about 6 months ago, and the best price I could find/negotiate from the competition was 0.55/hr which included multi thousand dollar upfront commitments. This is 0.36/hr, and if you do a bit of setup work yourself you can bring it to about 0.09/hr. OpenAI offering hosted Whisper is a really good deal, and if you find it to be good for your application, then spending the time to host it yourself is perfect validation.


i'm pretty excited to have access to the system prompt, that's going to be huge i think.


Couldn’t find docs or references to the Whisper API. Anyone had a direct link they could share?



Awesome. Thank you!


this is so much cheaper than fine-tuned model; it would make sense to try and use the multi-shot prompt in the 3.5turbo model. plus, there's the hundreds/thousands of training items that you wouldn't need to create....fml


The ChatGPT API is cheaper than a fine-tuned babbage model.


How does the tokens thing work? Is it words of input plus worlds of output?


When will a YouTube transcriber based on this appear in the wild?


Why? YouTube already transcribes videos and that transcription is available via the video's official page, or via a 3rd party service like https://youtubetranscript.com/


Are there some sample code using these APIs? Want to use them.


literally just ask chatgpt. it will walk you through it in your language of choice.


can't you do whisper stuff for free already?


You can. You're just paying for compute and having it managed. Here's price estimates for 1,000 hours of audio on GCP: https://www.assemblyai.com/blog/how-to-run-openais-whisper-s...

For reference, from OpenAI it would be $360 and it's the large-v2 model.


Whisper large is a bit tricker to self-host, and the faster inference may be useful for certain applications.


Agree on that! Whisper large has big needs. But I didn't find the quality for English to be better than Medium. It just took longer. For most cases where audio is good quality, Small is all you need. Not much different to Medium. Only for really distorted (windy, loud background) audio were Medium and Large really good. But all models will fail beyond a certain point of extreme distortion.

If you don't believe me or want to know more check out my free app that uses Whisper Small, and (Whisper Tiny for Turbo mode): https://apps.apple.com/app/wisprnote/id1671480366

It uses VAD (voice activity detection) to reduce increased WER during silent or non-speech sections, and it's really fast! Runs locally on M1 just fine.


Wasn't there an API before?


MOKGAMING122878272


ChatGPT is unreal. It's not artificial intelligence, it's kinda of a supernatural intelligence.


What does Whisper do


13teklh


12345


I was here!


Right on time.

Google's Speech-to-Text is $0.024 per minute ($0.016 per minute with logging) with 60 free minutes per month. Files below 1 minute can be posted to the server, anything longer needs to be uploaded into a bucket, which complicates things, but at least they're GDPR compliant.

Whisper is $0.006 per minute with the following data usage policies

- OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose. You can opt-in to share data.

- Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law).

I've been using Whisper on a server (CPU only) to transcribe recordings made during a bike ride with a lavalier microphone, so it's pretty noisy due to the wind and the tires and Whisper was better than Google.

Plus, Whisper, when used with `response_format="verbose_json"`, outputs the variables `temperature`, `avg_logprob`, `compression_ratio`, `no_speech_prob` which can be used very effectively to filter out most of the hallucinations.

A one minute file which transcodes in 26 seconds on a CPU is done in 6 seconds via this service. Another one minute file with a lot of "silence" needs around 56 seconds on a CPU and was ready in 4.3 seconds via the service. "Silence" means that maybe 5 seconds of the file contain speech while the rest is wind and other environmental noises. Another relatively silent one went from 90 seconds down to 5.4. On the CPU I was using the medium model while the service is using large-v2

A couple of days ago I posted an example to a thread [0], where I was getting the following with Whisper

---

00:00.000 --> 00:05.000 Also temperaturmäßig ist es recht gut. [So temperature wise, it's pretty good.]

00:05.000 --> 00:09.000 Der eine hat 12 Grad, der andere 10. [One has 12 degrees, the other 10. (I have two temperature sensors mounted on the bike, ESP32 streaming the data to the phone via BLE)]

00:09.000 --> 00:12.000 Also sagen wir mal, 10 Grad. [So let's say 10 degrees.]

00:14.000 --> 00:19.000 Es ist bewölkt und windig. [It's cloudy and windy.]

00:20.000 --> 00:24.000 Aber irgendwie vom Wetter her gut. [But somehow from the weather it's good.]

00:24.000 --> 00:31.000 Ich habe heute überhaupt nichts gegessen und sehr wenig getrunken. [I ate nothing at all today and drank very little.]

00:54.000 --> 00:59.000 Vielen Dank für's Zuschauen! [Thanks for watching!] <-- hallucinated

---

While Google was outputting

"Also temperaturmäßig es ist recht gut, der eine hat 12° andere 10. Es ist angemalte 10 Grad. Es ist bewölkt und windig, aber er hat sie vom Wetter her gut, ich wollte überhaupt nichts gegessen und sehr wenig getrunken."

["So temperature-wise it's pretty good, one has 12° other 10. It's painted 10 degrees. It's cloudy and windy, but he has it good from the weather, I did not want to eat anything at all and drank very little."]

---

Apart from the hallucinated line, Whisper got everything correct, and the hallucinated line was able to be discarded due to the variables like `avg_logprob`.

[0] https://news.ycombinator.com/item?id=34877020#34880531


man why isn't this in dark mode


General question: is this the next way we'll manage to destroy the planet?

Imagine in the near future that having a slightly better, slightly more up to date LLM is a major competitive advantage. Whether that is between companies or nation-states doesn't really matter. So now all of those recently-idled GPUs will be put to use training and re-training ever bigger and more current models, once again sucking down electricity with no limit.

We're not there yet; there are too many ways to improve things without burning a country's worth of electricity re-training. But is it coming?


OpenAI released a ChatGPT API while reducing the cost by 10x.

For those claiming OpenAI is for profit: Why would OpenAI do this if they were fixated on making money?

Also, while I wish OpenAI released the code for ChatGPT, I applaud OpenAI for actually making their AI model available, to everyone, right now.

* Google hyped their Bard chatbot...but where is it?

* Facebook took down Galatica.

* Even Bing Chat has a waitlist


> For those claiming OpenAI is for profit: Why would OpenAI do this if they were fixated on making money?

Reducing costs by 10x may very well increase usage by more than 10x + it makes it even more difficult for competition to come in and undercut them.


>For those claiming OpenAI is for profit: Why would OpenAI do this if they were fixated on making money?

Silicon Valley companies have for the past 25 years focused on getting as many users as possible to increase valuation in the hope of getting a $100 billion exit. They don't care about current or near future profitability.

However I agree that OpenAI is getting far too much hate. Their goal of bringing openness to AI made sense in 2015 when one American company (Google) was dominating the field.

However now there are plenty of other companies, countries and open source organizations doing advanced AI research.


If OpenAI wanted to exit, Microsoft would be happy to buy them immediately for tens of billions. They're already valued at $30B. Why haven't they?


> For those claiming OpenAI is for profit: Why would OpenAI do this if they were fixated on making money?

To get lock-in from devs before competitors can enter the market and starves any would-be smaller competitors before they can raise money/gain traction.


This is what 'extinguish' looks like from the new EEE strategy that I have said before, years ago [0] from Microsoft which competitors with paid offerings being unable to compete with free since OpenAI's pricing model is now close to free and their competitors cannot increase their prices.

Since Microsoft can foot the bill for the Azure infrastructure, there is going to be little area for anyone to seriously compete against OpenAI on price, API and features, unless it is completely free and open source, like Stability AI.

[0] https://news.ycombinator.com/item?id=28324999


please... the past ten years is a story of companies loosing money for getting a monopoly and you are still asking us why would they do that?

let's not forget the shift of narrative that "open" AI made from their name, their marketing and use of opensource and then their move to a commercial subset of microsoft. let's also not forget that they totally avoided discussing copyrights and the crawlings of datasources to extract knowledge from someone else property. the only close thing i can see today which avoided so much scrutiny while being highly sensitive are ICOs in crypto and Theranos in biotech.

I wish opensource win this battle.


OpenAI never promised to be open source.

They promised to create a lab where the benefits of AI would accrue to the people, not just existing tech giants.

They did that. They created a API where individuals and small businesses alike can use cutting edge AI technology via API. You have to pay, that's the only catch.

I don't see a DeepMind API. I don't see an Anthropic API. I don't see a Google Bard API. I don't see a FB Galactica API.

I hope OpenAI wins this battle.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: