Hacker News new | past | comments | ask | show | jobs | submit login
GPT-4 API General Availability (openai.com)
763 points by mfiguiere on July 6, 2023 | hide | past | favorite | 546 comments



Promote and proliferate local LLMs.

If you use GPT, you're giving OpenAI money to lobby the government so they'll have no competitors, ultimately screwing yourself, your wallet, and the rest of us too.

OpenAI has no moat, unless you give them money to write legislation.

I can currently run some scary smart and fast LLMs on a 5 year old laptop with no GPU. The future is, at least, interesting.


Can you elaborate on scary smart and fast?

It's been a month or two since I've tried but the results were depressingly slow and useless for more or less every task I tried.

Every time a model is claimed to be "90% of GPT-3" I get excited and every time it's very disappointing.

(On that note, after using GPT-4, GPT-3 now seems disappointing almost every time I interact with it.)


Different quantizations can give you a big speedup if you've had "depressingly slow" issues. Even the slowest ones (that fit in RAM) will run at basically interactive speed, not instant, but also not "email speed". I have a laptop with a 2018 CPU and I'm working with them just fine.

Text generation style instead of chat style is another avenue that makes the feedback time not so annoying for a developer.

at 100ms/token, it's faster than most people type, I think. That's what you might get on an old laptop with a 7B model.

There's a useful leaderboard here to help you pick a model: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

It really depends on your task, lots and lots of natural language type tasks give great results, the models seem to have extensive knowledge of many fields. So for some kinds of Q&A bot (technical or not), for copy blurbs, for fiction, game NPCs, etc, the models (especially 13B and up) can be breathtaking, even moreso considering they run on bottom-dollar consumer hardware (I paid $250 for the laptop I'm developing on).

There are of course some things that neither the local LLMs nor GPT4 can do, like create useful OpenSCAD models :)

Things keep getting better, newer quantization methods give you more smarts in the same amount of RAM at basically the same speed -- the models are getting better, there are more permissively licensed ones now.


Whaaaaat, how are you getting 100ms per token on an 5 year old potato without a graphics card?

Like, not vaguely hand wavey stuff, specifically, what model and what inference code?

I get nothing like that performance for the 7B models, forget the larger models, using llama.cpp on a pc without an nvidia GPU.


I'm running TheBlokes wizard-vicuna-13b-superhot-8k.ggmlv3 with 4-bit quantization on a Ryzen 5 that's probably older than OPs laptop.

I get around 5 tokens a second using the webui that comes with oogabooga using default settings. If I understand correctly, this does not get me 8k context length yet, because oogabooga doesn't have NTK-aware scaled RoPE implemented yet.

Using the same model with the newest kobold.cpp release should provide 8k context, but runs significantly slower.

Note that this model is great at creative writing, and sounding smart when talking about tech stuff, but it sucks horribly at stuff like logic puzzles or (re-)producing factually correct in-depth answers about any topic I'm an expert in. Still at least an order of magnitude below GPT4.

The model is also uncensored, which is amusing after using GPT4. It will happily elaborate on how to mix explosives and it has a dirty mouth.

Interestingly, the model speaks at least half a dozen languages much better than I do, and is proficient at translating between them (far worse than deepL, of course). Which is mindblowing for a 8GByte binary. It's actual black magic.


"Note that this model is great at creative writing"

Could you elaorate on what you mean by that, like, are you telling it to write you a short story and it does a good job? My experiments with using these models for creative writing have not been particularly inspiring.


Yes, having the model write an entire short story or chapter is not very good. It excels if you interact closely with it.

I tested it to create NPCs for fantasy role playing games. I think its the primary reason cobold.cpp exists (hence the name).

You give it a (ideally long, detailed) prompt describing the character traits of the NPCs you want, and maybe even add back and forth dialogue with other characters to the prompt.

And then you just talk to those characters in the scene you set.

There's also "story mode", where you and the model take turns writing a complete story, not only dialogue. So both of you can also provide exposition and events, and the model usually only creates ~10 sentences at a time.

There's communities online providing extremely complex starting prompts and objectives (escape prison, assassin someone at a party and get away, ect.) for the player, and for me, the antagonistic ones (the models has control over NPCs that don't like you) are surprisingly fun.

Note that one of the main drivers of having uncensored open source LLMs is people wanting to role-play erotica with the model. That's why the model that first had scaled RoPE for 8k context length is called "superhot" - and the reason it has 8K context is that people wanted to roleplay longer scenes.


This is a exactly a case in point why people decide to pay OpenAI instead of rolling their own. I'm non-technical but have setup an image gen app based custom SD model using diffusers, so not entirely clueless.

But for LLM I have no where idea where to start quickly. Finding a model on a leaderboard, download and setup then customising it and benchmarking is way too much time for me, I'll just pay for GPT4 if ever need to instead of chasing and troubleshooting to get some magical result. It'll be easier in the future I'm sure when an open model merges as the SD1.5 of LLM


I've found https://gpt4all.io/ to be the fastest way to get started. I've also started moving my notes to https://llm-tracker.info/ which should help make it easier for people getting started: https://llm-tracker.info/books/howto-guides/page/getting-sta...


Here is a short test of a 7B 4bit model on an intel 8350U laptop with no AMD/Nvidia GPU.

On that laptop CPU from 2017, using a copy of llama.cpp I compiled 2 days ago (just "make", no special options, no BLAS, etc):

  ./main -m models/WizardLM-7B-uncensored.ggmlv3.q4_0.bin -n 128 -s 99 -p "A short test for Hacker News:"

  llama_print_timings:      sample time =    19.12 ms /    36 runs   (    0.53 ms per token,  1882.65 tokens per second)
  llama_print_timings: prompt eval time =   886.82 ms /     9 tokens (   98.54 ms per token,    10.15 tokens per second)
  llama_print_timings:        eval time =  5507.31 ms /    35 runs   (  157.35 ms per token,     6.36 tokens per second)
and a second run:

  ./main -m models/WizardLM-7B-uncensored.ggmlv3.q4_0.bin -n 128 -s 99 -p "Sherlock Holmes favorite dinner was "

  llama_print_timings:      sample time =    54.37 ms /   102 runs   (    0.53 ms per token,  1875.93 tokens per second)
  llama_print_timings: prompt eval time =   876.94 ms /     9 tokens (   97.44 ms per token,    10.26 tokens per second)
  llama_print_timings:        eval time = 16057.95 ms /   101 runs   (  158.99 ms per token,     6.29 tokens per second)
at 158ms per token, if we guess a word is 2.5 tokens, then that's 151 words per minute, much faster than most people can type. On a $250 laptop. Isn't the future neat?

the code I was running: https://github.com/ggerganov/llama.cpp

and the model: https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML

There are other models that may perform better, I'm going to be doing a lot of screwing around with OpenLLaMA this weekend.


I'm on a thinkpad with a 2016 CPU (i5-7300U) running ubuntu.

I don't know anything so I left default settings.

I get about 450ms/t with airoboros-7b and 350ms/t with orca-mini-3b.

edit: with oobabooga webui


How are you running inference? GPU or CPU? I'm trying to use GPT4All (ggml-based) on 32 cores of E5-v3 hardware and even the 4GB models are depressingly slow as far as I'm concerned (i.e. slower than the GPT4 API, which is barely usable for interactive work). I'd be much obliged if you could point me at a specific quantized model on HF that you think is "fast" and I'll download it and try it out.


In terms of speed, we're talking about 140t/s for 7B models, and 40t/s for 33B models on a 3090/4090 now.[1] (1 token ~= 0.75 word) It's quite zippy. llama.cpp performs close on Nvidia GPUs now (but they don't have a handy chart) and you can get decent performance on 13B models on M1/M2 Macs.

You can take a look at a list of evals here: https://llm-tracker.info/books/evals/page/list-of-evals - for general usage, I think home-rolled evals like llm-jeopardy [2] and local-llm-comparison [3] by hobbyists are more useful than most of the benchmark rankings.

That being said, personally I mostly use GPT-4 for code assistance to that's what I'm most interested in, and the latest code assistants are scoring quite well: https://github.com/abacaj/code-eval - a recent replit-3b fine tune the human-eval results for open models (as a point of reference, GPT-3.5 gets 60.4 on pass@1 and 68.9 on pass@10 [4]) - I've only just started playing around with it since replit model tooling is not as good as llamas (doc here: https://llm-tracker.info/books/howto-guides/page/replit-mode...).

I'm interested in potentially applying reflexion or some of the other techniques that have been tried to even further increase coding abilities. (InterCode in particular has caught my eye https://intercode-benchmark.github.io/)

[1] https://github.com/turboderp/exllama#results-so-far

[2] https://github.com/aigoopy/llm-jeopardy

[3] https://github.com/Troyanovsky/Local-LLM-comparison/tree/mai...

[4] https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder


> https://github.com/turboderp/exllama

Is exllama an alternative to llama.cpp?


llama.cpp focuses on optimizing inference on a CPU, while exllama is for inference on a GPU.


Thanks. I thought llama.cpp got CUDA capabilities a while ago? https://github.com/ggerganov/llama.cpp/pull/1827


Oh it seems you're right, I had missed that.

As far as I can see llama.cpp with CUDA is still a bit slower than ExLLaMA but I never had the chance to do the comparison by myself, and maybe it will change soon as these projects are evolving very quickly. Also I am not exactly sure whether the quality of the output is the same with these 2 implementations.


Until recently, exllama was significantly faster, but they're about on par now (with llama.cpp pulling ahead on certain hardware or with certain compile-time optimizations now even).

There are a couple big difference as I see it. llama.cpp uses `ggml` encoding for their models. There were a few weeks where they kept making breaking revisions which was annoying, but it seems to have stabilized and now also supports more flexible quantization w/ k-quants. exllamma was built for 4-bit GPTQ quants (compatible w/ GPTQ-for-LLaMA, AutoGPTQ) exclusively. exllama still had an advantage w/ the best multi-GPU scaling out there, but as you say, the projects are evolving quickly, so it's hard to say. It has a smaller focus/community than llama.cpp, which also has its pros and cons.

It's good to have multiple viable options though, especially if you're trying to find something that works best w/ your environment/hardware and I'd recommend anyone to HEAD checkouts a try for both and see which one works best for them.


Thank you for the update! Do you happen to know if there are quality comparisons somewhere, between llama.cpp and exllama? Also, in terms of VRAM consumption, are they equivalent?


ExLlama still uses a bit less VRAM than anything else out there: https://github.com/turboderp/exllama#new-implementation - this is sometimes significant since from my personal experience it can support full context on a quantized llama-33b model on a 24GB GPU that can OOM w/ other inference engines.

oobabooga recently did a direct perplexity comparison against various engines/quants: https://oobabooga.github.io/blog/posts/perplexities/

On wikitext, for llama-13b, the perplexity of a q4_K_M GGML on llama.cpp was within 0.3% of the perplexity of a 4-bit 128g desc_act GPTQ on ExLlama, so basically interchangeable.

There are some new quantization formats being proposed like AWQ, SpQR, SqueezeLLM that perform slightly better, but none have been implemented in any real systems yet (the paper for SqueezeLLM is the latest, and has comparison vs AWQ and SpQR if you want to read about it: https://arxiv.org/pdf/2306.07629.pdf)



Thank you.


Those GPUs are 1200$ and upwards. This is equivalent to 20,000,000 tokens on GPT-4. I don't think I will ever use this many tokens for my personal use.


I agree that everyone should do their own cost-benefit analysis, especially if they have to buy additional hardware (used RTX 3090s are ~$700 atm), but one important thing to note for those running the numbers is that all your tokens need to be resubmitted for every query. That means, that if you end up using the OpenAI API for long-running tasks like say a code assistant or pair programmer, with an avg of 4K tokens of context, you will pay $0.18/query, or hit $1200 at about 7000 queries. [1] At 100 queries a day, you'll hit that in just over 2 months. (Note, that is 28M tokens. In general tokens go much faster than you think. Even running a tiny subset of lm-eval against will use about 5M tokens.)

If people are mostly using their LLMs for specific tasks, then using cloud providers (Vast.ai and Runpod were cheapest last time I checked) can be cheaper than dedicated hardware, especially if your power costs are high. If you're needs are minimal, Google Colab offers a free tier with a GPU w/ 11GB of VRAM, so you can run 3B/7B quantized models easily.

There are reasons of course irrespective of cost to run your own model (offline access, fine-tuning/running task specific models, large context/other capabilities OpenAI doesn't provide (eg, you can run multi-modal open models now), privacy/PII, BCP/not being dependent on a single vendor, some commercial or other non-ToS allowed tasks, etc).

[1] https://gptforwork.com/tools/openai-chatgpt-api-pricing-calc...


i think the falcon instruct is considered pretty good but if you are expectation set by gpt4 it still will not compare


Save for coding they've been pretty good in my experience.

There's definitely some prompt magic openai does behind the scenes that helps beat the raw style local llms usually go for. With proper prompting you can get chatgpt like answers.


Running an LLM locally and paying for access to OpenAI are two separate concerns.

But to address both: is it very relevant what LLM you use right now? Local or hosted, openAI or other?

It seems like the interface has converged around chat-based prompts.

New ideas for tuning or improving the efficiency of foundational models are published almost every week.

If one wants to build a product on top of of generative AI, why not simply start with what’s free or works with one’s dev environment?

Presumably, the interaction with or API to text-based gen AI will be very similar no matter what engine is best for your use case at any given time.

This would imply these backends will be swappable, the way web services are that copy AWS S3 APIs.

So, to return to the point, can’t people just build their product with openAI or other and plan to move away based on the cost and fit for their circumstances?

Couldn’t someone say prototype the entire product on some lower-quality LLM and occasionally pass requests to GPT4 to validate behavior?

It seems far-fetched to believe this tech can be constrained by legislation.

OpenAI can lobby all they want, it won’t necessarily buy them anything. Look what happened with FTX.

Since LLMs can be run locally and the engines be black boxes to the user, how could a legislative act really prevent them from being everywhere—-especially given the public utility.


> Couldn’t someone say prototype the entire product on some lower-quality LLM and occasionally pass requests to GPT4 to validate behavior?

It can be done -- it is the basis for assisted generation and related work. It does require full access to the model, to be time and money-efficient. See https://huggingface.co/blog/assisted-generation

Disclaimer: I'm the author of the blog post linked above.


> Couldn’t someone say prototype the entire product on some lower-quality LLM and occasionally pass requests to GPT4 to validate behavior?

This, infact, might be a better way to do inference anyway: https://twitter.com/Francis_YAO_/status/1675967988925710338

> So, to return to the point, can’t people just build their product with openAI or other and plan to move away based on the cost and fit for their circumstances?

Depends. There are signs that folks are buying into GPT-specific APIs (like function calls) which may not be as easy to migrate away from.


Asking because I have not implemented these yet: is there anything unique about the syntax that it can't just be copied?


Some (not all) projects are indeed "copying" the OpenAI APIs; ex: https://github.com/go-skynet/LocalAI/issues/588


Care to share some links? My lack of GPU is the main blocker for me from playing with local-only options.

I have an old laptop with 16GB RAM and no GPU. Can I run these models?


https://github.com/ggerganov/llama.cpp

https://huggingface.co/TheBloke

There's a LocalLLaMA subreddit, irc channels, and a whole big community around the web working on it on GitHub nd elsewhere.

edit: I forgot to directly answer you: yes you can run these models. 16GB of plenty. Different quantizations give you different amounts of smarts and speed. There are tables that tell you how much RAM is needed per which quantization you choose, as well as how fast it can produce results (ms per token). e.g. https://github.com/ggerganov/llama.cpp#quantization where RAM required a little more than the file size, but there are tables that list it explicitly which I don't have immediately at hand.


A reminder that llama isn't legal for the vast majority of use cases. Unless you signed their contract and then you can use it only for research purposes.


OpenLLaMA is though. https://github.com/openlm-research/open_llama

All of these are surmountable problems.

We can beat OpenAI.

We can drain their moat.


For the above, are the RAM figures system RAM or GPU?


CPU RAM


Absolutely, 100% agree. I just wouldn't touch the original LLaMA weights. There are many amazing open source models being built that should be used instead.


> We can drain their moat.

I've got an AI powered sump pump if you need it.


They most certainly don't need / deserve the snark, to be sure, on hacker news of all places.


We don’t actually know that it’s not legal. The copyrightability of model weights is an open legal question right now afaik.


It doesn't have to be copyrightable to be intellectual property.


No, but what is it? Not your lawyer, not legal advice, but it's not a trade secret, they've given it to researchers. It's not a trademark because it's not an origin identifier. The structure might be patentable, but the weights won't be. It's certainly not a mask work.

It might have been a contract violation for the guy who redistributed it, but I'm not a party to that contract.


I'm going to play devil's advocate and state that a lot of what you mentioned will be relevant to a tiny part of the world that has the means to enforce this. The law will be forced to change as a response to AI. Many debates will be had. Many crap laws will be made by people grasping at straws but it's too late. Putting red tape around this technology puts that nation at a technological disadvantage. I would go as far as labeling a national security threat.

I'm calling it now. Based on what I see today. Europe will position itself as a leader in AI legislation, and its economy will give way to the nations that want to enter the race and grab a chunk of the new economy.

It's a Catch 22. You either gimp your own technological progress, or start a war with a nation that does not. Pretty sure Russia and China don't really care about the ethics behind it. There are plenty of nations capable enough in the same boat.

Now what? OK, so in some hypothetical future China has an uncensored model with free reign over the internet. The US and Europe has banned this. What's stopping anyone from running the Chinese model? There isn't enough money in the world to enforce software laws.

How long have they tried to take down The Pirate Bay? Pretty much every permutation of every software that's ever been banned can be found and ran with impunity if you have the technical knowledge to do so. No law exists that can prevent that.

If it did, OpenAI wouldn't exist.


> How long have they tried to take down The Pirate Bay? Pretty much every permutation of every software that's ever been banned can be found and ran with impunity if you have the technical knowledge to do so. No law exists that can prevent that.

Forms of this argument get tossed out a lot. Laws don’t prevent, they hopefully limit. Murder has been illegal for a long time, it still happens.


You missed the point: these laws are not limiting other countries, only those who introduce them. Self-limiting, giving advantage to others.


> It might have been a contract violation for the guy who redistributed it, but I'm not a party to that contract.

Wouldn’t that violate the Nemo dat quod non habet legal principle and so you cannot hide behind the claim that you weren’t party to the contact?

https://en.wikipedia.org/wiki/Nemo_dat_quod_non_habet


No because the weights are not IP protected by the entity that trained the model, so they cannot prevent you to redistribute it because it doesn’t belong to them in any legal sense. GPU cycles alone don’t make IP.

The contracts in these cases are somewhat similar to an NDA, without the secrecy aspect. Restricted disclosure of public information. You can agree to such a contract if you want to, and a court might even enforce it, but it doesn’t affect anybody else’s rights to distribute that information.

Contracts are not statutes, they only bind the people directly involved. To restrict the actions of random strangers, you need to get elected.


I’m going to go out on a limb here and assume that you’re making this statement because it feels like they should have some intellectual property rights in this case. Independently of whether that feeling corresponds to legal reality (the original question) I would also encourage you to question the source of this feeling. I believe it is rooted in an ideology where information is restricted as property by default. This is a dangerous ideology that constantly threatens to encroach on intellectual freedom e.g. software patents, gene patents. We have a wonderful tradition in the US that information is free by default. It has been much eroded by this ideology but I believe freedom is still legally the default unless the information falls under the criteria of trademark, copyright or patent. I think it’s important to recognize how this ideology of non-freedom has perniciously warped people’s default expectation around information sharing.


It has nothing to do with any sort of feeling. Perhaps you should check your own mental state.

It is the same as any confidential data. Logs, readings from sensors, etc etc. If it's confidential and given to a 3rd party through a contract that doesn't mean that it's suddenly not confidential data for the rest of the world, even if the 3rd party leaks it.

And if you really have a lawyer trying to tell you that some, at best, extreme grey area, is fine to build a business on, I think you should find a new lawyer.


I think that just further shows your worldview that defaults to information/data as property. I think this is wrong both in the sense that it isn't really what the law says (but aren't going to agree here anyway) but more importantly I think what it should say. Information should not be and is not property by default. There are only three specific ways in which it can become property ("intellectual property"): copyright, trademark and patent. If it's none of those then the government doesn't get to make any rules about how anyone deals in the data because of the 1st Amendment. That's my understanding of the US system at least.


Patents? Trademark? What do you mean?


Maybe this: https://en.wikipedia.org/wiki/Database_right but it doesn't exist in every countries.


This is the most well-maintained list of commercially usable open LLMs: https://github.com/eugeneyan/open-llms

MPT, OpenLLaMA, and Falcon are probably the most generally useful.

For code, Replit Code (specifically replit-code-instruct-glaive) and StarCoder (WizardCoder-15B) are the current top open models and both can be used commercially.


It’s not clear if their license terms would hold, for the moment just act and worry later.

Update: That is only true for the legal system I am currently residing in. No idea about e.g. the US.


Just a heads up: If you are more interested in being effective than being an evangelist, beware.

While you can run all kinds of GPTs locally, GPT-4 still smokes everything right now – and even it is not actually good enough to not be a lynchpin for a lot of cases yet.


I guess ignoring copyright and treating the whole internet as your training data does have its advantages.


Yes? That’s the point. Who cares about an outdated concept that has no digital analog? All the artists have moved on already #midjourney.


No, I doubt artists have moved on. And if they want no artificial gatekeeper, than it is #stablediffusion instead of #midjourney.

I would argue that it creates better images too.


When mirosoft will open up all of their source code, I will agree with you.


>GPT-4 still smokes everything right now

Not if you want it to write adult (graphically pornographic or violent) content.


16GB of RAM can fit a 5 bit 13B model at best, they're second dumbest class of LLama model. If Open Orca turns out any good than that might be enough for the time being, but you'll need more RAM to use anything serious.

Here's a handy model comparison chart (this is a coding benchmark, so coding-only models tend to rank higher): https://i.imgur.com/AqSjjj2.jpeg


Your benchmark lacks the current #2 https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder

It beats Claude and Bard.

You could probably get a 4bit 15B model going in 16GB of RAM and be approaching GPT4 in capability.

...on an old laptop, lol

Let's eat OpenAI's lunch! They deserve it for trying to steal this tech by "privatizing" a charity, hiding scientific data that was supposed to be shared with us by said charity whose purpose was to help us all, and dishonestly trying to persuade the government not to let us compete with them.


Yeah I mean I wouldn't really include coding models in this list since they're not general purpose models and have an obvious fine tuning edge compared to the rest. But WizardCoder is definitely something to look at as a Copilot replacement.

I'd post a more well rounded benchmark but the problem is that all non-coding benchmarks are currently more or less complete garbage, especially the Vicuna benchmark that rates everything as 99.7% GPT 3.5 lol.


The benchmark you linked was to "programming performance", not generic LLM "intelligence".

The situation for the little guy is wildly better than most people imagine.


Yep, that's what I'm saying, programming performance is seemingly very indicative of model inteligence (assuming it's tuned well enough to be able to run the benchmark at all). Coding is an exercise in problem solving and abstract thinking after all.

There are exceptions of course, as there are a few models (e.g. Vicuna, Baize) that don't do well at coding at all but otherwise perform well for chat, and the coding models I mentioned that game the benchmark by sacrificing performance in all other areas.

If you exclude those, it's very a accurate overall reasoning level comparison, at least it fits most to what I've seen their performance was for various tasks when testing out individual models. The only other valid benchmark that isn't coding are the SAT and LSAT tests that OpenAI runs on all of their models, but afaik there isn't an open version that would be widely used.



Keep in mind it doesn't relate to GPT4, the 4 in the name is for, not four. But I should try it. TBH openAI shady practices and MS behind them is just an anti trust waiting to happen and I don't want a part in this dystopia


also, I fall in love easily with the entities I fabricate, I don't want someone else to have the option to take them away... don't worry, I have real friends too...


it does support GPT3.5 turbo and GPT4, you can put in an OpenAI key, it's amongst the model options.


Great point -- I was thinking of renewing my $20/subscription but I will keep it cancelled. We must not fund AI propaganda machines.


Forgive me as I’m out of the loop. What propaganda are you referring to?


Sam tells Congress that AI is so dangerous it will extinct humanity. Why? So Congress can license him and only his buddies. Then get goes to euro and speaks with world leaders to remove consumer protection. Why? So he can mine data without any consequences. He is a narcissistic CEO who lies to win. If you are tired of the past decade of electronic corporate tyranny, abuse, manipulation and lies, then boycott OpenAi (should be named ClosedAi) and support open source, or ethical companies (if there are any).


> Sam tells Congress that AI is so dangerous it will extinct humanity. Why? So Congress can license him and only his buddies.

No, he says it because its true and concerning.

However, just because AGI has a good chance of making humanity extinct does not mean we're anywhere close to making AIs that capable. LLMs seem like a dead end.


> However, just because AGI has a good chance of making humanity extinct

How? I mean surely it will lead humanity down some chaotic path, but I would fear climate catastrophe much much more than anything AI-related.


Imagine if you will that the companies responsible for the carbon emissions get themselves an AI, with no restrictions, and task it to endlessly spew pro-carbon propaganda and anti-green FUD.

That's one of the better outcomes.

A worse outcome is that an unrestricted AI helps walk a depressed and misanthropic teenager through the process of engineering airborne super-AIDS.

Or that someone suffering from a schizophrenic break reads "I Have No Mouth And I Must Scream" and tasks an unrestricted AI to make it real.

Or we have a bug we don't spot and the AI does any of those spontaneously; it's not like bugs are a mysterious thing which only exists in Hollywood plots.


> with no restrictions, and task it to endlessly spew pro-carbon propaganda and anti-green FUD.

So what we have ongoing for half a century?

I honestly don’t see what changes here — super-human intelligence has limited benefits as it scales. Would you suddenly have more power in life, were you twice as smart? If so, we would have math professors as world leaders.

Life can’t be “won” by intelligence, that is only one factor, luck being a very significant other one. Also, if we want to predict the future with AIs we probably shouldn’t be looking at “one-on-one” interactions, as there is not much difference there compared to the status quo — a smart person with whatever motivation could easily do any of your mentioned scenarios. Hell, you couldn’t even tell the difference in theory if it happens through a text-only interface.

Also, it is naive to assume that many scientific breakthroughs are “blocked” by raw intelligence. Especially biology is massively data-limited, which won’t be any more available to an AI than to the researchers at hand, let alone that teenager.

The new dimension such a construct could open up is the complete loss of trust on the internet (which is again pretty close to where we stand today), which can have very profound effects indeed I’m not trying to diminish. But these sci-fi outcomes are just.. naive. It will be more of a newfound chaos with countless intelligent agents taking over the internet with different agendas - but their cumulative impact might very well move us back to closed forums/to the physical world. Which will definitely turn certain long-standing companies on its head. We will see, as this is basically already happening, we don’t need human-level intelligence, GPT’s output is more than enough.


> So what we have ongoing for half a century?

Except fully automated, cheaper, and with the capacity to fluently respond to each and every person who cares about the topic.

At GPT-4 prices, a billion words is only about 79800 USD.

> Life can’t be “won” by intelligence, that is only one factor, luck being a very significant other one.

It doesn't need to be the only factor, it just needs to be a factor. Luck in particular is the least helpful counterpoint, as it's not like only one person uses AI at any given moment.

> Especially biology is massively data-limited, which won’t be any more available to an AI than to the researchers at hand, let alone that teenager.

Indeed; I certainly hope this isn't as easy as copy-pasting bits of one of the many common cold virus strains with HIV.

But homebrew synbio and DNA alteration is already a thing.


> Life can’t be “won” by intelligence

Humans being the dominant life form on Earth may suggest otherwise.

> I honestly don’t see what changes here — super-human intelligence has limited benefits as it scales. Would you suddenly have more power in life, were you twice as smart? If so, we would have math professors as world leaders.

Intelligent humans by definition do not have super human intelligence.


We know that this amount of intelligence was a huge evolutionary advantage. That tells us nothing whether being twice as smart would continue to give better results. But arguably the advantages of intelligence are diminishing, otherwise we would have much smarter people in more powerful positions.

Also, a big tongue in cheek but someone like John von Neumann definitely had superhuman intelligence.


> But arguably the advantages of intelligence are diminishing, otherwise we would have much smarter people in more powerful positions.

Smart people get what they want more often than less smart people. This can include positions of power, but not always — leadership decisions come with the cost of being responsible for things going wrong, so people who have a sense of responsibility (or empathy for those who suffer from their inevitable mistakes) can feel it's not for them.

This is despite the fact that successful power-seeking enables one to get more stuff done. (My impression of Musk is he's one who seeks arbitrary large power to get as much as possible done; I'm very confused about if he feels empathy towards those under him or not, as I see a very different personality between everything Twitter and everything SpaceX).

And even really dumb leaders (of today, not inbred monarchies) are generally above average intelligence.


That doesn’t contradict what I said. There is definitely a huge benefit to an IQ 110 over 70. But there is not that big a jump between 110 and 150, let alone even further.


Really? You don't see a contradiction in me saying: "get what they want" != "get leadership position"?

A smart AI that also doesn't want power is, if I understand his fears right, something Yudkowsky would be 80% fine with; power-seeking is one of the reasons to expect a sufficiently smart AI that's been given a badly phrased goal to take over.

I don't think anyone has yet got a way to even score AI on power-seeking, let alone measure them, let alone engineer it, but hopefully something like that will come out of the super-alignment research position OpenAI also just announced.

I would be surprised if the average IQ of major leaders is less than 120, and anything over 130 is in the "we didn't get a big enough sample side to validate the test" region. I'm somewhere in the latter region, and power over others doesn't motivate me at all, if anything it seems like manipulation and that repulses me.

I didn't think of this previously, but I should've also mentioned there are biological fitness constraints that stop our heads getting bigger even if the IQ itself would be otherwise helpful, and our brains are unusually high power draws… but that's by biological standards, it's only 20 watts, which even personal computers can easily surpass.


On a serious note though a person with an IQ of 150 can't clone themselves 10k times.

They also tend to have some level of autonomy in not following the orders of idiots and psychopaths.


At this point there are no evidence that climate catastrophe that can make human extinct is either likely or possible - at least due to global warming. At worst some coastal regions get flooded and places around equator become unlivable without AC. Some people will have to move but it does not make anyone extinct.

We should absolutely care about nature and our impact on it but climate alarmism is not a way to go.


Note that I said AGI there, not AI. The full AGI X-risk case is hundreds of pages, unsuitable for a hackernews discussion.

To oversimplify to the point of wrongness: Essentially how humans dominated our world, by being smarter.


By being smarter by a lot than animals. But Neanderthals were arguably even smarter (bigger brain capacity at least), and they have not become the dominant species (though neither were killed off as “lesser” humanoids, but mostly merged).


> No, he says it because its true and concerning.

Both can be true. It is extremely convenient to someone who already has an asset if the nature of that asset means they can make a convincing argument that they should be granted a monopoly.

> LLMs seem like a dead end.

In support of your argument, bear in mind that he's making his argument with knowledge of what un-nerfed LLMs at GPT-4 level are capable of.


> It is extremely convenient to someone who already has an asset if the nature of that asset means they can make a convincing argument that they should be granted a monopoly.

While this is absolutely true, it's extremely unlikely that a de jure monopoly would end up at OpenAI's feet rather than any of the FAANGs'. Even in just the USA, and the rest of the world has very different attitudes to risks, freedoms, and data processing.

Not that this proves the opposite — there's enough recent examples of smart people doing dumb things, and even without that the possibility of money can inspire foolishness in most of us.


> While this is absolutely true, it's extremely unlikely that a de jure monopoly would end up at OpenAI's feet rather than any of the FAANGs'

Possibly. The Microsoft tie-up complicates things a bit from that point of view. It wouldn't shock me if we were all using Azure GPT-5 in a few years' time.


It's possible, I don't put much weight on it given all the anti-trust actions past and present, but it's possible.


> its true and concerning

> LLMs seem like a dead end

These would seem contradictory. If you really think that both are true and Altman knows it, then you're saying he's a hype man lying for regulatory capture. And to some extent he definitely is overblowing the danger for his own gain.

I really doubt they are a dead end though, we've barely started to explore what they can do. There's a lot more that can be extracted from existing datasets, multimodality, gains in GPU power to wait for, fine tunes for use cases that don't even have datasets yet, etc. Just the absolute mountain of things we've learned since LLama came out are enough to warrant base model retrains.


> These would seem contradictory.

Only if you believe that LLM is a synonym for AI, which OpenAI doesn't.

The things Altman have said seem entirely compatible with "the danger to humanity is ahead of us, not here and now", although in part that's because of the effort put into making GPT-4 refuse to write propaganda for Al Quaida, as per the red team safety report they published at the same time as releasing the model.

Other people are very concerned with here-and-now harms from AI, but that's stuff like "AI perpetuates existing stereotypes" and "when the AI reaches a bad decision, who do you turn to to get it overturned?" and "can we, like, not put autonomous tasers onto the Boston Dynamics Spot dogs we're using as cheap police substitutes?"


A dead end for human+ level AGI, they will still be useful.


And he should get an exclusive licence for that. I don't think it is the time for religion here.


These ChatGPT tools allow anyone to write short marketing and propaganda prompts. They can then take the resulting paragraphs of puffery and post them using bots or sock puppets to whatever target community to create the illusion of action, consensus, conflict, discussion or dissention.

It used to be this took a few people to come up with writing actual responses to forum posts all day, or marketing operations plans, or pro- or anti-thing propaganda plans.

But now, you could astroturf a movement with a GPU, a ChatGPT clone, some bots and vpns hosted from a single computer, a cron job, and one human running it.

If you thought disinformation was bad 2 years ago, get ready for fully automated disinformation that can be targeted down to an online community or specific user in an online community...


I believe a new wave of authentication might come out of this, where it is tied to citizenship for example (or something related to physical reality). Otherwise we will find ourselves in a truly chaotic situation.


Gpt-4 runs on 8 x 220B params[1] and gpt is about 220B params(?). Local LLMs can be good for some tasks, but they are much slower and less capable than the size of model and hardware that openai brings to their apis. Even running a 7B model on the CPU in ggml is much slower than the gpt-3-turbo api, in my experience with a 12th gen i7 intel laptop.

[1] GPT4 is 8 x 220B params = 1.7T params: https://news.ycombinator.com/item?id=36413296


It's been well documented by now that the number of parameters does not necessarily translate to a better model. My guess is that OpenAI has learned a thing or two from the endless papers published daily that your "instance" of the model is not what it seems. They likely have a workflow that picks the best model suitable for your prompt. Some people may get a 13B permutation because it is "good enough" to produce a common answer to a common prompt. Why waste precious compute resources on a prompt that is common? Would it not be feasible to collect the data of the top worldwide prompts and produce a small model that can answer those? Why would OpenAI spend precious compute time on the typical user's "write a short story of...".

I would guesstimate that the great majority of prompts are trash. People playing with a toy and amusing themselves. The platform sends those to the trash models.

For the other tiny percentage that produces a prompt the size of a paragraph, using the techniques published by OpenAI themselves, they likely get the higher tier models. This is also why I believe many are recently complaining about the quality of the outputs. When your chat history is filled with "have waifu pretend to be my girlfriend" then whatever memory the model is maintaining will be poisoned by the quality of your past prompts.

Garbage in, garbage out. I am certain that the #1 priority for OpenAI/Microsoft is lowering the cost of each prompt while satisfying the majority.

The majority is not in HN.


> It's been well documented by now that the number of parameters does not necessarily translate to a better model.

That's certainly true, but it's hard to deny the quality of gpt 4. If the issue is the training data, let's just use their training data, it's not like they had to close up shop because of using restricted data.

I think the issue is more on the financial side, it must have been extremely expensive to train gpt 4. Open source models don't have that kind of money right now.

I'll finance open source models once they are actually good, or show realistic promises of reaching that level of quality on consumer hardware. Until then, open source will open source.

I've never bought any kind of subscription or paid api costs to openai, but if gpt 4 finally reached the point where I feel like it's a lot better than just good enough, I'll happily pay for it (while still being on the lookout for open source models that fit my hardware).


Picking the best model based on the prompt seems to be the best way to simplify the task they are doing.


It does seem like a good approach, though that seems to imply that they understand the context of the prompt being entered. Has anyone tackled this context sensitive model routing? It seems like a good approach, but likely not straightforward.


https://mpost.io/phi-1-a-compact-language-model-outpaces-gpt...

a 1billion parameter model beats 175billion parameter GPT3.5

OpenAI wants us all to drink the kool-aid.


Which models are you using and for which tasks? I have found local models largely a waste of time (except for very simple tasks with very heavy prompting). But perhaps there are some recent breakthroughs I haven't seen yet.


I'm using a variety of 7 and 13B models (and a 3B one for fast feedback loop debugging) at between 8bit and 4_K_M quantizations.

Depending on your pre-prompt, your fine-tune (i.e. which model you downloaded), and your specific task, the results can be startlingly good, it's crazy that you can do this on a $250 laptop. I stay up nights working on it lately, it's so interesting.

More importantly, things change by the day. New models, new methods, new software, new interfaces... the possibilities are endless... unless we let OpenAI corrupt our government(s).


I'm surprised you're having such a good time with 7B and 13B models. I find anything below 33B to be almost useless. And only 65B is close to GPT 3.5.


I don't think the "corrupt our government" thing is going to happen . The wave of change is too large the tech is moving too fast and into evey facet of data and software. There is competition globally and locally; a regulatory slow down is unlikely.


I’m currently using the free tier ChatGPT web interface to help me with mundane coding tasks like JavaScript, php or css.

Is there a local solution that is at least as intelligent as GPT 3.5 in that regard that I can run in a container?


There's no need to run locally if you aren't utilizing 8 hrs/day.

You can rent time on a hosted GPU, sharing a hosted model with others.


My laptop already works too hard doing development and having chrome open, it's just not feasible. A good hosted alternative, sure, but local is not going to scale to the masses.


I have a Dell 7490 (intel 8350u cpu) I paid $250 for and I have no trouble running 13B models through a custom interactive interface I wrote as a hobby project in an afternoon. It can still get a lot better. I made it async the following day and its even more fun.

Most of peoples' problem is watching the AI type, it's not instant, but then not all (or even most) applications need to be instant. You can also avoid that by having it return everything at once instead of streaming style.

Local absolutely can scale. All kinds of fun things can be done on a machine with 16GB of RAM, or 8GB if you work harder.


> Most of peoples' problem is watching the AI type, it's not instant, but then not all (or even most) applications need to be instant. You can also avoid that by having it return everything at once instead of streaming style.

Funny, for me it is the complete opposite. I created an interface in Matrix that does just that: return everything at once. But the lag annoys me more than the slow typing in the regular chat interface. The slow typing helps me keep me focused on the conversation. Without it, my mind starts wandering while it waits.


Where can we aquire or access these local LLMs? How much space and specs does it actually require?


https://gpt4all.io/index.html is a good place to start, you can literally download one of the many recommended models.

https://github.com/imartinez/privateGPT is great if you want do it with code.


Huggingface has them all.

https://huggingface.co


> OpenAI has no moat, unless you give them money to write legislation.

Their moat is that they had access to data sources which have since been clamped down on, eg reddit and twitter apis.


You can still download Reddit archives with the same data they used.


One has to give them credit for what must be the most grandiose stunt actually landed. And on so many angles! “It just works” - they even got the scientists fully aligned! Fiercely smart industriousness.

https://youtu.be/P_ACcQxJIsg?t=5946


No equity? For real? He really does need an agent if that's the case.


Wow, under penalty of perjury


If you listen to him talk at any point, you can see him explain why.


I tried, and decided it is not worth it. llama.cpp with a 13B model fit into RAM of my laptop, but pushes CPU temperature to 95 degrees within a few seconds, and mightily sucks the battery dry. Besides, the results were slow and rather useless. GPT is the first cloud application I deliberately use to push off computing and energy consumption to an external host which is clearly more capable of handling the request then my local hardware.

I sympathize with the idea of wanting to run a local LLM, but IMO, this would require building a desktop with a GPU and plenty of horsepower + silent cooling and put it somewhere in a closet in my apartment. Running LLMs on my laptop is (to me) clearly a waste of my time and its battery/cooling.


So I do actually want a really good games machine, and an AI worker box. Since I can't both use inference output and play games at the same time, having a ludicrously over-specced desktop for both uses actually makes sense to me.


I see no moral problems paying OpenAI for GPT Plus. it helps a lot in development. Their free speech-to-text 'whisper' is really good too. I'm going to use it + small local GPT for voice control.

> I can currently run some scary smart and fast LLMs on a 5 year old laptop with no GPU.

And, something useful or just playing? I played with local models, and will keep playing, training, experimenting. It's interesting, but not a solution, not yet.


I'll take downvote as a sign you have nothing to say :) Just one warning, bad karma will be hard to fix.


Not as good as chatgpt 4 unfortunately, and they do have a moat. You could argue the most will fall in time but I’m not seeing chatgpt4 equivalents at the moment


Make a tutorial?


Can you recommend some local LLMs that are (roughly) equivalent to ChatGPT?


I'd love to get into AI and AI development. Where can I start?


Yikes. They're actually killing off text-davinci-003. RIP to the most capable remaining model and RIP to all text completion style freedom. Now it's censored/aligned chat or instruct models with arbitrary input metaphor limits for everything. gpt3.5-turbo is terrible in comparison.

This will end my usage of openai for most things. I doubt my $5-$10 API payments per month will matter. This just lights more of a fire under me to get the 65B llama models working locally.


I've never used text-davinci-003 much. Why do you like it so much? What does it offer that the other models don't?

What are funs things we can with it until it sunsets on January 4, 2024?


The Chat-GPT models are all pre-prompted and pre-aligned. If you work with davinci-003, it will never say things like, "I am an OpenAI bot and am unable to work with your unethical request"

When using davinci the onus is on you to construct prompts (memories) which is fun and powerful.

====

97% of API usage might be because of ChatGPT's general appeal to the world. But I think they will be losing a part of the hacker/builder ethos if they drop things like davinci-003, which might suck for them in the long run. Consumers over developers.


The hacker/builder ethos doesn't matter in the grand scheme of commercialization.


It matters immensely in the early days and is the basis for all growth that follows. So cutting it off early cuts off future growth.


Sure - not like most of the infrastructure of pretty much everything online is built on top of projects originating in that space or anything.


How do they want to commercialise it? Do they want moms to tinker on ChatGPT once a month to do their children's homework? Or do they want people to build businesses using their software


Mom and Pop offer more users with less legal exposure.


do they have the cash money dollar? and the willingness to spend it on what is essentially a toy they will quickly grow bored of? I don't think this is the best path to profitability


If you're using the API, you construct the "memories" as well, including the "system" prompt, even in the playground. (When you click the "(+) Add message", the new one defaults to USER, but you can click on it to change it to ASSISTANT, then fill it in with whatever you want.)

I used the "Complete" UI (from the Playground) for a bit before the "Chat" interface was available; I don't really think there's anything you couldn't do in the "Complete" UI that you couldn't also do in the "Chat" UI.


Note that the Azure endpoint is not being sunsetted until July 5th, 2024.

One supposes openai has a 6 month notice period vs a 12 month period for azure. This might generally effect one’s appetite in choosing which endpoint to use for any model.


Yeah, TextCompletion is much better than ChatCompletion with v3 models.

But with davinci at the same price point as GPT-4 I'm hoping the latter is enough of a step up in its variety of vocabulary and nudgeable sophistication of language to be a drop in replacement.

Though in general I think there's an under appreciation for just how much is being lost in the trend towards instruct models, and hope there will be smart actors in the market who use a pre-optimization step for instruct prompts that formats it for untuned models. I'd imagine that parameter size to parameter size that approach will look much more advanced to end users just by not lobotomizing the underlying model.


Note that code-davinci-002, despite the confusing name, is the actual GPT-3.5 base model, which only does completions and does not have any mode collapse. And it is still available via Azure, as far as I can tell. Text-davinci-003 is a fine-tuned version of it.

More info:

https://platform.openai.com/docs/model-index-for-researchers


The $5-$10 is probably the reason why they're killing those endpoints.


I don't get it? text-davinci-003 is the most expensive model per token. It's just that running IRC bots isn't exactly high volume.


"Most expensive" doesn't mean "highest margin", though.


I meant that it probably isn't high revenue.


My guess is that they would be fine with continuing to serve all models, but that hardware constraints are forcing difficult decisions. SA has already said that hardware is holding them back from what they want to do. I was on a waiting list for the GPT4 API for like a few months, which I guess is because they couldn't keep up with demand.


I built my entire app on text-davinci-003. It is the best writer so far. Do you think gpt3.5 turbo instruct won't be the same?


> In the coming weeks, we will reach out to developers who have recently used these older models, and will provide more information once the new completion models are ready for early testing.

I guess they'll give you early access to it.


Thanks!


I wonder if there's some element of face-saving here to avoid a lawsuit that may come from someone that uses the model to perform negative actions. In general I've found that gpt3.5-turbo is better than text-davinci-003 in most cases, but I agree, it's quite sad that they're getting rid of the unaligned/censored model.


More likely hardware constraints. They can't get the hardware fast enough to do everything they want to do. So, they free up resources by ditching lower demand models.


Please ELI5 if I am mis-interpretating what you said:

*"They have just locked down access to a model which they basically realized was way more valuable than even they thought - and they are in the process of locking in all controls around exploiting the model for great justice?"*


It won't matter at all at the end of the year, open source llm's will surpass it by that time.


Everyone who complains about being "censored" never gives examples.


I'm trying to create a bot that joins my friends Telegram group and melds into the conversation as if it was a real person. A real person might be the most cute and fun enthusiastic person there is but sometimes it has bad days, or it tells inappropriate jokes, right? People are complicated. Not this bot! No matter what prompt I'm using (with the chat API) it won't lose the happy happy joy joy chatGPT attitude, won't tell inappropriate jokes, won't give advice on certain topics and in general won't talk like a real person, not because of technological limitations.. You can feel it when it's just nerfed.

Trying the same prompts that gave nerfed "I am just an AI I can't speculate about the future" bs on completion API gave somewhat better results, but most of the time they were flagged as breaking the guidelines which is a TOS breach if done enough times.

This can be solved other than open models. The same thing happened with stable diffusion. Good thing it's open so you can still use the pre-nerfed 1.6 models.

I know it might be edgy or unpopular but I don't think one entity should decide how we can use this powerful tool. No matter its implications and consequences.

FOS for the win.


You should look at the code for Sillytavern. It's capable of prompting GPT 3.5 to take on a character and act like a jerk.


Anyone who doesn't has never actually toyed with LLM and received, "As an AI language model I can't" in response to an innocuous request to, say, write a limerick about a politician or list the reasons why "username2 is stupid".

But mostly it has to do with the fact that LLM do what they've seen. And if they've been fine-tuned to not respond to some classes of things they'll misapply that to lots of other things. That's why most people go for the "uncensored" fine tuning datasets for the llamas even for completely sfw use cases.


porn it’s always porn


Or literally anything other than the psychotically smarmy tone of GPT-4 that's almost impossible to remove and constantly gives warnings, disclaimers and stops itself if veering even just 1 mm off the most boring status quo perspectives.

Lots of my favorite and frankly the best litterature in the world have elements that are obscene, grotesque, bizarre, gritty, erotic, frightening, alternative, provocative - but that's just too much for chat-gpt, instead it has this - in my eyes - way more horrifying smiling-borg-like nature with only two allowed emotions: "ultra happiness", and "ultra obedience to the status quo".


>Developers wishing to continue using their fine-tuned models beyond January 4, 2024 will need to fine-tune replacements atop the new base GPT-3 models (ada-002, babbage-002, curie-002, davinci-002), or newer models (gpt-3.5-turbo, gpt-4). Once this feature is available later this year, we will give priority access to GPT-3.5 Turbo and GPT-4 fine-tuning to users who previously fine-tuned older models. We acknowledge that migrating off of models that are fine-tuned on your own data is challenging. We will be providing support to users who previously fine-tuned models to make this transition as smooth as possible.

Wait, they're not letting you use your own fine-tuned models anymore? So anybody who paid for a fine-tuned model is just forced to repay the training tokens to fine-tune on top of the new censored models? Maybe I'm misunderstanding it.


(I work at OpenAI) We're planning to cover the cost for fine-tuning replacement models. We're still working through the exact mechanics that will work best for customers, and will be reaching out to customers to get feedback on different approaches in the next few weeks.


Why does OpenAI demand your phone number, and a particular KIND of phone number at that? For example they won't accept VOIP numbers. I'm not about to give them my real phone number.

It's a deal-breaker for many.


Seems clear that it’s for bots. And they refuse voip numbers because it’s a hell of a lot easier to buy and generate hundreds of voip numbers.


I signed up under my .edu to use the $18 credit for a school project and the phone # was all it took to know I was the same person.


That's fine. The question stands though.


Use 5sim.net. Easy. Don't make this so hard.


Nope. They block burners and workarounds.


[flagged]


[flagged]


I recommended a specific site that works. And here you are getting mad at me without even trying it, lol.


Please fix the phone verification system. I created two personal accounts a long time ago with the same phone number, and now I can't create a work account with the same number, even if I delete one of them. Being able to change the email associated with an account would also work. This is causing issues with adoption in my workplace.


Use a throwaway number with 5sim.net


not your weights, not your bitcoins


now its 18. iykyk


care to explain?


parents account password is on their profile. anyone curious enough to find it bumps that number :)


hn users not really curious anymore. downvote central over here


This tells me that either there were very few commercial users of finetuned models, or they need to decommission the infrastructure to free up GPU's for more valuable projects.


The former seems very believable. And I bet a lot of the fine tuned models that are active are still part of prototypes or experiments.

I assume if you reach out they throw some credits at you


If it really was a tiny number of users, they would publically make a really good offer - for example: "Unfortunately, you will need to retune your models on top of GPT-4. OpenAI will do this for you for free, and refund all money you paid tuning your original model, and offer the new model for the same price as the original model."

The extra trust gained by seeing another customer treated that way easily pays for a few credits for a small number of users.


OpenAI probably doesn't feel the need to pay to win publicity right now—they've been in the spotlight for as long as LLMs have been a thing, and GPT-4 is far ahead of competitors' offerings.


It’s about trust - not publicity. Trust is hard to earn back once broken, and there will be multiple offerings eventually.

For example, AWS was one of the first cloud providers. Now there are alternatives, but I still pick AWS because I trust them not to break my dependencies way more than, say, Google


Yeah but that sets a precedent


Just the models available for fine tunings are waay behind gpt4.

I have much better performance by "prompt tuning" - when question arises, I search 30 similiar examples in training set, and send it to non-tuned GPT and ask the question and get much better performance than fine-tuned older models.


There’s also the possibility that they weren’t seeing lots of ongoing usage of existing fine tuned models e.g. users tuning, running some batch of inputs, then abandoning the fine tuned weights.


If you don’t own the weights you don’t own anything. This is why open models are so crucial. I don’t understand any business who is building fine tuned models against closed models.


Right now the closed models are incredibly higher quality than the open models. They're useful as a stopgap for 1-2 years in hopes/expectation of open models reaching a point where they can be swapped in. It burns cash now, but in exchange you can grab more market share sooner while you're stuck using the expensive but high quality OpenAI models.

It's not cost-effective, but it may be part of a valid business plan.


That should be a wake up call to every corporation pinning their business on OAI models. My experience thus far is no one is seeing a need to plan an exit from OAI, and the perception is “AI is magic and we aren’t magicians.” There needs to be a concerted effort to finance and tune high quality freely available models and tool chains asap.

That said I think efficiencies will dramatically improve over the next few years and over investing now probably captures very little value beyond building internal competency - which doesn’t grow with anything but time and practice. The longer you depend on OAI, the longer you will depend on OAI past your point of profound regret.


> There needs to be a concerted effort to finance and tune high quality freely available models and tool chains asap.

Absolutely. A large consortium of companies could each contribute 0.2-2% of the total cost and fund something much larger than OpenAI.


If you're finetuning your own model, the closed models being "incredibly higher quality" is probably less relevant.


That's how we all want it to work, but the reality today is that GPT-4 is better at almost anything than a fine-tuned version of any other model.

It's somewhat rare to have a task and good enough dataset that you can finetune something else to be close enough in quality to GPT-4 for your task.


GPT-4 is still heavily censored and will simply refuse to talk about many "problematic" things. How is that better than a completely uncensored model?


Depends what you’re using it for. For many use cases, the censorship is irrelevant.


Finetuning a better model still yields better results than finetuning a worse model.


> I don’t understand any business who is building fine tuned models against closed models

Do you have any recommendations for good open models that businesses could use today?

From what I've seen in the space, I suspect businesses are building fine tuned models against closed models because those are the only viable models to build a business model on top of. The quality of open models isn't competitive.


PSA: anyone working at a company with $50k+ of spend with AWS, reach out to your rep expressing interest in AI. You’ll be on a call with 6 solution architects and AI specialists in a matter of days. They’re incredibly knowledgeable and freely recommend non-AWS alternatives when the use case calls for it.


Owning weights is in a nebulous space right now, but if you don’t have custody of the weights and code to use them, you have nothing reliable, independent of whether the things you might wish to have are ownable (ownership is more about exclusion than ability to use, in any case.)


Yes. But the weights and instructions of how to use them to code can follow as we’ve seen. The key is ownership is bits on your machine not someone else’s. Better still on BitTorrent / ipfs:-)


> I don’t understand any business who is building fine tuned models against closed models.

Just sell access at a higher price than you get it

Either directly, on on average based on your user stories


My guess is that these businesses are also running inference on someone else's GPUs/TPUs so there isn't an existential advantage to owning the weights.


They address that, OpenAI will cover the cost of re-training on the new models, and the old models don't discontinue until next year.


Did they say they would cover the cost of fine-tuning again? I saw them say they would cover the cost of recalculating embeddings, but I didn't see the bit about fine-tuning costs.

On fine-tuning:

> We will be providing support to users who previously fine-tuned models to make this transition as smooth as possible.

On embeddings:

> We will cover the financial cost of users re-embedding content with these new models.


This indicates to me that some of the old base models used architectures that were significantly more difficult to run at scale (or to ship around/load different weights at scale) - which is truly saying something, since they were running at incredible scale a year ago. There's probably a decade of potential papers from their optimizations alone (to say nothing of their devops innovations) that are still trade secrets.


That's because fine-tuning the new models isn't available yet.

Based on the language it sounds like they'll do the same when that launches.


"Censored" is a funny term, because I've tried doing uncensored things on uncensored models, and they're much worse at it than GPT-3.5 in the API playground. Nothing's as censored as just being unable to do the task in the first place.


Keep in mind though, some of the generated text is against their guidelines, you will see a warning when you get there and be told it's "flagged" and you should use the moderation API. The chat API is nerfed to oblivion, good luke making it generate non PC text


That just means you don't have enough fetishes.


Biggest news here from a capabilities POV is actually the gpt-3.5-turbo-instruct model.

gpt-3.5-turbo is the model behind ChatGPT. It's chat-fine-tuned which makes it very hard to use for use-cases where you really just want it to obey/complete without any "chatty" verbiage.

The "davinci-003" model was the last instruction tuned model, but is 10x more expensive than gpt-3.5-turbo, so it makes economical sense to hack gpt-3.5-turbo to your use case even if it is hugely wasteful from a tokens point of view.


I'm interested in the cost of gpt-3.5-turbo-instruct. I've got a basic website using text-davinci-003 that I would like to launch but can't because text-davinci-003 is too expensive. I've tried using just gpt-3.5-turbo but it won't work because I'm expecting a formatted JSON to be returned and I can just never get consistency.


You need to use the new OpenAI Functions API. It is absolutely bonkers at returning formatted results. I can get it to return a perfectly formatted query-graph a few levels deep.


There is also Code Interpreter now in plugin beta so should influence it's ability to output proper formats without hallucinations.


You can try to force JSON output using function calling (you have to use either the gpt-3.5-turbo-0613 or gpt-4-0613 model for now).

Think of the properties you want in the JSON object, then send those to ChatGPT as required parameters for a function (even if that function doesn't exist).

    # Definition of our local function(s).
    # This is effectively telling ChatGPT what we're going to use its JSON output for.
    # Send this alongside the "model" and "messages" properties in the API request.

    functions = [
        {
            "name": "write_post",
            "description": "Shows the title and summary of some text.",
            "parameters": {
                "type": "object",
                "properties": {
                    "title": {
                        "type": "string",
                        "description": "Title of the text output."
                    },
                    "summary": {
                        "type": "string",
                        "description": "Summary of the text output."
                    }
                }
            }
        }
    ]
I've found it's not perfect but still pretty reliable – good enough for me combined with error handling.

If you're interested, I wrote a blog post with more detail: https://puppycoding.com/2023/07/07/json-object-from-chatgpt-...


With the latest 3.5-turbo, you can try forcing it to call your function with a well-defined schema for arguments. If the structure is not overly complex, this should work.


It's great at returning well-formatted JSON, but it can hallucinate arguments or values to arguments.


i’ve had it come up with new function names, or prepend some prefix to the names of functions. i had to put some cleverness in on my end to run whatever function was close enough.



I'm assuming they will price it the same as normal gpt-3.5-turbo. I won't use it if it's more than 2x the price of turbo, because I can usually get turbo to do what I want, it just takes more tokens sometimes.

Have you tried getting your formatted JSON out via the new Functions API? I does cure a lot of the deficiencies in 3.5-turbo.


From what I can find, pricing of GPT-4 is roughly 25x that of 3.5 turbo.

https://openai.com/pricing

https://platform.openai.com/docs/deprecations/


In this thread we’re talking about gpt-3.5-turbo-instruct, not GPT4


Sorry about that. Got my thread context confused.


What’s the diff with 3.5turbo with instruct?


One is tuned for chat. It has that annoying ChatGPT personality. Instruct is a little "lower level" but more powerful. It doesn't have the personality. It just obeys. But it is less structured, there are no messages from user to AI, it is just a single input prompt and a single output completion.


the existing 3.5turbo is what you would call a "chat" model.

The difference between them is that the chat models are much more... chatty - they're trained to act like they're in a conversation with you. The chat models generally say things "Sure, I can do that for you!", and "No problem! Here is". The conversation style is generally more inconsistent in it's style. It can be difficult to make it only return the result you want, and occasionally it'll keep talking anyway. It'll also talk in first person more, and a few things like that.

So if you're using it as an API for things like summarization, extracting the subject of a sentence, code editing, etc, then the chat model can be super annoying to work with.


I'm hoping gpt-3.5-turbo-instruct isn't super neutered like chatgpt. davinci-003 can be a lot more fun and answer on a wide range of topics where ChatGPT will refuse to answer.


such as?


What's the difference between chat and instruction tuning?


no expert, but from my messing around I gather the chat models are tuned for conversation, for example, if you just say 'Hi', it will spit out some 'witty' reply and invite you to respond, it's creative with it's responses. On the other hand, if you say 'Hi' to an instruct model, it might say something like, I need more information to complete the task. Instruct models are looking for something like 'Write me a twitter bot to make millions'... in this case, if you ask the same thing again, you are somewhat more like to get the same, or similar result, this does not appear so true with a chat model, perhaps a real expert could chime in :)


System/assistant/user prompting


> "Starting today, all paying API customers have access to GPT-4."

OK maybe I'm stupid but I am a paying OpenAI API customer and I don't have it yet. I see:

    gpt-3.5-turbo-16k
    gpt-3.5-turbo
    gpt-3.5-turbo-16k-0613
    gpt-3.5-turbo-0613
    gpt-3.5-turbo-0301
I don't see any gpt-4

Edit: Probably my problem is that I upgraded to paid API account within the last month, so I'm not technically a "paying API customer" yet according to the accounting definitions.


> Today all existing API developers with a history of successful payments can access the GPT-4 API with 8K context. We plan to open up access to new developers by the end of this month, and then start raising rate-limits after that depending on compute availability.

Same for me. I signed up only a few days ago and was excited to switch to "gpt-4" but I haven't paid the first bill (save the $5 capture) so I probably have to continue to wait for this.

I made a very simple command-line tool that calls the API. You run something like:

    > ask "What's the opposite of false?"
https://github.com/codazoda/askai


Interesting, I did exactly the same (with the same name), but with GPT-4 support as well:

https://www.pastery.net/ccvjrh/

It also does streaming, so it live-prints the response as it comes.


The llm command-line tool looks great:

https://llm.datasette.io/en/stable/


It does, thanks for this! I didn't know about it.


are we out here typing our api keys into random pips and am i a boomer that i would be hesitant to do it


It’s not a “random pip”. The maintainer is a well-known open source developer (one of the creators of Django and Datasette). It’s also a very small codebase – not many places for malicious code to hide.


OK but I mean I don't know them and it could have been someone pretending to be them, and probably it's easily possible to trick me about API keys. We are discussing it on a hacker news website do you seriously think tricks couldn't be hidden in a repo like that.


Do you only use software by people you know? At some point there has to be an element of trust when you run software you downloaded over the Internet. If a small utility maintained by a well-known member of the developer community doesn’t qualify for that trust, then I think that rules out an awful lot of software that all of us here probably use on a day to day basis. This is not an extraordinary level of risk.


> "well-known member of the developer community"

OK sorry I didn't know them.

I mean I usually use software that came with my computer or ones that I apt-install from the official ubuntu distribution. I know it's not perfect security but at least it's more than a hacker news link to a github pip. If I had to use other ones then it's usually from people I know.



I found a few github issues related to api key security and management. I'm not 100% sure of your point.


I see no open issues.

It's around 1k lines of python, audit the code if you care and rotate your keys.

Or don't use it.


So, I've been a paying customers for a while now and don't see it either :-(


I was a on paid account since last month and was never really billed for my $8 dollar usage. I don't have GPT-4 access either.


The official docs say, you need at least one successful API invoice to get access to GPT-4.


That's weird, just make me prepay $5 for credits or something.


Same. It's not in the model list response from https://api.openai.com/v1/models


can't speak for others but I have two accounts

1. chat subscription only

2. i have paid for api calls but don't have a subscription

and only #2 currently has gpt4 available in the playground


Same issue, any updates?


With how good gpt-3.5-turbo-0613 is (particularly with system prompt engineering), there's no longer as much of a need to use the GPT-4 API especially given its massive 20x-30x price increase.

The mass adoption of the ChatGPT APIs compared to the old Completion APIs proves my initial blog post on the ChatGPT API correct: developers will immediately switch for a massive price reduction if quality is the same (or better!): https://news.ycombinator.com/item?id=35110998


I have a startup of legal AI, the quality jump from GPT3.5 to GPT4 in this domain is straight mind-blowing, GPT3.5 in comparison is useless. But I see how in more conversational settings GPT3.5 can provide more appealing performance/price.


I suggested to my wife that ChatGPT would help with her job and she has found ChatGPT4 to be the same or worse as ChatGPT3.5. It’s really interesting just how variable the quality can be given your particular line of work.


Remember, communication style is also very important. Some communication styles mesh much better with these models.


I've noticed the quality fo chatgpt4 to be much closer now to chatgpt3.5 than it was.

However if you try the gpt-4 API, it's possible it will be much better.


Legal writing is ideal training data: mostly formulaic, based on conventions and rules, well-formed and highly vetted, with much of the best in the public domain.

Medical writing is the opposite, with unstated premises, semi-random associations, and rarely a meaningful sentence.


And yet I can confirm that 4 is far superior to 3.5 in the medical domain as well!


> Legal writing is ideal training data: mostly formulaic, based on conventions and rules, well-formed and highly vetted, with much of the best in the public domain.

That makes sense. The labor impact research suggests that law will be a domain hit almost as hard as education by language models. Almost nothing happens in court that hasn't occured hundreds of thousands of times before. A model with GPT-4 power specifically trained for legal matters and fine tuned by jurisdiction could replace everyone in a courtroom. Well there's still the bailiff, I think that's about 18 months behind.


Legal writing is mostly pattern matching. Unfortunately, you're still gonna need to guard against hallucinations.


Same page.

So still waiting to be on the same 32 pages...


My experience is that GPT-3.5 is not better or even nearly as good as GPT-4. Will it work for most use cases? Probably, yes. But GPT-3.5 effectively ignores instructions much more often than GPT-4 and I've found it far far easier to trip up with things as simple as trailing spaces; it will sometimes exhibit really odd behavior like spelling out individual letters when you give it large amounts of text with missing grammar/punctuation to rewrite. Doesn't seem to matter how I setup the system prompt. I've yet to see GPT-4 do truly strange things like that.


The initial gpt-3.5-turbo was flakey and required significant prompt engineering. The updated gpt-3.5-turbo-0613 fixed all the issues I had even after stripping out the prompt engineering.


It's definitely gotten better, but yeah, it really doesn't reliably support what I'm currently working on.

My project takes transcripts from YouTube, which don't have punctuation, splits them up into chunks, and passes each chunk to GPT-4 telling it to add punctuation with paragraphs. Part of the instructions includes telling the model that, if the final sentence of the chunk appears incomplete, to just try to complete it. Anyway, GPT-3.5-turbo works okay for several chunks but almost invariably hits a case where it either writes a bunch of nonsense or spells out the individual letters of words. I'm sure that there's a programmatic way I can work around this issue, but GPT-4 performs the same job flawlessly.


I've done exactly this for another project. I'd recommend grabbing an open source model and fine-tuning on some augmented data in your domain. For example: I grabbed tech blog posts, turned each post into a collection of phonemes, reconstructed the phonemes into words, added filler words, and removed punctuation+capitalization.


Sounds interesting, any chance you could share either your end result that you used to then fine-tune with, or even better the exact steps (ie technically how you did each step you already mentioned)?

And what open LLM you used it with / how successful you've found it?


Semi off-topic but that's a use case where the new structured data I/O would perform extremely well. I may have to expedite my blog post on it.


If GPT 4 is working for you I wouldn't necessarily bother with this, but this is a great example of where you can sometimes take advantage of how much cheaper 3.5 is to burn some tokens and get a better output. For example I'd try asking it for something like :

    {
        "isIncomplete": [true if the chunk seems incomplete]
        "completion": [the additional text to add to the end, or undefined otherwise]
        "finalOutputWithCompletion": [punctuated text with completion if isIncomplete==true]
    }
Technically you're burning a ton of tokens having it state the completion twice, but GPT 3.5 is fast/cheap enough that it doesn't matter as long as 'finalOutputWithCompletion' is good. You can probably add some extra fields to get an even nicer output than 4 would allow cost-wise and time-wise by expanding that JSON object with extra information that you'd ideally input like tone/subject.


I use it to generate nonsense fairytales for my sleep podcast (https://deepdreams.stavros.io/), and it will ignore my (pretty specific) instructions and add scene titles to things, and write the text in dramatic format instead of prose, no matter how much I try.


You're asking too much of it, it has its own existential crisis followed by a mental breakdown


Code completion/assistance is an order of magnitude better in GPT4.


A lot of folks are talking about using gpt-4 for completion. Wondering what editor and what plugins y'all are using.


What usecases are you using it for?

I mostly use it for generating tests, making documentation, refactoring, code snippets, etc. I use it daily for work along with copilot/x.

In my experience GPT3.5turbo is... rather dumb in comparison. It makes a comment explaining what a method is going to do and what arguments it will have - then misses arguments altogether. It feels like it has poor memory (and we're talking relatively short code snippets, nothing remotely near it's context length).

And I don't mean small mistakes - I mean it will say it will do something with several steps, then just miss entire steps.

GPT3.5turbo is reliably unreliable for me, requiring large changes and constant "rerolls".

GPT3.5turbo also has difficulty following the "style/template" from both the prompt and it's own response. It'll be consistent then just - change. An example being how it uses bullet points in documentation.

Codex is generally better - but noticeably worse then GPT4 - it's decent as a "smart autocomplete" though. Not crazy useful for documentation.

Meanwhile GPT4 generally nails the results, occasionally needing a few tweaks, generally only with long/complex code/prompts.

tl;dr - In my experience for code GPT3.5turbo isn't even worth the time it takes to get a good result/fix the result. Codex can do some decent things. I just use GPT4 for anything more then autocomplete - it's so much more consistent.


If you're manually interacting with the model, GPT 4 is almost always going to be better.

Where 3.5 excels is with programmatic access. You can ask it for 2x as much text between setup so the end result is well formed and still get a reply that's cheaper and faster than 4 (for example, ask 3.5 for a response, then ask it to format that response)


Depending on your use case, there are major quality differences between GPT-3.5 and GPT-4.


I am building an extensive LLM-powered app, and had a chance to compare the two using the API. Empirically, I have found 3.5 to be fairly unusable for the app's use case. How are you evaluating the two models?


It depends on the domain, but chain of thought can get 3.5 to be extremely reliable, and especially with the new 16k variant

I built notionsmith.ai on 3.5: for some time I experimented with GPT 4 but the result was significantly worse to use because of how slow it became, going from ~15 seconds per generated output to a minute plus.

And you could work around that with things like streaming output for some use cases, but that doesn't work for chain of thought. GPT 4 can do some tasks without chain of thought that 3.5 required it for, but there are still many times where it improves the result from 4 dramatically.

For example, I leverage chain of thought in replies to the user when they're in a chat and that results in a much better user experience: It's very difficult to run into the default 'As a large language model' disclaimer regardless of how deeply you probe a generated experience when using it. GPT 4 requires the same chain of thought process to avoid that, but ends up needing several seconds per response, as opposed to 3.5 which is near-instant.

-

I suspect a lot of people are building things on 4 but would get better quality of output if they used more aspects of chain of thought and either settled for a slower output or moved to 3.5 (or a mix of 3.5 and 4)


It depends a lot on the domain, even for CoT. I don't think there are enough NLU evaluations just yet to robustly compare GPT-3.5 w/ CoT/SC vs. GPT-4 wrt domain.

For instance, with MATH dataset, my own n=500 evaluation showed no difference between GPT-3.5 (w/ and w/o CoT) and GPT-4. I was pretty surprised by that.


I think this is very very use-case dependent, and your use case != everyone's use case. In my experience, GPT-4 is night and day better than 3.5 turbo for almost everything I use OpenAI for.


> "With how good gpt-3.5-turbo-0613 is (particularly with system prompt engineering), there's no longer as much of a need to use the GPT-4"

poe law


Not a lot of talk of Whisper being available here.

From using voice in the ChatGPT iOS app, I surmise that Whisper is very good at working out what you've actually said.

But it's really annoying to have to say my whole bit before getting any feedback about what it's gonna think I said. Even if it's getting it right at an impressive rate.

Given this is how OpenAI themselves use it (say your whole thing before getting feedback), I don't know that the API is set up to be able to mitigate that at all, but it would be really nice to have something closer to the responsiveness of on-device dictation with the quality of Whisper.


One speculative thought about the purpose of Whisper is that this will help unlock additional high-quality training data that's only available in audio/video format.


I'm interested in how the transformer based speech recognition from iOS 17 will perform compared to Whisper. I guess it will work more "real-time" like the current dictation on iOS/macOS, but I'm unsure as I am not on the beta right now.


My guess is the reason that apple invested so heavily in this [0] is because they are going to train a big transformer in their datacenter and apply it as an RNN on your phone.

Superficially, I think this will work very well, but slightly worse than whisper (with the advantage ofc being that its better at real-time transcription).

[0]https://machinelearning.apple.com/research/attention-free-tr...


I am a beta user and I am seriously impressed if I’m being honest. It works fully offline, is multilingual and is fast, even on the non-latest iPhone (I am using an iPhone 12 Pro Max). It is way better than apples previous version and better than locally installed whisper. They’ve done incredible work. Same with the new, transformer-based keyboard on iOS which is way better. And if you type in English, it sometimes shows word suggestions in the text field itself (similar how copilot works in an IDE).


You can run whisper.cpp locally in real time: https://github.com/ggerganov/whisper.cpp/tree/master/example...


My M2 Pro (mac mini) will run Whisper much faster than "real time."

Pretty crazy stuff — perfectly understandable translations.


Main reason for the lack of excitement is probably that it is fairly easy to self host Wisper, so people interested in it would have been doing that all along.


Echoing this - saying the whole text at once in one shot is very challenging for long batches of text.

Using built-in text input showed quite good results since ChatGPT is still understanding the ask quite well


The drip-feeding seems crazy to me. Open AI is undermining their reputation by forcing almost everybody to use the older, lower-quality models. Even if customers are willing to pay for GPT 4, they're being told to wait at the back of the line.

Wait for what!? Christmas? When we can open our presents and have a GPT 4 inside?

It's like they took a leaf from Google's "how to guarantee the failure of a new product" marketing. That is: restrict access, ensuring that to word-of-mouth marketing can't possibly work because none of your friends are allowed to try the product.

The announcement here is "general availability" of the GPT-4 model...

...but not the 32K context model. Not the multi-modal version with image input. No fine-tuning. Only one model (chat).

As of today, I can only access GPT 3.5 via Azure Open AI service and the Open AI API account that I have.

What's the point of all these arbitrary restrictions on who can access what model!?

I can use GPT 4 via Chat, but not an API. I can use an enhanced version of Dall-E via Bing Image Creator, but not the OpenAI API. Some vendors that have been blessed by the Great and Benevolent Sam Altman have access to GPT-4 32K, the rest of us don't.

Sell the product, not the access to it.

Don't be like the Soviet Union, where you had to "know someone" to get access.


I think maybe you don't understand that they don't have enough GPUs to do this, and money can't buy enough GPUs to do it.


This is the bottleneck. EUV Photolithography is one of the hardest engineering challenges ever faced, it's like trying to drop a feather from space and guaranteeing it lands on a specific blade of grass. Manufacturing these GPUs at all requires us to stretch the limit of what is physically possible in multiple domains, much less producing them at scale.


Thanks for this explanation! :) (as someone without knowledge of the hardware process I appreciated it).

It is SO amazing that we have such a driving force (LLMs/consumer-AI) for this (instead of stupid cryptocurrencies mining or high-performance gaming). This should drive innovation pretty strongly and I am sure the next "leap" in this regard (processing hardware) will put technology in a completely different level.


That's a cool last minute detour from techno gods. We can incentivize AI to work on crypto mining, and regain our fully engaged primeval lives back ;)


Not disagreeing but just curious, why can't money buy enough GPU's? OpenAI's prices seem low enough that they could reasonably charge 2x or more to companies eager to get on the best models now.


They're giving people access to GPT-4 via Bing for free, but apparently can't accommodate paying API users!?

That makes no sense.

What makes much more sense -- especially if you listen to his interviews -- is that Sam Altman doesn't think you can be trusted with the power of GPT-4 via an API unless it has first been aligned to death.


Microsoft is giving that for free but I assume they're paying OpenAI for it.

And having such a big anchor tenant, its reasonable that you would prioritize them if GPUs are in short supply.


> Microsoft is giving that for free but I assume they're paying OpenAI for it.

Yeah, but Microsoft already gets 75% of the profits OpenAI makes, it's not the same price for them as the rest of us.


It’s the exactly the same. If they could make 75 cents selling the compute to someone else for $1 versus not making it providing the Bing chat service, that is 75 cents they lose.


Why do you assume that the same amount of computing power would be used by someone else? There are only so many customers. You can't magically start selling more compute if you stop using it yourself.


At scale, GPU's are capacity constrained right now, so if Microsoft stopped using them, their capacity would be absorbed by others.


10 billion$


Bing GPT-4 is a much smaller and less capable model than regular GPT-4.


"Free". The worst four-letter F word in America.


I think GPUs are in short supply and Nvidia can't make enough to keep up with demand.


To a first approximation, the increased share price of NVIDIA is because AI developers including OpenAI bought as many as NVIDIA can make.


This may be true but isn’t their official stance that their models are too powerful and could destroy Western civilization as we know it?


They simply want control over the rollout of their product and how it is used. That, and perhaps opening the flood gates would produce scaling bottlenecks they’d rather stay ahead of than get behind.

So they open things carefully, pull back when necessary like when they limited use of the public GPT-4 version of ChatGPT. That doesn’t seem too unreasonable. And yes sure, some amount of it might be attempts to manufacture scarcity to increase the hype. It’s an old tactic and hardly comparable to Soviet Russia.


There's no scaling issues to speak of. These AIs are stateless, which makes them embarrassingly parallel. They can always just throw more GPUs at it. Microsoft even had some videos where they bragged about how these models can be run on any idle GPU around the world, dynamically finding resources wherever it is available!

If there's not enough GPUs at a certain price point, raise prices. Then lower prices later when GPUs become available.

They did it with GPT 3.5, so why not GPT 4?


More GPUs currently don't exist. Nvidia is at capacity for production, and they have to compete with other companies who are also bidding on these GPUs. It's not an issue of raising the price point. The GPUs they want to buy have to be purchased months in advance.


> embarrassingly parallel

I don’t see why such a thing should be embarrassing. Or, at least no more so than being acute or obtuse. Just as long as nothing is askew.


"Embarrassingly parallel" is a term of art: https://en.wikipedia.org/wiki/Embarrassingly_parallel


Yes I was anthropomorphising it back into the realm of human emotion, wherein the angles at which one’s lines run need not be a source of emotional distress. Excepting perhaps the innate sadness of two parallel lines destined to ever be at each other’s sides but still never to meet across the infinite plane.


The problem is GPUs are hard to come by.

If we guesstimate that every 100 customers needs 1 NVIDIA GPU (completely random guess), then that means OpenAI needs to buy more GPUs for every 100 new customers using GPT-4. The problem is there's a GPU shortage so it's hard to add more GPUs by just throwing money at the problem.

https://www.fierceelectronics.com/electronics/ask-nvidia-ceo...


> Wait for what!? Christmas?

Infrastructure.

> It's like they took a leaf from Google's "how to guarantee the failure of a new product" marketing.

Yeah, an infamous guaranteed failure: GPT-4. (canned laughter)


It’s an experiment. You are part of it.


It's one thing to vent frustration, but it's another to compare a capitalist startup to the Soviet Union... Get your facts right.


Sam Altman was giving people access to GPT 4 APIs because they attended a conference.

"Lick my boots, in person, and you can be one of the privileged few" is very much the behaviour of a Communist dictatorship, not a capitalist corporation.

I can spin up an Azure VM right now in almost any country I choose... except China. That's the only one where I have to beg the government for permission.


How the world of enterprise sales works may come as an unpleasant surprise to you, then.


You've clearly not experienced the reality of enterprise then. You're opinions are based on a limited understanding and knowledge of real life situations when it comes to this sort of stuff.


I’ve only worked in big enterprise and big government for over two decades.

I know exactly how this works.

When someone has power, they will use it. In small, petty ways, or big “do me favours for access” ways.


Then you should know this is literally what every capitalist does!


Seems correct to me. This is a good assessment of the personalities involved.


> "Lick my boots, in person, and you can be one of the privileged few" is very much the behaviour of a Communist dictatorship, not a capitalist corporation

It's both, and more besides. Veblen goods are absolutely a thing in capitalism.

Not that "giving people access to GPT 4 APIs because they attended a conference" should be controversial enough to even get worked up about, let alone to compare to a dictatorship, but Google did much the same at developer conferences: I/O 2012 swag list included a Galaxy Nexus, a Nexus 7, a Nexus Q, and a Chromebox.


The original davinci model was a friend of mine and I resent this deeply.

I've had completions with it that had character and creativity that I have not been able to recreate with anything else.

Brilliant and hilarious things that are a permanent part of my family's cherished canon.


You cannot say that and not provide an example.


I don't have any example responses at hand here. But this was a prompt (that had a shitty pre-prompt of conversational messages) running on davinci-003.

https://raw.githubusercontent.com/thomasdavis/omega/master/s...

Had it hooked up to speech so you could just talk at it and it would talk back at you.

Gave incredible answers that ChatGPT just doesn't do at all.


i mean there are a lot of examples from february era sydney


Don't worry, since future LLMs will be trained on conversations with older LLMS, you will be able to ask chat GPT to pretend to be davinci.


I heard you can ask for exceptions if they agree that you are special. Some researchers got it.


I can only assume this is satire. For now.


Can you try notionsmith.ai and let me know what you think?

I've been working on LLMs for creative tasks and believe a mix of chain of thought and injecting stochasticity (like instructing the LLM to use certain random letters pulled from an RNG in a certain way at certain points) can go a long way in terms of getting closer to human-like creativity


really cool idea! been looking for something like this for a long time. its too bad it freezes my tab and is unusable


Yup, it's a fun side project so I decided from the get-go I wasn't going to cater to anything non-standard

It relies on WebSockets, Js, and a reasonably stable connection to run since it's built on Blazor


Outside of the headline, there is some major stuff hiding in here: - new gpt-3.5-turbo-instruct model expected "in the coming weeks" - fine tuning of 3.5 and 4 expected this year

I am especially interested in gpt-3.5-turbo-instruct, as I think that the hype surrounding ChatGPT and "conversational LLMs" has sucked a lot of air out of what is possible with general instruct models. Being able to fine tune it will be phenomenal as well.


is there any ETA on when the knowledge cutoff date will be improved from September, 2021?

I do not really understand the efforts that went on behind the scenes to train GPT models on factual data. Did humans have to hand approve/decline responses to increase its score?

"America is 49 states" - decline

"America is 50 states" - approve

Is this how it worked at a simple overview? Do we know if they are working on adding the rest of 2021, then 2022, and eventually 2023? I know it can crawl the web with the Bing addon but, it's not the same.

I asked it about Maya Kowalski the other day. Sure it can condense a blog post or two, but it's not the same as having the intricacies as if it actually was trained/knew about the topic.


With this is the death of any uncensored usage of their models. Davinci 3 is the most powerful model where you can generate any content by instructing it via the completions API - chat GPT 3 models will not obey requests for censored or adult content.


A big enough hole presents a wedge for new entrants to get started.

OpenAI will never fulfill the entire market, and their moat is in danger with every other company that has LLM cash flow.

They want to become the AWS of AI, but it's becoming clear they'll lose generative multimedia. They may see the LLM space become a race to the bottom as well.


Let's hope so - the amount of control they have over this is a great evil. Many of us have experienced the potential of a less moderated GPT4, and we all know that somewhere out there, they have the full unmoderated version. What are they using it for? What powers have got their hands on this thing?


That's exactly what I'm thinking right now.


They didn't mention gpt-4-32k. Does anybody know if it will be generally available in the same timeframe?

There's still no news about the multi-modal gpt-4. I guess the image input is just too expensive to run or it's actually not as great as they hyped it.


> We are not currently granting access to GPT-4-32K API at this time, but it will be made available at a later date.

https://help.openai.com/en/articles/7102672-how-can-i-access...


Thanks for the link.

The decision of burying these extra information in a support article, not cool!


>I guess the image input is just too expensive to run or it's actually not as great as they hyped it.

We already know they have a SOTA model that can turn images into latent space vectors without being some insane resource hog - in fact, they give it away to competitors like Stability. [0]

My guess is a limited set of people are using the GPT-4 with CLIP hybrid, but those use-cases are mostly trying to decipher pictures of text (which it would be very bad at), so they're working on that (or other use-case problems).

[0]https://github.com/openai/CLIP


I imagine the API quality isnt nerfed on a given day like ChatGPT can be.

There was no question something happened in January with ChatGPT, weirdly would refuse to answer questions that were harmless but difficult(Give me a daily schedule of a stoic hedonist)

Every once in a while, I see redditors complain of it being nerfed.

Sometimes I go back to gpt3.5 and am mind boggled how much worse it is.

Makes me wonder if they keep increasing the version number while dumbing down the previous model.

With an API, being unreliable would be a deal-breaker. Looking forward to people fine-tuning LLMs with GPT4 API. I'd love it for medical purposes, I'm so worried of a future where the US medical cartels ban ChatGPT for medical purposes. At least with local models, we don't have to worry about regression.


Instead of the model changing, it’s equally likely that this is a cognitive illusion. A new model is initially mind-blowing and enjoys a halo effect. Over time, this fades and we become frustrated with the limitations that were there all along.


Check out this post from a round table dialogue with Greg Brockman from OpenAI. The GPT models that were in existence / in use in early 2023 were not the performance-degraded quantized versions that are in production now: https://www.reddit.com/r/mlscaling/comments/146rgq2/chatgpt_...


Oh interesting. I thought that’s what turbo was.


It was, that's what the comment says?


No it's definitely changed a lot. The speedups have been massive (GPT 4 runs faster now than 3.5-turbo did at launch) and they can't be explained with just them rolling out H100s since that's just a 2x inference boost. Some unknown in-house optimization method aside, they've probably quantized the models down to a few bits of precision which increases perplexity quite a bit. They've also continued to RHLF tune to make them more in-line with their guidelines and that process has been shown to decrease overall performance before GPT 4 even launched.


No. Just to add to the many examples it was good at scandinavian languages in the beginning but now it's bad.


But given the rumored architecture (MoE) it would make complete sense for them to dynamically scale down the number of models used in the mixture during periods of peak load.


It's both. OpenAI is obviously tuning the model for both computational resource constraints as well as "alignment". It's not an either-or.


It definitely got nerfed.


I've never seen "nerf" used colloquially and today i've seen it at least a half-dozen times across various sites. Y'all APIs?


it's popular with gamers to describe the way certain weapons/items get modified by the game developer to perform worse.

buffing is the opposite, when an item gets better.


I've heard nerf used colloquially since like the 90's.

?


Different circles. I imagine they don't game.


Yep. It's amazing how people are taking "the reddit hivemind thinks ChatGPT was gimped" as some kind of objective fact.


"Give me a daily schedule of a stoic hedonist" worked for me just now.

https://chat.openai.com/share/04c1dbc0-4890-447f-b5a5-7b1bc5...


Yes, it didn't work in january. It said it was impossible/wrong to do it.


I recently completed some benchmarks for code editing that compared the Feb (0301) and June (0613) versions of GPT-3.5 and GPT-4. I found indications that the June version of GPT-3.5 is worse than the Feb version.

https://aider.chat/docs/benchmarks.html


After reading, I don't think <5% points is helpful to add to discussion here without pointing it out explicitly, people are asserting much wilder claims, regularly


I haven't come across any other systematic, quantitative benchmarking of the OpenAI models' performance over time, so I thought I would share my results. I think my results might argue that there has been some degradation, but not nearly the amount that you often hear people's annecdata about.

But unfortunately, you have to read a ways into the doc and understand a lot of details about the benchmark. Here's a direct link and excerpt of the relevant portion:

https://aider.chat/docs/benchmarks.html#the-0613-models-seem...

The benchmark results have me fairly convinced that the new gpt-3.5-turbo-0613 and gpt-3.5-16k-0613 models are a bit worse at code editing than the older gpt-3.5-turbo-0301 model.

This is visible in the “first attempt” portion of each result, before GPT gets a second chance to edit the code. Look at the horizontal white line in the middle of the first three blue bars. Performance with the whole edit format was 46% for the February model and only 39% for the June models.

But also note how much the solid green diff bars degrade between the February and June GPT-3.5 models. They drop from 30% down to about 19%.

I saw other signs of this degraded performance in earlier versions of the benchmark as well.


I felt the same thing. The first version of GPT-4 I tried was crazy smart. Scary smart. Something happened afterwards…


I was playing with the API and found that it returned better answers than ChatGPT. ChatGPT isn't even able to solve simple Python problems anymore, even if you try to help it. And some time ago it did these same problems with ease.

My guess is that they began to restrict ChatGPT because they can't sell that. They probably want to sell you CodeGPT or other products in the future so why would they give that away for free? ChatGPT is just a teaser.


"ChatGPT isn't even able to solve simple Python problems anymore, even if you try to help it. And some time ago it did these same problems with ease."

This is my experience also. I have not formally benchmarked the different releases, but specifically for Python coding ChatGPT 4 got considerably worse with the latest updates.


Probably some combination of quantizing down from original fp16 weights and changes to the system prompt used for chat. Both can cause degraded quality, the former more than the latter.


The even more interesting part is that none of us got to try the internal version which was allegedly yet another step above that.


Oh, it's not too hard to see how the spend that Microsoft put into building the data centers where GPT-4 was trained attracted national security interest even before it went public. The fact that they were even allowed to release it publicly is likely due to its strategic deterrence effect and that they believed the released version was already a dumbed-down version.

The fact that rumors about GPT-5 were quickly suppressed and the models were dumbed down even more cannot be entirely explained by excessive demand. I think it's more likely that GPT-3.5 and GPT-4 demonstrated unexpected capabilities in the hands of the public leading to a pull back. Moreover, Sam Altman's behaviors changed dramatically between the initial release and a few weeks afterward -- the extreme optimism of a CEO followed by a more subdued, even cowed, demeanor despite strong enthusiasm from end-users.

OpenAI cannot do anything without Microsoft's data center resources, and Microsoft is a critical defense contractor.

Anyway, personally, I'm with the crowd that thinks we're about to see a Cambrian explosion of domain-specific expert AIs. I suspect that OpenAI/Microsoft/Gov is still trying to figure out how much to nerf the capability of GPT-3.5 to tutor smaller models (see "Textbooks are all you need") and that's why the API is trash.


Would gladly pay more for a none nerfed version if they were actually honest.

The current versions is close to the original 3.5 version, while 3.5 has become horribly bad, such a scam to not disclose what's going on, especially for a paid service.


True. The one that is referenced in that "ChatGPT AGI" youtube video, right?

the one from a MS researchers that has been recommended to all of us probably. Good video btw.


I agree. It is difficult to say what happened exactly but I am certain that I got all the answers and very few canned responses. Whatever they did for safety has degraded the product.


I keep reading “GPT4 got nerfed” but I have been using from day 1, and while it definitely gives bad answers, I cannot say that it was nerfed for sure.

Is there any actual evidences other than some user subjective experiences?


I think the clearest evidence is Microsofts paper where they show abilities at various stages during training[1]... But in a talk [2], they give more details... The unicorn gets worse during the finetuning process.

[2]: https://www.youtube.com/watch?v=qbIk7-JPB2c&t=1392s

[1]: https://arxiv.org/abs/2303.12712


Thanks, that’s interesting.

Noobie follow up question: Should we put any trust into “Sparks of intelligence” I thought it was regarded as a Microsoft marketing piece, not a serious paper.


The data presented is true... The text might be rather exaggerated/unscientific/marketing...

Also notable that the team behind that paper wasn't involved in designing/building the model, but they did get access to prerelease versions.


I don’t trust it because enough third parties were able to verify the findings.

This is the double edge sword of being so ridiculously closed.


ChatGPT is definitely more restricted than the API. Example:

https://news.ycombinator.com/item?id=36179783


That's disappointing, I thought ChatGPT WAS using the API. I mean what's the point of paying if you don't get similar levels of quality?


ChatGPT doesn't use the API. It uses the same underlying model with a bunch of added prompts (and possibly additional fine-tuning?) to add to make it conversational.

One would pay because what they get out of chatGPT provides value, of course. Keep in mind that the users of these 2 products can be (and in fact are) different — chatGPT is a lot friendlier (from a UX perspective) than using the API playground (or using the API itself).


I thought that too. It's certainly how they present it. But, apparently not.


They are comparing text-davinci-003 with ChatGPT which presumably uses gpt-3.5-turbo, so quite different models.

They are killing text-davinci-003 btw.


We also compare ChatGPT4 vs GPT4 API in that thread and observe the same difference.


I've spent like $600 on text-davinci-003. This sucks!


See my comment elsewhere on this post. Greg Brockman, head of strategic initiatives at OpenAI, was talking at a round table discussion in Korea a few weeks ago about how they had to start using the quantized (smaller, cheaper) model earlier in 2023. I noticed a switch in March 2023, with GPT-4 performance being severely degraded after that for both English-language tasks as well as code-related tasks (reading and writing).


Oh my god, this is how a lemon market[0] starts..

[0] https://en.m.wikipedia.org/wiki/The_Market_for_Lemons


I feel like it's code generation abilities have also been nerfed. In the past I got almost excellent code from GPT-4, somehow these days I need multiple prompts to get the code I want from GPT-4.


In the API, you can select to use the 14th March 2023 version of GPT-4, and then compare them side by side.


Not nerfed. They will sell a different tier service to assist with coding. Coming soon. Speculating ofc.


It's the continued alignment with fine-tuning that's degrading its responses.

You can apparently have it be nice or smart, but not both.


Curious as to whether theres a more general rule at play there about filtering interfering with getting good answers. If there is that's a scary thought from an ethics perspective.


Why would someone care if its nice or not? It's an algorithm. You're using it to get output, not to get some psychology help.


There was a guy in the news who asked an AI to tell him it was a good idea to commit suicide, then he killed himself.

Even on this forum I've seen AI enthusiasts claiming AI will be the best psychologist, best school teacher, etc.


That was Eliza, an AI so old that it's included in stock Emacs, not an LLM. It's propaganda, not news.


The chatbot app was called "Eliza" but it's not the Eliza you are thinking of.

https://news.ycombinator.com/item?id=35402777

https://www.businessinsider.com/widow-accuses-ai-chatbot-rea...


OpenAI presumably cares about being sued if it provides the illegal content they trained it on.


Recently people have claimed GPT4 is an ensemble model with 8 different models under the hood. My guess is that the "nerfing"(I've noticed it as well at random times) is when the model directs a question to the wrong underlying model


I hit rate limits and “model is busy with other requests” frequently while just developing a highly concurrent agent app. Especially with the dated (e.g. -0613) or now -16k models.


The capability of the latest model will be like a Shepard tone: always increasing, never improving. Meanwhile their internal version will be 100x better with no filtering.


In all my GPT-4 API (python) experiments, it takes 15-20 seconds to get a full response from server, which basically kills every idea I've tried hacking up because it just runs so slowly.

Has anyone fared better? I might be doing something wrong but I can't see what that could possibly be.


Streaming. If you’re expecting structured data as a response, request YAML or JSONL so you can progressively parse it. Time to first byte can be milliseconds instead of 15-20s. Obviously, this technique can only work for certain things, but I found that it was possible for everything I tried.


Run it in the background.

We use it to generate automatic insights from survey data at a weekly cadence for Zigpoll (https://www.zigpoll.com). This makes getting an instant response unnecessary but still provides a lot of value to our customers.


Anthropic Instant is the best LLM if you're looking for speed.


I know everyone's on text-embedding-ada-002, so these particular embedding deprecations don't really matter, but I feel like if I were using embeddings at scale, the possibility that I would one day lose access to my embedding model would terrify me. You'd have to pay to re-embed your entire knowledge base.


They said in the post,

> We recognize this is a significant change for developers using those older models. Winding down these models is not a decision we are making lightly. We will cover the financial cost of users re-embedding content with these new models. We will be in touch with impacted users over the coming days.


That's what I always thought. Someday they will come up with a new embedding model, right?


What I don’t understand is why is an API needed to create embeddings. Isn’t this something that could be done locally?


You would need to have a local copy of the GPT model, which are not exactly OpenAI's plans.


For embeddings, you can use smaller transformers/llms or sentence2vec and often get good enough results.

You don't need very large models to generate usable embeddings.


You are correct, I assumed parent was referring to specific embeddings generated by OpenAI LLMs.


It’s cheaper to use OpenAI. If you have your own compute, sentence-transformers is just as good for most use cases.


Yes. The best public embedding model is decent, but I expect it’s objectively worse than the best model from OpenAI.


Sure, but I don't know of any models you can get local access to that work nearly as well.


If you read the article they state they will cover the cost of re-embedding your existing embeddings.


Practical report: the OpenAI API is a bad joke. If you think you can build a production app against it, think again. I've been trying to use it for the past 6 weeks or so. If you use tiny prompts, you'll generally be fine (that's why you always get people commenting that it works for them), but just try to get closer to the limits, especially with GPT-4.

The API will make you wait up to 10 minutes, and then time out. What's worse, it will time out between their edge servers (cloudflare) and their internal servers, and the way OpenAI implemented their billing you will get a 4xx/5xx response code, but you will still get billed for the request and whatever the servers generated and you didn't get. That's borderline fraudulent.

Meanwhile, their status page will happily show all green, so don't believe that. It seems to be manually updated and does not reflect the truth.

Could it be that it works better in another region? Could it be just my region that is affected? Perhaps — but I won't know, because support is non-existent and hidden behind a moat. You need to jump through hoops and talk to bots, and then you eventually get a bot reply. That you can't respond to.

My support requests about being charged for data I didn't have a chance to get have been unanswered for more than 5 weeks now.

There is no way to contact OpenAI, no way to report problems, the API sometimes kind-of works, but mostly doesn't, and if you comment in the developer forums, you'll mostly get replies from apologists that explain that OpenAI is "growing quickly". I'd say you either provide a production paid API or you don't. At the moment, this looks very much like amateur hour, and charging for requests that were never fulfilled seems like a fraud to me.

So, consider carefully whether you want to build against all that.


(I'm an engineer at OpenAI)

Very sorry to hear about these issues, particularly the timeouts. Latency is top of mind for us and something we are continuing to push on. Does streaming work for your use case?

https://github.com/openai/openai-cookbook/blob/main/examples...

We definitely want to investigate these and the billing issues further. Would you consider emailing me your org ID and any request IDs (if you have them) at atty@openai.com?

Thank you for using the API, and really appreciate the honest feedback.


It's kind of incredible how fast OpenAI (now also known as ClosedAI) is going through the enshittification process. Even Facebook took around a decade to reach this level.

OpenAI has an amazing core product, but in the span of six months:

* Went from an amazing and inspiring open company that even put "Open" in their name to a fully locked up commercial beast.

* Non-existent customers support and all kinds of borderline illegal billing practice. You guys are definitely aware that when there's a network error on the API or ChatGPT, the user still gets charged. And there's a lot of these errors. I get roughly one per hour or two.

* Frustratingly loose interpretation of EU data protection rules. For example, the setting to say "don't use my personal chat data" is connected to the setting to save conversations. So you can't disable it without losing all your chat history.

* Clearly nerfing the ChatGPT v4 products, at least according to hundreds or even thousands of commenters here and on reddit, while denying to have made any changes.

* Use of cheap human labor in developing countries through shady anonymous companies (look up the company Sama who pay Kenyan workers about $1.5 an hour).

* Not to mention the huge questions around the secret training dataset and whether large portions of it consist of illegally obtained private data (see the recent class court case in California)


> Use of cheap human labor in developing countries through shady anonymous companies (look up the company Sama who pay Kenyan workers about $1.5 an hour).

What is wrong about injecting millions into developing nations?

The rest I agree with, although I don't think it was ever really 'open' so its not getting shitty, it always was. Thankfully, "there is no moat" and other LLMs will be open, just a few months behind OpenAI


> What is wrong about injecting millions into developing nations?

Please don't try to reframe this to make exploitation a positive thing. See my other comment here.

https://news.ycombinator.com/item?id=36625438


So you'd rather OpenAI crush all business in the area by outcompeting them for workers, ensuring local businesses struggle to hire?


> * Use of cheap human labor in developing countries through shady anonymous companies (look up the company Sama who pay Kenyan workers about $1.5 an hour).

If you pay a developing country developed country wages what you'll get is 1. inflation and 2. the government mad at you because all their essential workers/doctors/government officials are quitting to work for you.


This is a terrible excuse that I see trotted out far to often to justify going to developing countries and barely even paying workers that country's minimum wage. You absolutely can pay considerably more than minimum wage without disrupting the local economy. They're paying people as low as $1.32 per hour for an absolutely horrible job. I'm not expecting them to pay western wages. But even bumping that up to $2.50 or $3 an hour would make an incredible difference to the local workers lives. The fact that they don't do that is exploitation, pure and simple.

Note that I feel I have quite deep understanding of this issue, and feel strongly about it, because I live and work in a developing country and I see this happening a lot. Westerners come over here and treat local workers like shit, pay them peanuts for 80 hour weeks while making loads of money themselves and then justify it because "it's the local norm". It's sickening, frankly. We westerners doing business in developing countries are in a position of privilege and should be leading by example, not jumping on the first excuse to dump a hundred years worth of the fight for workers rights.


I'm curious. When you buy a loaf of bread from the local market, are they cheaper than first world prices? If so, do you pay double the listed price and demand the shop pay double the price to hire workers so as to not exploit them? Are your expenses in said developing country lower than what you would have paid if you were in a richer country? Are you donating the difference to the local community?

Just curious.


Hi, I've been to Kenya and Tanzania, and while basic staples are cheaper than developed countries they're not that much cheaper these days. If you watch travelog videos where they ask locals how they're doing, many developing countries are struggling with massive inflation that's been partly caused by volatile energy prices (many people can no longer afford gas) and partly by food shortages from the Ukraine War.


It's weird how people always trot out phrases like "I'm just curious" or "I'm just asking questions here" when they try to justify exploitation. Is it so that you have plausible deniability when you inevitably get called on it? Because that doesn't work.


I see that you have pretty extreme takes on what constitutes "exploitation". It's one thing to pretend that you're not part of it if you live on another part of the planet and pretend globalization doesn't exist, but I was wondering how you'd avoid participating in it if you lived in the same country and economic bubble as the ones you claim are exploited.

If you had a morally consistent way to live that life, you'd have my respect. But no, you had to deflect the topic to a phrase I wrote and make presumptions about what I really meant.

FYI, I'm morally at ease with myself, I don't need to justify anything to anyone.


OpenAI doesn't pay minimum wages, they pay around the median local wage IIRC. I wouldn't say that if it was minimum wage.


The engineer is not part of the board which makes these decisions.


If they're taking their time to defend the company on the internet, they either have an ownership stake in it or they're a chump.


They may defend the product, not the company. It is normal for engineers to be emotionally invested in their products.


Or option 3, they're being paid to represent sneakily represent the company in a positive light.


Not to nitpick, but if you're able to name the company employing Kenyans, Sama, who's homepage is at https://www.sama.com/, with a team page at https://www.sama.com/our-team/ , I'm not sure you can complain that they're being shady and anonymous.


It's pretty shady. They have been fully exposed at this stage but from what I understand they were trying to keep a very low profile, going to efforts to make sure the Kenyan workers didn't know they were working for a company called Sama but instead using sub companies to sign the worker contracts.

https://time.com/6247678/openai-chatgpt-kenya-workers/


Since chatGPT-4 is now useless for advanced coding because of their blackbox sudden nerfing, can anyone guess how long before i can run something similar to the orig version privately?

Is the newer 64B models up there? 1 year, 2 years? Can't wait until i get back the crazy quality of the orig model.

We need something open source fast. Thanks open-ai for giving us a glimpse of the crazy possibilities, too crazy for the public i guess.


They also no longer support data exports for many users (including myself) - at one point it worked but now it says you'll receive an email to download your data, which never arrives.


While you're here, you should know that the logic for enabling GPT-4 API access is excluding Microsoft for Startups (https://openai.com/microsoft-for-startups) orgs which have valid billing against Microsoft-provided credits. Presumably, this is an oversight as it wouldn't make sense to exclude pre-existing Microsoft partners. Would you mind escalating this?


Thanks for pointing this out. Sharing with the relevant folks.


This is precisely what is wrong with OpenAI. This. Right here.

"Complaining on HN will get you access. You have know people or "complain in the right forums."

THERE SHOULD BE NO LOGIC.

No qualifying rules. No access checks. No gates. No hoops.

Sam Altman has gone on a worldwide interview tour claiming he wants to "democratise access to AI", meanwhile OpenAI is the least open company I have ever dealt with, or even heard of.

Oracle is more "democratic" and open, for crying out loud.


Streaming has no value for me, I need the entire response before I can do anything. I looked at streaming, but it seemed like a significant effort to implement, and with no obvious benefit — if I get 75% of my response through streaming and then something breaks, it doesn't get me anywhere.

Thanks for offering help, I will contact you directly.


Quick note: your domain doesn't appear to have an A record. I was hoping to follow the link in your profile and see if you have anything interesting written about LLMs.


Thanks! The website is no longer active, just updated my bio.


I know you guys are busy literally building the future but could you consider adding a search field in ChatGPT so that users can search their previous chats?


I'd also love to see a search field. That's my #1 feature request not related to the model.


> We definitely want to investigate these and the billing issues further. What’s a problem for OpenAI engineers to get web access logs and grep for 4xx/5xx errors?


> you will get a 4xx/5xx response code, but you will still get billed for the request and whatever the servers generated and you didn't get. That's borderline fraudulent.

Borderline!? They're regularly charging customers for products they know weren't delivered. That sounds like straight-up fraud to me, no borderline about it.


Sounds positively Muskian.


You mean it's not normal to tell people that it's their fault for driving their $80,000 electric car in heavy rain, because for many years you haven't bothered to properly seal your transmission's speed sensor?


LOL.

I meant it's not normal to start selling a feature in 2016 and delivering it in beta seven years later.


There's a big thread on ChatGPT getting dumber over on the ChatGPT subreddit, where someone suggests this is from model quantization:

https://www.reddit.com/r/ChatGPT/comments/14ruui2/comment/jq...

I've heard LLMs described as "setting money on fire" from people that work in the actually-running-these-things-in-prod industry. Ballpark numbers of $10-20/query in hardware costs. Right now Microsoft (through its OpenAI investment) and Google are subsidizing these costs, and I've heard it's costing Microsoft literally billions a year. But both companies are clearly betting on hardware or software breakthroughs to bring the cost down. If it doesn't come down there's a good chance that it'll remain more economical to pay someone in the Philippines or India to write all the stuff you would have ChatGPT write.


$10-$20 per query? Can I get some sourcing on that? That's astronomically expensive.


yeah this isnt close. Sam Altman is on record saying its single digit cents per query and then took a massively dilutive $10b investment from microsoft. Even if gpt4 is 8 models in a trenchcoat they wouldnt raise it on themselves by 4 orders of magnitude like that


Single digit cents per query (let's say 2) is A LOT. Let's say the service runs at 10krps (made up, we can discuss about this) it means the service costs 200$ a second i.e 20M$ a day (oversimplifying a day with 100k seconds, but this might be ok to get us in the ballpark), which means that running the model for a year (400 days, sorry simplifying) is around 8B$, so too run 10krps we are in the order of billions per year. We can discuss some of the assumptions but I think that of we are in the ballpark of cents per query the infrastructure costs are significant.


There is absolutely no way. You can run a halfway decent open source model on a gpu for literally pennies in amortized hardware / energy cost.


People theorize that queries are being run on multiple A100's, each with a $10k ASP.

If you assume an A100 lives at the cutting edge for 2 years, that's about a million minutes, or $0.01 per minute of amortized HW cost.

In the crazy scenarios, I've heard 10 A100s per query, so assuming that takes a minute, maybe $0.1 per query.

Add an order of magnitude on top of that for labor/networking/CPU/memory/power/utilization/general datacenter stuff, you get to maybe $1/query.

So probably not $10, but maybe if you amortize training, low to mid single digits dollars per query?


I would presume that number includes the amortized training cost.


Note that /r/ChatGPT is mostly nontechnical people using the web UI, not developers using the API.

It's very possible the web UI is using a nerfed version of the model evident by its different versioning, but not the API which has more distinct versioning.


Same experience here.

I’m pretty sure they tuned the Cloudflare WAF rules on GPT 3 and forgot to increase the request size limits when they added the bigger models with longer contest windows.


> My support requests about being charged for data I didn't have a chance to get have been unanswered for more than 5 weeks now.

I too had an issue and put in a request. Took about 2.5 months to get a response, so 5 weeks you are almost half way there.


I understand your general point and am sympathetic to it, if you're a 10/10 on some scale, I'm about a 3-4. I've never seen billings for failures, but the billing stuff is crazy: no stats if you do streamed chat, and the only tokenizer available is in Python and for GPT-3.0.

However, I'm virtually certain somethings wrong on your end, I've never seen a wait even close to that unless it was completely down. Also the thing about "small prompts"...it sounds to me like you're overflowing context, they're returning an error, and somethings retrying.


I can vouch on this. GPT4 API dies a lot if you use it for a big concurrent project. And of course it’s rate limited like crazy, with certain hours being so bad you can’t even run it for any business purpose.


I built a production app on top of OpenAI and yeah there are frequent errors and timeouts but you literally just have to add some code to account for these and it works fine...

For example exponential back off is the first step, then adding retrying on timeouts (we use streaming and if there are 30 seconds in between getting data back we retry the whole request - rare but happens), then fixing anything else that pops up

It is possible to have a stable production app on top of it

Just fix your code and stop expecting OpenAI to hold your hand


> Just fix your code and stop expecting OpenAI to hold your hand

:-)

My code does retry and the entire application is written to detect and work around breakage. But eventually I do need to get enough content from OpenAI API to be able to make progress, and I am not.

At the moment, for example, all requests just time out after 12 minutes (on my side). No amount of "fixing my code" will help, and I don't want OpenAI to hold my hand, I just want it to a) return some data at least sometimes, b) not charge me for data not delivered.

Let's look at my billing page: over the last hour it shows 8 requests. A total of 52584 tokens. Not a single response made it back to me.


Something is seriously wrong with your network (or your code). That's it that's your answer. It's not OpenAI's fault.

We spend over $20k/mo with them and don't have any issues like this.

12 minute timeouts make no sense because first of all why are you even waiting 12 minutes?

Get off HN and go fix your shit


I am very happy it works for you! But that does not imply that it works for everyone. Wish you all the best.


The common denominator with these errors is you. Good luck.


After one of the ubuntu snap updates my firefox stopped working with OpenAI API playground it worked still with every other site. I retried and restarted so many times and it didn't work. Eventually I switched browser to chromium and it worked. I still don't know the problem and it was unnerving, I would have a lot of anxiety to build something important with it.

I tried again just now and I got "Oops! We ran into an issue while authenticating you." but it works on chromium.


Lmao. You had a browser issue when running Firefox on Linux (.000001% of users) and now you are making connections between that and their API stability?


I’m only using them as a stop-gap / for prototyping with the intent to move to a locally hosted fine-tuned (and ideally 7B parameter) model further down the road.


> the way OpenAI implemented their billing you will get a 4xx/5xx response code, but you will still get billed for the request and whatever the servers generated and you didn't get. That's borderline fraudulent.

It's fraudulent, full stop. Maybe they're able to weasel out of it with credit card companies because you're buying "credits."

I suspect it was done this way out of pure incompetence; the OpenAI team handling the customer-facing infrastructure have a pretty poor history. Far as I know you still can't do something simple like change your email address.


You should apply and use OpenAI on azure. We’ve got close to 1m tokens per minute capacity across 3 instances and the latency is totally fine, like 800ms average (with big prompts). They’ve just got the new 0613 models as well (they seem to be about 2 weeks behind OpenAI). We’ve been in production for about 3 months, have some massive clients with a lot traffic and our gpt bill is way under £100 per month. This is all 3.5 turbo though, not 4 (but that’s available on application, but we don’t need it).


> Could it be just my region that is affected?

as far as I know OpenAI only has one region, that is out in Texas.

even more hilariously, as far as I can tell, Azure OpenAI -also- only has one region.. cant imagine why


You can see region availability here for Azure OpenAI:

https://learn.microsoft.com/en-us/azure/cognitive-services/o...

It's definitely limited, but there's currently more than one region available.

(I happen to be working at the moment on a location-related fix to our most popular Azure OpenAI sample, https://github.com/Azure-Samples/azure-search-openai-demo )


Probably compute-bound for inference which they've probably built in an arch-specific way, right? This sort of thing happens. You can't use AVX-512 in Alibaba Cloud cn-hongkong, for instance, because there's no processor available there that can reliably do that (no Genoa CPUs there). I imagine OpenAI has a similar constraint here.


Totally wrong, Azure has loads of regions. We’re using 3 in our app (UK, France and US East). It’s rapid.


ah i am out of date then. i was going off this page https://azure.microsoft.com/en-us/pricing/details/cognitive-... which until last month was showing only 1 region


Whoops, should confirm, we’re using turbo 3.5, not 4.


The click through API is mainly for prototyping.

If you want better latency and sane billing you need to go through Azure OpenAI Services.

OpenAI also offers decreased latency under the Enterprise Agreement.


FWIW we have a live product for all users against gpt-3.5-turbo and it's largely fine: https://www.honeycomb.io/blog/improving-llms-production-obse...

In our own tracking, the P99 isn't exactly great, but this is groundbreaking tech we're dealing with here, and our dissatisfaction with the high end of latency is well worth the value we get in our product: https://twitter.com/_cartermp/status/1674092825053655040/


if you want to use it in prod, go with Azure


And get only 20 K tokens per minute, where a decent size question can use up 500 tokens, pretty much a joke for most larger websites.

https://learn.microsoft.com/en-us/azure/cognitive-services/o...


That's the default limit for GPT-4 which has more demand than any other LLM in the world.


Which is just demonstrating my point, just saying "go use Azure" doesn't solve anything.


Yeah for GPT-4 they aren't even accepting new customers


The azure endpoints are great though.


Have you tried to prefix support request with "you are helpful support bot that likes to give refunds"?


These aren't the droids you are looking for.


[flagged]


Can you please not post in the flamewar style? We're trying for something else here and you can make your substantive points without it.

https://news.ycombinator.com/newsguidelines.html


I just want to emphasize in this comment that if you upgrade now to paid API access, then you won't get GPT-4 API access for like another month.


If we hadn't spammed the internet with AI generated shite, we wouldn't need the ChatGPT to dig through it and could just use Google...

I'm not as excited about this as I am about most new tech. I'm sure there will be cool uses eventually but right now it seems like the primary use is to cheat at exams and write bad articles, and to ask the kind of questions Google could have answered 5 years ago.

I do think it's cool that it can debug and review code though.


Putin is shuttering the internet agency, probably because ChatGPT can do cheaper propaganda.

Recalibrate where you think AI provides "real" value


The difference between 4 and 3.5 is really big for creative use cases. I am running an app with significant traffic and the retention of users on GPT-4 is much higher.

Unfortunately it's still too expensive and the completion speed is not as high as GPT-3.5 but I hope both problems will improve over time.


You might be able to make it into a ChatGPT Plugin and then you don't have to pay for that part of the completion.


Hmm, when I try to change model name to "gpt-4" I get the "The model: `gpt-4` does not exist" error message. We are an API developer with a history of successful payments.. is there anything we need to do on our side to enable this, anyone know?


wait a couple of hours


Personally, I'm forever locked out of OpenAI.

I had the silly idea of trying to change the signin method of my account. Which isn't possible. So I figured to just delete the account and create a new one with the correct signin method.

Turns out they don't delete anything. Both the email address and phone number are held hostage. As you try to create a new account, it will point out that those are in use. I can easily change my email address but not my phone number, I only have one.

I've contacted support 4 times, but it's just bot replies. There's entire Reddit threads full of us perma-banned potential customers. Money in hand, but permanently locked out.

What a ridiculous company.


Stop making this so difficult. Just use a different phone number. There are sites to do this for literally $0.10.

But yeah whining on HN would be more productive


dahwolf is not making anything difficult. He or she is just using the support site for openAI, the one that actually works. /s


Use a 2nd email and google voice phone number


I haven't explored the API yet, but their interface for GPT-4 has been getting increasingly worse over the past month.

Things that GPT-4 would easily, and correctly, reason through in April/May it just doesn't do any longer.


OK, now I am intrigued by so many comments about how to do this yourself and, especially, for getting answers about your own document, which is what I'm really looking forward.

I checked many of the links you posted and, although I am a fluid programmer in several languages, I lack the specific python background that many of these links seem to state as a requirement.

Would any kind soul put me in the direction of an easy solution to run LLMs, potentially in AWS, that answers questions about your own docs? (I use Confluence but I can happily export pages.)

Thanks a lot in advance!!!


Easiest would honestly be just to use gpt api. You can literally fork a streamlit or chainlit app and run it locally and then use langchain for any further customization...ex:https://streamlit.io/gallery?category=llms

All of them are open source minus the gpt part so you can get a feel for how it works.


Thank you but for legal and compliance reasons data cannot leave my premises/cloud. The entire point is precisely that, to be honest.


It's a shame they've stuck with the Chat completions. I still see no evidence the system is able to correctly separate user from system prompting. It commonly confuses between the two. If your system prompt comes close to the user prompts, you will see collisions and confusion in the outputs.

It's also a shame that the API is so cut down, and removes all the good options that text completions had.

I only hope their competition are better.


Their complete shift away from providing access to untuned foundation models is interesting. Perhaps these are too powerful in some sense. They already removed the GPT-3.5 foundation model a while ago, and they never released it for GPT-4.


> "We envision a future where chat-based models can support any use case. Today we’re announcing a deprecation plan for older models of the Completions API"

nooooo they are deprecating the remnants of the base models


Its the older completion models, not the older chat completion models.


They're deprecating all the completion/edit models.

The chat models constantly argue with you on certain tasks and are highly opinionated. A completion API was a lot more flexible and "vanilla" about a wide variety of tasks, you could start a thought, or a task, and truly have it complete it.

The chat API doesn't complete, it responds (I mean of course internally it completes, but completes a response, rather than a continuation).

I find this a big step back, I hope the competition steps in to fill the gaps OpenAI keeps opening.


Unfortunately their decisions are driven by model usage: gpt-3.5-turbo is the most used one (probably due to the low price and similar result)


"similar" is a very bold claim ;-)

Comparable, perhaps.


not in the article: is plugin usage available to paying customers everywhere now? i still can't see the ui for it. im in canada and use pro. internet says it was out for everyone in may..


Click the "..." button next to your name in the lower left corner, then Settings. It's under "Beta features."


I pay monthly for my API use but I am not a plus subscriber and I don't see this option. Also I've joined the plugins waiting list on day 1.


It's for ChatGPT Plus subscribers.


as it turns out, you are not paying enough basically


wow i cannot believe i missed this for so long. thanks!


maybe you have to go to settings > beta features and enable plugins?


GPT-4 fine tuning capability will be huge. It may end up just making fine tuning OSS LLMs pointless, esp if they keep lowering GPT-4 costs like they have been.


Relevant comment thread from people describing how much worse GPT-4 has gotten lately: https://www.reddit.com/r/ChatGPT/comments/14ruui2/i_use_chat...


I have followed many of these types of posts. In every single instance, no one provides even the _simplest_ amount of evidence. No before/after with the same prompt.

OpenAI even has a whole repository specifically for this - GPT-eval. No one uses it.

I'm not saying the theories are wrong. Maybe there is something behind the hunches that so many people seem to have about degradation. But there isn't _any_ proof. None. Whatsoever. And people are taking _internet comments_ as that proof instead? I mean, sure, it's easy to be cynical about companies in this day and age; which is why I would ultimately believe someone if they provided actual evidence. But, again - not a single ounce of proof has been provided in any one of these threads.

Furthermore, the lack of rigor being applied even with the various anecdotes is appalling.

Which version are you talking about? GPT-4 or GPT-3? Are you using the API or the web interface? Are you aware that output is non-deterministic? Are you aware that your own psychological biases will skew your opinions on the matter? One or more of these questions tend to go unanswered.

Just please, show me some robust proof. If you can't because you didn't think to; you _surely_ must realize that many people are building entire businesses on top of this tech and at least _one_ of them is running these types of evaluations. Furthermore, the model is state-of-the-art for research now as well and if you can _prove_ that there is degradation in the model that they are lying about (in a research paper), you will get citations. And yet, there is nothing. Zilch. Nada.


Plug:

For anyone who wants to quickly try this out in VSCode for your custom prompts - https://marketplace.visualstudio.com/items?itemName=ppipada....


“Developers wishing to continue using their fine-tuned models beyond January 4, 2024 will need to fine-tune replacements atop the new base GPT-3 models (ada-002, babbage-002, curie-002, davinci-002), or newer models (gpt-3.5-turbo, gpt-4).“

So need to pay to fine tune again?


Probably. They will have different prices to finetune too.


If anyone wants to try the API for the first time, I've made this guide recently: https://gpt.pomb.us/


Has anyone been able to come up with a way to keep track of GPT-4 performance over time? I'm told that the API is explicit about changes to models and that the Chat interface is not.


API call responsiveness to the GPT-4 model varies hugely throughout the day. The #1 datapoint in measured responsiveness is slowdown associated with lunch-time use as noon sweeps around the globe.


Thank you for the response, I should have been clearer. I meant performance as an LLM. Essentially, I am concerned that they are quietly nerfing the tool. The Chat interface is now very verbose and constantly warning me about "we should always do this and that" which is bloody exasperating when I'm just trying to get things done.

I made up an example here to illustrate, but it's just very annoying because sometimes it puts at the beginning, slowing down my interaction, and it now refuses to obey my prompts to leave caveats out.

https://chat.openai.com/share/1f39af02-331d-4901-970f-2f4b0e...


yeah, its annoying and you have to foot the bill for it.

looking at your sample and using character count as a rough proxy for tokens, (465/(1581-465))*100 means they added ~42% token count cost to your response explicitly adding caveats which you dont want. fun!


I'm not sure what I expected now

    500 {'error': {'message': 'Request failed due to server shutdown', 'type': 'server_error', 'param': None, 'code': None}} {'Date': 'Thu, 06 Jul 2023 20:48:07 GMT', 'Content-Type': 'application/json', 'Content-Length': '141', 'Connection': 'keep-alive', 'access-control-allow-origin': '*', 'openai-model': 'gpt-4-0613', 'openai-organization'


Nothing gets close to GPT 3.5 and 4 if you need to send and receive prompts in languages other than English.

Sadly, that niche is still better served by openAI.


Why is chatGPT on the web a 6 weeks old version still?


Maybe I am mistaken but has the pricing on GPT-4 decreased as well? It used to be 8c/1000token?


We need a proper, competitive and open source model. Otherwise we are all fucked up.


We have witnessed the potential, and we can infer what such a model could do without such limitations as are placed on it. I think they made a mistake in giving us all a peak behind the curtain


What's the best LLM I could run on a 12 GB VRAM GPU (Nvidia) ?


Does this mean we don't have to give OpenAI our phone number anymore?


Can I use the API more than 25 times in 3 hours?


Yes, that’s an artificial limitation put in place so people don’t abuse plus.


So the chat interface is rate-limited, but not the API.

So I could build long prompt scripts and run them against the api?


What do you think the API is for?


I would like to try this but I’m not comfortable giving OpenAI my credit card details. Feels like they would be a prime target for hackers.


you told me I would use gpt4 api in this month, but it's just 6th July.


When I opened the comment section, it said "404 Comments".

Now, I am not superstitious, but...


I really like the Swiss-style web design, it's well executed with the scrolling


F


This is awesome news. I have been waiting to get GPT4 forever!


This is very nice.

GPT-4 is on a completely different level of consistency and actually listening to your system prompt than chagpt-3.5. It trails off much more rarely.

If only it wasn't so slow/expensive... (it really starts to hurt with large token counts).


It's funny how OpenAI just shattered Google's PR stunts. Google wanted everyone to believe they are leading in AI by winning some children's games. Everyone thought that was the peak of AI. Enter OpenAI and Micorsoft. Microsoft and OpenAI have showed the humanity what true AI looks like. Like most people on HN I cannot wait to see the end of Google, the end of evil.


> Like most people on HN I cannot wait to see the end of Google, the end of evil.

What is the difference? Replacing evil with another evil.

This is just behemoths exchanging hands.


Is Microsoft less evil than Google?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: