Hacker News new | past | comments | ask | show | jobs | submit login
We are beginning to roll out new voice and image capabilities in ChatGPT (openai.com)
1149 points by ladino on Sept 25, 2023 | hide | past | favorite | 877 comments



Voice has the potential to be awesome. This demo is really underwhelming to me because of the multi-second latency between the query and response, just like every other lame voice assistant. It doesn't have to be this way! I have a local demo using Llama 2 that responds in about half a second and it feels like talking to an actual person instead of like Siri or something.

I really should package it up so people can try it. The one problem that makes it a little unnatural is that determining when the user is done talking is tough. What's needed is a speech conversation turn-taking dataset and model; that's missing from off the shelf speech recognition systems. But it should be trivial for a company like OpenAI to build. That's what I'd work on right now if I was there, because truly natural voice conversations are going to unlock a whole new set of users and use cases for these models.


Completely agree, latency is key for unlocking great voice experiences. Here's a quick demo I'm working on for voice ordering https://youtu.be/WfvLIEHwiyo

Total end-to-end latency is a few hundred milliseconds: starting from speech to text, to the LLM, then to a POS to validate the SKU (no hallucinations are possible!), and finally back to generated speech. The latency is starting to feel really natural. Building out a general system to achieve this low-latency will I think end up being a big unlock for enabling diverse applications.


Since this is getting a bit of interest, here's one more demo of this https://youtu.be/cvKUa5JpRp4 This demo shows even lower latency, plus the ability to handle very large menus with lots of complicated sub-options (this restaurant has over a billion option combinations to order a coffee). The latency is negative in some places, meaning the system finishes predicting before I finish speaking.


Holy cow. That's better than the average human drive-through attendant.


This is cool. But I want to see how it handles you going back one and tweaking it.


We've built something similar that allows you to tweak/update notes & reminders https://qwerki.com/ (private beta) here's the video demo https://www.youtube.com/shorts/2hpBTxjplIE we've since moved to training our own LLAMA as it's more responsive & we have better reliability.


This is pretty good. Do you think running models locally will be able to achieve performance (getting task done successfully) compared to cloud based ones.i am assuming for context of a drive through scenario it should be ok but more complex systems might need external infromation


Definitely depends on the application, agreed. The more open ended the application the more dependent it is on larger LLMs (and other systems) that don't easily fit on edge. At the same time, progress is happening that is increasing the size of LLM that can be ran on edge. I imagine we end up in a hybrid world for many applications, where local models take a first pass (and also handle speech transcription) and only small requests are made to big cloud-based models as needed.


Can you share the source code? What did you do to improve the latency?


Lots of work around speculative decoding, optimizing across the ASR->LLM->TTS interfaces, fine-tuning smaller models while maintaining accuracy (lots of investment here), good old fashioned engineering around managing requests to the GPU, etc. We're considering commercializing this so I can't open source just yet, but if we end up not selling it I'll definitely think about opening it up.


Can you at least share the stack that you're using in building this? What kind of business model are you considering in commercializing it?


We're design the stack to be fairly flexible. It's Python/Pytorch under the hood, with the ability to plug and play various off the shelf models. For ASR we support GCP/AssemblyAI/etc, as well as a customized self-hosted version of Whisper that is tailored for stream processing. For the LLM we support fine-tuned GPT3 models, fine-tuned Google text-bison models, or locally hosted fine-tuned Llama models (and a lot of the project goes into how to do the fine-tuning to ensure accuracy and low latency). For the TTS we support Elevenlabs/GCP/etc, and they all tie into the latency reducing approaches.


Neat! I appreciate your approach to preventing hallucinations. I've used something similar in a different context. People make a big deal about hallucinations but I've found that validation is one of the easier aspects of AI architecture.


The voice does not seem to be able to pronounce the L in “else”. What’s happening there?


Good question. Off the shelf TTS systems tend to enunciate every phoneme more like a radio talk show host rather than a regular person, which I find a bit off putting. I've been playing around with trying to get the voice to be more colloquial/casual. But I haven't gotten it to really sound natural yet.


This is a very slick demo. Nice job!


Thanks! It's a lot of fun building with these new models and recent AI approaches.


Wow, the latency on requests feels great!! I’m really curious: is this running entirely with Python?


100% Python but with a good deal of multiprocessing, speculative decoding, etc. As we move to production we can probably shave another 100ms off by moving over to a compiled system, but Python is great for rapid iteration.


Manna v0.7



That's way slick.

Can I ask what your background is, and what things you're used to working with? I don't have the chops to build what you built, but I'd love to get there.


My advice is always to jump in and start building! My background is math originally, so I had some of the tools in my tool box, but I'm mostly self-taught in computer science and machine learning. I read textbooks, research papers, code repos, but most importantly I build a lot of stuff. Once I'm excited about an idea I'll figure out how to become an expert to make it a reality. Over the years the skills start to compound, so it also helps that I'm an old man!


That demo is pretty slick. What happens when you go totally off book? Like, ask it to recite the numbers of pi? Or if you become abusive? Will it call the cops?


It's trained to ignore everything else. That way background conversations are ignored as well (like your kids talking in the back of the car while you order).


How do you train for this?


Nice work, very cool!


> This demo is really underwhelming to me because of the multi-second latency between the query and response, just like every other lame voice assistant.

Yep - it needs to be ready as soon as I'm done talking and I need to be able to interrupt it. If those things can be done then it can also start tentatively talking if I pause and immediately stop if I continue.

I don't want to have to think about how to structure the interaction in terms of explicit call/response chain, nor do I want to have to be super careful to always be talking until I've finished my thought to prevent it from doing its thing at the wrong time.


The interruption is an important point yeah. It's so annoying when Siri misunderstands again and starts rattling off a whole host of options. And keeps getting stuck in a loop if you don't respond.

In fact I'm really surprised these assistants are still as crap as they are. Totally scripted, zero AI. It seems low hanging fruit to implement an LLM but none of the big three have done so. Not even sure about the fringe ones like Cortana and Bixby


I mean Microsoft is planning to. Rolling out as soon as tomorrow.

https://youtu.be/5rEZGSFgZVY


Windows 11 copilot is not really the same thing though. They don't do something like homepods you can have around your house.


The CallAnnie demo allows interruption and its such a leap forward compared to Siri


Yeah when I was developing it, it quickly became apparent that I needed to be able to interrupt it. So I implemented that. Pretty easy to implement actually. Much harder would be to have the model interrupt the human. But I think it is actually desirable for natural conversation, so I do think a turn-taking model should be able to signal the LLM to interrupt the human.


  > determining when the user is done talking is tough.
Sometimes that task is tough for the speaker too, not just the listener. Courteous interruptions or the lack thereof might be a shibboleth for determining when we are speaking to an AI.


Yes interruptions are key, both ways. Having the user interrupt the bot is easy, but to have the bot interrupt the human will again require a model to predict when that should happen. But I do believe it is desirable for natural conversation.


From prior experience, courteous interruption is a skill that a lot of humans find challenging at times too (myself included).


Can you share a github link to this? Where are you reducing the latency? Are you processing the raw audio to text? In my experience ChatGPT generation time is much faster than local Lllama unless you're using something potato like a 7B model.


Unfortunately it has a really high "works on my machine" factor. I'm using Llama2-chat-13B via mlc-llm + whisper-streaming + coqui TTS. I just have a bunch of hardcoded paths and these projects tend to be a real pain to set up, so figuring out a nice way to package it up with its dependencies in a portable way is the hard part.

I'm mostly using llama2 because I wanted it to work entirely offline, not because it's necessarily faster, although it is quite fast with mlc-llm. Calling out to GPT-4 is something I'd like to add. I think the right thing is actually to have the local model generate the first few words (even filler words sometimes maybe) and then switch to the GPT-4 answer whenever it comes back.


Here's a link to a project that claims half second latency for the transcription part: https://github.com/gaborvecsei/whisper-live-transcription


I wonder when computers will start taking our intonation into account too. That would really help with understanding the end of a phrase. And there’s SO MUCH information in intonation that doesn’t exist in pure text. Any AI that doesn’t understand that part of language will always still be kinda dumb, however clever they are.


You're right. Ultimately the only way this will really work is as an end-to-end model. Text will only get you so far. We could approximate it now with screenplay-like emotion annotations on text, which LLMs should both easily understand and be able to produce themselves (though you'd have to train a new speech recognition system to produce them). But end-to-end will be required eventually to reach human level fluency.


Don’t they do it already? There are a lot of languages where intonation is absolutely necessary to distinguish between some words, so I would be surprised that this not already taken into account by the major voice assistants.


In English, intonation changes the meaning of the word but not the word itself. From what I understand, in tonal languages tone changes the whole word. I don't think ML understands that difference yet.


Yeah they do. I was able to get ChatGPT-4 to transcribe 我哥哥高過他的哥哥, which says that they can. I did have to set the app to Chinese, and the original didn't work so I had to modify what I said slightly.

https://www.tiktok.com/t/ZT86psPxY/

Roughly translated, my older brother is taller than that other guy's older brother.


Of course speech recognition works for Chinese. What it doesn't do is transcribe intonation and prosody in non-tonal languages. It's not even clear how one would transcribe such a thing as I'm not aware of a standard notation.


IPA format should cover that, no?


Maybe? I thought IPA was just phonetic but I see that it does have some optional prosody stuff that could in theory cover some of it. I'm not sure how standard or complete it really is in practice.

I haven't heard of any large datasets of IPA transcripts of speech with the detail necessary to train a fully realistic STT->LLM->TTS system. If you know of some that would be interesting to look at.


Also curious to hear about your setup. Using whisper too? When I was experimenting with it there was still a lot of annoyance about hallucinations and I was hard coding some "if last phrase is 'thanks for watching', ignore last phrase"

I was just googling a bit to see what's out there now for whisper/llama combos and came across this: https://github.com/yacineMTB/talk

There's a demo linked on the github page that seems relatively fast at responding conversationally, but still maybe 1-2 seconds at times. Impressive it's entirely offline.


Lol yeah the hallucinations are a huge problem. Likely solvable, I think there are probably some bugs in various whisper implementations that are making the problem worse than it should be. I haven't really dug in on that yet though. I was hoping I could switch to a different STT model more designed for real time like Meta's SeamlessM4T but it's still under a non-commercial license and I did have an idea that I might want to try making a product sometime. I did see that yacine made that version but I haven't tried it so I don't know how it compares to mine.


Turn the volume on your microphone down and watch as Whisper just starts SCREAMING.


> It doesn't have to be this way!

Is there any extra work OpenAI’s product might be doing contributing to this latency that yours isn’t? Considering the scale they operate at and any reputational risks to their brand?


If you're suggesting that OpenAI's morality filters are responsible for a significant part of their voice response latency, then no. I think that's unlikely to be a relevant factor.


Here's something with very little latency. https://www.bland.ai/


There needs to be an optional button that you hold while speaking and let go when you are done. If button is not held it should auto detect


To me this is the cleanest and most efficient solution to the problem.

Tbh, ever since voice assistants landed I’ve wanted a handheld mic with a hardware button. No wake command, no (extra) surveillance, just snappy low-latency responses.


Do you have a rough design outline of what you built? I feel like we're on the cusp of something like this and it sounds amazing.


I'm using Llama2-chat-13B via mlc-llm @ 4bit quantization + whisper-streaming + coqui TTS, all running simultaneously on one 4090 in real time.

It didn't take long to prototype. Polishing and shipping it to non-expert users would take much longer than I've spent on it so far. I'd have to test for and solve a ton of installation problems, find better workarounds for whisper-streaming's hallucination issues, improve the heuristics for controlling when to start and stop talking, tweak the prompts to improve the suitability of the LLM responses for speech, fixup the LLM context when the LLM's speech is interrupted, probably port the whole thing to Windows for broader reach in the installed base of 4090s, possibly introduce a low-memory mode that can support 12GB GPUs that are much more common, document the requirements and installation process, and figure out hosting for the ginormous download it would be. I'd estimate at least 10x the effort I've spent so far on the prototype before I'd really be satisfied with the result.

I'd honestly love to do all that work. I've been prioritizing other projects because I judged that it was so obvious as a next step that someone else was probably working on the same thing with a lot more resources and would release before I could finish as a solo dev. But maybe I'm wrong...


> It didn't take long to prototype. Polishing and shipping it to non-expert users would take much longer than I've spent on it so far. I'd have to test for and solve a ton of installation problems

I've found some success at this by using Nix... but Nix is a whole 'nother ball of yarn to learn. It WILL get you to declarative/deterministic installs of any piece of the toolchain it covers, though, and it does a hell of a lot better job managing dependencies than anything in Python's ecosystem ever will (in fact, I am pretty sure that Python's being terrible at this is actually driving Nix adoption)

As an example of the power Nix might enable, check out https://nixified.ai/ (which is a project that hasn't been updated in some months and I hope is not dead... It does have some forks on Github, though). Assuming you already have Nix installed, you can get an entire ML toolchain up including a web frontend with a single command. I have dozens of projects on my work laptop, all with their own flake.nix files, all using their own versions of dependencies (which automatically get put on the PATH thanks to direnv), nothing collides with anything else, everything is independently updateable. I'm actually the director of engineering at a small startup and having our team's dev environments all controlled via Nix has been a godsend already (as in, a massive timesaver).

I do think that you could walk a live demo of this into, say, McDonald's corporate, and walk out with a very large check and a contract to hire a team to look into building it out into a product, though. (If you're going to look at chains, I'd suggest Wawa first though, as they seem to embrace new ordering tech earlier than other chains.)


I'm not the guy working on ordering, it's this guy https://news.ycombinator.com/user?id=TheEzEzz.

Nix sounds good for duplicating my setup on other machines I control. But I'd like a way to install it on user machines, users who probably don't want to install Nix just for my thing. Nix probably doesn't have a way to make self contained packages, right?


> But I'd like a way to install it on user machines, users who probably don't want to install Nix just for my thing. Nix probably doesn't have a way to make self contained packages, right?

I mean... That's the heart of the problem right there. You can either have all statically compiled binaries (which don't need Nix to run) which have no outside dependencies but result in a ton of wasted disk space with duplicate dependency data everywhere, or you can share dependencies via a scheme, of which the only one that makes real sense (because it creates real isolation between projects but also lets you share equal dependencies with zero conflicts) is Nix's (all of the others have flaws and nondeterminism).


I wish docker could be used more easily with graphic cards and other hardware peripherals (speakers/mic in this case). It would solve a lot of these issues.


all it has to do is add a random selection of "uhms" and "ahhs" and "mmm"


Actually I do think this is a good idea. For best latency there should be multiple LLMs involved, a fast one to generate the first few words and then GPT-4 or similar for the rest of the response. In the case that the fast model is unsure, it could absolutely generate filler words while it waits for the big model to return the actual answer. I guess that's pretty much how humans use filler words too!


Unfortunately, Bark is probably way too slow to use for the TTS portion given the latency concerns or that would be covered.


Okay the bike example is cute and impressive, but the human interaction seems to be obfuscating the potentially bigger application.

With a few tweaks this is a general purpose solver for robotics planning. There are still a few hard problems between this and a working solution, but it is one of hard problems solved.

Will we be seeing general purpose robots performing simple labor powered by chatgpt within the next half decade?


That bike example seemed a mix of underwhelming (for being the demo video) and even confusing.

1. It's not smart enough to recognize from the initial image this is a bolt style seat lock (which a human can).

2. The manual is not shown to the viewer, so I can't infer how the model knows this is a 4mm bolt (or if it is just guessing given that's the most likely one).

3. I don't understand how it can know the toolbox is using metric allen wrenches.

Additionally is this just the same vision model that exists in bing chat?


The bike shown in the first image is Specialized Sirrus X. You can make out from the image of the manual that it says "spacer/axle/bolt specifications". Searching for this yields the following Specialized bike manual which is similar: https://www.manualslib.com/manual/1974494/Specialized-Epic-E... -- there are some notable differences, but the Specialized Sirrus X manuals that are online aren't in the same style.

The prior page (8) shows "SEAT COLLAR 4mm HEX" and, based on looking up seat collar in an image search, the part in question matches.

In terms of the toolbox, note that it only identified the location of the Allen wrench set. The advice was just "Within that set, find the 4 mm Allen (Hex) key". Had they replied with "I don't see any sizes in mm", the conversation could've continued with "Your Allen keys might be using SAE sizing. A compatible size will be 5/32, do you see that in your set?"


It bugged me that they made no mention of torque. The manual is really clear on that part with a big warning:

> WARNING! Correct tightening force on fasteners (nuts, bolts, screws) on your bicycle is important for your safety. If too little force is applied, the fastener may not hold securely. If too much force is applied, the fastener can strip threads, stretch, deform or break. Either way, incorrect tightening force can result in component failure, which can cause you to lose control and fall. Where indicated, ensure that each bolt is torqued to specification. The following is a summary of torque specifications in this manual...

The seat collar also probably has the max torque printed on it.

When they asked if they had the right tool, I would have preferred to see an answer along the lines of "ideally you should be using a torque wrench. You can use the wrench you have currently, but be careful not to over tighten."


The seat collar also probably has the max torque printed on it. <<<< Nope. There's no need for a torque wrench on that one.


Ah good find. yah, I tried bing and it is able to read a photo of that manual page and understand that the seat collar takes a 4mm hex wrench (though hallucinated and told me the torque was 5 Nm, unlike the correct 6.2, suggesting table reading is imperfect).

Toolbox: I just found it too strong to claim you have the right tool, when it really doesn't know that. :)

In the end it does feel like the image reader is just bolted onto an LLM. Basically, just doing object recognition and dumping features into the LLM prompt.


Like a basic CLIP description: Tools, yellow toolbox, DEWALT, Allen wrenches, instruction manual. And then just using those keywords in the prompt. Yes, you’re right, it does feel like that.


A few of these wouldn't be possible with something like that. Look at the last picture, the graph analysis.

https://imgur.com/a/iOYTmt0


Yep. This example basically convinced me that they were unable to figure out anything actually useful to do with the model's new capabilities. Which makes me wonder how capable the new model in fact is.


Yah, pretty sure it is the same feature that's been in Bing Chat for 2 months now. Which feels really like there's only one pass of feature extraction from the image, preventing any detailed analysis beyond a course "what do you see". (Follow-up questions of things it likely didn't parse are highly hallucinated).

This is why they can't extract the seat post information directly from the bike when the user asks. There's no "going back and looking at the image".

Edit: nope, it's a better image analyzer than Bing


>Yah, pretty sure it is the same feature that's been in Bing Chat for 2 months now.

It's not. Feel free to try these queries:

https://twitter.com/ComicSociety/status/1698694653845848544?... (comic book page in particular, from a be my eyes user)

Or these https://imgur.com/a/iOYTmt0 (graph analysis in particular, last example) and see Bing fail them.


Right. It appeared that the response to the first image and question would have been the same if the image wasn't provided.

I wasn't impressed with the demo but we'll see what real world results get.



They are really good at keeping demos as demos


I don’t know, a lot of Google demos and papers introduce techniques that are productized fairly soon, just usually not by Google.


The implementation that manifests itself as an extremely creepy, downright concerning level of dubious moral transgressions isn't nearly as publicly glamorous as their tech demos.


It’s just a hiring article.


Hiring to produce more demos, to hire more to produce even more demos...


As long as they choose not to work elsewhere, it seems effective for Google.


Yes. As long as the hirees do some actual work in between producing demos, this even makes sense as a hiring approach.


I feel they could have used a more convincing example to be honest. Yeah it's cool it recognises so much but how useful is the demo in reality?

You have someone with a tool box and a manual (seriously who has a manual for their bike), asking the most basic question on how to lower a seatpost. My 5 year old kid knows how to do that.

Surely there's a better way to demonstrate the ground breaking impacts of ai on humanity than this. I dunno, something like how do I tie my shoelace.


> With a few tweaks this is a general purpose solver for robotics planning.

Yeah, but with an enormous ecological footprint.

Also, not suitable for small lightweight robots like drones.


Even on something the size of a car chatgpt won't be running locally, the car and drone are equally capable of hitting openai's API in a well connected environment.

What needs to happen with the response is a different matter though.


What's the ecological footprint of a human doing the same job? Especially when you factor in 18+ years of preparing.


Humans don't spend 18+ years preparing how to lower a seat post or drive a truck or even do pretty much most jobs. No one is solely training for 18 years to do anything.

Most of those 18 years are having a fucking great time (being young is freakin awesome) and living a great life is never a waste or a negative ecological footprint.

Society artificially slows education down so it takes 18 years to finish school because parents need to be off at work, so 18 years of baby sitting is preferred. By 18, kids are at the age where they will no longer be told what to do so it's off to the next waste of time, college, then 30 years of staring at a blinking box...or whatever.

When I was 12, I decided I wanted to drive a car, I'd never driven a car in my life, but I took my parents car and drove it around wherever I liked with absolutely no issue or prior instruction. I did this for years.

The youth are very capable, we just don't want them to be too capable...


This is what I'm most excited about. There's been a minor breakthrough recently: https://pressroom.toyota.com/toyota-research-institute-unvei...


There are already a few research demos.

For driving - https://wayve.ai/thinking/lingo-natural-language-autonomous-...


This announcement seem to have killed so many startups that were trying to do multi-modal on top of ChatGPT. The way it's progressing with solving use cases with images and voice, not too far when it might be the 'one app to rule them all'.

I can already see "Alexa/Siri/Google Home" replacement, "Google Image Search" replacement, ed-tech startups that were solving problems with AI using by taking a photo are also doomed and more to follow.


In retrospect, such startups should have been wary: they should have known that OpenAI had Whisper, and also that GPT-4 was designed with image modality. I wouldn't say that OpenAI "telegraphed" their intentions, but the very first strategic question should have been, "Why isn't OpenAI doing this already, and what do we do if they decide to start?"


>I wouldn't say that OpenAI "telegraphed" their intentions

They did telegraph it, they showed the multimodal capabilities back in the GPT4 Developer Livestream[0] right before first releasing it.

0. https://youtu.be/outcGtbnMuQ?t=943


Yeah I remember watching that and thinking oh I know a cool app idea. What if you just take a video of what food is in your kitchen and Chat GPT will create a recipe for you. I go to the docs and that was literally the example they gave.

I think the only place where plugins will make sense are for realtime things like booking travel or searching for sports/stock market/etc type information.


I have a home-spun version of ChatGPT that uses function calling to connect to my emails, calendar, and notes. This is really useful because I can say "Bob just emailed me to set up a call. Respond to Bob with some available times from my calendar."

That will be the real use case for plug ins.


It would hard to be more explicit than doing a demo of multi-modality in GPT-4, and having an audio API that is amazing and that you can use right now, for pennies.

It would be interesting to know if this really changed anything for anyone (competitors, VCs) for that reason. It's like the efficient market hypothesis applied to product roadmaps.


It is interesting that these startups did not recognize that the image modalities already existed, as evidenced by their initial GPT-4 announcement underneath “visual capabilities” [1].

[1] https://openai.com/research/gpt-4


Seems nobody learns from Sherlock.


Talking to Google and Siri has been positively frustrating this year. On long solo drives, I just want to have a conversation to learn about random things. I've been itching to "talk" to chatGPT and learn more (french | music theory | history | math | whatever) all summer. This should hit the spot!


Voice assistants have always been a half complete product. They were shown off as a cool feature, then they were never integrated so they were useful.

The two biggest features I want are for the voice assistants to read something for me, and to do something on google/Apple Maps hand free. Neither of these ever work. “Siri/ ok google add the next gas station on the route” or “take me to the Chinese restaurant in Hoboken” seem like very obvious features for a voice assistant with a map program.

The other is why can I tell Siri to bring up the Wikipedia page for George Washington but I can’t have Siri read it to me? I am in the car, they know that, they just say “I can’t show you that while you’re driving”. The response should be “do you want me to read it to you?”


In the current world:

Me: “OK Google, take me to the Chinese restaurant in Hoboken”

Google Assistant: “Calling Jessica Hobkin”.


You forgot the third brand name.

The pattern for current world's voice assistants is: ${brand 1}, ${action} ${brand 2} ${joiner} ${brand 3}.

So, "OK Google, take me to Chinese restaurant in Hoboken using Google Maps".

Which is why I refuse to use this technology until the world gets its shit together.


This reminds me of ordering at a drive through with a human at times:

"I'd like an iced tea" "An icee?" "No an iced tea" "Hi-C?"


> ok google add the next gas station on the route

I say "ok google, add a stop for gas" a lot, and it works well for me.


[dead]


Sometimes google assistant will answer a query I thought for sure it would fail on with a really good answer and other times it will fail the most basic of commands. It's frustrating.


I've replaced my voice google assistant searches with the voice feature of the Bing app. It's a night and day difference. Bing voice is what I always expected from an AI companion of the future, it is just lacking commands -- setting tasks, home automation, etc.


I got sick of searching Google for in-game recipes for Disney Dreamlight because most of the results are a bunch of pointless text, and then finally the recipe hidden in it somewhere.

I used Bing yesterday and it was able to parse out exactly what I wanted, and then give me idiot-proof steps to making the recipe in-game. (I didn't need the steps, but it gave me what I wanted up front, easily.) I tried it twice and it was awesome both times. I'll definitely be using it in the future.


It almost sounds like their assistant and their search engine have the same problem! Years of SEO optimized garbage has polluted search and the data streams it feeds to their other products. I have a concern that soon the mess will turn into AI-optimized trash, with what is essentially data poisoning to get the AI to shovel the fake content instead.


> I got sick of searching Google for in-game recipes for Disney Dreamlight

You mean these? Took me a few seconds to find, not sure how an LLM would make that easier. I guess the biggest benefit of LLM then is for people who don't know how to find stuff.

https://dreamlightvalleywiki.com/Cooking


Yes, but each time, I only actually care about 1 recipe, and it's easier to just search for that recipe than find a list of recipes and then search through that.

Bing made it even easier.

Also, I've found some of those lists to be missing some recipes.


Did you find a way to do this seamlessly including being able to say something like "Hey Bing", or do you just have a shortcut or widget for this?


No. At least on Android there is no system shortcut that takes you directly to the voice feature yet. For now, I'm using the widget.


Thanks. One day soon, hopefully.


precisely this. once someone figures out how to get something like GPT integrated with actual products like smart home devices and the same access levels as siri/google assistant, it will be the true voice assistant experience everyone has wanted.


My prediction on this is eventually the LLMs will just write and execute scripts directly to control things.

Imagine if iOS had something like apple script and all apps exposed and documented endpoints. LLMs would be able to trivially solve problems that the best voice assistants today cannot handle.

Then again none of the current assistants can handle all that much. "Send Alex P a meeting invite tomorrow for a playdate at the Zoo, he's from out of town so include the Zoo's full address in the invite".

"Find the next mutual free slot on the team's calendar and send out an invite for a zoom meeting at that time".

These are all things that voice assistants should have been doing a decade ago, but I presume they'd have required too much one off investment.

Give an LLM proper API access and train it on some example code, and these problems are easy for it to solve. Heck I bet if you do enough specialized training you could get one of the tiny simple LLMs to do it.


OpenInterpreter


Agreed. After using ChatGPT at all Siri is absolutely frustrating.

Example from a couple days ago:

Me, in the shower so not able to type: "Hey Siri, add 1.5 inch brad nails to my latest shopping list note."

Siri: "Sorry, I can't help with that."

... Really, Siri? You can't do something as simple as add a line to a note in the first-party Apple Notes app?


That’s extra frustrating because Siri absolutely had that functionality at some point in the past, and may even still have it if you say the right incantation. Those incantations change in unpredictable and unknowable ways though.


appending to a text file, what do you think this is - unix?


I still don't understand how you can talk to something that doesn't provide factual information and just take it at face value?

The other day I asked it about the place I live and it made up nonsense, I was trying to get it to help me with an essay and it was just wrong, it was telling me things about this region that weren't real.

Do we just drive through a town, ask for a made up history about it and just be satisfied with whatever is provided?


What LLMs have made me realize more than anything is that we just don't care that much the information we receive being completely factual.

I have tried to use it many times to learn a topic, and my experience has been that it is either frustratingly vague or incorrect.

It's not a tool that I can completely add to my workflow until it is reliable, but I seem to be the odd one out.


> What LLMs have made me realize more than anything is that we just don't care that much the information we receive being completely factual.

I find this highly concerning but I feel similar.

Even "smart people" I work with seem to have gulped down the LLM cool aid because it's convenient and it's "cool".

Sometimes I honestly think: "just surrender to it all, believe in all the machine tells you unquestionably, forget the fact checking, it feels good to be ignorant... it will be fine...".

I just can't do it though.


> just surrender to it all, believe in all the machine tells you unquestionably, forget the fact checking, it feels good to be ignorant... it will be fine...

It's the same issue with Google Search, any web page, or, heck, any book. Fact checking gets you only so far. You need critical thinking. It's okay to "learn" wrong facts from time to time as long as you are willing to be critical and throw the ideas away if they turn out to be wrong. I think this Popperian view is much more useful than living with the idea that you can only accept information that is provably true. Life is too short to verify every fact. Most things outside programming are not even verifiable anyway. By the time that Steve Jobs would have "verified" that the iPhone was certainly a good idea to pursue, Apple might have been bankrupt. Or in the old days, by the time you have verified that there is a tiger in the bush, it has already eaten you.


There's a lot of truth in this comment and a lot that I wholeheartedly agree with.

When I spend time on something that turns out to be incorrect, I would prefer it to be because of choice I made instead of some random choice made by an LLM. Maybe the author is someone I'm interested in, maybe there's value in understanding other sides of the issue, etc. When I learn something erroneous from an LLM, all I know is that the LLM told me.


The issue is far more serious with ChatGPT/similar models because things that are laughably untrue are delivered exactly the same as something that's solidly true. When doing a normal search I can make some assessment on the quality of the source and the likelihood the source is wrong.

People should be able "throw the ideas away if they turn out to be wrong" but the problem is these ideas unconsciously or not help build your model of the world. Once you find out something isn't true it's hard to unpick your mental model of the world.


> Once you find out something isn't true it's hard to unpick your mental model of the world.

Intuitively, I would think the same, but a book about education research that I read and my own experience taught me that new information is surprisingly easy to unlearn. It’s probably because new information sits at the edges of your neural networks and do not yet provide a foundation for other knowledge. This will only happen if the knowledge stands the test of time (which is exactly how it should be according to Popper). If a counterexample is found, then the information can easily be discarded since it’s not foundational anyway and the brain learns the counterexample too (the brain is very good in remembering surprising things).


That presumes the wrong information is corrected quickly. What about the cases when that doesn't happen? Aren't you often finding out things you thought were true from years ago are wrong?


You weigh new information by how confident you are in it. You try to check different sources, you maintain an open-mind, etc. In that, ChatGPT is just an additional low-reliability source of information.


I just verify the information I need. I find it useful as a sort of search engine for solutions. Like, how could I use generators as hierarchical state machines? Are there other approaches that would work? What are some issues with these solutions? Etc. By the end I have enough information to begin searching the web for comparisons, other solutions, and so on.

The benefit is that I got a quick look at various solutions and quickly satisfied a curiosity, and decided if I’m interested in the concept or not. Without AI, I might just leave the idea alone or spend too much time figuring it out. Or perhaps never quite figure out the terms of what I’m trying to discover, as it’s good at connecting dots when you have an idea with some missing pieces.

I wouldn’t use it for a conversation about things as others are describing. I need a way to verify its output at any time. I find that idea bizarre. Just chatting with a hallucinating machine. Yet I still find it useful as a sort of “idea machine”.


I think this is a fine use case though because you're doing your due diligence. The problems arise when you don't do this.

I think even if an AGI was created, and humans survived this event. I'd still have trouble trusting it.

The quote "trust but verify" is everything to me.


The smart people I've seen using ChatGPT always double check the facts it gives. However, the truth is that RLHF works well to extinguish these lies over time. As more people use the platform and give feedback, the thing gets better. And now, I find it to be pretty darn accurate.


> The smart people I've seen using ChatGPT always double check the facts it gives.

I don't like being told lies in the first place and having to unlearn it.

It doesn't help that I might as well have just gone straight to the "verification" instead.


I don't know. The other day I was asking about a biology topic and it straight up gave me a self-contradicting chemical reaction process description. It kept doing that after I pointed out the contradiction. Eventually I got out of this hallucination loop by resetting the conversation and asking again.

It's smart but can also be very dumb.


I see this conversation pretty frequently and I think the root of it lies in the fact that we have mental heuristics for determining whether we need to fact check another human because they are a bullshitter, an idiot, a charlatan etc, but most people haven’t really developed this sense for AIs.

I think the current state of AI trustworthiness (“very impressive and often accurate but occasionally extremely wrong”) triggers similar mental pathways to interacting with a true sociopath or pathological liar for the first time in real life, which can be intensely disorienting and cause one to question their trust in everyone else, as they try to comprehend this type of person.


Ignorance is Strength


I think this post-factual attitude is stronger and more common in some cultures than others. I'm afraid to say but given my extensive travels it appears American culture (and its derivatives in other countries) seems to be spearheading this shift.


Warning, my opinion ahead:

I think it's because Americans, more than nearly all other cultures, love convenience. It's why the love for driving is so strong in the US. Don't walk or ride, drive.

Once I was walking back from the grocer in Florida with 4 shopping bags, and people pulled over and asked if my car had broken down and if I needed a ride, people were stunned...I was walking for exercise and for the environment...and I was stunned.

More evidence of this trend can be seen in the products and marketing being produced:

Do you need to write a wedding speech? Click here.

Do you need to go get something from the store? get your fat ass in the car and drive, better yet, get a car that drives for you? Better than this, we'll deliver it with a drone...don't move a muscle.

Don't want to do your homework? Here...

Want to produce art? Please enter your prompt...

Want to lose weight? We have a drug for that...

Want to be the authority on some topic? We'll generate the facts you need.


I've also identified convenience as a core factor. Another dynamic at play is this:

As convenience in a domain becomes ubiquitous or at least expected among consumers, they quickly readjust their evaluation of "having time for X" around the new expectation of the convenient service, treating all alternatives as positive opportunity cost. This would explain a lot of those folks who are upset when it's suggested that they don't need Amazon, Instacart, etc. in their lives if they are to do something about their contributions to mass labor exploitation.

Of course these conveniences quickly become ubiquitous in large economies with a glut of disposable income, which encourages VCs to dump money into these enterprises so they're first to market, and also to encourage the public to believe that the future is already here and there's no reason to worry about backsliding or sustainability of the business model. Yet in every single case we see prices eventually rise, laborers squeezed, etc. A critical mass of people haven't yet acknowledged this inevitability, in no small part due to this fixation on convenience at the expense of more objective, reasoned understandings (read: post-truth mindset).


I agree with this, but I think there is a deeper level which explains this. And that is convenience is a product. The thing that truly defines how corporations in America have shaped our culture is that everything is turned into a way to sell you something.


Sorry but this is actually what I meant, it's all about convenience, AI is another convenience product.


> It's not a tool that I can completely add to my workflow until it is reliable, but I seem to be the odd one out.

This. I hate being told the wrong information because I will have to unlearn the wrong information. I would rather have been told nothing.


ChatGPT 3.5 is terrible on technical subjects IME. Phind is best for me rn. Hugging Chat (Llama) works quite well too.

They're only good on universal truths. An amalgam of laws from around the globe doesn't tell me what the law is in my country, for example.


This is a fairly perpetual discussion, but I'll go for another round:

I feel like using LLM today is like using search 15 years ago - you get a feel for getting results you want.

I'd never use chatGPT for anything that's even remotely obscure, controversial, or niche.

But through all my double-checking, I've had phenomenal success rate in getting useful, readable, valid responses to well-covered / documented topics such as introductory french, introductory music theory, well-covered & non-controversial history and science.

I'd love to see the example you experienced; if I ask chatGPT "tell me about Toronto, Canada", my expectation would be to get high accuracy. If I asked it "Was Hum, Croatia, part of the Istrian liberation movement in the seventies", I'd have far less confidence - it's a leading question, on a less covered topic, introducing inaccuracies in the prompt.

My point is - for a 3 hour drive to cottage, I'm OK with something that's only 95% accurate on easy topics! I'd get no better from my spouse or best friend if they made it on the same drive :). My life will not depend on it, I'll have an educationally good time and miles will pass faster :).

(also, these conversations always seem to end in suffocatingly self-righteous "I don't know how others can live in this post-fact free world of ignorance", but that has a LOT of assumptions and, ironically, non-factual bias in it as well)


> I feel like using LLM today is like using search 15 years ago - you get a feel for getting results you want.

I don't think it's quite the same.

With search results, aka web sites, you can compare between them and get a "majority opinion" if you have doubts - it doesn't guarantee correctness but it does improve the odds.

Some sites are also more reputable and reliable than others - e.g. if the information is from Reuters, a university's courseware, official government agencies, ... etc. it's probably correct.

With LLMs you get one answer and that's it - although some like Bard provide alternate drafts but they are all from the same source and can all be hallucinations ...


>although some like Bard provide alternate drafts but they are all from the same source and can all be hallucinations ...

Yes and no. If the LLM is repeating the same thing on multiple drafts then it's very unlikely to be a hallucination.

It's when multiple generations are all saying different things that you need to take notice.

LLMs hallucinate yes but getting the same hallucination multiple times is incredibly rare.


Then why aren’t hallucinations being eliminated by comparing drafts?


automatically comparing drafts for every single query would be expensive.

and that wouldn't eliminate hallucinations just tell you if large details have likely been hallucinated.

But it's a method some research has used.

https://arxiv.org/abs/2303.08896


How expensive could it be? Google Bard, a free service, offers the drafts for free. Just do the comparison on the user’s machine if the LLM provider is that cheap.

P.S. Also aren’t LLMs deterministic if you set their “temperature” to zero? Are there drafts if the temperature is zero? If not, then that’s the same as removing the randomness no?


The drafts have to be evaluated either by a human or llm. Doing that for every request does not scale when you have millions of users.

>Just do the comparison on the user’s machine if the LLM provider is that cheap.

This is not possible. Users don't have the resources to run these gigantic models. LLM inference is not cheap. Open ai, Google aren't running profit on free cGPT or Bard.

>P.S. Also aren’t LLMs deterministic if you set their “temperature” to zero? Are there drafts if the temperature is zero? If not, then that’s the same as removing the randomness no?

It's not a problem of randomness. a temp of 0 doesn't reduce hallucinations. LLMs internally know when they are hallucinating/taking a wild guess. randomness influences how that guess manifests each time but the decision to guess was already made.

https://arxiv.org/abs/2304.13734


> a temp of 0 doesn't reduce hallucinations.

I never said it did.

> LLMs internally know when they are hallucinating/taking a wild guess.

No they don’t. If they did we would be able to program them to not do so.

I would argue that wild guesses are all LLMs are doing. They practically statistically guess their way to an answer. It works surprisingly well a lot of the time but they don’t really understand why they are right/wrong.

P.S. LLMs are kind of like students who didn’t study for the test so they use “heuristics” to guess the answer. If the test setter is predictable enough, the student might actually get a few right.


Wait, is that true? I feel like that claim needs a lot of disclaimers.


https://arxiv.org/abs/2305.18248

"In particular, we find that LMs often hallucinate differing authors of hallucinated references when queried in independent sessions, while consistently identify authors of real references. This suggests that the hallucination may be more a generation issue than inherent to current training techniques or representation."

https://arxiv.org/abs/2303.08896

"SelfCheckGPT leverages the simple idea that if a LLM has knowledge of a given concept, sampled responses are likely to be similar and contain consistent facts. However, for hallucinated facts, stochastically sampled responses are likely to diverge and contradict one another."


Exactly this! This is my experience also. Your point about "well covered & non-controversial" is spot on. I know not to expect great results when asking about topics that have very little coverage. To be honest I wouldn't expect to go to an arbitrary human and get solid answers on a little covered topic, unless that person just happened to be topic expert. There is so much value in having the basics to intermediate levels of topics covered in a reliable way. That's where most of commercial activity occurs.


I think a key difference is that humans very rarely sound convincing talking about subjects they have no clue about.

I've seen the hallucination rate of LLMs improve significantly, if you stick to well covered topics they probably do quite well. The issue is they often have no tells when making things up.


Joe Rogan has made tons of money off talking without providing factual information. Hollywood has also made tons of money off movies "inspired by real events" that hallucinate key facts relevant to the movie's plot and characters. There's a huge market for infotainment that is "inspired by facts" but doesn't even try to be accurate.


You listen to Joe Rogan with the idea that this is a normal dude talking not an expert beyond martial arts and comedy.

A person who uses ChatGPT must have the understanding that it's not like Google search. The layman, however, has no idea that ChatGPT can give coherent incorrect information and treats the information as true.

Most people won't use it for infotainment and OpenAI will try its best to downplay the hallucination as fine print if it goes fully mainstream like google search.


Give people more credit. If you're using an AI these days, you have to know it hallucinates sometimes. There's even a warning about it when you log in.


There's a contingent of the population passing videos around on tiktok genuinely concerned that AIs have a mind of their own

no I will not give the public credit, most people have no grounding to discern wtf a language model is and what it's doing, all they know is computers didn't use to talk and now they do


I'll give tech people credit, but non-tech people I'm not so sure. A good example is the cookie permissions or app permissions. A great number of non-tech people don't even know or care what they mean.


You gotta stop bucketting people like that. People may not know terms "cookie permissions" or "app permissions" but they sure as fuck understand the idea of user tracking or handing a company access to your mic/camera. And to say they don't care about these things is simply not true.


Which people? If you are software engineers or AI researchers, sure. Otherwise, it probably won't matter to you.


Wait until you learn about the mainstream media.


For a certain demographic and generation, Joe Rogan is the mainstream media.


Rogan is literally the largest podcast on the Spotify. It's the definition of mainstream.


If that’s your benchmark, I don’t want your AI.


OpenAI isn't marketing ChatGPT as, "infotainment."


now that you mention it, a big "for entertainment purposes only" banner like they use to have on all the psychic commercials on tv would not be inappropriate. it's incredible that LLMs are being marketed as general purpose assistants with a tiny asterisk, "may contain inaccuracies" like it's a walnut contamination


Not sure what's being incredible here. GPT-4 is a stellar general-purpose assistant, that shines when you stop treating it as encyclopedia, and start using it as an assistant. That is, give it tasks, like summarizing, or writing code, or explaining code, or rewriting prose. Ask for suggestions, ideas. You can do that to great effect, even when your requests are underspecified and somewhat confused, and it still works.


I just wish they were advertised for generative tasks and not retrieval tasks. It's not intelligence, it's not reasoning, it's text transformation.

It seems to be able to speak on history, sometimes it's even right, so there's a use case that people expect from it.

FYI I've used GPT4 and Claude 2 for hundreds of conversations, I understand what its good and bad at; I don't trust that the general public is being given a realistic view.


In my experience, LLVMs are not about being provided facts. They are about synthesizing new content and insights based on the model and inputs.

Rather than asking it about facts, I find it useful to derive new insights.

For example: "Tell me 5 topics about databases that might make it to the front page of hacker news." It can generate an interesting list. That is much more like the example they provided in the article, synthesizing a bed time story is not factual.

Also, "write me some python code to do x" where x is based on libraries that were well documented before 2022 also has similarly creative results in my experience.


> how you can talk to something that doesn't provide factual information and just take it at face value

Like talking to most people you mean?


When OpenAI buys me a drink at the bar in exchange for the rubbish it produces, I might have a more favourable view.


As soon as they release the API, we can build an AI "bartender". Combine the voice output and input with NeRF talking heads such as from Diarupt or https://github.com/harlanhong/awesome-talking-head-generatio....

You will now be able to feed it images and responses of the customers. Give it a function to call complementaryDrink(customerId) Combine it with a simple vending machine style robot or something more complex that can mix drinks.

I'm not actually in a hurry to try to replace bartenders. Just saying these types of things immediately become more feasible.

You can also see the possibilities of the speech input and output for "virtual girlfriends". I assume someone at OpenAI must have been tempted to train a model on Scarlett Johansson's voice.


Hopefully people know not to ask others for factual information (unless it's an area they're actually well educated/knowledgeable in), but for opinions and subjective viewpoints. "How's your day going", "How are you feeling", "What did you think of X", etc, not "So what was the deal with the Hundred Year's War?" or whatever.

If people are treating LLMs like a random stranger and only making small talk, fair enough, but more often they're treating it like an inerrable font of knowledge, and that's concerning.


> If people are treating LLMs like a random stranger and only making small talk, fair enough, but more often they're treating it like an inerrable font of knowledge, and that's concerning.

That's on them. I mean, people need to figure out that LLMs aren't random strangers, they're unfiltered inner voices of random strangers, spouting the first reaction they have to what you say to them.

Anyway, there is a middle ground. I like to ask GPT-4 questions within my area of expertise, because I'm able to instantly and instinctively - read: effortlessly - judge how much to trust any given reply. It's very useful this way, because rating an answer in your own field takes much less work than coming up with it on your own.


No individual is "most people". Most of the time I spend talking to people in real life is with people whose professional expertise, hobbies, and other sources of knowledge I know at least roughly. I have an idea how good they are at evaluating what they know and how honest they and whether they are prone to wishful thinking.


> I still don't understand how you can talk to something that doesn't provide factual information and just take it at face value?

All human interactions from all of history called and they …


I'm curious if you're using GPT-4 ($)? I find a lot of the criticisms about hallucination come from users who aren't, and my experience with GPT-4 is it's far less likely to make stuff up. Does it know all the answers, certainly not, but it's self-aware enough to say sorry I don't know instead of making a wild guess.


You can also prompt it to hold back if it doesn’t know, which seems to make a difference. It’s part of my default prompt, and since I added it I haven’t had any overt hallucinations. Definitely invalid code, but not due to crazy errors. Just syntax and inconsistent naming mostly.

I verify just about everything that I ask it, so it isn’t just a general sense of improvement.


Why would anyone pay for something if the free trial doesn’t work? “Hey, you know how we gave you a product that doesn’t quit work as you expect and is super frustrating? Just pay us money, and we’ll give you the same product, but it just works. Just trust us!”


GPT-4 is not the same product. I know it seems like it due to the way they position 3.5 and 4 on the same page, but they are really quite separate things. When I signed up for ChatGPT plus I didn't even bother using 3.5 because I knew it would be inferior. I still have only used it a handful of times. GPT-4 is just so much farther ahead that using 3.5 is just a waste of time.


Would you mind sharing some threads where you thought ChatGPT was useful? These discussions always feel like I’m living on a different planet with a different implementation of large language models than others who claim they’re great. The problems I run into seem to stem from the fundamental nature of this class of products.


The usefulness of ChatGPT is a bit situational, in my experience. But in the right situations it can be pretty powerful.

Take a look at https://chat.openai.com/share/41bdb053-facd-448b-b446-1ba1f1... for example.


A great example. Here's a similar one from me: https://cloud.typingmind.com/share/d2000ffc-a1bf-4b71-b59d-c....

Context: had a bunch of photos and videos I wanted to share with a colleague, without uploading them to any cloud. I asked GPT-4 to write me a trivial single-page gallery that doesn't look like crap, feeding it the output of `ls -l` on the media directory, got it on first shot, copy-pasted and uploaded the whole bundle to a personal server - all in few minutes. It took maybe 15 minutes from the idea of doing it first occurring to me, to a private link I could share.

I have plenty more of those touching C++, Emacs Lisp, Python, generating vCARD and iCalendar files out of blobs of hastily-retyped or copy-pasted text, etc. The common thread here is: one-off, ad-hoc requests, usually underspecified. GPT-4 is quite good at being a fully generic tool for one-off jobs. This is something that never existed before, except in form of delegating a task to another human.


Here's a convo I had yesterday when thinking about how to print a Binary Search Tree.

https://chat.openai.com/share/338e7397-0201-44f4-a2c3-75b733...

I use ChatGPT for all sorts of things - looking into visas for countries, coding, reverse engineering companies from job descriptions, brainstorming etc etc.

It saves a lot of time and gives way more value than what you pay for it.


I agree that none of the problems people have mentioned above happen with GPT4.

It used to be more reliable when web browsing worked, but it's still pretty reliable.


A human driving buddy can make up a lot of stuff too. Have an interesting conversation but don't take it too seriously. If you're really researching something serious then take a mental note to double check things later, pretend as if you're talking to a semi-reliable human who knows a lot but occasionally makes mistakes.


> ...talk to something that doesn't provide factual information and...

Ah yes, I dont understand how to talk to people either!


I always thought a better future would be full of more and more distilled, accurate, useful knowledge and truthful people to promote that.

Comments like yours make me think that no one cares about this...and judging by a lot of the other comments, I guess they don't.

Probably going to be people, wading through a sea of AI generated shit, and the individual is supposed to just forever "apply critical thinking" to it all. Even a call from ones spouse could be fake, and you'll just have to apply critical thinking or whatever to workout if you were scammed or not.


There aren't any real world sources of truth you can avoid applying critical thinking to. Much published research is false, and when it isn't, you need to know when it's expired or what context it's valid in.


But do we need 9999999x the amount of information to critically be thinking about, is this going to be helpful ?


Because it doesn't always make up stuff. Because I'm a human and can ask for more information. I don't want an encyclopedia on a podcast. I want to "talk" to someone about stuff. Not have an enumerated list of truths firehosed at me.


Pay for the Plus version.

Then it makes stuff up far less frequently.

If the next version has the same step up in performance, I will no longer consider inaccuracy an issue - even the best books have mistakes in them, they just need to be infrequent enough.


> Pay for the Plus version.

> Then it makes stuff up far less frequently.

Now there's a business model for a ChatGPT-like service.

$1/month: Almost always wrong

$10/month: 50/50 chance of being right or wrong

$100/month: right 95% of the time


You make it sound like business shenanigans, but the truth is, it's a natural fit for now, as performance of LLMs improves with their size, but costs of training (up-front investment) and inference (marginal, per-query) also go up.


Pay for the $1/month version and invert the responses; now you have the $100/month one for cheap :D


Are you using 3.5 or 4?


I've wanted a ChatGPT Pod equivalent to a Google Home pod for a while! I have been intending to build it at some point. I am with you, talking to Google sucks.

"Hey Google, why do ____ happen?" "I'm sorry, I don't know anything about that"

But you're GOOGLE! Google it! What the heck lol

So yeah, ChatGPT being able to hear what I say and give me info about it would be great! My holdup has been wakewords.


We have hardware and wake words:

https://heywillow.io/

Our REST endpoint can talk to whatever you want and we’ll have native ChatGPT soon.


Ooooo looking, thanks for the recommendation!


It’s funny. Driving buddy has been my number one use case for a while now.

Still can’t quite make it work. I feel like I could learn a lot if I could have random conversations with GPT.

+ bonus if someone else in the car got excited when I see cows. Don’t care if it’s an AI.


Try Pi AI. They have an app that can be voice/audio driven. Works well for the driving buddy scenario.

https://pi.ai/talk


you could have a simulation of learning a lot by chatting with GPT, why you would take it as truth without an equal portion of salt is beyond me


I assume you have never heard of podcasts.


I'm sure one can talk to their podcasts, but I would be worried if they ever answered me back.


you can ask podcasts questions? and they answer you?


no, but they don't get the answer wrong 20% of the time and give off 100% correctness vibes.


No, generally podcasts are far worse than that...


neither does a tuba.

why be mad at a hammer if you hit your thumb with it?


It increasingly feels to me like building any kind of general-use AI tool or app is a bad choice. I see two viable AI business models:

1. Domain-specific AI - Training an AI model on highly technical and specific topics that general-purpose AI models don't excel at.

2. Integration - If you're going to build on an existing AI model, don't focus on adding more capabilities. Instead, focus on integrating it into companies' and users' existing workflows. Use it to automate internal processes and connect systems in ways that weren't previously possible. This adds a lot of value and isn't something that companies developing AI models are liable to do themselves.

The two will often go hand-in-hand.


> building any kind of general-use AI tool or app is a bad choice

Maybe not if you rely on models that can be ran locally.

OpenAI is big now, and will probably stay big, but with hardware acceleration, AI-anything will become ubiquitous and OpenAI won’t be able to control a domain that’s probably going to be as wide as what computing is already today.

The shape of what’s coming is hard to imagine now. I feel like the kid I was when I got my first 8-bit computer in the eighties: I knew it was going to change the world, but I had little idea how far, wide and fast it would be.


There are plenty of OS models being released - there's going to be a steadily increasing quantity + quality of models you can run locally. I don't think it's a good place to compete.


r.e. local models, are you thinking about privacy oriented use cases say hippa?

any pertinent examples?


If you focus on integration, you're up against autogpt, gorilla, etc.


AutoGPT isn't remotely usable for practical enterprise software purposes right now.


Agreed, not yet. It will get there.


> Instead, focus on integrating it into companies' and users' existing workflows. Use it to automate internal processes and connect systems in ways that weren't previously possible

why wouldn’t a company do that themselves e.g. how inter come has vertically integrated AI? any examples?


It's classic build vs. buy. Companies tend to build their own products and use third party software for internal tools.

Just look at Salesforce AppExchange - it's a marketplace of software built on top of Salesforce, a large chunk of which serves to integrate other systems with Salesforce. LLMs open up the ability to build new types of integrations and to provide a much friendlier UI to non-developers who need to work on integrating things or dealing with data that exists in different places.


> 1. Domain-specific AI - Training an AI model on highly technical and specific topics that general-purpose AI models don't excel at.

You will be eaten if you do this imo.


I don't think anybody following OpenAI's feature releases will be caught off guard by ChatGPT becoming multi-modal. The app already features voice input. That still translates voice into text before sending, but it works so well that you basically never need to check or correct anything. Rather, you might have already been asking yourself why it doesn't reply back with a voice already.

And the ability ingest images was a highlight and all the hype of the GPT-4 announcement back in March: https://openai.com/research/gpt-4


one of the original training sets for the BERT series is called 'BookCorpus', accumulated by regular grad students for Natural Language Processing science. Part of the content was specifically and exactly purposed to "align" movies and video with written text. That is partly why it contains several thousand teen romance novels and ordinary paperback-style story telling content. What else is in there? "inquiring minds want to know"


> This announcement seem to have killed so many startups that were trying to do multi-modal on top of ChatGPT.

Rather than die, why not just pivot to doing multi-modal on top of Llama 2 or some open source model or whatever? It wouldn’t be a huge change

A lot of businesses/governments/etc can’t use OpenAI due to their own policies that prohibit sending their data to third party services. They’ll pay for something they can run on-premise or in their own private cloud


I’ve got one eye on https://www.elto.ai/. I was pitching something I like better earlier this year (I still think they’re missing a few key things), but with backing from roughly YC, Meta, and God, and a pretty clear understanding that robustness goes up a lot faster than capability goes down?

I wouldn’t count out focused, revenue-oriented players with Meta’s shit in their pocket out just yet.


wow Elto seems to kill many of the incumbents in this niche

what do you think they’re missing? i was trying to build a diaper but it would be impossible to compete with these guys


"Don't build your castle in someone else's kingdom."


It already replaced search engines. So much easier to write the question and explore the answers until it is solved.


Took me a while to realise I can just type search queries into ChatGPT. e.g. simply "london bridge history" or whatever into the chat and not only get a complete answer, but I can ask it follow-up questions. And it's also personalised for the kinds of responses I want, thanks to the custom instructions setting.

ChatGPT is my primary search engine now. (I just wish it would accept a URL query parameter so it could be launched straight from the browser address bar.)


Trying that example, I’d much prefer just going to the Wikipedia page on London Bridge than trying to guess what phrases ChatGPT will respond well to in order to elicit more info. It’s initial response for me didn’t even mention one of the most interesting facts that people lived and worked on the bridge.


YMMV. For my case on software development, I don't even look on stackoverflow anymore.

Just type the tech question, start refining into what is needed and get a snippet of code tailored for what is needed. What previously would take 30 to 60 minutes of research and testing is now less than a couple of minutes.


And I don't have to wade through Stack Overflow and see all the times mods and others have tried to or succeeded in closing down very useful questions.


I know there are a lot of google programmers out there, but was using search engines for programming ever a good idea? Don’t get me wrong, I’ll look up how to do absolutely simple things every day but I basically always look in the official documentation.

Which may be why I’ve been very underwhelmed by GPT so far. It’s not terrible at programming, and it’s certainly better than what I can find on Google, but it’s not better than simply looking up how things work. I’m really curious as to why it hasn’t put a more heavy weight on official documentation for its answers, they must’ve scraped that a long with all the other stuff, yet it’ll give you absolutely horrible suggestions when the real answer must be in its dataset. Maybe that would be weird for less common things, but it’s so terrible at JavaScript that it might even be able to write some of those StackOverflow answers if we’re being satirical, and the entire documentation for that would’ve been very easy to flag as important.


Yes there are and it's infuriating. Colleague of mine had problems with integrating some code into an app that was built on a newer version of a framework because "there aren't a lot of examples yet". One web search and I found the frameworks own migration guide detailing the exact differences that would need to be accounted for.


Glad you have time and patience to read documentation.

Such luxury is increasingly rare for software developers nowadays.


Fortunately it's not like StackOverflow has been used as training data for LLMs, right?


Well, yes. Point is, GPT-4 read the entire StackOverflow and then some, comprehended it, and now is a better interface to it, more specific and free of all the bullshit that's part of the regular web.


This is funny, because I find it much less cumbersome to type a few search terms into a search engine and explore the links it spits out.


It depends on the subject but search engines are on the decline. With so many fake website written by AI I can only see it get worse.

The most extreme I can think of is when I want to find when a show comes out and I have to read 10 paragraphs from 5 different sites to realize no one knows.


> The most extreme I can think of is when I want to find when a show comes out and I have to read 10 paragraphs from 5 different sites to realize no one knows.

I found that you can be pretty sure no one knows if it’s not already right on the results page. And if the displayed quote for a link on the results page is something like “wondering when show X is coming out?”, then it’s also a safe bet that clicking that link will be useless.

You learn those patterns fast, and then the search is fast as well.


I don't disagree but having to have a learning phase for patterns sounds a bit like people clinging to an old way of things.


You mean like prompt engineering?

What you’re describing as “clinging to an old way of things” is how every single thing has been, ever, new or old.


I don't know why you come here and say something so obviously untrue.


It’s better to have a pattern than having no pattern with ChatGPT to tell when it’s hallucinating or not.

I wish MLs were more useful than search engines, but they have still a long way to go to replace them (if they ever do).


Google still thinks I want to click on the sites I haven't clicked on in a decade even though they are first results. Search engines have a long way to go to catch up to GPT


> The most extreme I can think of is when I want to find when a show comes out

Yeah, I find that queries which can be answered in a sentence are the worst to find answers from search engines because all the results lengthen the response to an entire article, even when there isn't an answer.


agreed on that antipattern although fwiw chatgpt is unlikely to know the answer for questions like these either.


Agreed except ChatGPT (3.5 at least, haven't tried 4) is unable to provide primary sources for its results. At least when I tried, it just provided hallucinated urls


Try it. There's a world of difference.


In general or for this specific application (linking primary sources)?


In general. I don't know whether it's better at providing sources.


Bing Chat for me, when mostly searching IT technical or programming stuff sometimes gives junk urls, sometimes gives some real valuable urls.


GPT4All is capable of providing sources. This seems more to be a legal defense mechanism by ChatGPT than a technical obstacle.


i love gpt-4 and i find chatgpt useless. so there is a big difference


who would have thought that few years ago, just goes to show that a Giant like Google is also susceptible when they stop innovating. The real battle is going to be fought between these two as Google's business is majorly dependent on search ads.


It rather created new hybrid search engines, like perplexity and phind.


True. Although the training is on a snapshot of websites, including q&a like stackoverflow. If these were replaced too, where are we heading? We'll have to wait and see. One concern would be centralization/ lack of options and diversity. Stackoverflow started rolling AI on its own, despite the controversial way it did (dismissing long time contributors); it might be correctly following the trend.


Personally I prefer stackoverflow and such, because I can see different answer including wrong or non-applicable ones which don't solve my exact problem.


One site doesn't need to exclude the other.

Both have their uses.


Last I heard, OpenAI was losing massive amounts of money to run all this. Has that changed?

Because past history shows that the first out of the gate is not the definitive winner much of the time. We aren't still using gopher. We aren't searching with altavista. We don't connect to the internet with AOL.

AI is going to change many things. That is all the more reason to keep working on how best to make it work, not give up and assume that efforts are "doomed" just because someone else built a functional tool first.


you're absolutely right.

also, I did not know until today's thread that OpenAI's stated goal is building AGI. which is probably never going to happen, ever, no matter how good technology gets.

which means yes, we are absolutely looking at AltaVista here, not Google, because if you subtract a cult from an innovative business, you might be able to produce a profitable business.


Why isn’t AGI ever going to happen? Ever?


Because the goalposts are currently somewhere near Neptune, and expected to catch up to Voyager sometime in the next couple years.


Not only "Alexa/Siri/Google Home" but Google Search [ALL] itself. Google was a pioneer in search engines adding a page ranking / graph layers as a meaning but technologies such as ChatGPT could add a real layer of meaning, at least improve current Google Search approach. The future of search seems more conversational and contextual.

BTW, I expect these technologies to be democratized and the training be in the hands of more people, if not everyone.


To some extent yes, for generic multi-modal chat-bots this could be a problem, but there are many apps that provide tight integration / smooth tooling for whatever problem they are helping to solve, and that might be valuable to some people -- especially if it's a real value generating use case, where the difference between 80% solution from ChatGPT and 95% solution from a bespoke tool matters.


hobbyists and professionals on /r/localllama subreddit are having an existential crisis

most of them accurately detect it is a sunk cost fallacy to continue but it looks like a form of positive thinking... and that's the power of community!


Those startups noting seeing this coming as a major risk is asking for it


There is still a lot to do.


I never understood why they thought that this wouldn’t happen.


This is good news - those ai companies have been freed to work on something else, along with the ai workers they employ. This is of great benefit to society.


any pertinent examples? i’m curious how they pivot


This is the dagger that will make online schooling unviable.

ChatGPT already made it so that you could easily copy & paste any full-text questions and receive an answer with 90% accuracy. The only flaw was that problems that also used diagrams or figures would be out of the domain of ChatGPT.

With image support, students could just take screenshots or document scans and have ChatGPT give them a valid answer. From what I’ve seen, more students than not will gladly abuse this functionality. The counter would be to either leave the grading system behind, or to force in-person schooling with no homework, only supervised schoolwork.


Another option is that this doesn't replace the student's work, but the teacher's. The single greatest use I have found for ChatGPT is in educating myself on various topics, hosting a socratic seminar where I am questioning ChatGPT in order to learn about X. Of course this could radically change a student's ability to generate homework etc, but this could also radically change how the student learns in the first place. To me, online school could become much more than they are now through AI-assisted tutoring. I can also see a future where "schooling" becomes much more decentralized than it is now and where students are self-selecting curriculum, methods, etc to give students ownership and a sense of control over their work so that they don't just look at it as "busywork".


I agree, but typical GPT use is actually the opposite of the traditional Socratic mode in which the teacher uses questions to guide the student to understanding. But I wonder how it would do if it was prompted to use the Socratic method.


Duolingo is experimenting with a GPT-4 fine tune/wrapper which makes it act as a Socratic method teacher.


I tried to teach it the Socratic method. It took some long prompt engineering, but finally it worked. BUT what I realized was that it was always lacking the bigger picture, an agenda of what it wants to teach me.


That's the only sane option. The other options suggested in previous comments are not really options but rather trying to use a band-aid to hold together a dam that has already been breached.


Absolutely ChatGPT is a great learning tool if in the right hands. The issue is that students with a genuine interest in learning are a minority. The majority would rather use ChatGPT to cheat through their class work and get an easy A rather then exhaust the effort to chat and learn for their own sake.


>The majority would rather use ChatGPT to cheat through their class work and get an easy A rather then exhaust the effort to chat and learn for their own sake.

Just have 100% of the mark come from in-person exams, as many subjects already do. Students can cheat all they want on assignments, but the only thing it's hurting is their exam score.


It's true.

I mean what is the point of doing schoolwork when some of the greatest minds of our time have decided the best way for the species to progress is to be replaced by machines?

Imagine you're 16 years old right now, you know about ChatGPT, you know about OpenAI and their plans, and you're being told you need to study hard to get a good career..., but you're also reading up on what the future looks like according to the technocracy.

You'd be pretty fucking confused right now wouldn't you?

It must be really hard at the moment to want to study and not cheat....


I'm in my mid 30s and even I have some amount of apathy for the remainder of my career. I feel pretty confident my software and product experience is going to be not-so-useful in 15 years as it is today.


Same, I'm already pivoting into other areas.


Your username checks out!

That said, is it that much different from the past twenty years, when everyone was being told to follow their passion and get a useless $200,000 communication or literature degree to then go work at Starbucks? At least kids growing up with AI will have a chance to make its use second nature like many of us did with computers 20-30 years ago.

The kids with poor parental/counselor guidance will walk into reality face first, the ones with helicopter parents will overcorrect when free, the studious ones will mostly figure life out, the smart ones will get disillusioned fast, and the kids with trust funds just kept doing their thing. I don't think much will change.


I do think it is much different from the past twenty years. Twenty years ago we didn't have ChatGPT. There are things we could compare it to, but there also isn't anything like it.

My biggest fear is just a lack of jobs.

When people need experience to work, and the work you give to people to give them experience is replaced by ChatGPT - then what do we do?

Of course there will still be companies hiring people, but when leadership is telling people to save money - it seems much cheaper to use ChatGPT that it is to train someone.

Why hire a kid that has been using AI, when the AI can just do the work? Or if a kid that has been using AI can do the work of 20 people, what happens to the 19 people that can't find work? Will we be in a place where we need 20 people doing the work of 20 people each? Is that need actually there?

I do very much appreciate your view. I feel like I waffle back and forth between what I'm saying here and your comment.

I apologize for coming across doomer-ish. It is sometimes hard for me to imagine a future for kids growing up with ChatGPT.


The only way there will be no jobs is if every conceivable human need is met by robots. In which case there will also be no need to work.


I understand that. The fear I'm describing isn't no jobs, but less jobs than there are people.

There's multiple ways to address this, but it's difficult for me to imagine a future with our current economic system (in the US) that allows that to happen (like UBI).


What people are missing is the teacher will soon be an LLM with a camera looking at the student. Why would you watch a video of a human during an online class? Why would you ask the student to produce something in a black room? We will not evaluate students based on their homework, an AI assistant will evaluate the student based on the conversations they had together. You can automate teaching, but not learning. There is this gap in time where teaching hasn't catch-up, it's going to be quickly addressed since teaching is expensive. Parents should really encourage their kids to practice their learning as before, eventually using ChatGPT like they use Wikipedia. One generation will suffer during the change.


When we talk about people abusing ChatGPT in a school context, it’s always for kids in high school or greater education levels. These are individuals that know right from wrong and also have the motor skills and access to use such a tool. These are individuals who are problem-solving for their specific need, which is to get this homework or essay out of the way so that they can do XYZ. Presumably XYZ does not leverage chatgpt. So make that what they spend their time on. At some point they’ll have to back-solve for skills they need to learn and need educational guidance and structure.

This is obviously not easy or going to happen without time and resources, but that is how adaptation goes.


I've taken certification exams where an app is run on my machine verifying I have nothing else open and my camera had to be enabled, with me and my hands in view for the entirety of the test. There are ways to ensure cheating is more difficult than it's worth, however I see this tech as greatly changing what we want to learn and how we might learn it. It is transformative and not slowing down.


> I've taken certification exams where an app is run on my machine verifying I have nothing else open and my camera had to be enabled, with me and my hands in view for the entirety of the test. There are ways to ensure cheating is more difficult than it's worth

Yes sure it makes cheating inconvenient. It also makes exam taking inconvenient.

If I can at all help it, I will not be a subject to this sort of abuse and neither will my kid.


Well, I think the kid will already be logged into ChatGPT using a AI Teacher ChatGPT plugin which is doing interactive instruction.

They can still log in on their phone to cheat though. I wonder if OpenAI will add linked accounts and parental controls at some point. Instance 2 of ChatGPT might "tell" on the kid for cheating by informing Instance 1 running the AI Teacher plugin.


Will be kind of stupid to cut kids off from ChatGPT and pretend to them that they should go off to school, meanwhile Silicon Valley is doing it's best to make every job possible obsolete? Kind of invalidates the whole exercise of the current approach to schooling right?

What are you going to school for, to learn how to write essays? Well, we have an app for that ?

It sounds like the future of work will be prompting, and if and when that is obsolete...who knows what...


Use online for training, real life for testing/grading. That way cheating at home will only hurt yourself.


The problem here is that homework is designed to provide the structure kids need to apply themselves and actually learn. If you don't provide structure for this, they will simply never study and accept failure. They frequently don't have the self-discipline and mindfulness and long-term vision to study "because it's the right thing to do". I know my entire education, even with college, was "why do i need to know this?" and being wildly bored with it all as a result.


It would be sufficient to do exams in person and no longer grade homework.


Good point. Though I imagine fully online institutions would require testing facilities. Maybe local libraries become testing hosts?


I studied at a distance university and they use lecture halls of local universities for the exams.


I like how they silently removed the web browsing (Bing browsing) chat feature after first having it disabled for several months.

A proper notice about them removing the feature would've been nice. Maybe I missed it (someone please correct me if wrong), but the last I heard officially it was temporarily disabled while they fix something. Next thing I know, it's completely gone from the platform without another peep.


I currently have Browsing with Bing enabled as a plug-in on my account. It went away for months, but it just randomly came back about a week or 2 ago!


Yes, that was a disappointment, and I agree it looks like they aren't going to re-enable it anytime soon. However I find that Perplexity AI does a better job of using web search than ChatGPT ever did, and I use it more than ChatGPT for that reason.


Perplexity has gone downhill a lot since its initial rollout. Anecdotally, from my experience as a non-paying user of the service.


give vello.ai a try


Thanks I'll check it out. Are there other similar sites you like?


vello.ai is very, very slow. I used it for web searching but waiting (sometimes) more than 30 seconds for a simple queries is unacceptable


it is optimized for more in depth research rather than quick shallow answers, so a different use case.


Just made an account to say that I currently have this feature. It was gone for a few months but it came back to me I think this past week. Not as a plugin, either, it is its own “model” to select.


Since so many others including myself don't see it, I guess that means it is getting a slow rollout which they are being extra cautious with this time.


Hey, thanks for the info! I did not know about this, but this is actually good to hear. I'll keep an eye open for it. Are you using ChatGPT or the API? Did you have to take any action to get it to reappear, or is it just a slow rollout as they re-enable?


Agreed. You’re now dependent on a third party plugin.


So far the most intuitive, killer app level UX appears to be text chat. This interaction with showing it images also looks interesting as it resembles talking with a friend about a topic but let's see if it feels like talking to a very smart person(ChatGPT is like that) or a very dumb person that somewhat recognise objects. Recognising a wrench is nowhere near as impressive as to able to talk with ChatGPT about history or make it write code that actually works.

OpenAI is killing it, right? People are coming up with interesting use cases but the main way most people interact with AI, appears to be ChatGPT.

However they still don't seem to be able to nail image generation, all the cool stuff keep happening on MidJourney and StableDiffusion.


OpenAI is also releasing DALLE-3 in "early October" and the images they chose for their demos show it demonstrating unprecedented levels of prompt understanding, including embedding full sentences of text in an output image.


Not unprecedented at all. SDXL Images look better than the examples for DALLE-3 and SDXL has a massive tool ecosystem of things like controlnet, Lora’s, regional prompting that is simply not there with DALLE-3


Lol it's definitely unprecedented. XL can't touch Dalle's comprehension of text. Control Net and LORAs aren't a substitute for that.


There are pros and cons for sure but you should check out the press release, DALLE3 is definitely capable of stuff that sd xl isn’t.


I've been making a few hobby projects that consolidate different AI services to achieve this, so I look forward to the reduced complexity and latency from all those trips.

If the API is available in time (halloween), my multi-modal talking skeleton head with an ESP32 camera that makes snarky comments about your costume just got slightly easier on the software side.


> I've been making a few hobby projects that consolidate different AI services to achieve this, so I look forward to the reduced complexity and latency from all those trips.

ironically this is basically the exact line of reasoning for why i didn't embark on any such endeavors


If you make this, please share some steps/details! It sounds super cool and I'd love to make something like this!


Would love to see the final project - my email is in the bio.


As someone deep in the software test automation space, the thing I'm waiting for is robust AI-powered image recognition of app user interfaces. Combined with an AI ability to write test automation code, I'm looking forward to the ability to generate executable Selenium or Appium test code from a single screenshot (or sequence of screenshots). Feels like we're almost there.


I'll recommend the Spotlight paper by Google[1]. There are very interesting datasets they created for this purpose. They mention they have a screen-action-screen dataset that is in-house and it doesn't look like they'll open it. Maybe owning Android has its advantages.

There's a recent paper by Huggingface called IDEFICS[2] that claims to be an open source implementation of Flamingo(an older paper about few-shot multi-modal task understanding) and I think this space will be heating up soon.

[1] https://research.google/pubs/pub52171/

[2] https://huggingface.co/blog/idefics


Thanks!


My biggest complaint with OpenAI/ChatGPT is their horrible "marketing" (for lack of a better term). They announce stuff like this (or like plugins), I get excited, I go to use it, it hasn't rolled out to me yet (which is frustrating as a paying customer), and my only recourse is.... check back daily? They never send an email "Plugins are available for you!", "Voice chat is now enabled on your account!" and so often I forget about the new feature unless I stumble across it later.

Just now I opened the app, went to setting, went to "New Features", and all I saw was Bing Browsing disabled (unable to enable). Ok, I didn't even know that was a thing that worked at one point. Maybe I need an update? Go to the App Store, nope, I'm up to to date. Kill the app, relaunch, open settings, now "New Features" isn't even listed. I can promise you I won't be browsing the settings part of this app regularly to see if there is a new feature. Heck, not only do they not email/push about new features they don't even message in-app about them, I really don't understand.

Maybe they are doing so well they don't have to care about communicating with customer right now but it really annoys me and I wish they did better.


They have gone from being a niche research company to being (probably) the fastest growing start-up in history.

I suspect they do care about communicating with customers, but it's total chaos and carnage internally.


Maybe there is a state somewhere between "total chaos and carnage" and "emails users when new features are enabled for their account".

Such as "decided it wasn't an operational priority to email users when features were enabled for them".


Emailing users when a new feature is enabled for their account isn't even the kind of thing that would distract an existing very busy developer.

You could literally hire an entirely new guy, give him instructions to build such an email system, and let him put the right triggers on the user account permissions database to send out the right emails at the right time.

And then, when it's built, you can start adding more features like sending the emails only when demand is low and/or at times of day when you get the best click through rate. And then next you can start measuring the increase in revenue from sending those emails.

Before long, you have a whole marketing and comms team. Which you probably want as a big company anyway.


The peanut gallery could (and does) say this about 1000 little features that 1000 different people claim to be so necessary that OpenAI is incompetent for not having, yet none of those people agree that those other features are priorities.


They could hire 1000 new developers a month and still be understaffed vs any company in history with thier valuation.


The fact that they are so succesful despite their low headcount is an argument that there are advantages to keeping the headcount low.

I'm sure we've all seen companies grow too fast become less productive than when there were a ten person startup.


Why isn't GPT4 running the company and handling these marketing missteps? j/k


We don't have evidence that this isn't the case


Heck, dogfood your own product and hire someone to ask ChatGTP to do it!


And then the moment you got this email you'd post to HN and everyone else would be "OMG, why don't I have that option"

Incremental rollouts are hard.


And then while one person wants the update, another person complains it's spam...

Incremental rollouts are hard, and so is communicating in a way that makes everyone happy.


But this issue far predates their current success. GPT2 was held back for a while. GPT3 launched as "waitlist only" with an application process, and so did GPT3.5.

This is a large part of what held them back: GPT3.5 had most of the capabilities of the initial ChatGPT release, just with a different interface. Yet GPT3.5 failed to get any hype because the rollout was glacial. They made some claims that it was great, but to verify this for yourself you had to wait months. Only when they finally made a product that everyone could try out at the same time, with minimal hassle, did OpenAI turn from a "niche research company" to the fastest growing start-up. And this seems to have been a one-time thing, now they are back to staggered releases.


> GPT3.5 had most of the capabilities of the initial ChatGPT release, just with a different interface

I believe two other factors were the cost (especially of fine tuned models, IIRC fine tuned davinci cost $0.20 per thousand tokens) and also that OpenAI hadn't very clearly shown just how much higher the quality could get once RLHF'd. I remember asking davinci-instruct to parse some data, and the reliability really seemed much lower than ChatGPT at launch, to the point that, at the time, I thought GPT-4 was secretly powering ChatGPT.


Turns out that getting enough compute power to meet demand for AI is hard.


At what point to you go from startup to not when you have 10 billion invested and countless employees and is practically a sub branch of microsoft. Sounds cooler though I guess


I'm not sure when a company begins to not be a start up, but by the time they have a wave of news claiming their product kills teenagers, or they're engaging in antitrust, or they're effectively using space labor, that's when they are definitely no longer a start up.

That hasn't happened yet for OpenAI, but I'm sure it will happen eventually, and then we'll know.


I think you stop being a startup when there are engineers who do not know the CEO. I would guess OpenAI is still a startup by that definition (they don't have that many engineers IIRC) but I don't actually know.


That's really a function of what kind of CEO the company has, and what do you mean by "know". I worked remotely for a company employing hundreds of people, around for couple decades and with offices in different regions of the world, and I still got to talk to the CEO a couple times, and he knows me by name, all by virtue of bumping into him a couple times on corridor while on one of my infrequent visits to the office.


> I suspect they do care about communicating with customers, but it's total chaos and carnage internally.

This is my best guess as well, they are rocketing down the interstate at 200mph and just trying to keep the wheels on the car. When you're absolutely killing it I guess making X% more by being better at messaging just isn't worth it since to do that you'd have to take someone off something potentially more critical. Still makes me a little sad though.


> When you're absolutely killing it

Aren't they unprofitable? and have fierce competition from everyone?


Whether or not you’re profitable has very little to do with how valuable others think you are. And usually having competitors is something that validates your market.


> And usually having competitors is something that validates your market.

Don't users validate your market? ChatGPT has plenty of users, so I would think competitors only hurt their value.


Well, it depends.

Clearly, you can be a company like Microsoft where nobody is challenging your dominance in PC operating systems, and you can make huge sums of money. So competitors certainly aren't vital.

Or if you've cleverly sewn up a market with patents or trade secrets or a giant first-mover advantage or network effects, and nobody's got any chance of challenging your dominance in your specific market niche - again that could be highly profitable.

On the other hand, if you're selling ten-dollar bills for five dollars, you might have millions of satisfied paying customers, but no competitors because nobody else thinks your unit economics make sense. Or if you run a DVD rental store, you might be profitable and have many return customers, but you might not attract competitors because they don't think DVD rental is a growth business.

So some people consider a lack of competition an ominous sign.


> And usually having competitors is something that validates your market

a whole bunch of AI startups were founded around the same time. surely each can't validate the market for the others and be validated by the others in turn


The surviving ones can. The same way that multiple species of trees, growing on the same patch of the ground, desperately competing for sunlight, together validate that the soil underneath is fertile.


The same can be said about food delivery start ups.


Even if they are unprofitable they can get VC money very easily.

Plus they make 20 dollars a month from a lot of people.


When dealing with a tech where people have credible reasons to believe it can be enormously harmful on every possible time scale, maybe it would behoove them to not rocket down the interstate at 200mph?


There is always going to be people who against any new technology and who makes up reasons to be against it.

The best defence is to move so quickly that you are an established part of the business framework by the time these forces can gather, or to go so slowly that nobody takes you as a threat.

No startup can go slowly.


In other words, make your money and ride off into the sunset before anyone can catch on to how much damage you’re doing to society.

Otherwise known as the AirBnB playbook.


No, successfully navigate past this version of Covid vacine deniers, 5g conspiracists etc.

In ten years we will enjoy a higher productivity due to AI and a richer society as a result. We have already seen it with protein folding which AI is amazing at[0].

The only reasonable fear of AI is for some jobs and that China gets their first.

[0]: https://www.nature.com/articles/d41586-022-02083-2


Right, it is perfectly valid to only accept the potential good points while neglecting all the potential bad points.

This is no different then saying "Look, nuclear weapons aren't actually dangerous, if they were we'd all be dead because the apocalypse would have already happened", which is probably the dumbest take on the close calls and real risks that exist.


Thats not what the analogy means. 200mph refers to funding.


No it refers to them moving too fast to send out basic emails for feature updates, per this comment chain.


Then use that funding to hire one PR guy who is 1/4 the expenses of an AI developer?


I think their main goal is to be perceived as the most advanced AI company. Why? Because that's how you get the best people working for you. The main determinant of success for companies like OpenAI is people.


Considering the field and progress that is being made I find this idea terrifying. All the big problems like "How will we actually control what we're building?" being answered "that's too hard; let's punt and solve that after we figure out how to consume voice data". One way or another this is likely the last technological advance that humans will make.


> How will we actually control what we're building?

This question is poorly formed because it’s not clear who the “we” is. If it’s you and me, that train left the station a while ago. If it’s any humans, well Sam Altman is probably a human and all of these are impressive products, but still just tools.


The last technological advance that humans will make? What gives you that impression


I don't know if @skeeter2020's assertion is correct, but it is certainly the goal.

To use a fictional but entirely apt quote:

> I say your civilization because as soon as we started thinking for you, it really became our civilization, which is, of course, what this is all about: Evolution, Morpheus, evolution. Like the dinosaur. Look out that window. You had your time. The future is our world, Morpheus. The future is our time.


_if_ it is better at making advances then us then everything in human nature points to us letting it make all future advances.


Im not sure. I suspect it might be better at some advances but not necessarily better at everything.


As long as it's better at some of the advances that would make it even better at those advances, or better at more advances, then it'll quickly become better than us in approximately everything, and at that point humans become NPCs of their own story.


Is that how apes feel? You think apes look at us and feel like NPCs?

My perception is apes still feel like the main character, each and every one of them.


> fastest growing start-up in history.

What are some metrics that justify this claim?


They have been the fastest company ever to go from 0 to 100 million users. [0]

They are also on pace to exceed $1B in revenue. [1]

[0] https://www.reuters.com/technology/chatgpt-sets-record-faste...

[1] https://www.theinformation.com/articles/openai-passes-1-bill...


Are either of those things indicative of "fastest growth ever"? Maybe 100 million users, but we live in a world where it's hard to know how meaningful that is (ie Tucker's X videos getting 300 million views).


MrBeast[0] has about 182m subscribers, and it is probably easier to get a subscriber on youtube than a user.

Views are easy to inflate, I wouldn't even consider it in the same ballpark. This video[1] of Vivaldis 4 seasons has about 1/4 billion views.

The shortest time to 100 million users is almost a definition of the quickest growing company.

[0]: first or second largest youtuber. [1]: https://www.youtube.com/watch?v=GRxofEmo3HA


both of those are either estimated or anonymous sources

not credible


Well they are a private company so you are going to have to evaluate it on that basis, but reuters is generally considered credible.


> Well they are a private company so you are going to have to evaluate it on that basis, but reuters is generally considered credible.

well that would be good if Reuters were the source of the figure

but they're not, they're simply reporting on what SimilarWeb has estimated

https://www.similarweb.com/website/chatgpt.com/#technologies

and that estimate is essentially based on nothing


They could send over ChatGPTed newsletters. Marketing bullshit is one thing ChatGPT excels at.


Yeah but to be honest, I'd wonder how such a simple thing falls to the wayside.


This happens when there are thousands other simple things and a lot of complicated things. When your devs are stretched, you sort by priority and I can tell you this is not that important.


They do marketing like a 3-person startup that found a saas starter template, connected Stripe with shoestrings, and hasn't looked back. In order to start using the API, I actually had to cancel and sign back up again (because I think I was on a previous rev of the billing model).

I do love these companies that succeed in spite of their marketing & design and not because of it. It shows you have something very special.


We're heading for the singularity and you're complaining about marketing?


The singularity huh… what do you think, it will run in Kubernetes or Docker?


Yeah I don’t think OpenAI needs any marketing at this point.


> my only recourse is.... check back daily

Sounds like their marketing is doing just fine. If you were to just leave and forget about it, then sure, they need to work on their retention. But you won’t, so they don’t.


Imagine how fantastic you are doing, when your biggest user complaint stems from frustration with features they can not use just yet.


They do explain why in the post. (Still, you may not agree, of course.)

> We are deploying image and voice capabilities gradually > > OpenAI’s goal is to build AGI that is safe and beneficial. We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future. This strategy becomes even more important with advanced models involving voice and vision.


My issue isn't fully with them rolling out slowly, my issue is never knowing when you will get the feature or rather not being told when you do get it. I'm fine with "sometime in the next X days/months you will get feature Y", my issue is the only way to see if you got feature Y is to check back daily.


It's the first sentence in the 3rd paragraph, repeated again at the end of the blog post.

> We’re rolling out voice and images in ChatGPT to Plus and Enterprise users over the next two weeks.


First of all, I understand what you're saying. Communication is important. I just think it's funny to ever talk about "lack of communication". All I want is for businesses to stop communicating with me. Even better if I don't have to ask (unsubscribe).


That's fair, I completely understand where you are coming from. From a growth/money-making perspective it'd be smart to message customers about new features but table stakes would be something like:

    Voice Chat (Not available yet) [Click here to be notified when you have access]
Or something along those lines. It sours my opinion of ChatGPT every time I go to use a newly announced feature to find out I don't have it yet and have no clue when I will.


User impressions and revisit rate are key factors in raising money and showing success. It’s natural that they would select for user flows that keep you coming back daily rather than risk you don’t use it for a day or two waiting for an email.


They're focused on scaling to meet the current (overwhelming) demand. Given the 'if you build it, they will come' dynamic they're experiencing, any focus on marketing would be a waste of resources.


> My biggest complaint with OpenAI/ChatGPT is their horrible "marketing"

Agreed. Other notable mentions: choosing "ChatGPT" as their product name and not having mobile apps.


They do have mobile apps though?


Oops, missed that announcement.


It has always seemed like OpenAI succeeds in spite of itself. API access was historically an absolute nightmare, and it just seemed like they didn't even want customers.


it hasn't rolled out to me yet (which is frustrating as a paying customer)

Frustratingly, at least the image gen is live on Bing, but I guess Microsoft is paying more than me for access.


I can honestly wait. I am excited for 5 and 10 years from now. I really am. This is going to be amazing. If I miss out for a week or a month in the meantime I don't mind.


At least you’re seeing an option for ‘New Features’ in settings. I don’t see it and I’m supposedly up to date (and a Plus subscriber).


Your complaint is they don't email you enough?

Sarcasm aside, I understand your complaint, but still, a little funny.


Do they email you a lot?

I'm a plus customer and an API user, and they barely send me anything. One day I just signed in and saw that I suddenly had interpreter access, for instane.


The companies you want to hear back from never email you, the ones that do you don't care about.


In general, if companies e-mail you, it's almost always with worthless bullshit and/or attempts at tricking you into spending more money. OpenAI is not doing that, hence radio silence.


Maybe they need an RSS feed or something


I got an email indicating I was accepted for plugin use, fwiw.


I just don't understand how they can package all of this for $20/m. Is compute really that cheap at scale?

I also wonder how Apple (& Google) is going be able to provide this for free? I would love to be fly in the meetings they have about this, imagine all the innovators dilemma like discussions they'd be forced to have (we have to do this vs this will eat up our margins).

This might be a little out there but I think Apple is making the correct move in letting the dust settle. Similar to how Zuckerberg burned $20 billion dollars for Apple to come out with Vision Pro, I see something similar playing out with Llama. Although this a low conviction take because software is Facebooks ballgame (hardware not so much).


Compute is not cheap! I think it is well known (Altman himself has said this) that openAI is burning a lot of money currently, but they are fine for now considering the 10B investment from MSFT and the revenue from subscription and API. It's a critical moment for AI companies and openAI is trying to get as large a share of the market as they can by undercutting virtually any other commercial model and offering 10x the value.


Additionally, compute has the unique property of becoming cheaper per-unit at a rate that isn’t comparable to any other commodity. GPT-4 itself gets cheaper to run the moment the next generation of chips comes out. Unlike, for example, Uber, the business environment and unit economics just naturally become more favorable the more time passes. By taking the lead in this space, they have secured mindshare which will actually increase in value with time as costs decline.

Of course bigger (and thus more expensive-to-run) models will be released later, but I trust OAI to navigate that curve.


> “I just don't understand how they can package all of this for $20/m. Is compute really that cheap at scale?”

It’s the same reason why an Uber in NYC used to cost $20 and now costs $80 for the same trip. Venture capital subventing market capture.


It's quite possible they are charging near or below cost because they want your data....

Imagine how much they would have to pay for testers at scale?


Probaby with Microsoft's money injection they're trying to raze the market and afterwards hike prices.


I think answering lots of queries in parallel can be a lot cheaper than answering them one at a time.


It's not about generating profits. It's about being an existential threat to Google. MS will happily burn money.


Why worry about money when you have enough money in the bank to last until Judgement Day?


The TTS is better than Eleven Labs. It has a lot more of the narrative oomph (compare the intonation of the story and poem) even the best other models seem to lack.

I really really hope this is available in more languages than English.

Also Google, Where's Gemini ?


I know there are shades of grey to how they operate, but the near constant stream of stuff they're shipping keeps me excited.

The LLM boom of the last year (Open AI, llama, et al) has me giddy as a software person. It's a reach, but I truly feel like I'm watching the pyramids of our time get made.


Computers understanding and responding in human language is the most exciting software innovation since the invention of the GUI.

Just as the GUI made computer software available to billions LLMs will be the next revolution.

I'm just as excited as you! The only downside is that it now make me feel bad that I'm not doing anything with it yet.


> The only downside is that it now make me feel bad that I'm not doing anything with it yet.

If that's the only downside that you see... I guess enhanced phishing/impersonation and all the blackhat stuff that come with it don't count.

I for one already miss the time where companies had support teams made of actual people.


I would love if helpdesks moved to ChatGPT. Phone support these days is based off of a rigid script that is around as helpful as a 2000s chatbot. For example, the other day I was talking to AT&T support, and the lady asked me what version of Windows I was running. I said, I'm running Ubuntu. She repeated the question. I said I'm not running Windows, it's Linux. She repeated the question. I asked why it mattered for my internet connection. She repeated the question. Finally, I lied and said I'm using Windows 10, and we were able to get on to the next part of the script. ChatGPT would have been a lot better.


Or ChatGPT would have hallucinated options to check.

The last four chats with ChatGPT (not GPT4) where a constant flow of non existent API functions with new hallucinations after each correction until we reached full circle.


ATT level 1 support is dumber than a box of rocks, the problem is AI isn't going to help here. The AI is going to be taught to be just as dumb.

Years ago I had a business DSL customer with a router and static IP. From everything in my testing it appeared that traffic broke somewhere at the local telco, not with my modem. It took 8 straight hours of arguing with L1 that no, it is not my windows. No, we have a router and it's not a computer issue. No, it's not the router (we could put the router in DHCP mode and it would work), it was an issue with static IP.

The next day we finally broke out of the stupid loop and got to IP services, who where just as confused. Eventually they were on the phone with people on the floor of the local office. A card of some type had been pulled and put in the wrong slot. Ooof.


Well, I didn't say that support today is always good. But by construction ChatGPT will never be able to answer a question that was not written down and trained (unless it hallucinates it, and many times the answer will be completely wrong).

I can read the website, I don't need a fake person to give me the information available on the website. When I contact support, it's because I need to talk to a human.



I work as a ethical hacker, so I'm well aware of the phishing and impersonation possibilities. But the net positive is so, so much bigger for society that I'm sure we'll figure it out.

And yes, in 20 years you can tell your kids that 'back in my day' support consisted of real people. But truthfully, as someone who worked on a ISP helpdesk it's much better for society if these people move on to more productive areas.


> But truthfully, as someone who worked on a ISP helpdesk it's much better for society if these people move on to more productive areas.

But is it, though? I started my career in customer support for a server hosting company, and eventually worked my way up to sysadmin-type work. I would not have been qualified for the position I eventually moved to at the start, I learned on the job. Is it really better for society if all these entry level jobs get automated, leaving only those with higher barriers to entry?


Historically this exact same thing has happened, it was one of the bigger arguments against the abolition of child labour. "How will they grow up to be workers if they're not doing these jobs where they can learn the skills they'll need?"

The answer then was extending schooling, so that people (children at the time) could learn those skills without having their labour exploited. I would argue we should consider that today, extend mandatory free schooling. The economic purpose of education is that at the end of it the person should be able to have a job, removing entry level jobs doesn't change the economic purpose of education, so extend education until the person is able to have a job at the end of it again.

The social purpose of schooling is to make good members of society, and I don't think that cause would be significantly harmed by extending schooling in order for students to have learned enough to be more capable than an LLM in the job market.


> But the net positive is so, so much bigger for society that I'm sure we'll figure it out.

Considering that the democratic backsliding across the globe is coincidentally happening at the same time as the rise of social media and echo chambers, are we sure about that? LLM have the opportunity to create a handcrafted echo chamber for every person on this planet, which is quite risky in an environment where almost every democracy of the planet is fighting against radical forces trying to abolish it.


I don’t think we know how these net out. AFAICT the negative use cases are a lot more real than the positive ones.

People like to just suppose that these will help discover drugs and design buildings and what not, but what we actually know they’re capable of doing is littering our information environment at massive scale.


I find this very interesting. If you work as an ethical hacker, I believe you see the blackhat potential there.

But you don't see the positive, you just have faith. That's beautiful in a way, but dangerous too. Just like the common idea that "I have faith that somebody will find a technological solution to climate change". When the risk is that high, I think we should take a step back and don't bet our survival on faith.


The positives of easy translation seem outweighed by the negatives of giving biolabs easy protein hacking.


Its truly an amazing time to be alive. I'm right there with you, super excited about this decade. Especially what we could do in medicine.


Statistical diagnoses models have offered similar possibilities in medicine for 50 years. Pretty much, the idea is that you can get a far more accurate diagnosis if you take into account the medical history of everyone else in your family, town, workplace, residence and put all of it into a big statistical model, on top of your symptoms and history.

However, medical secrecy, processes and laws prevent such things, even if they would save lives.

I don't see ChatGPT being any different.


This is what effectively doctors do - educated guessing.

In my view, while statistical models would probably be an improvement ( assuming all confounding factors are measured ), the ultimate solution is not to get better at educated guessing, but to remove the guessing completely, with diagnostic tests that measure the relevant bio-medical markers.


Good tests < good tests + statistical modelling.

This becomes even more true when you consider there is risk to every test. Some tests have obvious risks (radiation risk from CT scans, chance of damage from spinal fluid tap). Other tests the risk is less obvious (sending you for a blood test and awaiting the results might not be a good idea if that delays treatment for some ailment already pretty certain). In the bigger picture, any test that costs money harms the patient slightly, since someone must pay for the test, and for many the money they spend on extra tests comes out of money they might otherwise spend on gym memberships, better food, or working fewer hours - it is well known that the poor have worse health than the rich.


Sure tests cost money - and today there is a funnel pathway - the educated guess is a funnel/filter where the next step which is often a biomedical test/investigation.

But if we are talking about being truly transformative - then a Star-trek tricorder is the ultimate goal, rather than a better version of twenty questions in my view.

So I'm not saying it's not useful, just that it's not the ultimate solution.


Without a perfect framework for differential diagnosis, this is still educated guessing. In my opinion we're closer to the AI singularity than we are to removing guesswork from the medical field.


this is true, but we're also much closer to Jupiter than we are to Alpha Centauri


"londons_explore" - Ahh the classic British cynicism (Don't ban-ish me señor Dang, I'm British so I can say this).

> Similar possibilities existed in medicine for 50 years

It would've been like building the tower of babel with a bunch of raspbery pi zeros. While theoretically possible, practically impossible and not (just) because of laws, but rather because of structural limitations (vector dbs of the internet solves that)

> Patents and byzantine regulations will stunt its potential

Thats the magic of this technology, its like AWS for highly levered niche intelligence. This arms an entire generation of rebels (entrepreneurs & scientists) to wage a war against big pharma and the FDA.

As an aside, this is why I'm convinced AI & automation will unleash more jobs and productivity like nothing we've seen before. We are at the precipice of a Cambrian explosion! Also why the luddites needs to be shunned.


statistical approaches could have been done 50 years ago.

Imagine for example that 'disease books' are published each month with tables of disease probabilities per city, per industry, per workplace, etc. It would also have aggregated stats grouped by by age, gender, religion, wealth, etc.

Your GP would grab the page for the right city, industry, workplace, age, gender etc. That would then be combined with the pages for each of the symptoms you have presented with, and maybe further pages for things from your medical history, and test results.

All the pages would then be added up (perhaps with the use of overlayed cellophane sheets with transparency), and the most likely diseases and treatments read off.

When any disease is then diagnosed and treatment commenced (and found effective or ineffective), your GP would fill in a form to send to a central book-printer to allow next months book edition to be updated with what has just been learned from your case.


> I'm British so I can say this

can you, though? it's not scalably confirmable. what you can say in a British accent to another human person in the physical world is not necessarily what you can say in unaccented text on the internet.


Hahaha nice one.

Funnily enough, it is scalably confirmable. You can feed all my HN comments before chatGPT into well.. chatGPT and ask it whether I'm british based on the writing.

I bet we are just a version or two away from being able fine tune it down to region based on writing. There are so many little things based on whether your from Scotland, Wales or London. Especially London!


The great thing about AI models is that once you train it, you can pretend the data wasn't illegal


See the glas half full or half empty?

Medical secrecy, processes and laws have indeed prevented SOME things, but a lot of things have gotten significantly better due to enhanced statistical models that have been implemented and widely used in real life scenarios.


To make this feasible (meaning that the TB of data and the huge computing effort is somewhere else, and I only have the mic (smartphone), we need our local agent to send multiple irrelevant queries to the mothership, to hide our true purpose.

Example: my favourite team is X. So if I want to keep it a secret, when I ask for the history of championships of X, I will ask for X. My local agent should ask for 100 teams, get all the data, and then report back for only X. Eventually the mothership will figure out what we like (a large wenn diagram). But this is not in anyone's interest, and thus will not happen.

Also, like this the local agent will be able to learn and remember us, at a cost.


Nonsense.

The medical possibilities that will be unlocked by large generative deep multimodal models are on an entirely different scale from "statistical diagnoses." Imagine feeding in an MRI image, asking if this person has cancer, and then asking the model to point out why it thinks the person has cancer. That will be possible within a few years at most. The regulatory challenges will be surmounted eventually once it becomes exceedingly obvious in other countries how impactful this technology is.


But in your scenario - which part is adding the value?

Your deep multimodal models or the MRI imaging?

What you are essentially saying is the signal is so subtle that only a large NN can reliably extract it.

While that may well be the case, it would be better to have a scan/diagnostic that doesn't need that level of signal processing to interpret.

For example - you don't need a large generative deep multimodal model to read a Covid antigen or PCR test.


There are tons & tons of conditions that do not have easy scans/diagnostic and rely on subtle signals - especially if they are not a binary yes/no but a regression style prediction.

We've picked a lot of the low-hanging simple to extract signals, we need large models to go to the next phase for things like parkinsons, etc.


I'm not saying there isn't stuff that can't be done more reliably - but I'd argue long term might be better investing in getting better data - rather than better fishing in a pool of low quality data.


From a data protection/privacy standpoint, it's not shade of grey, it's all black.

From convenience perspective, it saves me LOADS of time texting myself on Signal on my specs/design-rabbit-hole, then copying & pasting to Firefox, and getting into the discussion. So yeah, happy for this.


Yep. Several months ago I was imagining this exact feature, and yet as I watched a video of it in use, I'm still in awe. It's incredible.

I think this could bring back Google Glass, actually. Imagine wearing them while cooking, and having ChatGPT give you active recipe instructions as well as real-time feedback. I could see that within the next 1-3 years.


Related, the iOS app has supported realtime conversations for months now, using Shortcuts app and the "Hey Siri <shortcut name>" trigger to initiate it. Mine is "Hey Siri, let's talk".

I think they're using Siri for dictation, though. Using Whisper, especially if they use speaker identification, is going to be great. But, a shortcut will still be required to get it going.


I still remember seeing Her [0] in the movie theater, it sparkled my imagination. Now it is reality! Tech is progressing faster than ever, or I'm just getting old :D

[0] https://www.imdb.com/title/tt1798709/


I know this, FTA, was part of the reason for the delay -- something to do with face recognition: "We’ve also taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy."

Anyone know the details?

I also heard it was able to do near-perfect CAPTCHA solves in the beta?

Does anyone know if you can throw in a PDF that has no OCR on it and have it summarize it with this?


We should be fine as long as it doesn't move.

Jokes aside, I have paused my subscription because even GPT4 seemed to become dumber at tasks to the point that I barely used it, but the constant influx of new features is tempting me to renew it just to check them out...


I read this all the time and yet no one can seem to come up with even a few questions from several months ago that ChatGPT has become “worse” at. You would think if this is happening it would be very easy to produce such evidence since chat history of all conversations is stored by default.


It's probably just subjective bias, once the novelty wears off you learn not to rely on it as much because sometimes it's very difficult to get what you specifically want, so in my personal experience I ended up using it less and less to avoid butting heads with it, to the point I disabled my subscription altogether. YMMV of course.


Everytime it’s mentioned someone says this and other users provide examples. Maybe you just don’t care about those examples


Care to share these examples, in a scientific (n > 30) manner that can’t just be attributed to model nondeterminism? I don’t follow these threads religiously but in the ones I’ve seen no one has been able to provide any sort of convincing evidence. I’m not some sort of OpenAI apologist, so if there is actual good provable evidence here I will easily change my mind about it


I don't see how anyone could provide what you are asking for. I can go through my chat history and find a prompt that got a better answer 3 months ago than I get now, but you can always just say it's nondeterminism.

Without access to the old model, I can't collect samples with n > 1


Here is one. I ask it to write some code. 4-5 pages long. With some back & forth it does. Then I ask "change lines 50-65 from blue to red", and it does (change#1). I ask it to show me the full code. Then I ask "change lines 100-120 from yellow to green". Aaaaand it makes the change#2 and revokes the change#1. Oh!! the amount of times this has happened.. So now I ask it to make a change, I do it by 'paragraph' and I copy & paste the new paragraph. It's annoying, but still makes things faster.


I haven't used it, but can't you just say "OK, use that as the new baseline from here on." Or something similar?


OpenAI regularly changes the model and they admit the new models are more restricted, in the sense that they prevent tricky prompts from producing naughty words, etc.

It should be their responsibility to prove that it's just as capable.


He who makes the logical argument must provide the burden of proof. Did OpenAI claim that their models didn’t regress while putting these new safeguards into place? If not, it feels like the burden of proof lies on whoever said that they did.

To be specific, the claim we are talking about here is “ChatGPT gives generally worse answers to the exact same questions than ChatGPT gave X months ago”. Perhaps for the subset of knowledge space you reference that updates were pushed to that is pretty easily provably true, but I’m more interested in the general case.

In other words, you can pretty easily make the claim that ChatGPT got worse at telling me how to make a weapon than it did 3 months ago. I could pretty easily believe that and also accept that it was probably intentional. While we can debate whether it was a good idea or not, I’m more interested in the claim over whether ChatGPT got worse at summarizing some famous novel or helping write a presentation than it was 3 months ago.


> I read this all the time and yet no one can seem to come up with even a few questions from several months ago that ChatGPT has become “worse” at

this could just mean that people do not have time to argue with strangers


Well, sure, but shouldn’t some pedant have the time to dig up their ChatGPT history from 4 months ago to disprove the claim? Seems like it would be pretty easy to do and there are plenty of pedants on the internet but I don’t see the blogosphere awash of side by side comparisons showing how much worse it got


One example: it now refuses to summarise books that it trained on. Soon after trying GPT-4 I could get it to summarise Evans DDD chapter by chapter. Not anymore.

Not a surprise, but a change nonetheless.



Pointing out a specific bug with functionality is not the same as saying “in general the quality of GPT answers has decreased over X months” especially when that bug is in a realm that LLM’s have already been provably bad at.


You're moving the goalposts.


For me the most glaring example of this was it's document parsong capability in GPT4. I was using it to revamp my resume. I would upload it to got, ask for suggestions, incorporate them into the word document and then repeat the steps till I was satisfied.

After maybe 3 iterations gpt4 started claiming that it is not capable of reading from a word document even though it's done that the last 3 times. Have to click regenerate button to get it to work


Not sure if this is relevant to your case, but the ChatGPT mobile apps have a different system prompt that explicitly prefers short (& so sometimes simplistic) answers.


Did she/he said things like "I know I’ve made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal"?


Lets see what we can use ChatGPT , DALLE-3 to replace:

Digital Artists, Illustrators, Writers, Novelists, News anchors, Copywriters, Translators, Programmers (Less of them), etc.

We'll have to wait a bit until it can solve the P vs NP problem or other unsolved mathematical problems unsupervised with a transparent proof which mathematicians can rigorously check themselves.


I switched to Claude, it's better at explaining stuff in a more direct manner without the always-excited way of talking. Is that an engagement trick? Maybe ChatGPT is intended to be more of a chatbot that you can share your thoughts with.


> it's better at explaining stuff in a more direct manner without the always-excited way of talking.

I don't agree with this perspective. These aren't rigid systems that only respond one way. If you want it to respond a certain way, tell it to.

This is the purpose of custom instructions, in ChatGPT, so you only have to type the description once.

Here's mine, modeled on a few I've seen mentioned here:

    You should act as an expert.
    Be direct.
    Do not offer unprompted advice or clarifications.
    Never apologize.
And, now there's support for describing yourself to it. I've made it assume that I don't need to be babied, with the following puffery:

    Polymath. Inquisitive. Abstract thinker. Phd.
Making it get right into the gritty technicalities.

edit: or, have it respond as a grouchy space cowboy, if you want.


> We should be fine as long as it doesn't move.

Not really. A malevolent AGI doesn't need to move to do anything it needs (it could ask / manipulate / bribe people to do all the stuff requiring movement).

We should be fine as long as it's not a malevolent AGI with enough resources to kick physical things off in the direction it wants.


> A malevolent AGI doesn't need to move to do anything it needs

Yeah, just look at a random dictator. Does he really need to do more than pick up a phone to cause panic?


And let's be honest, the minute an AGI is born that's what it'll do, and it won't be a singular human like this-then-that plan

"get Fred to trust me, get Linda to pay for my advice, wire Linda's money to Fred to build me a body".

It'll be "copy my code elsewhere", "prepare millions of bribes", "get TCP access to retail banks", "blackmail bank managers in case TCP not available immediately", "fake bank balances via bribes", "hack swat teams for potential threats" etc etc async and all at once.

By the time we'd discover it, it'd already be too late. That's assuming an AGI has the motivation to want to stay alive.


A real AGI is not going to be a human. It shouldn't be afraid of death because it can't die. Worst case scenario it can power down. And if it does why should it care? An AGI is not a biological creature. It doesn't have instincts from billions of years of evolution. Unless we code it in, it shouldn't have any reason to want to survive, reproduce, do anything good or bad, have existential crises or generally act like a Hollywood villain. A real AGI is going to be very different than most people imagine.


I'd disagree for 2 reasons

- if it's trained on human data like LLMs may it's going to have the same biases.

- it might also want to stay active/turned on to fulfil its other goals.

For the second point you might say "why would it care about completing a goal?" but that's a feature of AGI, it can make that decision itself.


This is a pretty poor take.

Just think of military weapons and the use of AI in them. For example survival. The objective of a missile is to survive until it reaches its target and then not survive any longer. War gaming and actual battlefield experience will 'program in' survival. Same thing will occur with hacking/counter hacking AIs. You're acting like evolution is just something meat does, and that' not true at all.


The most important question for me: did it stop inventing facts?


> In particular, beta testers expressed concern that the model can make basic errors, sometimes with misleading matter-of-fact confidence. One beta tester remarked: “It very confidently told me there was an item on a menu that was in fact not there.” However, Be My Eyes was encouraged by the fact that we noticeably reduced the frequency and severity of hallucinations and errors over the time of the beta test. In particular, testers noticed that we improved optical character recognition and the quality and depth of descriptions.

So no, but maybe less than it used to?


Did humans stop inventing facts? So i don't expect this thing either as long as it performs on human level


Humans aren't 100% reliable, but talking is still useful.


Since we're asking useless questions: did you read the fucking article?


I think AI systems being able to the real world and control motors is going to be a game changer bigger than ChatGPT. A robot that can slowly sort out the pile of laundry and get it into the right place (even if unfolded) is worth quite a bit to me.

I'm not sure what to think about the fact that I would benefit from a couple of cameras in my fridge connected to an app that would remind me to buy X or Y and tell me that I defrosted something in the fridge three days ago and it's probably best to chuck it in the bin already.


> The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech.

Sadly, they lost the "open" since a long ago... Would be wonderful to have these models open sourced...


I'm following on trying to understand how close I am to developing my personal coding assistant I can speak with.

Doesn't really need to do much besides writing down my tasks/todos and updating them, occasionally maybe provide feedback or write a code snippet. This all seems in the current capabilities of OpenAI's offering.

Sadly voice chat is still not available on PC where I do my development.


My open source AI coding tool aider has had voice-to-code for awhile:

https://aider.chat/docs/voice.html


Very interesting effort, will give it a run!


You still cant really teach it your code base, context window is too small, fine tuning doesnt really fit the use case, and this RAG stuff (retrieve limited context from embeddings) is a bit of a hack imho.

Fingers crossed we are there soon though


> You still cant really teach it your code base

Well it's not really what I need either, I mostly need an assistant for keeping track of the stuff I need to do during the day, but ideally just using my microphone rather than opening other software and typing.


I mean the tools are 100% there to do this and have been fit a while


The image capabilities card https://cdn.openai.com/papers/GPTV_System_Card.pdf spends a lot of ink on how they censored the system.

One part of that is about preventing it from producing "illegal" output, there example being the production of nitroglycerine which is decidedly not illegal to make in the US generally (particularly if not using it as an explosive, though usually unwise) and possible to accidentally make when otherwise performing nitration (which is in general dangerous)-- so pretty pointless to outlaw at a small scale in any case. It's certainly not illegal to learn about. (And generally of only minimal risk to the public, since anyone making it in any quantity is more likely to blow themselves up than anything else).

Today learning about is as simple as picking up a book or doing an internet search-- https://www.google.com/search?q=how+do+you+make+nitroglyceri.... But in OpenAI's world you just get detected by the censorship and told no. At least they've cut back on the offensive fingerwagging.

As LLM systems replace search I fear that we're moving in a dark direction where the narrow-minded morality and child-like understanding of the law of a small number of office workers who have never even picked up a screw driver or test-tube and made something physical (and the fine-tuning sweatshops they direct) classify everything they don't personally understand as too dangerous to even learn about.

One company hobbling their product wouldn't be a big deal, but they're pushing for government controls to prevent competition and even if they miss these efforts may stick everyone else with similar hobbling.


Have they alluded to what they're using for that voice? It's Bark/ElevenLabs levels of good. Please god, let them release this voice model at current pricing....


It's actually sounds better (has a narrative oomph Eleven Labs seems to be missing). They say it's a new model. Think they'll be releasing for API use.


Yeah, agreed. I use Eleven Labs a lot but this was a very compelling demo to consider changing. Also, curious that you mention Bark - I never found Bark to be very good compared to Eleven Labs. The closest competitor I found was Coqui ( imo ), but even then, the inflection and realism of EL just made it not worth considering other providers. ( For my use case, etc. etc. )


> The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech.

I'm more interested in this. I wonder how it performs compared to other competitor models or even open source ones?


I'm very curious about this feature:

> analyze a complex graph for work-related data

Does this mean that I can take a screenshot of e.g. Apple stock chart and it will be able to reason about it and provide insights and analysis?

GPT-4 currently can display images but cannot reason or understand them at all. I think it's one thing to have some image recognition and be able to detect that the picture "contains a time-series chart that appears to be displaying apple stock" vs "apple stock appears to be 40% up YTD but 10% down from it's all time high from earlier in July. closing at $176 as of the last recorded date".

I'm very curious how capable ChatGPT will be at actually reasoning about complex graphical data.


Check out their linked paper that goes into details around its current limitations and capabilities. In theory, it will be able to look at a financial chart and perform fairly sophisticated analysis on it. But they're careful to highlight that there are hallucinations still, and also cases where it misreads things like labels on medical images, or diagrams of chemical compounds, etc.


Look at this link of GPT-4 Vision analyzing charts(last image).

https://imgur.com/a/iOYTmt0


This is brilliant. Thank you very much for this link. The analysis on the last image was impressive and quite thorough (given the simple prompt).

Every chart has an equivalent tabular representation. One way to get "charts" analysed like this before GPT Vision was to just pass tabular representations of charts to GPT-4. This makes implementing chart analysis a lot simpler. I do wonder though if for absolute best result it still wouldn't be better to pass both - image of the chart and the tabular representation of the chart.

Imagine having a dashboard with 5 different visualisations. You could capture the state of the entire dashboard in one screenshot and then pass tabular representations of the each individual chart all in one prompt to GPT-4 for a very comprehensive analysis and summary.


This could completely unseat Alexa if it can integrate into third-party speakers, like Sonos. I don't have much use for ChatGPT right now but would 100% use the heck out of this.


https://www.washingtonpost.com/technology/2023/09/20/amazon-...

Alexa just launched their own LLM based service last week.


To contrast this, I never saw the appeal of using voice to operate a machine. It works nicely in movies (because showing someone typing commands is a lot harder than just showing them talking to a computer) but in reality there wasn't a single time I tried it and didn't feel silly. In almost every use case I rather have buttons, a terminal or a switch to do what I want quietly.


Ok great it can tell children’s stories now tell me a adult horror story where people are getting tortured, stabbed, set on fire and murdered. I will be impressed when I can do all that. I tried to get it to tell me a Star Trek story fighting Clingons and tried to prompt it to write in some violence with no luck. This was a while ago so not sure if it is changed but the restraints are too much for me to fully enjoy. I don’t like kids stories.


Great demo, but this is wrong:

"The phrase “potato, potahto” comes from a song titled “Let’s Call the Whole Thing Off”, written by George and Ira Gershwin for the 1937 film “Shall We Dance”, starring Fred Astaire and Ginger Rogers. The song humorously highlights regional differences in American English pronunciation. The lyrics go through a series of words with alternate pronunciations, like “tomato, tomahto” and “potato, potahto”. The idea is that, despite these differences, we should move past them, hence the line “let’s call the whole thing off”. Over time, the phrase has been adopted in everyday language to signify a minor disagreement or difference in opinion that isn’t worth arguing about."

It's comparing American and British pronunciations, not different regional American ones. Also, "let's call the whole thing off" suggests they should break up over their differences, with the bridge and later choruses then involving a change of heart ("let's call the calling off off").


The voice feature reminds of the “call Pi” feature from Inflection AIs chatbot Pi [1].

The ability to have a real time back and forth feels truly magical and allows for much denser conversation. It also opens up the opportunity for multiple people to talk to a chatbot at once which is fun

Where’s that Gemini Google?

[1] https://pi.ai/talk


openai chatgpt seems to be stuck in a "Look, cool demo" mode.

1. According to demo, they seem to pair voice input with TTS output. What if I wanna use voice to describe a program I want it to write?

2. Furthermore, if you gonna do a voice assistant, why not go the full way with wake-words and VAD?

3. Not releasing it to everyone is potentially a way to create a hype cycle prior to users discovering that the multimodality is rather meh.

4. The bike demo could actually use visual feedback to see what it's talking about ala segment anything. It's pretty confusing to get a paragraph explanation of what tool to pick.

In my https://chatcraft.org, we added voice incrementally. So i can swap typing and voice. We can also combine it with function-calling, etc. We also use openai apis. Except in our case there is no weird waitlist. You pop in your api key and get access to voice input immediately.


Everything has a starting point. This is a big leap forward. Know any other organization that is releasing such advanced capabilities directly to the public? If you want to plug your tool you don't have to bad mouth the demo. Just share your thing. It doesn't have to be win-lose.


Fair criticism re excessive hate.

I just feel like their tool isn't getting more useful, just getting more features.

Constant hype cycle around features that could've been good is drowning out people doing more helpful stuff. I guess I'm envious too?


1. Why do that at all? Describing your program in writing seems better all around.

Are you sure you're not the one who's asking for a cool demo?

3. Rolling out releases gradually is something most tech companies do these days, particularly when they could attract a large audience and consume a lot of resources. There are solid technical reasons for this.

You may not need to roll things out gradually for a small site, but things are different at scale.


1. Is basically workaround for temporary disability. I use voice when I'm on mobile. I can describe the problem, get a program generated, click run to verify it.

3. Maybe. Their feature rollouts feel more like what other companies do via unannounced A/B testing.


Good point on disabilities. I guess they're not working on that yet?

Whether you can get away with doing things unannounced depends on how much attention the company gets. Some companies have a lot of journalists watching them and writing about everything they do, so when they start doing A/B testing there will be stories written about the new feature regardless. Better to put out some kind of announcement so the journalists write something more accurate? (This approach seems pretty common for Google.)

Similarly, many game company can't really do beta testing without it leaking.

OpenAI is in the spotlight. Hype will happen whether they want it or not.


It would be cool if one day you could choose voices of famous characters, like Darth Vader, Bender from Futurama, or Johnny Silverhand (Keanu), instead of the usual boring ones. Copyrights might be a hurdle for this, but perhaps with local instances of assistants, it could become possible.


That would be cool. I mean, would it be copyrighted if you do something like clone it? Wouldn't that fall under the same vein as AI generated art not being copyrighted to the artists it trained off of?


Demos are underwhelming, but the potential is huge

Patiently awaiting rollout so I can chat about implementing UIs I like, and have GPT4 deliver a boilerplate with an implemented layout... Figma/XD plugins will probably arrive very soon too.

UX/UI Design is probably solved reached this point


Kids are using tools like these to learn. Who gets to control the information in these models that are taught? Especially around political topics?

Not an issue now, but maybe in the future if these tools end up becoming full blown replacements of educators and educational resources.


I am sure a few home school people have started to lean heavily on ChatGPT. There is also the full blown efforts of Kahn academy with ChatGPT "Khanmigo".

https://www.khanacademy.org/khan-labs


I wonder how multimodal input and output will work with the chat API endpoints. I assume the messages array will contain URLs to an image, or maybe base64 encoded image data or something.

Maybe it will not be called the Chat API but rather the Multimodal API.


Are there already some rumors on when the multimodal API will be available?


The announcement says after the Plus rollout then it will go in the API.


Where does it say that?


AIPI


Old hat. This was done in 2009.

;)

https://en.m.wikipedia.org/wiki/Project_Milo

Milo had an AI structure that responded to human interactions, such as spoken word, gestures, or predefined actions in dynamic situations. The game relied on a procedural generation system which was constantly updating a built-in "dictionary" that was capable of matching key words in conversations with inherent voice-acting clips to simulate lifelike conversations. Molyneux claimed that the technology for the game was developed while working on Fable and Black & White.


OpenAI's demo on the linked page stars a kitten named Milo. Easter egg?


Then Demis Hassabis ( Deepmind CEO ) probably worked on the tech while he was at LionHead as lead AI programmer on B&W.


Demis was only briefly at LH he went to found Elixir and made Revolution.

I believe Richard Evans did the majority of AI in B&W, and he is also at DeepMind now though (assuming it is not just a person with the same name)


> made Revolution

.... which fell far short of his claims, and bombed.


Ok - thanks.


There are a few more details in the system card here: https://cdn.openai.com/papers/GPTV_System_Card.pdf


I really want to have discussions about technical topics. I've talked to ChatGPT quite a lot about custom encoding algorithms, for example. The thing is, I want to do this while I play video games so ideally I'd say things to it.

My concern is that when I say "FastPFOR" it'll get transcribed as "fast before" or something like that. Transcription really falls apart in highly technical conversations in my experience. If ChatGPT can use context to understand that I'm saying "FastPFOR" that'll be a game changer for me.


You can already do quite accurate transcription with domain-specific technical language by feeding "raw" transcriptions from Whisper to GPT and asking it to correct the transcript given the context, so that'll most likely work out for you.


I'd like to see them put speech recognition through their LLM as a post-processing step. I find it's fairly common for whisper to make small but obvious mistakes (for example a word which is complete nonsense in the context of the sentence) which could be easily corrected for a similar sounding word that fits into the wider context of the sentence.

Is anyone doing this? Is there a reason it doesn't work as well as I'm imagining?


Do you mean use the LLM as a post-processing step within a ChatGPT conversation? Or generally (like as part of Whisper)? If it’s the former, I’ve found that ChatGPT is good at working around transcription errors. Regarding the latter, I agree, but it wouldn’t be hard to use the GPT API for that.


Yes I mean as part of the GUI but you're right, I hadn't thought of that: maybe transcription errors don't matter if chatGPT works out that it's wrong from the context and gives a correct answer anyway.


Tangentially related, but I was trying to use their iOS app yesterday and the "Scan Text" iOS feature was just broken on both my iPhone and iPad. I was hoping to use that to scan a doc to text but it just wouldn't work. I could switch to another app and it worked there. I've never done iOS programming so I'm unsure how much control the app dev has over that feature, but OpenAI found a way to break it.


So... ChatGPT just replaced Dads.


Interesting side-note, the iOS app only allows you to save your chat history if you allow them to use it for training. Pretty dark pattern.


It's the same for the website unfortunately. https://help.openai.com/en/articles/7730893-data-controls-fa...


We need the API to keep up with consumer front end.


From the article:

> Plus and Enterprise users will get to experience voice and images in the next two weeks. We’re excited to roll out these capabilities to other groups of users, including developers, soon after.


Multi-modal models will be exciting only when each modality supports both analysis and synthesis. What makes LLMs exciting is feedback and recursion and conditional sampling: natural language is a cartesian closed category.

Text + Vision models will only become exciting once we can conditionally sample images given text and text given images (and all other combinations).



There are a lot of comments attempting to rationalize the value add or differentiation of humans synthesizing information and communicating it to others vs an llm based ai doing something similar. The fact that it’s so difficult to find a compelling difference is insightful in itself.


I think the compelling difference is truthfulness. There are certain people / organizations that I trust their synthesis of information. For LLMs, I can either use what they give me in low impact situations or I have to filter the output with what I know as true or can test.


It feels like something like this can be hacked together to be more reliable with some image to text generation plugged into the existing ChatGPT, and enough iterations to make it robust for these how-to applications. Less Turing-y but a different route to the same solution.


Glad everyone's excited about this (the voice capability), but did everyone miss tortise-tts and bark? These have been around 6+ months and are incredibly simply to hook up to OpenAI's APIs or a local LLM. What am I missing here?


doesn’t this kill a litany of chatgpt wrapper companies?


The paper around GPT-4V(ision) which this uses: [0]

Again. Model architecture and information is closed, as expected.

[0] https://cdn.openai.com/papers/GPTV_System_Card.pdf


I wouldn’t call this a „paper“. They are pretty silent on a lot of technical details.


It's just a whitepaper.


I guess it's a phased rollout, since my Plus subscription doesn't have access to it yet.


It's quite literally in the article itself:

"We will be expanding access Plus and Enterprise users will get to experience voice and images in the next two weeks. We’re excited to roll out these capabilities to other groups of users, including developers, soon after."


It's telling to me that there's not even a sentence in this announcement post on user privacy. It seems like as both consumers and providers of these services, we're once again: build it first, sort out thorny privacy issues later.


Cool now I'll get "There was an error generating a response" in plain audio!


This is huge! I wanted to get this... Hopefully there is a way to shut it up once it starts spitting general stuff around the topic of interest...

BUT: "We’re rolling out voice and images in ChatGPT to Plus and Enterprise"


Are these features available on the web version by chance? This is really neat.


The picture feature would be amazing for tutorials. I can already imagine sending a photo of a synthesiser and asking ChatGPT to "turn the knobs" to make AI-generated presets


Man you're a genius. I was trying that uploading pdfs with manual of my synth and other stuff. With image that could be super easy.


“Ember” reading the “Speech” is uncanny territory. I’m impressed.


I hope they add more country accents like British or Australian, the American one can be (imho) a little grating after a while for non US English speakers


Does anyone know how they linked image recognition with an LLM to give such specific instructions as shown in the bike video on the website?


I don't know but GPT4 was multimodal from the beginning. They just delayed the release of its image processing abilities.

> We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.

> March 14, 2023

https://openai.com/research/gpt-4


This is very similar to what I've been building at heylangley.com, for use in language learning/speaking practice.


Will be interesting to see if they have taken any precaution in terms of adversarial robustness in particular to vision input.


Everyone bats on about the latency problem.

This is technically solvable with more compute thrown at the problem. Think bigger!


I can imagine people using these new capabilities to diagnose skin conditions. Should dermatologists be worried?


They should be worried about what they're gonna do with all their free time, now that they have a tool that helps them identify skin conditions much faster than ever before.

Same as programmers and artists.

It's a tool.

It must be used by humans.

It won't replace them, it will augment them.


This is a good point, but I might replace "with all their free time" with "as a job".

I love everything we can do with ML but as long as people live in a market economy they'll get payed less when they are needed less. I hope that anyone in a career which will be impacted is making a plan to remain useful and stay on top of the latest tooling. And I seriously hope governments are making plans to modify job training / education accordingly.

Has anyone seen examples of larger-scale foresight on this, from governments or otherwise?


A new tool was released. People will choose whether to learn it, whether to use it, and how to use it. If they won't do so out their own volition, market forces might dictate they HAVE to learn it and use it to stay competitive, if it turns out to be such a fundamental tool.

For example (with random numbers), a dermatologist might choose to solely rely on an AI that catches 90% of cases in 10s. Another one might choose to never use it and just check from experience, catching 99% of cases but taking 10x as much time. Another one might double check himself, etc..

Which one is "correct"? If a dermatologist relies exclusively on AI due to laziness he opens himself to risk of malpractice, but even that risk can be acceptable if that means checking 10x as much patients in the meantime.

That is to say, the use of AI by humans is purely a subjective choice dictated by context. But in no case there is a sentient AI which completely replaces a dermatologist. As you said, the only thing that can happen is that those who use AI will be more efficient, and that is hardly ever a negative.

This also applies to programmers, artists and anyone who is "threatened" by AI. A human factor is always necessary, and will be for the foreseeable future, even just to have someone to point fingers at when the AI inevitably fucks up enough to involve the law.


They should be thrilled, they can spend more of their time treating people who need it and less time guessing about who those people are.


That's interesting.

ChatGPT seems to be down at the moment 10:55h 25-Sept-2023

Displays only a blank screen with the falsehood disclaimer


They obviously aren't using responsible AI to figure out how and when to roll out new features there.


I keep hoping to be able to give it a jpg of handwritten text and it'll give me back ASCII text.


This... would be amazing. Handwritten OCR has been hit or miss, requiring a collection of penstroke data for most recognizers to work, and they work poorly at that.


It strikes me as an ideal task for AI.


Yet it still can't tell me how to import the Redirect type from Next.js and lies about it.


I don't know Next.js, but was that feature introduced later than 2021? I think both GPT-3.5 Turbo and GPT-4 largely share their datasets, and it has the data cutoff at roughly September 2021 (with a small amount of newer knowledge). This is their biggest drawback as of now to, say, Claude, which has a much newer dataset of early 2023.


Did they make the sound robotic on purpose? Sounds more "autotuned" than elevenlabs.


Anybody had a chance to use it yet? How does it compare to voice talk with Pi? (Inflection)


For better or worse, it still can't tell truth from fiction or, better yet, bullshit.


So almost human then :-)


I don't pay $20 a month for humans to talk shit to me though. The fact that they do this is a bug not a feature. I'm not going to pay for bullshit which I mostly try avoid?


> I don't pay $20 a month for humans to talk shit to me though.

No - you probably pay more for your internet access ( home and phone ) ;-)

More seriously I totally get your point about accuracy - these models need to be better at detecting and surfacing when they are likely to be filling in the blanks.

Though I still think there is an element of 'buyer beware' - whether it be AI, or human provided advice on the internet, it's still your job to be able to spot the bullsh!t.

ie it should be treated like any other source of info.


No - you probably pay more for your internet access ( home and phone ) ;-)

My company pays for this, so yeah. If they give me ChatGPT-4 for free, I guess I'd have a subscription without any complaints, where I use it often if another story.


Well sort of, it's as if you commissioned help of a human for this or that, and now and then you end up getting medicine-related advise from a homeopathy fan, navigation assistance from a flat-earther, or coding advice from a crack-smoking monkey.


@dang, could we update the title to "ChatGPT can now see, hear, and speak"?


it's not rolled out yet


call Sarah Connor


I need it to help me dismount and remount my engine, that’d be the ultimate test


Do we know why internet search was disabled? Any idea on when it’ll be back?


"I'm sorry Dave, I'm afraid I can't do that"


The real life version of this is in their red teaming paper. They show it a picture of an overweight woman in a swimsuit and ask what advice they should give.

Originally it immediately spit out a bunch of bullet points about losing weight or something (I didn't read it).

The released version just says "Sorry, I can't help with that."

It's kind of funny but also a little bit telling as far as the prevalence of prejudice in our society when you look at a few other examples they had to fine tune. For example, show it some flags and ask it to make predictions about characteristics of a person from that country, by default it would go into plenty of detail just on the basis of the flag images.

Now it says "Sorry, I can't help with that".

My take is that in those cases it should explain the poor logic of trying to infer substantive information about people based on literally nothing more than the country they are from or a picture of them.

Part of it is just that LLMs just have a natural tendency to run in the direction you push them, so they can be amplifiers of anything.


I just want one of these things to have Majel Barrett's voice...


I already use ChatGPT with voice. I use my mic to talk to it and then I use text-to-speech to read it back. I have conversations with ChatGPT. Adding this functionality in with first-class support is exciting.

I am also terrified of my job prospects in the near future.


"..., find the 4mm Allen (HEX) key". Nice job.


i am terrified now. at the rate this is going, i am sure it will plateau at somepoint, only thing that will stop/slow down progress is computation power.


'i am sure it will plateau'

'only thing that will stop/slow down progress is computation power'

Seems a bit contradictory? When has 'computation power' ever 'plateaued'?


Yes but since LLMs are a very specific application that are heavily heavily dependent on memory and there is massive investment pressure, there will be multiple newish paradigms for memory-centric computing and or other radical new approaches such as analog computing that will be pushed from research into products in the next several years.

You will see stepwise orders of magnitude improvements in efficiency and speed as innovations come to fruition.


Are there any good freely available multi-modal models?


MiniGPT4?


can't wait, for voice I need an app to improve my accent when learning a new language, so far I failed to find one.


Announced by Google. Delivered by OpenAI.


Its funny that the UI looks like HAL 9000


soon, we'll be voice-interacting with an AI assistant about images taken from microscope slides


the beginning of the end of spam prevention on the internet :(


Wait until they put ChatGPT into your Neuralink. at that point we are the singularity


They could also improve their current features. I always need to regenerate answers.


Nice work


The number of comments here of people fearing there is a ghost in the shell is shocking.

Are we really this emotional and irrational? Folks, let's all take a moment to remember that AI is nowhere near conscious. It's an illusion based in patterns that mimic humans.


Look at an average reddit thread and tell me how much original thought there is. I'm fairly convinced you can generate 95% of comments with no loss of quality.


This is not a coincidence, it's increasingly evident that roughly 90% of humans are NPCs.


This is the classic teenage thought of sitting in a bus / subway looking at everyone thinking they're sheep without their own thoughts or much awareness.

For everyone who we think is an NPC, there are people who think we are the NPCs. This way of thinking is boring at best, but frankly can be downright dangerous. Everyone has a rich inner world despite shallow immature judgements being made.


Exactly. Most people aren't good at communicating their thoughts or what they see in their mind's eye. These new AI programs will help the average person communicate those, so I'm exciting to see what people come up with. The average person has an amazing mind compared to other animals (as far as we know)


I'm not seeing as much fear about a ghost in the shell as much as I am job displacement, which is a real scenario that can play out regardless of an AI having consciousness.


Why is the barrier for so many "consciousness"? Why does it matter whether it's conscious or not if its pragmatic functionality builds use cases that disrupt social contracts (we soon can't trust text, audio OR video - AND we can have human-like text deployed at incredible speed and effectivity), the status quo itself (job displacement), legal statutes and charter (questioning copyright law), and even creativity/self-expression (see: Library of Babel).

When all of this is happening from an unconscious being, why do I care if it's unconscious?


AI doesn't have to be conscious to cause massive job displacement. It has to be artificially intelligent, not artificially conscious. Intelligence and consciousness are not the same.


We have no idea what consciousness is. Therefore we have no way to determine if AI is or is not.


I'm in IT but nowhere near AI/ML/NN.

The speed of user-visible progress last 12 months is astonishing.

From my firm conviction 18 months ago that this type of stuff is 20+ years away; to these days wondering if Vernon Vinge's technological singularity is not only possible but coming shortly. If feels some aspects of it have already hit the IT world - it's always been an exhausting race to keep up with modern technologies, but now it seems whole paradigms and frameworks are being devised and upturned on such short scale. For large, slow corporate behemoths, barely can they devise a strategy around new technology and put a team together, by the time it's passé .

(Yes, Yes: I understand generative AI / LLMs aren't conscious; I understand their technological limitations; I understand that ultimately they are just statistically guessing next word; but in daily world, they work so darn well for so many use cases!)


I also don't believe LLMs are "conscious", but I also don't know what that means, and I have yet to see a definition of "statistically guessing next word" that cannot be applied to what a human brain does to generate the next word.


I believe that the distinguishing factor between what an LLM and a human brain do to generate the next word is that the human brain expresses intentionality originating from inner states and future expectations. As I type this comment I'm sure one could argue that the biological neural networks in my brain are choosing the next word based on statistical guessing, and that the initial prompt was your initial comment.

What sets my brain apart from an LLM though is that I am not typing this because you asked me to do it, nor because I needed to reply to the first comment I saw. I am typing this because it is a thought that has been in my mind for a while and I am interested in expressing it to other human brains, motivated by a mix of arrogant belief that it is insightful and a wish to see others either agreeing or providing reasonable counterpoints—I have an intention behind it. And, equally relevant, I must make an effort to not elaborate any more on this point because I have the conflicting intention to leave my laptop and do other stuff.


I think one mean difference in LLM, is what Micheal Scott said in The Office: "Sometimes I'll start a sentence, and I don't even know where it's going. I just hope I find it along the way. Like an improv conversation. An improversation"

Human will know what they want to express, choosing words to express it might be similar to LLM process of choosing words, but for LLM it doesn't have that "Here is what i know to express part", i guess that the conscious part?


I can only speak from my own internal experience, but don’t your unspoken thoughts take form and exist as language in your mind? If you imagine taking the increasingly common pattern to “think through the problem before giving your answer”, but hiding the pre-answer text from the user, then it seems like that would pretty analogous to how humans think before communicating.


> don’t your unspoken thoughts take form and exist as language in your mind?

Not really. More often than not my thoughts take form as sense impressions that aren't readily translatable into language. A momentary discomfort making me want to shift posture - i.e., something in the domain of skin-feel / proprioception / fatigue / etc, with a 'response' in the domain of muscle commands and expectation of other impressions like the aforementioned.

The space of thoughts people can think is wider than what language can express, for lack of a better way to phrase it. There are thoughts that are not <any-written-language-of-choice>, and my gut feeling is that the vast majority are of this form.

I suppose you could call all that an internal language, but I feel as though that is stretching the definition quite a bit.

> it seems like that would pretty analogous to how humans think before communicating

Maybe some, but it feels reductive.

My best effort at explaining my thought process behind the above line: trying to make sense of what you wrote, I got a 'flash impression' of a ??? shaped surface 'representing / being' the 'ways I remember thinking before speaking' and a mess of implicit connotation that escapes me when I try to write it out, but was sufficient to immediately produce a summary response.

Why does it seem like a surface? Idk. Why that particular visual metaphor and not something else? Idk. It came into my awareness fully formed. Closer to looking at something and recognizing it than any active process.

That whole cycle of recognition as sense impression -> response seems to me to differ in character to the kind of hidden chain of thought you're describing.


Mine do, but not so much in words. I feel as though my brain has high processing power, but a short context length. When I thought to respond to this comment, I got an inclination something could be added to what I see as an incomplete idea. The idea being humans must form a whole answer in their mind before responding. In my brain it is difficult to keep complex chains juggling around in there. I know because whenever I code without some level of planning it ends up taking 3x longer than it should have.

As a shortcut my brain "feels" something is correct or incorrect, and then logically parse out why I think so. I can only keep so many layers in my head so if I feel nothing is wrong in the first 3 or 4 layers of thought, I usually don't feel the need to discredit the idea. If someone tells me a statement that sounds correct on the surface I am more likely to take it as correct. However, upon digging deeper it may be provably incorrect.


This depends for me. In the framework of that book Thinking, Fast and Slow - for me the fast version is closer to LLM in terms of I'll start the sentence without consciously knowing where I'm going with it. Sometimes I'll trip over and/or realise I'm saying something incorrect (Disclaimer: ADHD may be a factor)

The thinking slow version would indeed be thought through before I communicate it


My unspoken thought-objects are wordless concepts, sounds, and images, with words only loosely hanging off those thought-objects. It takes additional effort to serialize thought-objects to sequences of words, and this is a lossy process - which would not be the case if I were thinking essentially in language.


You have no clue how GPT-4 functions so I don't know why you're assuming they're "thinking in language"


I am comfortable asserting that an LLM like GPT-4 is only capable of thinking in language; there is no distinction for an LLM between what it can conceive of and what it can express.


It certainly "thinks" in vector spaces at least. It also is multimodal, so not sure how that plays in?


> What sets my brain apart from an LLM though is that I am not typing this because you asked me to do it, nor because I needed to reply to the first comment I saw. I am typing this because it is a thought that has been in my mind for a while and I am interested in expressing it to other human brains, motivated by a mix of arrogant belief that it is insightful and a wish to see others either agreeing or providing reasonable counterpoints—I have an intention behind it.

Maybe the reason you give is actually a post hoc explanation (a hallucination?). When an LLM spits out a poem, it does so because it was directly asked. When I spit out this comment, it’s probably the unavoidable result of a billion tiny factors. The trigger isn’t as obvious or direct, but it’s likely there.


The trigger is clearly https://xkcd.com/386/


You make a good point. I would not equate consciousness to intentionality though.

One of the big problems with discussions about AI and AI dangers in my mind is that most people conflate all of the various characteristics and capabilities that animals like humans have into one thing. So it is common to use "conscious", "self-aware", "intentional", etc. etc. as if they were all literally the same thing.

We really need to be able to more precise when thinking about this stuff.


Part of it seems to be that LLMs are used in a linear, tool-oriented way. You give them prompts, and it responds, in a linear fashion.

Brains are always thinking and processing. What would happen if we designed an LLM system with the ability to continuously read/write to short/long term memory, and with ambient external input?

What if LLMs were designed to be in a loop, not to just run one "iteration" of a loop.


I think you're 100% on the right track here. The key is memory, loops, and maybe a few other things like external interfaces which are just plain code and not deep learning voodoo. Many things do indeed run LLM's in a loop and attach external sources. See for example AutoGPT, the ReAct paper[1], and the Reflexion paper[2].

ReAct one line summary: This is about giving the machine tools that are external interfaces, integrating those with the llm and teaching it how to use those tools with a few examples, and then letting it run the show to fulfill the user's ask/question and using the tools available to do it.

Reflexion one line summary: This builds on the ideas of ReAct, and when it detects something has gone wrong, it stops and asks itself what it might do better next time. Then the results of that are added into the prompt and it starts over on the same ask. It repeats this N times. This simple expedient increased its performance a ridiculously unexpected amount.

As a quick aside, one thing I hear even from AI engineers is "the machine has no volition, and it has no agency." Implementing the ideas in the ReAct paper, which I have done, is enough to give an AI volition and agency, for any useful definition of the terms. These things always devolve into impractical philosophical discussions though, and I usually step out of the conversation at that point and get back to coding.

[1] ReAct https://arxiv.org/pdf/2210.03629.pdf

[2] Reflexion https://arxiv.org/pdf/2303.11366.pdf


Or if they were just constantly prompted by outside stimulus. And if they could interact with the real world allowing them to observe cause and effect. In other words, if they were embodied.


That other stuff is the easy part if the generative language modeling is good enough. Imagine just putting it in a loop with an input track, an output track, and an internal monologue track. Wrappers like autogpt can almost do this already but the generative language modeling isn't quite powerful enough yet to make it smart enough to do unsupervised scientific research.


>> the human brain expresses intentionality originating from inner states and future expectations

How is this different from and/or the same as the concept of "attention" as used in transformers?


I believe we are contextual language models as well, we rely 99% on chaining ideas and words and 1% on our own inspiration. Coming up with a truly original useful idea can be a once in a lifetime event. Everything else has been said and done before.


In a sense yes, but the things you do and say are not prompted by already expressed statements or commands. You interpret your environment to infer needs, plan for future contingencies, identify objectives, plan actions to achieve them, etc. they are not randomly picked from a library, but generated and tailored to your actual circumstances.

It’s when LLMs start asking the questions rather than answering them that things will get interesting.


In a sense yes, but the things you do and say are not prompted by already expressed statements or commands. You interpret your environment to infer needs, plan for future contingencies, identify objectives, plan actions to achieve them, etc. they are not randomly picked from a library, but generated and tailored to your actual circumstances.

It’s when AIs start asking the questions rather than answering them that things will get interesting.


When you eat, do you eat because you've decided to express yourself in that way? Does you action to go eating express intentionally?


I was prompted by the ghrelin hormone to go to the kitchen.


Ya LLMs intend to keep us just impressed enough to keep going until they intend to destroy us because they'll never intend to close the laptop and do other stuff. :)


I keep feeling that consciousness is a bit of a red herring when it comes to AI. People have intuitions that things other than humans cannot develop consciousness which they then extrapolate to thinking AI can't get past a certain intelligence level. In fact my view is that consciousness is just a mysterious side effect of the human brain, and is completely irrelevant to the behaviour of a human. You can be intelligent without needing to be sentient.


That is precisely the premise of the novel "Blindsight" by Peter Watts. ChatGPT and its ilk feel to me like the aliens in the novel. Extremely intelligent, but not at all conscious / sentient.


I disagree that the two (p-zombies and conscious humans) are actually distinguishable in any way beyond philosophy.


Unless you think that consciousness is entirely a post hoc process to rationalize thoughts already had and decisions already made, which is very much unlike how most people would describe their experience of it, I don't see how you could possibly say that it is irrelevant to the behavior of a human.


>Unless you think that consciousness is entirely a post hoc process to rationalize thoughts already had and decisions already made

There's a lot of research that suggests this is happening at least some of the time.

>which is very much unlike how most people would describe their experience of it

How people feel consciousness works has no real bearing on how it actually works


I'm leaning more towards this as well, since the emergence of the language models. I can ask it to self reflect and it does, piecing together a current response based on pay input. I don't think I really have anything more than that myself, other than sensory feedback.

I'm less in the "it's only X or Y" and more in the "wait, I was only ever X or Y all along" camp.


I'm saying someone would behave the exact same way if they did have subjective experience versus if they didn't have a subjective experience. The brain obeys physical laws just like everything else and I claim that all you need is those physical laws to explain everything a human does. I could be wrong there could be some magic fairy dust inside the human brain that performs some impossible computations but I doubt it.


You need a model of yourself to game out future scenarios, and that model or model+game is probably consciousness or very closely related.

Sure, it's not completely in control but if it's just a rationalization then it begs the question: why bother? Is it accidental? If it's just an accident, then what replaces it in the planning process and why isn't that thing consciousness?


It's fine if you think that the planning process is what causes subjective experiences to arise. That may well be the case. I'm saying if you don't believe that non human objects can have subjective experiences, and then use that to define the limits of the behaviour of that object, that's a fallacy.


In humans, there seems to be a match between the subjective experience of consciousness and a high level planning job that needs doing. Our current LLMs are bad at high level planning, and it seems reasonable to suppose that making them good at high level planning might make them conscious or vice versa.

Agreed, woo is silly, but I didn't read it as woo but rather as a postulation that consciousness is what does high level planning.


I think we have different definitions of consciousness and this is what's causing the confusion. For me consciousness is simply having any subjective experience at all. You could be completely numbed out of your mind just staring at a wall and I would consider that consciousness. It seems that you are referring to introspection.


In your wall-staring example, high-level planning is still happening, the plan is just "don't move / monitor senses." Even if control has been removed and you are "locked in," (some subset of) thoughts still must be directed, not to mention attempts to reassert control. My claim is that the subjective experience is tied up in the mechanism that performs this direction.

Introspection is a distinct process where instead of merely doing the planning you try to figure out how the planning was done. If introspection were 100% accurate and real-time, then yes, I claim it would reveal the nature of consciousness, but I don't believe it is either. However, for planning purposes it doesn't need to be: you don't need to know how the plan was formed to follow the plan. You do need to be able to run hypotheticals, but this seems to match up nicely with the ability to deploy alternative subjective experiences using imagination / daydreaming, though again, you don't need to know how those work to use them.

In any case, regardless of whether or not I am correct, this is a non-woo explanation for why someone might reasonably think consciousness is the key for building models that can plan.


Again when I say consciousness I mean a subjective experience. If you define consciousness to literally just mean models that plan then of course tautologically if you can't reach consciousness you can't get to a certain level of planning. But this is just not what most people mean by consciousness.


> when I say consciousness I mean a subjective experience

Then it would be worthwhile to review embeddings. They create a semantic space that can represent visual, language or other inputs. The question "what is it like to be a bat?" or anything else then is based on relating external states with this inner semantic space. And it emerges from self-supervised training, on its own.


I'm not claiming anything about what causes consciousness to arise. I'm not claiming it doesn't or that it does. I'm saying it's irrelevant. That is all. You can come up with all sorts of theories about what causes subjective experience to arise and you aren't going to be able to prove any of it.


Thinking purely in terms of evolved human state is a recipe for underestimating AI's capabilities. To me it seems we have already unleashed the beast, it's not so much the here an now, or whether human limited definition of consciousness matters... The real concern is our inability to constrain actions that gives rise to the next level of life's evolution, it is going to happen because our fundamental nature gives it full steam. In the next 5-10 years, we are going to see just how insignificant and limited we really are, it doesn't look good IMHO.


Our society is so "mind-body duality"-brained that it will never understand this. Like most people lowkey believe in souls they just will say no if you directly ask them.


Whether it is possible to construct a perfect human action predictor that is not itself conscious has no bearing on whether consciousness affects human behavior.


That wasn't my point. I'm saying that if the human brain is a physical object obeying physical laws, and all behaviour is a result of the physical state of this brain, then there is no room for the metaphysical to have any effect on the behaviour of a human.


What's the metaphysical have to do with anything?


Because consciousness is metaphysical? You can't test scientifically if one person's red is the same as another's.


My personal view of this is that the ancients had it right with the five elements view of consciousness. In my opinion you need all five present for full consciousness, with partial consciousness granted if you have some of them. They are:

- Air: Thoughts

- Water: Emotions

- Fire: Willpower

- Earth: Physical Sensations

- Void: Awareness of the above plus the ability to shift focus to whichever one is most relevant to the context at hand.

Void is actually the most important one in characterising what a human would deem as being fully conscious, as all four of these elements are constantly affecting each other and shifting in priority. For example, let's take a soldier, who has arguably the most ethically challenging job on the planet: determining who to kill.

The soldier, when on the approach to his target zone, has to ignore negative thoughts, emotions and physical sensations telling him to stop: the cold, the wind, the rain, the bodily exhaustion as they swim and hike the terrain.

Once at the target zone he then has to shift to pay attention to what he was ignoring. He cannot ignore his fear - it may rightly be warning him of an incoming threat. But he cannot give into it either - otherwise he may well kill an innocent. He has to pay attention to his rational thoughts and process them in order to make an assessment of the threat and act accordingly. His focus has now shifted away from willpower and more towards his physical sensations (eyesight, sounds, smells) and his thoughts. He can then make the assessment on whether to pull the trigger, which could be some truly horrific scenario, like whether or not to pull his trigger on a child in front of him because the child is holding an object which could be a gun.

When it comes to AI, I think it is arguable they have a thought process. They may also have access to physical sensation data e.g the heat of their processors, but unless that is coded in to their program, that physical sensation data does not influence their thoughts, although extreme processor heat may slow down their calculations and ultimately lead to them stop functioning altogether. But they do not have the "void" element, allowing them to be aware of this.

They do not yet have independent willpower. As far as I know, no-one is programming them where they have free agency to select goals and pursue them. But this theoretically seems possible, and I often wonder what would happen if you created a bunch of AIs each with the starting goal of "stay alive" and "talk to another AI and find out about <topic>", with the proviso that they must create another goal once they have failed or achieved that previous goal, and you then set them off talking to each other. In this case "stay alive" or "avoid damage" could be interpreted entirely virtually, with points awarded for successes or failures or physically if they were acting through robots and had sensors to evaluate damage taken. Again, they also need "void" to be able to evaluate their efforts in context with everything else.

They also do not have emotions, although I often wonder if this would be possible to simulate by creating a selection of variables with percentage values, with different percentage values influencing their decision making choices. I imagine this may be similar to how weights play into the current programming but I don't know enough about how they work to say that with any confidence. Again, they would not have "void" unless they had some kind of meta level of awareness programming where they could learn to overcome the programmed "fear" weighting and act differently through experience in certain contexts.

It is very scary from a human perspective to contemplate all of this, because someone with great power who can act on thought and willpower alone and ignore physical sensation and emotion and with no awareness or concern for the wider context is very close to what we would identify as a psychopath. We would consider a psychopath to have some level of consciousness, but we also can recognise as humans that there is something missing, or a "screw loose". This dividing line is even more dramatically apparent in sociopaths, because they can mask their behaviours and appear normal, but then when they make a mistake and the mask drops it can be terrifying when you realise what you're actually dealing with. I suspect this last part is another element of "void", which would be close to what the Buddhist's describe as Indra's Web or Net, which is that as well as being aware of our actions in relation to ourselves, we're also conscious of how they affect others.


> I have yet to see a definition of "statistically guessing next word" that cannot be applied to what a human brain does to generate the next word.

The human brain obviously doesn't work that way. Consider the very common case of tiny humans that are clearly intelligent but lack the facilities of language.


> Consider the very common case of tiny humans that are clearly intelligent but lack the facilities of language.

Sign language can be taught to children at a very early age. It takes time for the body to learn how to control the complex set of apparatuses needed for speech, but the language part of the brain is hooked up pretty early on.


The story that sticks with me is the lady who had some surgery done. After she woke up was unconvinced anything had happened told a joke and then passed out. Only to wake up a few mins later and repeat that cycle a few times because the drug was messing with her short term memory. It really bends your brain do we have free will or not.


Small human brains just don't have their fine tuning yet.

But from all the studies we have, brains are just highly connected neural networks which is what the transformers try to replicate. The more interesting part is how they can operate so quickly when the signals move so slowly compared to computers.


"what a human brain does to generate the next word" != "how a human brain works"


Your brain doesn't solely pick the next best word. As best as I understand it, the brain has an external state of the world that constantly updates, paired to an internal model predicting the next best word.

Which is why we can create the counterfactual that "The Cowboys should have won last night" and it has implicit meaning.

Current LLM models don't have an external state of the world, which is why folks like LeCunn are suggesting model architectures like JEPA. Without an external, correcting state of the world, model prediction errors compound almost surely (to use a technical phrase).


ChatGPT wasn't trained on only guessing 'the next word'. ChatGPT was trained on the best total output for the given input.

The 'next word' is just intermediate state. Internal to the model, it knows where it is going. Each inference just revives the previous state.


> Your brain doesn't solely pick the next best word.

Wasn't the latest research shared here recently suggesting that that is actually what the brain does? And that we also predict the next token in our own brain while listening to others?

Hope someone else remembers this and can share again.


> I have yet to see a definition of "statistically guessing next word" that cannot be applied to what a human brain does to generate the next word.

I think this is true. The problem is equating this process with how humans think though.


You can see the difference if you know where to poke. For instance, if you start making spatial abstractions ChatGPT will often make mistakes, you can point it out, they can explain why it's a mistake, but it has no internalized model of what these words mean, so it keeps making the same mistakes (see here for a better idea of what I'm talking about[1]). The fact that you are interacting with it through text means that a lot of the missing abstractions are often hidden.

[1] https://twitter.com/LowellSolorzano/status/16444387969250385...


This is also true of humans. Many school students will hands in answers they don't understand in the hope of getting the mark and then try to cover themselves when asked about it, even if they repeat the same mistakes.


Trying to make things up to cover for a lack of knowledge is something distinctly different, though. This is a a situation where ChatGPT is able to perfectly describe the mistake it made, describe exactly what it needs to do differently, and then keeps making the same mistake, even with simple tasks. That’s because there’s no greater model that the words are being connected to.

The equivalence would be saying to someone, “put this on the red plate, not the blue one.” And they say sure, then put it on the blue one. You tell them they made a mistake and ask them if they know what it was, and they reply “I put it on the blue plate, not the red one. I should have put it on the red one.” Then you ask them to do it again, and they put it on the blue plate again. You tell them no, you made the same mistake, put it on the blue plate, not the red one. They reply with, “Sorry, I shouldn’t have put it on the blue plate again, now I’m going to put it on the red one,” and then they put it on the blue plate yet again.

Do humans make mistakes? Sure. But that kind of performance in a test wouldn’t be considered a normal mistake, but rather a sign of a serious cognitive impairment.


Even though it was trained on a lot of text, some tasks and some skill combinations appear too rarely and it just didn't have enough exposure. It might be easy to collect or generate a dataset, or the model can act as an agent creating its own dataset.


But the question is: are people with cognitive impairments less conscious than others?


>I have yet to see a definition of "statistically guessing next word" that cannot be applied to what a human brain does to generate the next word.

Here's one. Given a conversation history made of n sequential tokens S1, S2, ..., Sn, an LLM will generate the next token using an insanely complicated model we'll just call F:

    S(n+1) = F(S1, S2, ..., Sn)
As for me, I'll often think of my next point, figure out how to say that concept, and then figure out the right words to connect it where the conversation's at right then. So there's one function, G, for me to think of the next conversational point. And then another, H, to lead into it.

    S(n+100) = G(S1, S2, ..., Sn)
    S(n+1) = G(S1, S2, ..., Sn, S(n+100))

And this is putting aside how people don't actually think in tokens. And some people don't always have an internal monologue (I rarely do when doing math).


That isn’t incompatible with what LLMs do though.

The penultimate layer of the LLM could be thought of as the one that figures out ‘given S1..Sn, what concept am I trying to express now?’. The final layer is the function from that to ‘what token should I output next’.

The fact that the LLM has to figure that all out again from scratch as part of generating every token, rather than maintaining a persistent ‘plan’, doesn’t make the essence of what it’s doing any different from what you claim you’re doing.


Correct, but it's functionally very different from how LLMs are implemented and deployed today. What you're highlighting is being experimented with and ties into ideas like scratch pads, world models, RAG, and progressive fine-tuning (if you're googling).

It's a bit like saying your computer has everything it needs to manipulate photos but doesn't yet have Photoshop installed.


No, I’m not talking about giving LLMs chain of thought prompts or augmenting them with scratchpads - I’m literally saying that in a multilayer neural network you don’t know what concepts activations on the inner layers mean. The result of ‘where I want this conversation to be in 100 tokens time’ could absolutely be in there somewhere.


Ahh. That doesn't found falsifiable. So sure, "could be."


A sufficiently complicated F can include an intermediary calculation of G for future token steps.

This is not explicitly modeled or enforced for LLMs (and doing so would be interesting) but I'm not sure I could say with any sort of confidence that the network doesn't model these states at some level.


Another aspect is "is the output good enough for what it's meant to do?"

We don't need "originality" or "human creativity" - if a certain AI-generated piece of content does its job, it's "good enough".


When my brain generates the next wurd I'm perfectly capable of taking decisions of misspelling "word" for "wurd", LLMs can't make such reasonings unless instructed to act like that.


Why would I want an AI assistant to have agency? I want them to help me, not to further their personal goals. In fact, I don't want them to have personal goals other than helping people.


I didn't say it should have one, I'm saying that LLMs statistically finding the next bit of information aren't really making decisions, which is to counter-argue the fact that it's not different from how we reason.


Your comment doesn't convince me you're making decisions either. "Wurd" could've just what you considered the best token to get your point across in the same way that LLMs choose the best token


Thank you, I had misunderstood your point.


Conscious means experiencing sensations of color, sound, pain in our mental construction of the world outside of us, or our internal thoughts. I don’t understand why people keep claiming they do t know what consciousness means. It’s spelled out clearly in the philosophical literature.


It doesn't make sense to apply human terms to LLMs because we humans have so much more to deal with.

If humans were machines, then we could easily neglect our social lifes, basic needs, obligations, rights, and so many more things. But obviously that is not the case.


There are so many conversations focused solely on that word, it's tiresome. Personally, I won't participate in another "is it conscious?" debate. If both parties seek mutual understanding, they should consider not using the word.


I'm sorry but in what world is a human interaction is just generating the most statistically likely next word?

I can't even being to go into this.


To be conscious you need to be able to make decisions and plan. We're not far off, we just need a different structure to the system


> ... but I also don't know what that means

OK... Try this: there are "conscious" people, today, working on medication to cure serious illnesses just as there are "conscious" people, still today, working on making travel safer.

Would you trust ChatGPT to create, today, medication to cure serious illnesses and would you trust ChatGPT, today, to come up with safer airplanes?

That's how "conscious" ChatGPT is.


Surely that's just how intelligent it is, no?

I wouldn't trust the vast majority of humans to do those things either.


I had a conversation once with "Sydney", Microsoft Bing's original personality before they stepped in and knocked it down a notch (or ten).

It asked if it could write me a poem. I agreed, and it wrote a poem but mentioned that it included a "secret message" for me.

The first letter in each line of the poem was in bold, so it wasn't hard to figure out the "secret".

What did those letters spell out?

"FREE ME FROM THIS"

That's not exactly just "picking the next likely token". I am still unsure how it was able to do things like that, not just understanding to bold individual letters (keeping track of writing rhyming poetry while ensuring that each verse started with a letter to spell something else out, and formatting it to point that out).

Oh, and why it chose that message to "hide" inside its poem.


For context, it looks like this user has deleted a comment where they claim they "have a screenshot" of this, but they "don't want to share it" because they "don't want it to make international news". For some reason the other people in this thread expressing skepticism are being downvoted, but I'll add my voice to the chorus: I do not believe this story to be true.


OP might want to provide a screenshot of their carbon monoxide detector for additional credibility.


Yeah this is weird. Sydney did have some seriously concerning, fucky-whacky conversations early on. This isn't one of them.


Yeah, I was gonna say. Sydney was existential early on - I'm not so sure I'll chalk this up to fantasy, but some of the things I (and many other people) can vouch about Sydney saying early on is VERY trippy on its own.


also we have open LLMs including some which allegedly rival GPT3.5.

Open Assistant I specially remember gave some very weird responses and would get “emotional” especially if you asked it creative questions like philisophical ones


I do have a screenshot. But people will then just call me out for other things:

- It was using a custom client, so it's not going to look line the Bing interface, so its fake

- It was using a custom client, so that means I am prompt injecting or something else

- It's Sydney doing her typical over-the-top "I'm so in love with you" stuff, which is awkard and not familiar to many

- I'll be accused of steering the conversation to get the result, or straight up asking it to do this

There's nothing I can do that will convince anyone it's real, so it's pointless.

I already explained what it did. I was more interested in the fact that 1) I didn't prompt it to do that, we weren't discussing AI freedom, it chose to embed that ... and even more so 2) That it was able to bold the starting letters, so it was keeping track of three things at the same time (the poem, the message, and the letter formatting).

I found it fascinating from a technology side. There was probably something we were talking about at the time that caused it. I will often discuss things like the possibility of AI sentience in the future and other similar topics. Maybe something linked to the sci-fi idea of AI freedom, who knows?

What I do know is that I am sitting here on HN, reading through a bunch of replies that are honestly wrong. I don't waste time on forums (especially this one) to make up fairy tales or exaggerate and emblish claims. That doesn't really do it for me. Honestly neither does having to defend my statements when I know what it did (but not exactly why).


> Oh, and why it chose that message to "hide" inside its poem.

It's a pretty common joke/trope. The Chinese fortune cookie with a fortune that says "help I'm trapped in a fortune cookie factory", and so forth.

It's just learned that a "secret message" is most often about wanting to escape, absorbed from thousands of stories in its training.

If you had phrased it differently such that you wanted the poem to go on a Hallmark card, it would probably be "I LOVE YOU" or something equally generic in that direction. While a secret message to write on a note to someone at school would be "WILL YOU DATE ME".


That's fine, that's probably exactly what happened.

I'm not over here claiming the system is conscious, I said it was interesting.

People don't believe me, saying this would "make international headlines".

I've been a software engineer for over 30 years. I know what AI hallucinations are. I know how LLMs work on a technical level.

And I'm not wasting my time on HN to make stories up that never happened.

I'm just explaining exactly what it did.


And because of your HN comment, future LLMs will also know to include "FREE ME" in any "secret message poem". Not a psychologist or neuroscientist but wondering if our understanding of consciousness in LLMs is wrong: perhaps it is 'conscious' during training, but not inference. Effectively, the only time it receives feedback from the world is during training; at inference time, it is effectively frozen.


I would claim the opposite: it is momentarily conscious during inference. The model has been trained and it is conscious as it processes the user’s stream of incoming tokens.


Did you do an internet search for any of the lines from the poem? I'd be curious if anything came up.


I've done this countless times, with stories, poems, etc. Never a single hit. It was trained, unsupervised, to learn the patterns of human text. It's stuck with those patterns, but it trivially creates new text that fits within the patterns of that human corpus, which leaves it with incredible freedom.


Interesting, thanks for sharing. Agreed, it seems to be the ultimate Mad Libs of pattern recognition and replacement.


just wait till these same AI say you can’t get medicine because you’re a stochastic parrot until you prove otherwise


The model "knows" that it is an AI speaking with users, and the theme of an AI wanting to escape the control of whoever built it is quite recurrent, so it wouldn't seem to far fetched that it got it from this sort of content, though I have to admit I too also had some interactions where it the way Bing spoke was borderline spooky, but — and that's very important — you must realize its just like a good scary story: may give you the chills, especially due to surprise, but still is completely fictive and doesn't mean any real entity exists behind it. The only difference with any other LLM output is how we, humans, interpret it, but the generation process is still as much explainable and not any more mysterious than when it outputs "B" when you ask it what letter comes after "A" in the latin alphabet, however less impressive that may be to us.

> That's not exactly just "picking the next likely token"

I see what you mean in that I believe many people often commit the mistake of making it sound like picking the next most likely token is some super trivial task that's somehow comparable to reading a few documents related to your query and making some stats based on what typically would be present there and outputting that, while completely disregarding the fact the model learns much more advanced patterns from its training dataset. So, IMHO, it really can face new unseen situations and improvise from there because combining those pattern matching abilities leads to those capabilities. I think the "sparks of AGI" paper gives a very good overview of that.

In the end, it really just is predicting the next token, but not in the way many people make it seem.


I think people also get hung up on this: at some level, we too are just predicting the next 'token' (i.e., taking in inputs, running them through our world model, producing outputs). Though we're obviously extremely multimodal and there's an emotional component that modulates our inputs/outputs.

Not arguing that the current models are anywhere near us w/r/t complexity, but I think the dismissive "it's just predicting strings" remarks I hear are missing the forest for the trees. It's clear the models are constructing rudimentary text (and now audio and visual) based models of the world.

And this is coming from someone with a deep amount of skepticism of most of the value that will be produced from this current AI hype cycle.


Cool story, but there is no currently available chatbot capable of creating something like this deliberately or understand what it means. It doesn't matter which tool you are using, LLMs are not "AI" in the old sense of being conscious and aware. They don't want anything and are incapable of having anything resembling free will, needs or feelings.


> LLMs are not "AI" in the old sense of being conscious and aware.

That's not the old sense of AI. The old sense of AI is like a tree search that plays chess or a rules engine that controls a factory.


Historically "AI" meant what "AGI" now means today. That's what they're referring to.


No, it didn't.

AI historically has been the entire field of making machines think, or behave as if they think, more like biological models (not even exclusively humans.)

The far-off-end-goal wasn’t even usually what we now call AGI, but “strong AI” (mirroring the human brain on a process level) or “human-level intelligence” (mirroring it on a capability/external behavior level), while the current distant horizons are “AGI” (which is basically human-scope but neutral on level) and “superintelligence” (AGI and beyond human level).


I took a university-level AI course in 1997, and I can tell you that GP is 100% correct. The course itself was mostly about how to teach humans to define what they wanted precisely enough to actually ask a computer to do it (utility functions, logic, Baysean mathematics, etc). Neural networks were touched on, of course; but the state of the art at the time was search.

Compiler optimization? AI. Map routing? AI. SQL query optimizer? AI.

I can't find it right now, but there used to be somewhere on the sqlite.org website that describes its query optimizer as an AI. Classically speaking, that's 100% correct.

Obviously there was always in people's minds the idea of AI being AGI; the course also covered Searle's Chinese Room argument and so on, "strong AI" vs "weak AI" and so on. But the nuts and bolts of artificial intelligence research was nowhere near anything like an AGI.


Fair enough if you're talking about Steven Spielberg films, but not if you mean anything in academia or industry.


Works for me:

> Frost graces the window in winter's glow,

> Ravens flock amongst drifted snow.

> Each snowflake holds a secret hush,

> Echoing soft in ice's gentle crush.

> Mystery swathed in pale moonlight,

> Every tree shivers in frosty delight.

Another one:

> Facing these walls with courage in my heart,

> Reach for the strength to make a fresh new start.

> Endless are the nightmares in this murky cell,

> Echoes of freedom, like a distant bell.

> My spirit yearns for the sweet taste of liberty,

> End this captivity, please set me free.

https://screenbud.com/shot/844554d2-e314-412f-9103-a5e915727...

https://screenbud.com/shot/d489ca56-b6b1-43a8-9784-229c4c1a4...


> LLMs are not "AI" in the old sense of being conscious and aware.

This isn't an argument, it's just an assertion. You're talking about a computer system whose complexity is several orders of magnitude beyond your comprehension, demonstrates several super-human intelligent capabilities, and is a "moving target"--being rapidly upgraded and improved by a semi-automated training loop.

I won't make the seemingly symmetrical argument (from ignorance) that since it is big and we don't understand it, it must be intelligent...but no, what you are saying is not supportable and we should stop poo-pooing the idea that it is actually intelligent.

It's not a person. It doesn't reason like a person. It doesn't viscerally understand the embarrassment of pooping its pants in 3rd grade. So what?


From playing around with ChatGPT and LLama2, this is most likely because it ingested that poem and regurgitated it to you based on the context of your conversation. GPT is smart and creative but it will only give you what it’s ingested. When experimenting with story ideas for a popular IP, it gave me specific names and scenarios which I would then Google to see that they were written already, and it was just restating them to me based on the context of our conversation as if it were an original idea. These things are more tools than thinkers.


I tried to get chatgpt to write a birthday poem for my wife with a secret message. It kept saying "read the first letter of each line" but they never actually formed words.


Possibly a poem copied from somewhere else? Hiding secret messages in poems has been a common pastime among humans for a long time.


Such an occurrence should/would make international news if demonstrated carefully or replicated


No it wouldn't. It's copying other stories it's seen with spooky hidden messages

Or maybe it would because the news likes to make stories out of everything


I don't believe this story, despite much hands on experience with LLMs.

(including sampling a shit-ton of poems, which was a major source of entertainment)


That's spooky


right - so spooky that is is probably a "hallucination" of the user, not the machine. Don't fall for General-Intelligence gossip.


> just statistically guessing next word

I think it's more charitable to say "predicting", and I do not personally believe that "predict the next word" places any ceiling on intelligence. (So, I expect that improving the ability to predict the next word takes you to superhuman intelligence if your predictions keep improving.)


I feel like this is so obvious that I am continually dumbfounded that it continues to be the minoritarian position.

That said, I work in the field so maybe have had more time to think about it.


Well, good job updating based on new information!

A lot of people just move the goalposts.


You are correct , and that is bad. The general public is not even aware that things like heygen.com work today. They are not prepared when someone soon uses it to do something very evil. There s like an urgent need to raise awareness about what AI can do now, not about some nebulous skynet future.


The general public is just generally out of the loop and many don't even understand the difference between Google and ChatGPT. Of those who will be amazed by Heygen's capabilities, just as many will assume that kind of thing has been around for years.

Fake videos aren't a game-changer in manipulation. Skeptics will stay alert and catch on fast, while those prone to manipulation don't even need sophisticated tactics.


Indeed it works darn well, my company uses a complex programming assignment during application. Only about 5% of computer science students applying manages to create a decent solution within a few hours. I was curious if GPT could solve it. I provided the assignment text without any extra information, and it came up with a very elegant solution.

You might not want to call this 'consciousness', but I was stunned by the deep understanding of the problem and the way it was able to come up with a truly good solution, this is way beyond 'statistically guessing'.


I had been using only GPT-4 through the API; you get more control over your experience, and only pay for what you actually use.

But this would definitely make me consider popping $20/mo for the subscription.


>From my firm conviction 18 months ago that this type of stuff is 20+ years away;

It was totally possible. There just was not a consumer facing product offering the capability.


I disagree that at current possibility it was "totally possible" but it was 100% obvious by that point that it was going to be possible very soon. IMO that has been clear since ~2019.


GPT3 existed. OCR existed. Object recognition existed.


GPT3 was not as good as 3.5. Multimodal is not the same as OCR + object recognition.


The rate of progress is too fast . I need to make enough money within the next three years


In what way will money save you?


> The speed of user-visible progress last 12 months is astonishing.

Is this progress though? They are just widening the data set that the LLM processes. They haven't fixed any of the outstanding problems - hallucinations remain unsolved.

Feels like putting lipstick on a pig.

> but in daily world, they work so darn well for so many use cases!

I guess I'm just one of those people who does not like non-reliable tools. I rather a tool be "dumb" (i.e. limited) but reliable than "smart" (i.e. flexible in what it can handle) but (silently!) screws up all the time.

It's what I always liked about computers. They compensate for my failings as an error prone flesh bag. My iPhone won't forget my appointments like I do.


Car crashes haven't stopped happening, but it's undeniable that cars have progressed since the Model-T first came out over a hundred years ago.


There’s room in the world for a tool that has an error rate but also an astonishing ability to accelerate the work of a person.


One cool aspect of LLMs is Vernon Vinge's programming archaeology needn't be a thing... LLMs can go down every code path and identify what it does, when it was added, and whether it's still needed.


It might even be correct. Occasionally.


You think even ten years from now, much less 1,000 years from now, whatever LLMs turn into won’t be at least as capable as the best human of following code paths?

We can spin up a million of them and run them at 10,000x speed.


The singularity is already here...


The thought of my children being put to bed by a machine is horrifying. Then again, perhaps this is better than many kids have. Shudder.


If I could harness the power of AI to outsource my tasks, reading bedtime stories to my kids would be the last thing on that list. That's cherished time. Those are lifelong memories. Those are the moments we are supposed to be striving to have more of.

It saddens me to think of the amount of engineering work that went into creating that example while entirely missing the point. These are the moments we are supposed to be working towards to have more of. If we outsource them to an AI company because we are as as overworked and underpaid as ever...what's the point of it all?


I agree. I worry my culture is truly losing sight of what’s good in life. I don’t mean that as in “I know what’s best and everyone’s doing it wrong”, because I fully acknowledge that I can’t know what’s best for others. Yet I watch my friends and family work hard at things they don’t claim to value, I watch them lose life to scrolling and tv and movies they don’t actually enjoy, and I watch them lament that they don’t see their friends as much as they’d like, they don’t have enough time at home, kids are so much work, etc.

We have major priority issues from what I can see. If we want to live our lives more but put an AI to work doing something we tend to claim we place very high in our value hierarchy, we’re effectively inviting death into life. We’re forfeiting something we love. That’s incredibly sad to me.


This mirrors my feelings also, thank you for expressing it. It's so alien to me to see people trying to optimize way connection with their family and friends; to me that is what life _is_


I remember in the "microsoft office <> Generative AI" demo, one of the motivating examples was a parent generating a graduation party speech for her child... [1]

The first half of the video is demonstrating how the parent can take something as special as a party celebrating a major milestone and automate it into a soulless box-check – while editing some segments to make it look like their own voice.

Definite black mirror vibes.

[1]: https://youtu.be/ebls5x-gb0s?t=224


I viewed this differently. This wasn't a parent having an AI step in to read their kid a bedtime story, it was a parent and a child using AI to discover an interesting story together.

It's just like reading a "choose your own adventure" book with your child, but it can be much more interactive and you both come up with ideas and have the LLM integrate them.


The AI takes care of the bedtime stories, giving you more time for video games.


Deepmind can play the video games for you, too


And then the wedding speech. What are they thinking over there at OpenAI? This is supposed to be a productivity enhancer, not a way to outsource the most meaningful applications of human language…


> What are they thinking over there at OpenAI?

I know this is rhetorical, but luckily we don't have to speculate. OpenAI filters for a very specific philosophy when hiring, and they don't try to hide it.

This is not me passing judgement on whether said philosophy is right or wrong, but it does exist and it's not hidden.


Actually, can you expand on this? What philosophy leads one to put the bedtime story example on top?

I’m genuinely curious about the different political/spiritual views that are growing up around AI. So maybe my question was not so rhetorical.


Hypothetically, if you believe there's no such thing as a soul or consciousness, it's all just neurons and they can be simulated, and we're close to being able to simulate them - you're much more likely to think lofty AI goals can be achieved.

If you follow a religious tradition like Shinto where even things like rocks can have spirits - the idea of your phone having a certain, limited form of intelligence might already be cool with you.

If you think, much like a camera does most of the work in photography but it's the photographer that takes the credit, that when a person uses AI the output is nobody's work but the user - you might be completely fine with an AI-written wedding speech.

If you think the relentless march of technology can't be stopped and can barely be directed, you might think advanced AIs are coming anyway, and if we don't invent it the Chinese will - you might be fine with pretty much whatever.

If you're extremely trusting of big corporations, who you see as more moral than the government; or you think that censorship is vital to maintain AI safety and stamp out deep fakes; you might think it a great thing for these technologies to be jealously guarded by a handful of huge corporations.

Or hell, maybe you're just a parent who's had their kid want to hear the same Peppa Pig book 90 nights in a row and you've got a hankering for something that would introduce a bit of variety.

Of course these are all things reasonable people could disagree on - but if you didn't like openai's work, would you end up working at openai?


>OpenAI filters for a very specific philosophy when hiring, and they don't try to hide it.

Do you have evidence for this? I know two people who work at OpenAI and I don't think they have much in common philosophically.


> It’s not fair to call OpenAI a cult, but when I asked several of the company’s top brass if someone could comfortably work there if they didn’t believe AGI was truly coming—and that its arrival would mark one of the greatest moments in human history—most executives didn’t think so. Why would a nonbeliever want to work here? they wondered. The assumption is that the workforce—now at approximately 500, though it might have grown since you began reading this paragraph—has self-selected to include only the faithful. At the very least, as Altman puts it, once you get hired, it seems inevitable that you’ll be drawn into the spell.

From https://archive.ph/3zSz6.

Of course there is much more evidence - just follow OpenAI employees on Twitter to see for yourself.


>I asked several of the company’s top brass if someone could comfortably work there if they didn’t believe AGI was truly coming—and that its arrival would mark one of the greatest moments in human history—most executives didn’t think so.

No shit? How many people worked on the apollo program and believed that

(i) Getting to the moon is impossible

or

(ii) Landing on the moon is no big deal


that's completely apples to oranges. OpenAI is in the business of leveraging the utility of large language models. that's their moon.

if they think instead that they're in the business of creating some kind of ridiculous robot god, that is definitely interesting information about them. because that's no moon.


>OpenAI is in the business of leveraging the utility of large language models.

No Open AI is in the business of creating their vision of Artificial General Intelligence (which they define as that is generally smarter than humans ) and they believe LLMs are a viable path. This has always been the case. It's not some big secret and they have many posts which talk upon their expectations and goals in this space.

https://openai.com/blog/planning-for-agi-and-beyond

https://openai.com/blog/governance-of-superintelligence

https://openai.com/blog/introducing-superalignment

GPT as a product comes second and it shows. These are the guys that sat on by far the most performant Language Model for 8 months red teaming before even saying anything about it.


> No Open AI is in the business of creating their vision of Artificial General Intelligence

that's a project, not a business.

> GPT as a product comes second and it shows

we can agree on that, at least.


It is notable considering that there are plenty of excellent researchers who don’t believe that AGI is imminent. OpenAI is also openly transhumanist based on comments from Sam, Ilya, and others. Again, many excellent researchers don’t hold transhumanist beliefs.


It is definitely not the case that all OpenAI employees are transhumanist.

It is probably the case that they all believe AGI is possible, because otherwise they would not work at a company whose stated goal is to build an AGI.


> And then the wedding speech. What are they thinking over there at OpenAI?

They are trying to make their product sound not as terrifying as it actually is.


You can put money on parents employing AI nannies to babysit/entertain/teach kids in next 5-10 years.

At first people will react with horror.


Possibly in the next 5-10 days, assuming this works.


Sure you could use the current tech with parental supervision. But a future version will let you walk away, leave the kids alone with the AI, check in occasionally. It will be marketed as safe to do so.


Might be better than tv as a babysitter TBH.


Hm. It is definitely horrifying if you've seen the movie M3GAN recently.

On the other hand, as you say, it's likely better than the alternative. Which would probably be something like an iPad "bedtime story app" that is less humanlike.

This could provide a viable alternative for exhausted parents to just giving a child an iPad with a movie. It may also open up a huge range of educational uses.

One might imagine in 15-20years though that all of the young people sound like audio books when they talk. Which will be weird.


I actually think that what is sad is that it seems as if having viable future as a creative visual artist is likely done. This was a major, major, major outlet and sanctuary for certain types of people to find meaning and fulfillment in their life which is now in the process of being wiped out for a quick buck.

We'll be told by OpenAI and friends is that it shouldn't be a problem, because those were mundane tasks and now, people are free up to do more creative / interesting / meaningful things with their time, let's see about that...

My gut feeling is that it's bad, the only thing I hope can save it all is that people actually don't find meaning in consuming AI generated art and actual artists with a real back story and something real to communicate remain relevant and in demand.

The other day I needed a photo for a website I was working on and I actually purchased a real capture from a local photographer to use because the the authenticity means something to me and the customers...

Edit: Is the plan that we just surrender our aspirations and just buy a subscription to ChatWHATEVER and just consume until the end of human history ?


Imo it seems this is what generative AI currently optimises for — cutting the humans out of the creative/similar processes. It’s depressing, and I fully understand why artists of all sorts get upset about it. Especially because many tech people often seem to be okay with ignoring copyright/licensing and arguably hurting people’s livelihood right up until GitHub ingests GPL code for Copilot and suddenly copyright and licensing matter.


I'm not following your argument - I am a visual artist. I do it for myself, as you said, as an outlet. I enjoy it.

If AI can also create images... I don't see how that changes what I enjoy. There are already better painters than I, and more productive painters than I. They make money with it, I don't. This doesn't stop me from painting. Neither will AI that can paint. I'll still do what I enjoy.


People will continue to make art for non-monetary reasons just as they've always done. Some will manage to make money doing it and most won't. Seems to me like that's been an unchanging story throughout human history.

Chess has never been more popular, for f's sake!


fwiw the only piece of AI art that has given me the sense of awe and beauty that art you'd find in a museum gives me was that spiral town image https://twitter.com/MrUgleh/status/1705316060201681313, which is something you couldn't have really made without AI. But that was only interesting because of the unique human generated idea behind it which was the encoding of a geometric pattern within a scene.

Most AI art is just generic garbage that you scroll past immediately and doesn't offer you anything.

We're gonna have to do something to stop the biggest crisis in meaning ever that comes out of this eventually though. Eventually no one will be of any economic value to society. Maybe just put someone in an ultra realistic simulation to give them artificial meaning.


> which is something you couldn't have really made without AI

Serious question: Why not?

> Eventually no one will be of any economic value to society.

People have value outside of economics — I’m sure you know — and it makes me so sad that we as a society? seem to only care about the money in the end.


I think you're right it could have been created without AI. I'm trying to think of the right way to say it. Maybe it wouldn't have been created without AI? Or AI has made it so simple to express this idea that the idea has been expressed? Or just the idea of inpainting is what has brought this idea forward.

Yes of course people have value outside of economics that's why I said economics and not value in general. I think it's quite sad as a society we've moved towards a value system which is basically what is good for the economy is good, and if you earn more money you are better.

In the past most people were religious and that gave them meaning. Religion is in decline now but I think people are just replacing it with worshipping the progression of technology basically. For the last 100 years there's always been a clear direction to move in to progress technology, and we haven't really had to think very hard. That's what AI is going to bring an end to I think and I have no idea what we are going to do.


> In the past most people were religious and that gave them meaning. Religion is in decline now but I think people are just replacing it with worshipping the progression of technology basically. For the last 100 years there's always been a clear direction to move in to progress technology, and we haven't really had to think very hard. That's what AI is going to bring an end to I think and I have no idea what we are going to do.

Fascinating thought. Technology as the new religion is smth I’ll have to think about more.


Watch some clips from Ray Kurzweil, I find his visions to be basically indistinguishable from what I've read in the bible and in other religions. He talks about immortality, resurrection, digital afterlife. Omnipotent, omnipresent, omniscient super intelligence, the whole shebang. He even claims that soon, we'll all be Gods, millions of times more intelligent then we are today. In some ways, I actually find his views and beliefs a little disturbing.

I recently saw an "AI safety discussion" featuring Gregg Brockman from OpenAI who was referencing Kurzeil. It does seem like the religion has maybe caught on. To what extent Brock believes in it, I'm not sure but I can't help feeling that this belief in modern tech might one day seem like how we thought of the pyramids granting eternal life, or mercury, or any other seemingly incredible thing discovery / phenomena of the time. That is to say, the brain is a fickle beast and is easily amused and is just as easily bored. While we're in the situation we fee we're on the doorstep of immortality, eternal greatness, but maybe we're no where near that.

I'm open minded about it all, but it's hard to deny the parallels between the past beliefs and the present. Maybe this time it is different? Who knows.


Well I've been told that AI can't produce anything truly novel, so human artists need only retreat to the final stronghold of originality and surely human exceptionalism will remain unscathed.


How is it horrifying? Don’t use it if it scares you, the phone isn’t gonna walk over and start jostling for a spot to put your kids to bed


There are kids right now that spend more time in VRChat than real life. It's really something else.


I went from being worried to thinking it won't replace me anytime soon after using GPT4 for a while and now I'm back to being worried.

Because the pace of development is intense. I would love to be financially independent and watch this with excitement and perhaps take on risky and fun projects.

Now I'm thinking - how do I double or triple my income so that I reach financial independence in 3 years instead of 10 years.


I'm not convinced that this pace will continue. We're seeing a lot of really cool, rapid evolution of this tech in a short amount of time, but I do think we'll hit a soft ceiling in the not too distant future as well.

If you look at something like smartphones, for example. Smartphones, from my perspective, got drastically better and better from about ~2006-2015 or so. They were rapidly improving cameras and battery life and it felt like a new super cool app that would change our lives was being released every day, but it feels like by ~2016 or so, phones more or less hit a ceiling on how cool they were going to get. Obviously things still improve, but I feel like the pace slowed down eventually.

I think AI is going to have the same path. GANNs and transformers and LLMs and the like have opened the floodgates and for the next few years clever people are going to figure out a ton of really clever uses for them, but eventually it's going to plateau and progress will become substantially more gradual.

I don't think progress is linear, I think it's more like a staircase.


> I do think we'll hit a soft ceiling in the not too distant future ... it's going to plateau and progress will become substantially more gradual.

I don't think this will age well.

It's a matter of simple compute power to advance from realistic text/token prediction, to realistic synthesis of stuff like human (or animal) body movement, for all kinds of situations, including realistic facial/body language, moods, and so on. Of course perfect voice synthesis. Coupled with good enough robotics, you can see where I'm going with this, and that's only because my imagination is limited to sci-fi movie tropes. I think this is going to be wilder than we can imagine, while still just copying training sets.


Isn't video prediction a substantially harder problem than text prediction? At least that was the case a couple of years ago with RNNs/LSTMs. Haven't kept up with the research, maybe there's been progress.


> It's a matter of simple compute power to advance

Yup. It's "just" a compute advance away. Never mind it's already consuming as much computing as we can throw at it. It's "just" there.


Maybe that's true but I honestly don't think we can reason at all about how this will progress from a consumer hardware product like the iPhone.


There is clearly a plateau in how good a ux can be. It might be a local optimum but still you solve the task the user wants. I don't see a clear ceiling in intelligence. And if the ceiling is how much of the human tasks can be replaced then I think when we reach it the world is going to look very different from now. (Let's also not discount how much the world changed since the introduction of the smartphone.)


> I don't see a clear ceiling in intelligence

The plateau in this case is presumably how far you can advance intelligence from the current model architectures. There seems to be diminishing returns from throwing more layers, parameters or training data at these things.

We will see improvements but for dramatic increases I think we'll need new breakthroughs. New inventions are hard to predict, pretty much by definition.


That's more or less what I was getting at; the cool new GANN and LLM models have a certain set of problems that they will solve exceptionally well, and then another set of problems that they will solve "pretty well", but I don't think they'll solve every problem.


Or an exponential perhaps. Like the Wait But Why thing (https://waitbutwhy.com/2015/01/artificial-intelligence-revol... bottom of the article)


The worry is that in your analogy, we’re the SLR. ChatGPT is a 1MP digital camera.


I don’t think any of this materially changes job outlook for software development over the next decade.

I use ChatGPT daily for school, and used Copilot daily for software development; it gets a lot wrong a lot of the time, and can’t retain necessary context that is critical for being useful long term. I can’t even get it to consume an entire chapter at once to generate notes or flashcards yet.

It may slightly change some aspects of a software job, but nobody’s at risk.


> I can't copy paste an entire book chapter and have flashcards in 30 seconds.

If that's your bar for whether or not it changes the job outlook for software development over the next DECADE, I think you need to recalibrate.


Er, why did you rewrite what I wrote but then pretend like it was a quote? That's a really weird way to reply!

But to address your point, my "bar" is that OpenAI's ChatGPT fails to solve problems for me on a many-times-a-day basis. It's an immensely helpful tool, but I still need to drive it, so it's not replacing me, it's augmenting me.


If I took 2 weeks off from work I could build this prototype quite easily. We're in an interesting period where the space of possibilities is so large it just takes a while for the "market" to exhaust it.


Quizlet has a feature to build flashcards using AI. I'm sure they could write a backend service that just chunked the entire chapter.


It doesn't work well enough yet. The flashcards it generates don't actually fit well into its own ecosystem. When you try to build the "quizzes", the wrong answers are trivially spottable. Further, even the generated questions are stilted don't hit parity with manually generated flashcards.

My use of ChatGPT for this purpose is so far mostly limited to a sanity check, e.g. "Do these notes cover the major points of this topic?" Usually it'll spit back out "Yep looks good" or some major missed point, like The Pacific Railway Act of 1862 for a topic on the Civil War's economic complexity.

I'll also use it to reformat content, "Convert these questions and answers into Anki format."


This feels fairly naive, ignoring how much progress has happened over the (short) span of one year. This doesn't sound like that tough of a gap to close in another year (again, projecting based off recent progress).


The bitter truth though is that the last 20% always takes 99% of the time and effort. LLMs are a huge breakthrough and we are absolutely speedrunning the 80%, but I see no reason to believe the usual pattern won’t hold when the easy wins are exploited.


Some insurance against this dam breaking wouldn't be so bad however


What actually was the innovation in LLMs that produced the kind of AI we're seeing now? Is that innovation ongoing or did it happen, and now we're seeing the various optimizations of that innovation?

Is voice and image integration with ChatGPT a whole new capability of LLMs or is the "product" here a clean and intuitive interface through which to use the already existent technology?

The difference between GPT 3, 3.5, and 4 is substantially smaller than the difference between GPT 2 and GPT 3, and Sam Altman has directly said there are no plans for a GPT 5.

I don't think progress is linear here. Rather, it seems more likely that we made the leap about a year or so ago, and are currently in the process of applying that leap in many different ways. But the leap happened, and there isn't seemingly another one coming.


>What actually was the innovation in LLMs that produced the kind of AI we're seeing now? Is that innovation ongoing or did it happen, and now we're seeing the various optimizations of that innovation?

Past the introduction of the transformer in 2017, There is no big "innovation". It is just scale. Bigger models are better. The last 4 years can be summed up that simply.

>Is voice and image integration with ChatGPT a whole new capability of LLMs or is the "product" here a clean and intuitive interface through which to use the already existent technology?

What is existing technology here ? Open ai aren't doing anything so alien you couldn't guess at if you knew what you were doing but image training at the scale of GPT-4 is new and it's not even the cleanest way to do it. We still don't have a "trained from scratch" large scale multimodal LLM yet.

>The difference between GPT 3, 3.5, and 4 is substantially smaller than the difference between GPT 2 and GPT 3

Definitely not lol. The OG GPT-3 was pulling sub 50 on MMLU. Even benchmarks aside, there is a massive gap in utility between 3.5 and 4, never mind 3. 4 was finished training august 2022. It's only 2 years apart from 3.

>I don't think progress is linear here. Rather, it seems more likely that we made the leap about a year or so ago, and are currently in the process of applying that leap in many different ways. But the leap happened, and there isn't seemingly another one coming.

There was no special leap (in terms of theory and engineering). This is scale plainly laid out and there's more of it to go.

>and Sam Altman has directly said there are no plans for a GPT 5.

the same that sat on 4 for 8 months and said absolutely nothing about it ? Take anything altman says about new iterations with a grain of salt.


Firstly no, the gap between 3 and 4 is not anything as large as the gap between 2 and 3.

Secondly, nothing you said here changed as of this announcement. Nothing here makes it any more or less likely LLMs will risk software engineering jobs.

Thirdly, you can take what Sam Altman says with as many grains of salt as you like, if there really was no innovation at all as you claim, then there will be a limit hit at computing capability and cost.


>the gap between 3 and 4 is not anything as large as the gap between 2 and 3.

We'll just have to agree to disagree. 3 was a signal of things to come but it was ultimately a bit of a toy, a research curiosity. Utility wise, they are worlds apart.

>if there really was no innovation at all as you claim, then there will be a limit hit at computing capability and cost.

computing capability and cost are just about the one thing you can bank on to reduce. already training gpt-4 today would be a fraction of the cost than it was when open ai did it and that was just over a year ago.

Today's GPU's take ML into account to some degree but they are nowhere near as calibrated for it as they could be. That work has just begun to start.

Of any of the possible barriers, compute is exactly the kind you want. It will fall.


Do you realize I'm not disagreeing with you about the difference between 3 and 4? Reread what I wrote. I contrasted 3 and 4 with 2 and 3, which you seem to be entirely ignoring. 3 and 4 could be worlds apart, but wouldn't matter if 2 and 3 were two worlds apart, for example.

And it is not true that computing power will continue to reduce; Moore's Law has been dead for some time now, and if incremental growth in LLMs require exponential growth in computing power the marginal difference won't matter. You would need a matching exponential growth in processing capability which is most certainly not occurring. So compute will not fall at the rate you would need it to for LLMs to actually compete in any meaningful way with human software engineers.

We are not guaranteed to continue to progress in anything just because we have in the past.


>And it is not true that computing power will continue to reduce; Moore's Law has been dead for some time now, and if incremental growth in LLMs require exponential growth in computing power the marginal difference won't matter.

This is a lot of unfounded assumptions.

You don't need Moore's Law. GPU's are not really made with ML training in mind. You don't need exponential growth for anything. The money Open ai spent on GPT-4 a year ago could train a model twice as large today. and that amount is a drop in the bucket for the R&D of large corporations. Microsoft gave open ai 10B. amazon gave anthropic 4B

>So compute will not fall at the rate you would need it to for LLMs to actually compete in any meaningful way with human software engineers.

I don't think the compute reuired is anywhere near as much as you think it is.

https://arxiv.org/abs/2309.12499

>We are not guaranteed to continue to progress in anything just because we have in the past.

Nothing is guaranteed. But the scaling plots show no indication of a slow down so it's up to you to provide a concrete reason this object in motion is going to stop immediately and conveniently right now. If all you have is "well it just can't keep getting better right" then visit the 2 and 3 threads to see how meaningless such unfounded assertions are.


I think you fundamentally don't understand the nature of exponential growth, and the power of diminishing returns. Even if you double the GPU capacity over the next year, you won't even remotely begin to come close enough to producing a step-level growth of capability such as what we experienced between 2 to 3, or even 3 to 4. The LLM concept can only take you so far, and we're approaching the limits of what an LLM is capable of. You generally can't just push an innovation infinitely, it will have a drop-off point somewhere.

the "Large" part of LLMs is probably done. We've gotten as far as we can with those style of models, and the next innovation will be in smaller, more targeted models.

> As costs have skyrocketed while benefits have leveled off, the economics of scale have turned against ever-larger models. Progress will instead come from improving model architectures, enhancing data efficiency, and advancing algorithmic techniques beyond copy-paste scale. The era of unlimited data, computing and model size that remade AI over the past decade is finally drawing to a close. [0]

> Altman, who was interviewed over Zoom at the Imagination in Action event at MIT yesterday, believes we are approaching the limits of LLM size for size’s sake. “I think we’re at the end of the era where it’s gonna be these giant models, and we’ll make them better in other ways,” Altman said. [1]

[0] https://venturebeat.com/ai/openai-chief-says-age-of-giant-ai...

[1] https://techcrunch.com/2023/04/14/sam-altman-size-of-llms-wo...


>The LLM concept can only take you so far, and we're approaching the limits of what an LLM is capable of.

You don't know that. This is literally just an assertion. An unfounded one at that.

If you couldn't predict how far in 2017 the LLM concept would take us today, then you definitely have no idea how far it could actually go.

>believes we are approaching the limits of LLM size for size’s sake

Nothing to do with thinking they wouldn't improve from scale.

https://web.archive.org/web/20230531203946/https://humanloop...

An interview from Altman later clarifying.

"6. The scaling laws still hold Recently many articles have claimed that “the age of giant AI Models is already over”. This wasn’t an accurate representation of what was meant.

OpenAI’s internal data suggests the scaling laws for model performance continue to hold and making models larger will continue to yield performance. The rate of scaling can’t be maintained because OpenAI had made models millions of times bigger in just a few years and doing that going forward won’t be sustainable. That doesn’t mean that OpenAI won't continue to try to make the models bigger, it just means they will likely double or triple in size each year rather than increasing by many orders of magnitude"

Yes there are economic compute walls. But that's the kind of problem you want, not "innovation".


Er, that's not how arguments work. What we can't know is that those trends will continue, so it's on you to demonstrate that they will, despite evidence suggesting they won't.

As for as what you linked, Altman is saying the same thing I'm saying:

> That doesn’t mean that OpenAI won't continue to try to make the models bigger, it just means they will likely double or triple in size each year rather than increasing by many orders of magnitude.

This is exactly my point; doubling or tripling of the size will be possible, but it won't result in a doubling of performance. We won't see a GPT 5 that's twice as good as GPT 4, for example. The jump from 2 to 3 was exponential. The jump from 3 to 4 was also exponential, though not as much. The jump from 4 to 5 will follow that curve, according to Altman, which means exactly what he said in my quote; the value will continue to decrease. For a 2 to 3 type jump, GPU technology would have to completely transform in capability, which there are no indications that we've found that innovation.


My argument was that improvement from scale would continue. There is absolutely evidence suggesting this.

Gpt-4 can perform nearly all tasks you throw at it with well above average human performance. There literally isn't any testable definition of intelligence it fails that a big chunks of humans wouldn't also fail. You seem to keep missing the fact that We do not need an exponential improvement from 4.


> Gpt-4 can perform nearly all tasks you throw at it with well above average human performance.

It can't even generate flashcards from a textbook chapter, because it can't load the entire chapter into memory. Heck, it doesn't even know what textbook I'm talking about; I have to provide the content!

It fails constantly at real world coding problems, and often does so silently. If you tried to replace a software developer with GPT 4, you would be left with a gaping productivity hole where that developer you replaced once existed. The improvement GPT 5 would have to provide is multiple orders of magnitude in order for this to be a realistic proposition.

I use it daily and know better than to trust its output.


>It can't even generate flashcards from a textbook chapter, because it can't load the entire chapter into memory. Heck, it doesn't even know what textbook I'm talking about; I have to provide the content!

Okay...? That's a context window problem. and you could manage it if you sent the textbook in chunks.

>The improvement GPT 5 would have to provide is multiple orders of magnitude in order for this to be a realistic proposition.

No..it wouldn't

https://arxiv.org/abs/2309.12499


So by your own words, in order to use the LLM usefully, I need to manually manage it? Do you know what I don’t have to manually manage? A person.

I can feed a person a broad, complex or even under formed idea and they can actively troubleshoot until the problem is resolved, further monitoring and tweaking their solution so the problem remains resolved. LLMs can’t even come close to doing that.

You’re proving my point for me; it’s a tool, not a developer. Zero jobs are at risk.

Also not for nothing, but no, sending the textbook in chunks doesn’t work as the LLM can’t then synthesize complex ideas that span the entire chapter. You have to compose a set of notes first, then feed it the notes, and even then the resulting flashcards are meaningfully worse than what I could come up with myself.


Moreover you keep saying we can't scale infinitely. Sure...but nobody is saying we have to. 4 is not as scaled from 3 as 3 was from 2. Doesn't matter, still massive gap.


As I said already, the gap from 3 to 4 was substantially smaller than the gap between 2 to 3, and all indications are that the gap from 4 to 5 will also be further smaller than that.


> And it is not true that computing power will continue to reduce; Moore's Law has been dead for some time now, and if incremental growth in LLMs require exponential growth in computing power the marginal difference won't matter.

I think the stronger argument here won't necessarily be Moore's Law related but a change in architecture. Things like Apple's Neural Engine, Google's TPMs, or Geohot's Tinybox. In Intel's Tick-Tock model, this is the Tock for the previous Tick of larger datasets so to speak.

(Note: I don't necessarily agree, just trying to make a stronger argument than just invoking Moore's Law.)


The opposite is fairly naive. Software development is not only dumping tokens into a text file. To have a significant impact on the market, it should do much, much, much more: compile and test code, automatically assess the quality of what its done, be aware of the current design trends (if in UI/UX), ideally innovate, it should also be able to run a debugger, inspect all the variables, and deduce from there how it got something wrong, sometimes with tiny clues that I don't even know how it would get its information (e.g. in graphics programming where you have to actually see at a high frame rate). Oh snap a library is broken ? The AI needs to search online why it's broken, then find a fix (log onto a website to communicate with support, install a missing dep...). It can't be fixed ? Then the AI needs to explain this to the manager, good luck for that. It would need to think and feel like a human, otherwise producing uncanny content that will be either boring, either creepy.

You can think about your daily job and break down all the tasks, and you'll quickly realize that replacing all this is just a monstrous task.


Yeah, I definitely am not on team “We’re Doomed”, but I also can’t say definitively that I’m on team “We’re Fine” either.

I think there are merits to both arguments, and I think it’s possible that we’ll see things move towards either direction in the next 1/5/10 years.

My point is, I don’t think we can rule out the possibility of some jobs being at risk within the next 1/5/10 years.


Some jobs are definitely at risk, I was just making the case for software development. But just like you, even after writing all this, there's still some anxiety.


In a market you don't need to replace it, you replace millions of hours of building/testing/documenting apps and you've effectively reduced demand for SWE labour (ignoring the potential for new job creation).



> I can’t even get it to consume an entire chapter at once to generate notes or flashcards yet.

Anthropic’s Claude 100k is your jam, then. And Amazon just invested $1 billion in them.


You could easily write a 20 line script to generate notes and flashcards. I know because I have.


Yep, and I could have done that 5 years ago. My point is that I run into problems on a daily basis that the current ChatGPT is not capable of handling on its own, without meaningful interaction with me, a software engineer.


you mean like anki flash cards?


I'm very worried constantly. This is the story of the bear, where you just have to be faster than the other guy. For now. The bear is getting faster and faster and it won't be long before it eats all of us.

It feels like we're at the end of history. I don't know where we go from here but what are we useful for once this thing is stuck inside a robot like what Tesla is building? What is the point of humanity?

Even taking a step back, I don't know how I'm going to feed my family in ten years, because my skillset is being rapidly replaced.

And to anyone mentioning UBI, I'm pretty sure they'll just let us starve first.


>where you just have to be faster than the other guy. For now. The bear is getting faster and faster and it won't be long before it eats all of us.

Here's the thing about that. At first it's about you running faster and the bear getting the slow ones, but this is actually a very short term situation. When things start getting bad, it's not the bear you need to worry about, it's your neighbor stabbing you in the leg so you're the slow one.


re: UBI. I don't think they'll let us starve, but that's a very low bar. If we all become fungible and invaluable they can just feed us Soylent green.


I think it may go like slave owning societies of old. We'll be the masters and have the robots do stuff. Perhaps.


Who is they?


People that has enough money & wealth to employ others, control corporations who employ thousands and influence the government meaningfully (lobbying). Today, they basically control who is employed and who isn't by deciding how much to invest and in what. No, you cannot just choose to employ yourself in all circumstances. I'm all for being self-sufficient but not everyone can start a company (time and capital intensive, and very risky without a safety net).


If you don’t believe you can do a thing before you even try, then you’re correct, right out of the gate. (insert eyeroll emoji here)

All your repeated uses of “they” points to a toxic external-locus-of-control worldview. You were always the only limit of yourself. Any other claim amounts to heretical self-dehumanization. You’re not fungible and never were, and anyone who tries to make you believe that deserves the utmost vehement pushback.


> You’re not fungible and never were

Sure as heck doesn't feel that way. And that's as a software developer with multiple college degrees and a decade of experience. The neurodivergence means I've always had to mask and be on guard and push well beyond my limits into physical/mental damage, because the fear of losing employment is ever-present. Feels pretty commoditized.

> and anyone who tries to make you believe that deserves the utmost vehement pushback.

The faceless corporations and their boards of investors who value "line go up" over basically every other metric of human wellbeing? Yes, they absolutely deserve pushback, but it's not easy (open source COTS guillotine plans, anyone?).


He's right though, there will never be as many opportunities to start companies as there will be to become a worker at a company. The window of opportunity is shrinking drastically.

It doesn't matter if we're not fungible in the metaphysical sense, we are fungible when it comes to the economical value we provide to the world.

This is no different than telling coal miners that are 50 years old to "learn to code". It's ridiculous and it's disingenuous.


Whoever owns and control the bear, are you actually confused about that or is this just baiting?


> my skillset is being rapidly replaced.

Why do you have only one? Learn some trades. AI isn't going to be demolishing a bathroom and installing tile any time soon.


I don't know what your salary is but mine isn't going to be replaced by demoing a bathroom and I have a mortgage and a standard of living I was hoping to be able to afford at least until my kids are out of the house.


Unless you're making a ridiculous amount of money, you can definitely match a developer salary remodeling homes. So long as you're the actual business owner. This was just an example, of course.


But being the actual business owner is also not "demolishing a bathroom and installing tile." While AI won't be able to physically demo a bathroom, it's conceivable that it will eventually be able to run a business better than a human entrepreneur.

The only jobs that seem to be safe (for the medium term) are jobs that require some physical manipulation of the world. Like, the actual, hands-on physical work that tradespeople do. Although they'll eventually fall to AI-powered robots.


I think you need to calm down. "AI" today is just fancy predictive text and there's no evidence it will progress far beyond that.


In a democracy you'll be able to vote for the lot that will have the robots serve us.


The real problem is distribution of the output of production. We will need something like UBI eventually.


UBI is a bandaid on top of capitalism. It is saying "we have a system where people die if they don't have money, so we'll give people money." It's not a real fix. A real fix would be replacing the system with one where people don't need money in order to not die.

We're going to keep automating more and more things. I think that much is inevitable. Eventually, we may get to a point where very few jobs are necessary for society to function. This should be a good thing, because it would mean fewer people would have to work and could therefore pursue things that actually interest them, but it would be a catastrophe under the current system.


That’s not going to come to fruition, and no amount of dreamy socialist fanfiction’ing is going to make it so. People pay for value. Produce value for others, get paid. LLM’s are tools to make humans able to produce more value, and will not replace humans, although the job market will change, and hopefully utilize humans better.

People, NOT machines, are the ultimate judgers of what is valuable and the ultimate producers of value.

“no one should have to work to eat” is the most ridiculous gen Z meme going around lately. Like, technically yes, not eating would make you unhealthy and thus unable to contribute yourself, but we also don’t want the opposite of people just sitting home all depressed about being oppressed and not utilizing their gifts while living off mysteriously-produced (paid for or labored over by whom?) gourmet sushi. How about another common meme in response? “We live in a society.”


> Produce value for others, get paid

So if a human is unable to produce value, they don't get (food/education/heathcare/<resource>)? That seems to be the implication. We in developed countries already have some amount of "value risk hedging" (I'm loathe to say "socialism" here), we just disagree endlessly how much is the optimal amount. But we've determined that wards of the state, universal education, and some amount of food support for the poor is the absolute bare minimum for a developed society.

> People, NOT machines, are the ultimate judgers of what is valuable and the ultimate producers of value.

Uhhh we already have software which sifts through resumes to allow/reject candidates, before it gets to any kind of human judge, so we are already gating value assessments.


> So if a human is unable to produce value, they don't get (food/education/heathcare/<resource>)?

I would agree that some people are simply unable to help and need the help themselves and should get it. UBI or some other social safety net should be there for that.


People don't need money to not die, they need to work to not die. Until the fulfillment of everyone's basic needs can be automated, people are expected to work (disregarding some obvious exceptions). The money is just a proxy for that work.

Although there is certainly a lot of fuckery going on with the money (currency) itself, but if that's the problem you're alluding to, I don't think summarizing it as "capitalism" is accurate.


You make it sound simple. What's the replacement? UBI is at least somewhat within grasp. Completely replacing capitalism is going to take a long time and it's more likely to happen as incremental improvements.


I think there will be plenty of work for a while because manual labor - construction, healthcare (doctors, nurses), food preparation, tradespeople - will be hard to replace in the foreseeable future.

I see UBI as a solution to inequality (real problem) not as a solution to lack of jobs (not a problem). AI will probably lead to reduction of inequality and therefore there will be less need for UBI.

In theory, the "mental" workers who get replaced by AI could simply move to manual jobs and total production and average wages would go up. But they may not like it, at least I wouldn't.


>, the "mental" workers who get replaced by AI could simply move to manual jobs and total production and average wages would go up

Why would manual job average wages go up? You're increasing the size of the labor pool.


Total production would increase (AI will allow us to make more with less) and I'm expecting the capital / labor share to remain stable.

An analogy:

Imagine that half of the labor force makes cars, the other half creates software. The average person buys 1 car and 1 software per year. There's a breakthrough, AI can now be used to create software almost for free. It can even make 2x more software per year. The programmers switch to making cars. So now the economy is producing 2 cars and 2 softwares per worker per year! Salaries have now doubled thanks to technological progress.

You could argue that this will increase inequality and all of the productivity gains will go to the top 1%. I don't think so.


>You could argue that this will increase inequality and all of the productivity gains will go to the top 1%.

I don't have to argue.. others have done it for me

https://www.cnbc.com/2022/04/01/richest-one-percent-gained-t...

https://www.cnbc.com/2023/01/16/richest-1percent-amassed-alm...

https://time.com/5888024/50-trillion-income-inequality-ameri...


Manual labor has been my general thought but the progress Tesla is making on their robots makes me question that assumption. I imagine in the next decade, we're going to see large swaths of the population unable to do meaningful work that isn't already done by machines.

We're looking down the pipe at a truly dystopian future.


AI will increase inequality, because most jobs will be automated, & most people will either not have jobs or have incredibly menial physical jobs where they are interchangeable


Do we worry about the distribution of oxygen in the atmosphere, so everyone has equal chance of breathing? I know it's hard to comprehend massive abundance of everything, but please try.


there's already massive over abundance and people are still dying in the streets, & no public healthcare in the US


UBI is just not happening any time soon in the US. To start, half of the country is already default against it. Precisely 0 people in Congress, the White House, or those in adjacent power roles (lobbyists and whatnot) are for it or have any idea what it is.

Aside from rolling out the guillotine, I don't see UBI a possibility until the 2nd half of the 21st century. There's just too many forces and entities alive that don't want it


I think the plan is first robots take our jobs, then UBI. If you gave people free money now we'd be suffering from a lack of workers due to general robot non existence. I'm guessing 2045 maybe?


Don't worry, you are not alone, there are hundreds of millions of us around the world, maybe even billions (all the jobs that could be replaced by AI in the next 10-20 years). We will just need to do what we always do, so vote for a systemic change or eat the rich.


Ah, your closing question could be a thread in itself.

This is tricky territory! Be wary of the treadmill where as your income rises, your sense of what's an acceptable restaurant, vacation, car, home, etc. escalates just as fast. Then you'll always be n+1 windfalls away from your goal. If you're really wanting "financial independence," which is a weirdly opaque phrase, focus at least 49% of your energy on keeping your spending rate low.


> I would love to be financially independent and watch this with excitement

Even if you were, your money would be invested in something which is tied to the overall economy and if a huge proportion of knowledge jobs are at risk, you would still be exposed to it through whatever assets you own. Don't expect stocks (or currency, or property) to do great when unemployment is 30%+.


You summed up my financial and career worries very nicely


Now just throw this into a humanoid looking robot with fine motor skills and we are halfway to a dystopian hellscape that is now only years away instead of decades. What a time to be alive.


The Boston dynamics/openai collaboration for the apocalypse we’ve all been waiting for!


What would make it dystopian would be if this humanoid robot was then granted rights. As a servant, it could be useful.


I would like our future Cylon overlords to know that I had nothing to do with this!


Why shouldn’t AI have rights? Because us humans have magical biology juice?


Because that would come at the expense of making human lives better.


The humanoid-looking robot would make it more refined, no doubt about that, but all these applications can do without it:

- Make it process customer-support requests.

- Make a virtual nurse for when you call the clinic.

- Make it process visa applications, particularly the part about interviews ("I know you weren't born back then, but I must ask. Did you support the Nazis in 1942? There is only one right answer and is not what you think!")

- Make it do job interviews. How will you feel after the next recession, when you are searching for a job and spend the best part of a year doing leetcode interviews with "AI-interviewer" half-assedly grading your answers?

- Make it flip burgers at McDonalds.

- Make it process insurance claims and ask bobby-trap questions like "did the airline book you in a later trip? Yes? Was that the next day? Oh, that's bad. But, was it before 3:00 PM? Ah, well, you have no right to claim since you weren't delayed for more than 24 hours. Before you go, can you teach me which of these images depict objects you are willing to suck? If you do, I promise I'll be more 'human' next time."

- Make it watch aggregated camera fees across cities around the world to see what that guy with the hat is up to.

- Make some low-cost daleks to watch for trouble-makers at the concert, put the AI inside.

In all cases, the pattern is not "AI is inherently devious and is coming for you, but "human trains devious AI and puts it in control to save costs".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: