We are beginning to roll out new voice and image capabilities in ChatGPT

modeless · on Sept 25, 2023

Voice has the potential to be awesome. This demo is really underwhelming to me because of the multi-second latency between the query and response, just like every other lame voice assistant. It doesn't have to be this way! I have a local demo using Llama 2 that responds in about half a second and it feels like talking to an actual person instead of like Siri or something.

I really should package it up so people can try it. The one problem that makes it a little unnatural is that determining when the user is done talking is tough. What's needed is a speech conversation turn-taking dataset and model; that's missing from off the shelf speech recognition systems. But it should be trivial for a company like OpenAI to build. That's what I'd work on right now if I was there, because truly natural voice conversations are going to unlock a whole new set of users and use cases for these models.

TheEzEzz · on Sept 25, 2023

Completely agree, latency is key for unlocking great voice experiences. Here's a quick demo I'm working on for voice ordering https://youtu.be/WfvLIEHwiyo

Total end-to-end latency is a few hundred milliseconds: starting from speech to text, to the LLM, then to a POS to validate the SKU (no hallucinations are possible!), and finally back to generated speech. The latency is starting to feel really natural. Building out a general system to achieve this low-latency will I think end up being a big unlock for enabling diverse applications.

TheEzEzz · on Sept 25, 2023

Since this is getting a bit of interest, here's one more demo of this https://youtu.be/cvKUa5JpRp4 This demo shows even lower latency, plus the ability to handle very large menus with lots of complicated sub-options (this restaurant has over a billion option combinations to order a coffee). The latency is negative in some places, meaning the system finishes predicting before I finish speaking.

arcticfox · on Sept 25, 2023

Holy cow. That's better than the average human drive-through attendant.

jonplackett · on Sept 25, 2023

This is cool. But I want to see how it handles you going back one and tweaking it.

armini · on Sept 26, 2023

We've built something similar that allows you to tweak/update notes & reminders https://qwerki.com/ (private beta) here's the video demo https://www.youtube.com/shorts/2hpBTxjplIE we've since moved to training our own LLAMA as it's more responsive & we have better reliability.

cyrux004 · on Sept 25, 2023

This is pretty good. Do you think running models locally will be able to achieve performance (getting task done successfully) compared to cloud based ones.i am assuming for context of a drive through scenario it should be ok but more complex systems might need external infromation

TheEzEzz · on Sept 25, 2023

Definitely depends on the application, agreed. The more open ended the application the more dependent it is on larger LLMs (and other systems) that don't easily fit on edge. At the same time, progress is happening that is increasing the size of LLM that can be ran on edge. I imagine we end up in a hybrid world for many applications, where local models take a first pass (and also handle speech transcription) and only small requests are made to big cloud-based models as needed.

wordpad25 · on Sept 25, 2023

Can you share the source code? What did you do to improve the latency?

TheEzEzz · on Sept 25, 2023

Lots of work around speculative decoding, optimizing across the ASR->LLM->TTS interfaces, fine-tuning smaller models while maintaining accuracy (lots of investment here), good old fashioned engineering around managing requests to the GPU, etc. We're considering commercializing this so I can't open source just yet, but if we end up not selling it I'll definitely think about opening it up.

7_hours_ago · on Sept 26, 2023

Can you at least share the stack that you're using in building this? What kind of business model are you considering in commercializing it?

TheEzEzz · on Sept 26, 2023

We're design the stack to be fairly flexible. It's Python/Pytorch under the hood, with the ability to plug and play various off the shelf models. For ASR we support GCP/AssemblyAI/etc, as well as a customized self-hosted version of Whisper that is tailored for stream processing. For the LLM we support fine-tuned GPT3 models, fine-tuned Google text-bison models, or locally hosted fine-tuned Llama models (and a lot of the project goes into how to do the fine-tuning to ensure accuracy and low latency). For the TTS we support Elevenlabs/GCP/etc, and they all tie into the latency reducing approaches.

Breza · 2023-10-05T10:40:33.000000Z

Neat! I appreciate your approach to preventing hallucinations. I've used something similar in a different context. People make a big deal about hallucinations but I've found that validation is one of the easier aspects of AI architecture.

nelox · on Sept 25, 2023

The voice does not seem to be able to pronounce the L in “else”. What’s happening there?

TheEzEzz · on Sept 25, 2023

Good question. Off the shelf TTS systems tend to enunciate every phoneme more like a radio talk show host rather than a regular person, which I find a bit off putting. I've been playing around with trying to get the voice to be more colloquial/casual. But I haven't gotten it to really sound natural yet.

g0atbutt · on Sept 25, 2023

This is a very slick demo. Nice job!

TheEzEzz · on Sept 25, 2023

Thanks! It's a lot of fun building with these new models and recent AI approaches.

arktiso · on Sept 25, 2023

Wow, the latency on requests feels great!! I’m really curious: is this running entirely with Python?

TheEzEzz · on Sept 25, 2023

100% Python but with a good deal of multiprocessing, speculative decoding, etc. As we move to production we can probably shave another 100ms off by moving over to a compiled system, but Python is great for rapid iteration.

mach1ne · on Sept 25, 2023

Manna v0.7

kortex · on Sept 25, 2023

Context for the unaware: https://en.m.wikipedia.org/wiki/Manna_(novel)

swsieber · on Sept 26, 2023

That's way slick.

Can I ask what your background is, and what things you're used to working with? I don't have the chops to build what you built, but I'd love to get there.

TheEzEzz · 2023-10-02T23:58:24.000000Z

My advice is always to jump in and start building! My background is math originally, so I had some of the tools in my tool box, but I'm mostly self-taught in computer science and machine learning. I read textbooks, research papers, code repos, but most importantly I build a lot of stuff. Once I'm excited about an idea I'll figure out how to become an expert to make it a reality. Over the years the skills start to compound, so it also helps that I'm an old man!

simian1983 · on Sept 25, 2023

That demo is pretty slick. What happens when you go totally off book? Like, ask it to recite the numbers of pi? Or if you become abusive? Will it call the cops?

TheEzEzz · on Sept 25, 2023

It's trained to ignore everything else. That way background conversations are ignored as well (like your kids talking in the back of the car while you order).

edge17 · on Sept 26, 2023

How do you train for this?

yarone · on Sept 25, 2023

Nice work, very cool!

furyofantares · on Sept 25, 2023

> This demo is really underwhelming to me because of the multi-second latency between the query and response, just like every other lame voice assistant.

Yep - it needs to be ready as soon as I'm done talking and I need to be able to interrupt it. If those things can be done then it can also start tentatively talking if I pause and immediately stop if I continue.

I don't want to have to think about how to structure the interaction in terms of explicit call/response chain, nor do I want to have to be super careful to always be talking until I've finished my thought to prevent it from doing its thing at the wrong time.

wkat4242 · on Sept 25, 2023

The interruption is an important point yeah. It's so annoying when Siri misunderstands again and starts rattling off a whole host of options. And keeps getting stuck in a loop if you don't respond.

In fact I'm really surprised these assistants are still as crap as they are. Totally scripted, zero AI. It seems low hanging fruit to implement an LLM but none of the big three have done so. Not even sure about the fringe ones like Cortana and Bixby

og_kalu · on Sept 25, 2023

I mean Microsoft is planning to. Rolling out as soon as tomorrow.

https://youtu.be/5rEZGSFgZVY

wkat4242 · on Sept 26, 2023

Windows 11 copilot is not really the same thing though. They don't do something like homepods you can have around your house.

singularity2001 · on Sept 26, 2023

The CallAnnie demo allows interruption and its such a leap forward compared to Siri

modeless · on Sept 25, 2023

Yeah when I was developing it, it quickly became apparent that I needed to be able to interrupt it. So I implemented that. Pretty easy to implement actually. Much harder would be to have the model interrupt the human. But I think it is actually desirable for natural conversation, so I do think a turn-taking model should be able to signal the LLM to interrupt the human.

dotancohen · on Sept 25, 2023

  > determining when the user is done talking is tough.

Sometimes that task is tough for the speaker too, not just the listener. Courteous interruptions or the lack thereof might be a shibboleth for determining when we are speaking to an AI.

modeless · on Sept 25, 2023

Yes interruptions are key, both ways. Having the user interrupt the bot is easy, but to have the bot interrupt the human will again require a model to predict when that should happen. But I do believe it is desirable for natural conversation.

_kb · on Sept 25, 2023

From prior experience, courteous interruption is a skill that a lot of humans find challenging at times too (myself included).

rayuela · on Sept 25, 2023

Can you share a github link to this? Where are you reducing the latency? Are you processing the raw audio to text? In my experience ChatGPT generation time is much faster than local Lllama unless you're using something potato like a 7B model.

modeless · on Sept 25, 2023

Unfortunately it has a really high "works on my machine" factor. I'm using Llama2-chat-13B via mlc-llm + whisper-streaming + coqui TTS. I just have a bunch of hardcoded paths and these projects tend to be a real pain to set up, so figuring out a nice way to package it up with its dependencies in a portable way is the hard part.

I'm mostly using llama2 because I wanted it to work entirely offline, not because it's necessarily faster, although it is quite fast with mlc-llm. Calling out to GPT-4 is something I'd like to add. I think the right thing is actually to have the local model generate the first few words (even filler words sometimes maybe) and then switch to the GPT-4 answer whenever it comes back.

kordlessagain · on Sept 26, 2023

Here's a link to a project that claims half second latency for the transcription part: https://github.com/gaborvecsei/whisper-live-transcription

jonplackett · on Sept 25, 2023

I wonder when computers will start taking our intonation into account too. That would really help with understanding the end of a phrase. And there’s SO MUCH information in intonation that doesn’t exist in pure text. Any AI that doesn’t understand that part of language will always still be kinda dumb, however clever they are.

modeless · on Sept 25, 2023

You're right. Ultimately the only way this will really work is as an end-to-end model. Text will only get you so far. We could approximate it now with screenplay-like emotion annotations on text, which LLMs should both easily understand and be able to produce themselves (though you'd have to train a new speech recognition system to produce them). But end-to-end will be required eventually to reach human level fluency.

hk__2 · on Sept 25, 2023

Don’t they do it already? There are a lot of languages where intonation is absolutely necessary to distinguish between some words, so I would be surprised that this not already taken into account by the major voice assistants.

bobsmooth · on Sept 25, 2023

In English, intonation changes the meaning of the word but not the word itself. From what I understand, in tonal languages tone changes the whole word. I don't think ML understands that difference yet.

fragmede · on Sept 26, 2023

Yeah they do. I was able to get ChatGPT-4 to transcribe 我哥哥高過他的哥哥, which says that they can. I did have to set the app to Chinese, and the original didn't work so I had to modify what I said slightly.

https://www.tiktok.com/t/ZT86psPxY/

Roughly translated, my older brother is taller than that other guy's older brother.

modeless · on Sept 26, 2023

Of course speech recognition works for Chinese. What it doesn't do is transcribe intonation and prosody in non-tonal languages. It's not even clear how one would transcribe such a thing as I'm not aware of a standard notation.

fragmede · on Sept 29, 2023

IPA format should cover that, no?

modeless · 2023-09-30T05:12:34.000000Z

Maybe? I thought IPA was just phonetic but I see that it does have some optional prosody stuff that could in theory cover some of it. I'm not sure how standard or complete it really is in practice.

I haven't heard of any large datasets of IPA transcripts of speech with the detail necessary to train a fully realistic STT->LLM->TTS system. If you know of some that would be interesting to look at.

dsp_person · on Sept 25, 2023

Also curious to hear about your setup. Using whisper too? When I was experimenting with it there was still a lot of annoyance about hallucinations and I was hard coding some "if last phrase is 'thanks for watching', ignore last phrase"

I was just googling a bit to see what's out there now for whisper/llama combos and came across this: https://github.com/yacineMTB/talk

There's a demo linked on the github page that seems relatively fast at responding conversationally, but still maybe 1-2 seconds at times. Impressive it's entirely offline.

modeless · on Sept 25, 2023

Lol yeah the hallucinations are a huge problem. Likely solvable, I think there are probably some bugs in various whisper implementations that are making the problem worse than it should be. I haven't really dug in on that yet though. I was hoping I could switch to a different STT model more designed for real time like Meta's SeamlessM4T but it's still under a non-commercial license and I did have an idea that I might want to try making a product sometime. I did see that yacine made that version but I haven't tried it so I don't know how it compares to mine.

QuantumG · on Sept 26, 2023

Turn the volume on your microphone down and watch as Whisper just starts SCREAMING.

jimmytucson · on Sept 25, 2023

> It doesn't have to be this way!

Is there any extra work OpenAI’s product might be doing contributing to this latency that yours isn’t? Considering the scale they operate at and any reputational risks to their brand?

modeless · on Sept 26, 2023

If you're suggesting that OpenAI's morality filters are responsible for a significant part of their voice response latency, then no. I think that's unlikely to be a relevant factor.

og_kalu · on Sept 25, 2023

Here's something with very little latency. https://www.bland.ai/

barfingclouds · on Sept 26, 2023

There needs to be an optional button that you hold while speaking and let go when you are done. If button is not held it should auto detect

joshspankit · on Sept 27, 2023

To me this is the cleanest and most efficient solution to the problem.

Tbh, ever since voice assistants landed I’ve wanted a handheld mic with a hardware button. No wake command, no (extra) surveillance, just snappy low-latency responses.

pmarreck · on Sept 25, 2023

Do you have a rough design outline of what you built? I feel like we're on the cusp of something like this and it sounds amazing.

modeless · on Sept 26, 2023

I'm using Llama2-chat-13B via mlc-llm @ 4bit quantization + whisper-streaming + coqui TTS, all running simultaneously on one 4090 in real time.

It didn't take long to prototype. Polishing and shipping it to non-expert users would take much longer than I've spent on it so far. I'd have to test for and solve a ton of installation problems, find better workarounds for whisper-streaming's hallucination issues, improve the heuristics for controlling when to start and stop talking, tweak the prompts to improve the suitability of the LLM responses for speech, fixup the LLM context when the LLM's speech is interrupted, probably port the whole thing to Windows for broader reach in the installed base of 4090s, possibly introduce a low-memory mode that can support 12GB GPUs that are much more common, document the requirements and installation process, and figure out hosting for the ginormous download it would be. I'd estimate at least 10x the effort I've spent so far on the prototype before I'd really be satisfied with the result.

I'd honestly love to do all that work. I've been prioritizing other projects because I judged that it was so obvious as a next step that someone else was probably working on the same thing with a lot more resources and would release before I could finish as a solo dev. But maybe I'm wrong...

pmarreck · on Sept 26, 2023

> It didn't take long to prototype. Polishing and shipping it to non-expert users would take much longer than I've spent on it so far. I'd have to test for and solve a ton of installation problems

I've found some success at this by using Nix... but Nix is a whole 'nother ball of yarn to learn. It WILL get you to declarative/deterministic installs of any piece of the toolchain it covers, though, and it does a hell of a lot better job managing dependencies than anything in Python's ecosystem ever will (in fact, I am pretty sure that Python's being terrible at this is actually driving Nix adoption)

As an example of the power Nix might enable, check out https://nixified.ai/ (which is a project that hasn't been updated in some months and I hope is not dead... It does have some forks on Github, though). Assuming you already have Nix installed, you can get an entire ML toolchain up including a web frontend with a single command. I have dozens of projects on my work laptop, all with their own flake.nix files, all using their own versions of dependencies (which automatically get put on the PATH thanks to direnv), nothing collides with anything else, everything is independently updateable. I'm actually the director of engineering at a small startup and having our team's dev environments all controlled via Nix has been a godsend already (as in, a massive timesaver).

I do think that you could walk a live demo of this into, say, McDonald's corporate, and walk out with a very large check and a contract to hire a team to look into building it out into a product, though. (If you're going to look at chains, I'd suggest Wawa first though, as they seem to embrace new ordering tech earlier than other chains.)

modeless · on Sept 26, 2023

I'm not the guy working on ordering, it's this guy https://news.ycombinator.com/user?id=TheEzEzz.

Nix sounds good for duplicating my setup on other machines I control. But I'd like a way to install it on user machines, users who probably don't want to install Nix just for my thing. Nix probably doesn't have a way to make self contained packages, right?

pmarreck · on Sept 27, 2023

> But I'd like a way to install it on user machines, users who probably don't want to install Nix just for my thing. Nix probably doesn't have a way to make self contained packages, right?

I mean... That's the heart of the problem right there. You can either have all statically compiled binaries (which don't need Nix to run) which have no outside dependencies but result in a ton of wasted disk space with duplicate dependency data everywhere, or you can share dependencies via a scheme, of which the only one that makes real sense (because it creates real isolation between projects but also lets you share equal dependencies with zero conflicts) is Nix's (all of the others have flaws and nondeterminism).

joshspankit · on Sept 27, 2023

I wish docker could be used more easily with graphic cards and other hardware peripherals (speakers/mic in this case). It would solve a lot of these issues.

yieldcrv · on Sept 25, 2023

all it has to do is add a random selection of "uhms" and "ahhs" and "mmm"

modeless · on Sept 25, 2023

Actually I do think this is a good idea. For best latency there should be multiple LLMs involved, a fast one to generate the first few words and then GPT-4 or similar for the rest of the response. In the case that the fast model is unsure, it could absolutely generate filler words while it waits for the big model to return the actual answer. I guess that's pretty much how humans use filler words too!

dragonwriter · on Sept 25, 2023

Unfortunately, Bark is probably way too slow to use for the TTS portion given the latency concerns or that would be covered.

TOMDM · on Sept 25, 2023

Okay the bike example is cute and impressive, but the human interaction seems to be obfuscating the potentially bigger application.

With a few tweaks this is a general purpose solver for robotics planning. There are still a few hard problems between this and a working solution, but it is one of hard problems solved.

Will we be seeing general purpose robots performing simple labor powered by chatgpt within the next half decade?

usaar333 · on Sept 25, 2023

That bike example seemed a mix of underwhelming (for being the demo video) and even confusing.

1. It's not smart enough to recognize from the initial image this is a bolt style seat lock (which a human can).

2. The manual is not shown to the viewer, so I can't infer how the model knows this is a 4mm bolt (or if it is just guessing given that's the most likely one).

3. I don't understand how it can know the toolbox is using metric allen wrenches.

Additionally is this just the same vision model that exists in bing chat?

biot · on Sept 25, 2023

The bike shown in the first image is Specialized Sirrus X. You can make out from the image of the manual that it says "spacer/axle/bolt specifications". Searching for this yields the following Specialized bike manual which is similar: https://www.manualslib.com/manual/1974494/Specialized-Epic-E... -- there are some notable differences, but the Specialized Sirrus X manuals that are online aren't in the same style.

The prior page (8) shows "SEAT COLLAR 4mm HEX" and, based on looking up seat collar in an image search, the part in question matches.

In terms of the toolbox, note that it only identified the location of the Allen wrench set. The advice was just "Within that set, find the 4 mm Allen (Hex) key". Had they replied with "I don't see any sizes in mm", the conversation could've continued with "Your Allen keys might be using SAE sizing. A compatible size will be 5/32, do you see that in your set?"

freetime2 · on Sept 26, 2023

It bugged me that they made no mention of torque. The manual is really clear on that part with a big warning:

> WARNING! Correct tightening force on fasteners (nuts, bolts, screws) on your bicycle is important for your safety. If too little force is applied, the fastener may not hold securely. If too much force is applied, the fastener can strip threads, stretch, deform or break. Either way, incorrect tightening force can result in component failure, which can cause you to lose control and fall. Where indicated, ensure that each bolt is torqued to specification. The following is a summary of torque specifications in this manual...

The seat collar also probably has the max torque printed on it.

When they asked if they had the right tool, I would have preferred to see an answer along the lines of "ideally you should be using a torque wrench. You can use the wrench you have currently, but be careful not to over tighten."

somecallitblues · on Sept 26, 2023

The seat collar also probably has the max torque printed on it. <<<< Nope. There's no need for a torque wrench on that one.

usaar333 · on Sept 25, 2023

Ah good find. yah, I tried bing and it is able to read a photo of that manual page and understand that the seat collar takes a 4mm hex wrench (though hallucinated and told me the torque was 5 Nm, unlike the correct 6.2, suggesting table reading is imperfect).

Toolbox: I just found it too strong to claim you have the right tool, when it really doesn't know that. :)

In the end it does feel like the image reader is just bolted onto an LLM. Basically, just doing object recognition and dumping features into the LLM prompt.

cooper_ganglia · on Sept 25, 2023

Like a basic CLIP description: Tools, yellow toolbox, DEWALT, Allen wrenches, instruction manual. And then just using those keywords in the prompt. Yes, you’re right, it does feel like that.

og_kalu · on Sept 25, 2023

A few of these wouldn't be possible with something like that. Look at the last picture, the graph analysis.

https://imgur.com/a/iOYTmt0

gisely · on Sept 25, 2023

Yep. This example basically convinced me that they were unable to figure out anything actually useful to do with the model's new capabilities. Which makes me wonder how capable the new model in fact is.

usaar333 · on Sept 25, 2023

Yah, pretty sure it is the same feature that's been in Bing Chat for 2 months now. Which feels really like there's only one pass of feature extraction from the image, preventing any detailed analysis beyond a course "what do you see". (Follow-up questions of things it likely didn't parse are highly hallucinated).

This is why they can't extract the seat post information directly from the bike when the user asks. There's no "going back and looking at the image".

Edit: nope, it's a better image analyzer than Bing

og_kalu · on Sept 25, 2023

>Yah, pretty sure it is the same feature that's been in Bing Chat for 2 months now.

It's not. Feel free to try these queries:

https://twitter.com/ComicSociety/status/1698694653845848544?... (comic book page in particular, from a be my eyes user)

Or these https://imgur.com/a/iOYTmt0 (graph analysis in particular, last example) and see Bing fail them.

mcbutterbunz · on Sept 25, 2023

Right. It appeared that the response to the first image and question would have been the same if the image wasn't provided.

I wasn't impressed with the demo but we'll see what real world results get.

psbp · on Sept 25, 2023

Google demoed this a few months ago

https://www.deepmind.com/blog/rt-2-new-model-translates-visi...

m3kw9 · on Sept 25, 2023

They are really good at keeping demos as demos

dragonwriter · on Sept 25, 2023

I don’t know, a lot of Google demos and papers introduce techniques that are productized fairly soon, just usually not by Google.

Workaccount2 · on Sept 25, 2023

The implementation that manifests itself as an extremely creepy, downright concerning level of dubious moral transgressions isn't nearly as publicly glamorous as their tech demos.

michelb · on Sept 25, 2023

It’s just a hiring article.

rmbyrro · on Sept 25, 2023

Hiring to produce more demos, to hire more to produce even more demos...

michelb · on Sept 26, 2023

As long as they choose not to work elsewhere, it seems effective for Google.

TeMPOraL · on Sept 25, 2023

Yes. As long as the hirees do some actual work in between producing demos, this even makes sense as a hiring approach.

hereonout2 · on Sept 25, 2023

I feel they could have used a more convincing example to be honest. Yeah it's cool it recognises so much but how useful is the demo in reality?

You have someone with a tool box and a manual (seriously who has a manual for their bike), asking the most basic question on how to lower a seatpost. My 5 year old kid knows how to do that.

Surely there's a better way to demonstrate the ground breaking impacts of ai on humanity than this. I dunno, something like how do I tie my shoelace.

amelius · on Sept 25, 2023

> With a few tweaks this is a general purpose solver for robotics planning.

Yeah, but with an enormous ecological footprint.

Also, not suitable for small lightweight robots like drones.

TOMDM · on Sept 25, 2023

Even on something the size of a car chatgpt won't be running locally, the car and drone are equally capable of hitting openai's API in a well connected environment.

What needs to happen with the response is a different matter though.

dist-epoch · on Sept 25, 2023

What's the ecological footprint of a human doing the same job? Especially when you factor in 18+ years of preparing.

bamboozled · on Sept 26, 2023

Humans don't spend 18+ years preparing how to lower a seat post or drive a truck or even do pretty much most jobs. No one is solely training for 18 years to do anything.

Most of those 18 years are having a fucking great time (being young is freakin awesome) and living a great life is never a waste or a negative ecological footprint.

Society artificially slows education down so it takes 18 years to finish school because parents need to be off at work, so 18 years of baby sitting is preferred. By 18, kids are at the age where they will no longer be told what to do so it's off to the next waste of time, college, then 30 years of staring at a blinking box...or whatever.

When I was 12, I decided I wanted to drive a car, I'd never driven a car in my life, but I took my parents car and drove it around wherever I liked with absolutely no issue or prior instruction. I did this for years.

The youth are very capable, we just don't want them to be too capable...

RivieraKid · on Sept 25, 2023

This is what I'm most excited about. There's been a minor breakthrough recently: https://pressroom.toyota.com/toyota-research-institute-unvei...

og_kalu · on Sept 25, 2023

There are already a few research demos.

For driving - https://wayve.ai/thinking/lingo-natural-language-autonomous-...

suyash · on Sept 25, 2023

This announcement seem to have killed so many startups that were trying to do multi-modal on top of ChatGPT. The way it's progressing with solving use cases with images and voice, not too far when it might be the 'one app to rule them all'.

I can already see "Alexa/Siri/Google Home" replacement, "Google Image Search" replacement, ed-tech startups that were solving problems with AI using by taking a photo are also doomed and more to follow.

gwd · on Sept 25, 2023

In retrospect, such startups should have been wary: they should have known that OpenAI had Whisper, and also that GPT-4 was designed with image modality. I wouldn't say that OpenAI "telegraphed" their intentions, but the very first strategic question should have been, "Why isn't OpenAI doing this already, and what do we do if they decide to start?"

Tenoke · on Sept 25, 2023

>I wouldn't say that OpenAI "telegraphed" their intentions

They did telegraph it, they showed the multimodal capabilities back in the GPT4 Developer Livestream[0] right before first releasing it.

0. https://youtu.be/outcGtbnMuQ?t=943

makestuff · on Sept 25, 2023

Yeah I remember watching that and thinking oh I know a cool app idea. What if you just take a video of what food is in your kitchen and Chat GPT will create a recipe for you. I go to the docs and that was literally the example they gave.

I think the only place where plugins will make sense are for realtime things like booking travel or searching for sports/stock market/etc type information.

tornato7 · on Sept 25, 2023

I have a home-spun version of ChatGPT that uses function calling to connect to my emails, calendar, and notes. This is really useful because I can say "Bob just emailed me to set up a call. Respond to Bob with some available times from my calendar."

That will be the real use case for plug ins.

shanusmagnus · on Sept 25, 2023

It would hard to be more explicit than doing a demo of multi-modality in GPT-4, and having an audio API that is amazing and that you can use right now, for pennies.

It would be interesting to know if this really changed anything for anyone (competitors, VCs) for that reason. It's like the efficient market hypothesis applied to product roadmaps.

gzer0 · on Sept 25, 2023

It is interesting that these startups did not recognize that the image modalities already existed, as evidenced by their initial GPT-4 announcement underneath “visual capabilities” [1].

[1] https://openai.com/research/gpt-4

xeonmc · on Sept 25, 2023

Seems nobody learns from Sherlock.

NikolaNovak · on Sept 25, 2023

Talking to Google and Siri has been positively frustrating this year. On long solo drives, I just want to have a conversation to learn about random things. I've been itching to "talk" to chatGPT and learn more (french | music theory | history | math | whatever) all summer. This should hit the spot!

ecshafer · on Sept 25, 2023

Voice assistants have always been a half complete product. They were shown off as a cool feature, then they were never integrated so they were useful.

The two biggest features I want are for the voice assistants to read something for me, and to do something on google/Apple Maps hand free. Neither of these ever work. “Siri/ ok google add the next gas station on the route” or “take me to the Chinese restaurant in Hoboken” seem like very obvious features for a voice assistant with a map program.

The other is why can I tell Siri to bring up the Wikipedia page for George Washington but I can’t have Siri read it to me? I am in the car, they know that, they just say “I can’t show you that while you’re driving”. The response should be “do you want me to read it to you?”

Gunnerhead · on Sept 25, 2023

In the current world:

Me: “OK Google, take me to the Chinese restaurant in Hoboken”

Google Assistant: “Calling Jessica Hobkin”.

TeMPOraL · on Sept 25, 2023

You forgot the third brand name.

The pattern for current world's voice assistants is: ${brand 1}, ${action} ${brand 2} ${joiner} ${brand 3}.

So, "OK Google, take me to Chinese restaurant in Hoboken using Google Maps".

Which is why I refuse to use this technology until the world gets its shit together.

dieselgate · on Sept 25, 2023

This reminds me of ordering at a drive through with a human at times:

"I'd like an iced tea" "An icee?" "No an iced tea" "Hi-C?"

jefftk · on Sept 25, 2023

> ok google add the next gas station on the route

I say "ok google, add a stop for gas" a lot, and it works well for me.

on Sept 25, 2023

[dead]

bobsmooth · on Sept 25, 2023

Sometimes google assistant will answer a query I thought for sure it would fail on with a really good answer and other times it will fail the most basic of commands. It's frustrating.

rubslopes · on Sept 25, 2023

I've replaced my voice google assistant searches with the voice feature of the Bing app. It's a night and day difference. Bing voice is what I always expected from an AI companion of the future, it is just lacking commands -- setting tasks, home automation, etc.

wccrawford · on Sept 25, 2023

I got sick of searching Google for in-game recipes for Disney Dreamlight because most of the results are a bunch of pointless text, and then finally the recipe hidden in it somewhere.

I used Bing yesterday and it was able to parse out exactly what I wanted, and then give me idiot-proof steps to making the recipe in-game. (I didn't need the steps, but it gave me what I wanted up front, easily.) I tried it twice and it was awesome both times. I'll definitely be using it in the future.

a_vanderbilt · on Sept 25, 2023

It almost sounds like their assistant and their search engine have the same problem! Years of SEO optimized garbage has polluted search and the data streams it feeds to their other products. I have a concern that soon the mess will turn into AI-optimized trash, with what is essentially data poisoning to get the AI to shovel the fake content instead.

Jensson · on Sept 25, 2023

> I got sick of searching Google for in-game recipes for Disney Dreamlight

You mean these? Took me a few seconds to find, not sure how an LLM would make that easier. I guess the biggest benefit of LLM then is for people who don't know how to find stuff.

https://dreamlightvalleywiki.com/Cooking

wccrawford · on Sept 27, 2023

Yes, but each time, I only actually care about 1 recipe, and it's easier to just search for that recipe than find a list of recipes and then search through that.

Bing made it even easier.

Also, I've found some of those lists to be missing some recipes.

bytefactory · on Sept 25, 2023

Did you find a way to do this seamlessly including being able to say something like "Hey Bing", or do you just have a shortcut or widget for this?

rubslopes · on Sept 26, 2023

No. At least on Android there is no system shortcut that takes you directly to the voice feature yet. For now, I'm using the widget.

bytefactory · on Sept 28, 2023

Thanks. One day soon, hopefully.

3c0 · on Sept 25, 2023

precisely this. once someone figures out how to get something like GPT integrated with actual products like smart home devices and the same access levels as siri/google assistant, it will be the true voice assistant experience everyone has wanted.

com2kid · on Sept 25, 2023

My prediction on this is eventually the LLMs will just write and execute scripts directly to control things.

Imagine if iOS had something like apple script and all apps exposed and documented endpoints. LLMs would be able to trivially solve problems that the best voice assistants today cannot handle.

Then again none of the current assistants can handle all that much. "Send Alex P a meeting invite tomorrow for a playdate at the Zoo, he's from out of town so include the Zoo's full address in the invite".

"Find the next mutual free slot on the team's calendar and send out an invite for a zoom meeting at that time".

These are all things that voice assistants should have been doing a decade ago, but I presume they'd have required too much one off investment.

Give an LLM proper API access and train it on some example code, and these problems are easy for it to solve. Heck I bet if you do enough specialized training you could get one of the tiny simple LLMs to do it.

throwuwu · on Sept 26, 2023

OpenInterpreter

archon · on Sept 25, 2023

Agreed. After using ChatGPT at all Siri is absolutely frustrating.

Example from a couple days ago:

Me, in the shower so not able to type: "Hey Siri, add 1.5 inch brad nails to my latest shopping list note."

Siri: "Sorry, I can't help with that."

... Really, Siri? You can't do something as simple as add a line to a note in the first-party Apple Notes app?

callalex · on Sept 25, 2023

That’s extra frustrating because Siri absolutely had that functionality at some point in the past, and may even still have it if you say the right incantation. Those incantations change in unpredictable and unknowable ways though.

jazzyjackson · on Sept 25, 2023

appending to a text file, what do you think this is - unix?

ChatGTP · on Sept 25, 2023

I still don't understand how you can talk to something that doesn't provide factual information and just take it at face value?

The other day I asked it about the place I live and it made up nonsense, I was trying to get it to help me with an essay and it was just wrong, it was telling me things about this region that weren't real.

Do we just drive through a town, ask for a made up history about it and just be satisfied with whatever is provided?

skepticATX · on Sept 25, 2023

What LLMs have made me realize more than anything is that we just don't care that much the information we receive being completely factual.

I have tried to use it many times to learn a topic, and my experience has been that it is either frustratingly vague or incorrect.

It's not a tool that I can completely add to my workflow until it is reliable, but I seem to be the odd one out.

bamboozled · on Sept 25, 2023

> What LLMs have made me realize more than anything is that we just don't care that much the information we receive being completely factual.

I find this highly concerning but I feel similar.

Even "smart people" I work with seem to have gulped down the LLM cool aid because it's convenient and it's "cool".

Sometimes I honestly think: "just surrender to it all, believe in all the machine tells you unquestionably, forget the fact checking, it feels good to be ignorant... it will be fine...".

I just can't do it though.

huijzer · on Sept 25, 2023

> just surrender to it all, believe in all the machine tells you unquestionably, forget the fact checking, it feels good to be ignorant... it will be fine...

It's the same issue with Google Search, any web page, or, heck, any book. Fact checking gets you only so far. You need critical thinking. It's okay to "learn" wrong facts from time to time as long as you are willing to be critical and throw the ideas away if they turn out to be wrong. I think this Popperian view is much more useful than living with the idea that you can only accept information that is provably true. Life is too short to verify every fact. Most things outside programming are not even verifiable anyway. By the time that Steve Jobs would have "verified" that the iPhone was certainly a good idea to pursue, Apple might have been bankrupt. Or in the old days, by the time you have verified that there is a tiger in the bush, it has already eaten you.

cmiles74 · on Sept 25, 2023

There's a lot of truth in this comment and a lot that I wholeheartedly agree with.

When I spend time on something that turns out to be incorrect, I would prefer it to be because of choice I made instead of some random choice made by an LLM. Maybe the author is someone I'm interested in, maybe there's value in understanding other sides of the issue, etc. When I learn something erroneous from an LLM, all I know is that the LLM told me.

DharmaPolice · on Sept 25, 2023

The issue is far more serious with ChatGPT/similar models because things that are laughably untrue are delivered exactly the same as something that's solidly true. When doing a normal search I can make some assessment on the quality of the source and the likelihood the source is wrong.

People should be able "throw the ideas away if they turn out to be wrong" but the problem is these ideas unconsciously or not help build your model of the world. Once you find out something isn't true it's hard to unpick your mental model of the world.

huijzer · on Sept 25, 2023

> Once you find out something isn't true it's hard to unpick your mental model of the world.

Intuitively, I would think the same, but a book about education research that I read and my own experience taught me that new information is surprisingly easy to unlearn. It’s probably because new information sits at the edges of your neural networks and do not yet provide a foundation for other knowledge. This will only happen if the knowledge stands the test of time (which is exactly how it should be according to Popper). If a counterexample is found, then the information can easily be discarded since it’s not foundational anyway and the brain learns the counterexample too (the brain is very good in remembering surprising things).

DharmaPolice · on Sept 26, 2023

That presumes the wrong information is corrected quickly. What about the cases when that doesn't happen? Aren't you often finding out things you thought were true from years ago are wrong?

PoignardAzur · on Sept 26, 2023

You weigh new information by how confident you are in it. You try to check different sources, you maintain an open-mind, etc. In that, ChatGPT is just an additional low-reliability source of information.

steve_adams_86 · on Sept 25, 2023

I just verify the information I need. I find it useful as a sort of search engine for solutions. Like, how could I use generators as hierarchical state machines? Are there other approaches that would work? What are some issues with these solutions? Etc. By the end I have enough information to begin searching the web for comparisons, other solutions, and so on.

The benefit is that I got a quick look at various solutions and quickly satisfied a curiosity, and decided if I’m interested in the concept or not. Without AI, I might just leave the idea alone or spend too much time figuring it out. Or perhaps never quite figure out the terms of what I’m trying to discover, as it’s good at connecting dots when you have an idea with some missing pieces.

I wouldn’t use it for a conversation about things as others are describing. I need a way to verify its output at any time. I find that idea bizarre. Just chatting with a hallucinating machine. Yet I still find it useful as a sort of “idea machine”.

bamboozled · on Sept 25, 2023

I think this is a fine use case though because you're doing your due diligence. The problems arise when you don't do this.

I think even if an AGI was created, and humans survived this event. I'd still have trouble trusting it.

The quote "trust but verify" is everything to me.

flkenosad · on Sept 25, 2023

The smart people I've seen using ChatGPT always double check the facts it gives. However, the truth is that RLHF works well to extinguish these lies over time. As more people use the platform and give feedback, the thing gets better. And now, I find it to be pretty darn accurate.

TerrifiedMouse · on Sept 25, 2023

> The smart people I've seen using ChatGPT always double check the facts it gives.

I don't like being told lies in the first place and having to unlearn it.

It doesn't help that I might as well have just gone straight to the "verification" instead.

FooBarWidget · on Sept 25, 2023

I don't know. The other day I was asking about a biology topic and it straight up gave me a self-contradicting chemical reaction process description. It kept doing that after I pointed out the contradiction. Eventually I got out of this hallucination loop by resetting the conversation and asking again.

It's smart but can also be very dumb.

macNchz · on Sept 25, 2023

I see this conversation pretty frequently and I think the root of it lies in the fact that we have mental heuristics for determining whether we need to fact check another human because they are a bullshitter, an idiot, a charlatan etc, but most people haven’t really developed this sense for AIs.

I think the current state of AI trustworthiness (“very impressive and often accurate but occasionally extremely wrong”) triggers similar mental pathways to interacting with a true sociopath or pathological liar for the first time in real life, which can be intensely disorienting and cause one to question their trust in everyone else, as they try to comprehend this type of person.

thfuran · on Sept 25, 2023

Ignorance is Strength

uoaei · on Sept 25, 2023

I think this post-factual attitude is stronger and more common in some cultures than others. I'm afraid to say but given my extensive travels it appears American culture (and its derivatives in other countries) seems to be spearheading this shift.

bamboozled · on Sept 25, 2023

Warning, my opinion ahead:

I think it's because Americans, more than nearly all other cultures, love convenience. It's why the love for driving is so strong in the US. Don't walk or ride, drive.

Once I was walking back from the grocer in Florida with 4 shopping bags, and people pulled over and asked if my car had broken down and if I needed a ride, people were stunned...I was walking for exercise and for the environment...and I was stunned.

More evidence of this trend can be seen in the products and marketing being produced:

Do you need to write a wedding speech? Click here.

Do you need to go get something from the store? get your fat ass in the car and drive, better yet, get a car that drives for you? Better than this, we'll deliver it with a drone...don't move a muscle.

Don't want to do your homework? Here...

Want to produce art? Please enter your prompt...

Want to lose weight? We have a drug for that...

Want to be the authority on some topic? We'll generate the facts you need.

uoaei · on Sept 25, 2023

I've also identified convenience as a core factor. Another dynamic at play is this:

As convenience in a domain becomes ubiquitous or at least expected among consumers, they quickly readjust their evaluation of "having time for X" around the new expectation of the convenient service, treating all alternatives as positive opportunity cost. This would explain a lot of those folks who are upset when it's suggested that they don't need Amazon, Instacart, etc. in their lives if they are to do something about their contributions to mass labor exploitation.

Of course these conveniences quickly become ubiquitous in large economies with a glut of disposable income, which encourages VCs to dump money into these enterprises so they're first to market, and also to encourage the public to believe that the future is already here and there's no reason to worry about backsliding or sustainability of the business model. Yet in every single case we see prices eventually rise, laborers squeezed, etc. A critical mass of people haven't yet acknowledged this inevitability, in no small part due to this fixation on convenience at the expense of more objective, reasoned understandings (read: post-truth mindset).

gtowey · on Sept 25, 2023

I agree with this, but I think there is a deeper level which explains this. And that is convenience is a product. The thing that truly defines how corporations in America have shaped our culture is that everything is turned into a way to sell you something.

bamboozled · on Sept 26, 2023

Sorry but this is actually what I meant, it's all about convenience, AI is another convenience product.

TerrifiedMouse · on Sept 25, 2023

> It's not a tool that I can completely add to my workflow until it is reliable, but I seem to be the odd one out.

This. I hate being told the wrong information because I will have to unlearn the wrong information. I would rather have been told nothing.

pbhjpbhj · on Sept 25, 2023

ChatGPT 3.5 is terrible on technical subjects IME. Phind is best for me rn. Hugging Chat (Llama) works quite well too.

They're only good on universal truths. An amalgam of laws from around the globe doesn't tell me what the law is in my country, for example.

NikolaNovak · on Sept 25, 2023

This is a fairly perpetual discussion, but I'll go for another round:

I feel like using LLM today is like using search 15 years ago - you get a feel for getting results you want.

I'd never use chatGPT for anything that's even remotely obscure, controversial, or niche.

But through all my double-checking, I've had phenomenal success rate in getting useful, readable, valid responses to well-covered / documented topics such as introductory french, introductory music theory, well-covered & non-controversial history and science.

I'd love to see the example you experienced; if I ask chatGPT "tell me about Toronto, Canada", my expectation would be to get high accuracy. If I asked it "Was Hum, Croatia, part of the Istrian liberation movement in the seventies", I'd have far less confidence - it's a leading question, on a less covered topic, introducing inaccuracies in the prompt.

My point is - for a 3 hour drive to cottage, I'm OK with something that's only 95% accurate on easy topics! I'd get no better from my spouse or best friend if they made it on the same drive :). My life will not depend on it, I'll have an educationally good time and miles will pass faster :).

(also, these conversations always seem to end in suffocatingly self-righteous "I don't know how others can live in this post-fact free world of ignorance", but that has a LOT of assumptions and, ironically, non-factual bias in it as well)

TerrifiedMouse · on Sept 25, 2023

> I feel like using LLM today is like using search 15 years ago - you get a feel for getting results you want.

I don't think it's quite the same.

With search results, aka web sites, you can compare between them and get a "majority opinion" if you have doubts - it doesn't guarantee correctness but it does improve the odds.

Some sites are also more reputable and reliable than others - e.g. if the information is from Reuters, a university's courseware, official government agencies, ... etc. it's probably correct.

With LLMs you get one answer and that's it - although some like Bard provide alternate drafts but they are all from the same source and can all be hallucinations ...

og_kalu · on Sept 25, 2023

>although some like Bard provide alternate drafts but they are all from the same source and can all be hallucinations ...

Yes and no. If the LLM is repeating the same thing on multiple drafts then it's very unlikely to be a hallucination.

It's when multiple generations are all saying different things that you need to take notice.

LLMs hallucinate yes but getting the same hallucination multiple times is incredibly rare.

TerrifiedMouse · on Sept 27, 2023

Then why aren’t hallucinations being eliminated by comparing drafts?

og_kalu · on Sept 27, 2023

automatically comparing drafts for every single query would be expensive.

and that wouldn't eliminate hallucinations just tell you if large details have likely been hallucinated.

But it's a method some research has used.

https://arxiv.org/abs/2303.08896

TerrifiedMouse · on Sept 27, 2023

How expensive could it be? Google Bard, a free service, offers the drafts for free. Just do the comparison on the user’s machine if the LLM provider is that cheap.

P.S. Also aren’t LLMs deterministic if you set their “temperature” to zero? Are there drafts if the temperature is zero? If not, then that’s the same as removing the randomness no?

og_kalu · on Sept 27, 2023

The drafts have to be evaluated either by a human or llm. Doing that for every request does not scale when you have millions of users.

>Just do the comparison on the user’s machine if the LLM provider is that cheap.

This is not possible. Users don't have the resources to run these gigantic models. LLM inference is not cheap. Open ai, Google aren't running profit on free cGPT or Bard.

>P.S. Also aren’t LLMs deterministic if you set their “temperature” to zero? Are there drafts if the temperature is zero? If not, then that’s the same as removing the randomness no?

It's not a problem of randomness. a temp of 0 doesn't reduce hallucinations. LLMs internally know when they are hallucinating/taking a wild guess. randomness influences how that guess manifests each time but the decision to guess was already made.

https://arxiv.org/abs/2304.13734

TerrifiedMouse · on Sept 28, 2023

> a temp of 0 doesn't reduce hallucinations.

I never said it did.

> LLMs internally know when they are hallucinating/taking a wild guess.

No they don’t. If they did we would be able to program them to not do so.

I would argue that wild guesses are all LLMs are doing. They practically statistically guess their way to an answer. It works surprisingly well a lot of the time but they don’t really understand why they are right/wrong.

P.S. LLMs are kind of like students who didn’t study for the test so they use “heuristics” to guess the answer. If the test setter is predictable enough, the student might actually get a few right.

PoignardAzur · on Sept 26, 2023

Wait, is that true? I feel like that claim needs a lot of disclaimers.

og_kalu · on Sept 27, 2023

https://arxiv.org/abs/2305.18248

"In particular, we find that LMs often hallucinate differing authors of hallucinated references when queried in independent sessions, while consistently identify authors of real references. This suggests that the hallucination may be more a generation issue than inherent to current training techniques or representation."

https://arxiv.org/abs/2303.08896

"SelfCheckGPT leverages the simple idea that if a LLM has knowledge of a given concept, sampled responses are likely to be similar and contain consistent facts. However, for hallucinated facts, stochastically sampled responses are likely to diverge and contradict one another."

doug_durham · on Sept 25, 2023

Exactly this! This is my experience also. Your point about "well covered & non-controversial" is spot on. I know not to expect great results when asking about topics that have very little coverage. To be honest I wouldn't expect to go to an arbitrary human and get solid answers on a little covered topic, unless that person just happened to be topic expert. There is so much value in having the basics to intermediate levels of topics covered in a reliable way. That's where most of commercial activity occurs.

jabradoodle · on Sept 25, 2023

I think a key difference is that humans very rarely sound convincing talking about subjects they have no clue about.

I've seen the hallucination rate of LLMs improve significantly, if you stick to well covered topics they probably do quite well. The issue is they often have no tells when making things up.

whack · on Sept 25, 2023

Joe Rogan has made tons of money off talking without providing factual information. Hollywood has also made tons of money off movies "inspired by real events" that hallucinate key facts relevant to the movie's plot and characters. There's a huge market for infotainment that is "inspired by facts" but doesn't even try to be accurate.

absrec · on Sept 25, 2023

You listen to Joe Rogan with the idea that this is a normal dude talking not an expert beyond martial arts and comedy.

A person who uses ChatGPT must have the understanding that it's not like Google search. The layman, however, has no idea that ChatGPT can give coherent incorrect information and treats the information as true.

Most people won't use it for infotainment and OpenAI will try its best to downplay the hallucination as fine print if it goes fully mainstream like google search.

flkenosad · on Sept 25, 2023

Give people more credit. If you're using an AI these days, you have to know it hallucinates sometimes. There's even a warning about it when you log in.

jazzyjackson · on Sept 25, 2023

There's a contingent of the population passing videos around on tiktok genuinely concerned that AIs have a mind of their own

no I will not give the public credit, most people have no grounding to discern wtf a language model is and what it's doing, all they know is computers didn't use to talk and now they do

absrec · on Sept 25, 2023

I'll give tech people credit, but non-tech people I'm not so sure. A good example is the cookie permissions or app permissions. A great number of non-tech people don't even know or care what they mean.

flkenosad · on Sept 25, 2023

You gotta stop bucketting people like that. People may not know terms "cookie permissions" or "app permissions" but they sure as fuck understand the idea of user tracking or handing a company access to your mic/camera. And to say they don't care about these things is simply not true.

linhvn · on Sept 26, 2023

Which people? If you are software engineers or AI researchers, sure. Otherwise, it probably won't matter to you.

bilsbie · on Sept 25, 2023

Wait until you learn about the mainstream media.

bamboozled · on Sept 25, 2023

For a certain demographic and generation, Joe Rogan is the mainstream media.

flangola7 · on Sept 25, 2023

Rogan is literally the largest podcast on the Spotify. It's the definition of mainstream.

layer8 · on Sept 25, 2023

If that’s your benchmark, I don’t want your AI.

agentultra · on Sept 25, 2023

OpenAI isn't marketing ChatGPT as, "infotainment."

jazzyjackson · on Sept 25, 2023

now that you mention it, a big "for entertainment purposes only" banner like they use to have on all the psychic commercials on tv would not be inappropriate. it's incredible that LLMs are being marketed as general purpose assistants with a tiny asterisk, "may contain inaccuracies" like it's a walnut contamination

TeMPOraL · on Sept 25, 2023

Not sure what's being incredible here. GPT-4 is a stellar general-purpose assistant, that shines when you stop treating it as encyclopedia, and start using it as an assistant. That is, give it tasks, like summarizing, or writing code, or explaining code, or rewriting prose. Ask for suggestions, ideas. You can do that to great effect, even when your requests are underspecified and somewhat confused, and it still works.

jazzyjackson · on Sept 25, 2023

I just wish they were advertised for generative tasks and not retrieval tasks. It's not intelligence, it's not reasoning, it's text transformation.

It seems to be able to speak on history, sometimes it's even right, so there's a use case that people expect from it.

FYI I've used GPT4 and Claude 2 for hundreds of conversations, I understand what its good and bad at; I don't trust that the general public is being given a realistic view.

rickspencer3 · on Sept 25, 2023

In my experience, LLVMs are not about being provided facts. They are about synthesizing new content and insights based on the model and inputs.

Rather than asking it about facts, I find it useful to derive new insights.

For example: "Tell me 5 topics about databases that might make it to the front page of hacker news." It can generate an interesting list. That is much more like the example they provided in the article, synthesizing a bed time story is not factual.

Also, "write me some python code to do x" where x is based on libraries that were well documented before 2022 also has similarly creative results in my experience.

xnorswap · on Sept 25, 2023

> how you can talk to something that doesn't provide factual information and just take it at face value

Like talking to most people you mean?

philipwhiuk · on Sept 25, 2023

When OpenAI buys me a drink at the bar in exchange for the rubbish it produces, I might have a more favourable view.

ilaksh · on Sept 25, 2023

As soon as they release the API, we can build an AI "bartender". Combine the voice output and input with NeRF talking heads such as from Diarupt or https://github.com/harlanhong/awesome-talking-head-generatio....

You will now be able to feed it images and responses of the customers. Give it a function to call complementaryDrink(customerId) Combine it with a simple vending machine style robot or something more complex that can mix drinks.

I'm not actually in a hurry to try to replace bartenders. Just saying these types of things immediately become more feasible.

You can also see the possibilities of the speech input and output for "virtual girlfriends". I assume someone at OpenAI must have been tempted to train a model on Scarlett Johansson's voice.

lostcolony · on Sept 25, 2023

Hopefully people know not to ask others for factual information (unless it's an area they're actually well educated/knowledgeable in), but for opinions and subjective viewpoints. "How's your day going", "How are you feeling", "What did you think of X", etc, not "So what was the deal with the Hundred Year's War?" or whatever.

If people are treating LLMs like a random stranger and only making small talk, fair enough, but more often they're treating it like an inerrable font of knowledge, and that's concerning.

TeMPOraL · on Sept 25, 2023

> If people are treating LLMs like a random stranger and only making small talk, fair enough, but more often they're treating it like an inerrable font of knowledge, and that's concerning.

That's on them. I mean, people need to figure out that LLMs aren't random strangers, they're unfiltered inner voices of random strangers, spouting the first reaction they have to what you say to them.

Anyway, there is a middle ground. I like to ask GPT-4 questions within my area of expertise, because I'm able to instantly and instinctively - read: effortlessly - judge how much to trust any given reply. It's very useful this way, because rating an answer in your own field takes much less work than coming up with it on your own.

graemep · on Sept 25, 2023

No individual is "most people". Most of the time I spend talking to people in real life is with people whose professional expertise, hobbies, and other sources of knowledge I know at least roughly. I have an idea how good they are at evaluating what they know and how honest they and whether they are prone to wishful thinking.

bilsbie · on Sept 25, 2023

> I still don't understand how you can talk to something that doesn't provide factual information and just take it at face value?

All human interactions from all of history called and they …

mmahemoff · on Sept 25, 2023

I'm curious if you're using GPT-4 ($)? I find a lot of the criticisms about hallucination come from users who aren't, and my experience with GPT-4 is it's far less likely to make stuff up. Does it know all the answers, certainly not, but it's self-aware enough to say sorry I don't know instead of making a wild guess.

steve_adams_86 · on Sept 25, 2023

You can also prompt it to hold back if it doesn’t know, which seems to make a difference. It’s part of my default prompt, and since I added it I haven’t had any overt hallucinations. Definitely invalid code, but not due to crazy errors. Just syntax and inconsistent naming mostly.

I verify just about everything that I ask it, so it isn’t just a general sense of improvement.

elicksaur · on Sept 25, 2023

Why would anyone pay for something if the free trial doesn’t work? “Hey, you know how we gave you a product that doesn’t quit work as you expect and is super frustrating? Just pay us money, and we’ll give you the same product, but it just works. Just trust us!”

sacred_numbers · on Sept 25, 2023

GPT-4 is not the same product. I know it seems like it due to the way they position 3.5 and 4 on the same page, but they are really quite separate things. When I signed up for ChatGPT plus I didn't even bother using 3.5 because I knew it would be inferior. I still have only used it a handful of times. GPT-4 is just so much farther ahead that using 3.5 is just a waste of time.

elicksaur · on Sept 25, 2023

Would you mind sharing some threads where you thought ChatGPT was useful? These discussions always feel like I’m living on a different planet with a different implementation of large language models than others who claim they’re great. The problems I run into seem to stem from the fundamental nature of this class of products.

michaelt · on Sept 25, 2023

The usefulness of ChatGPT is a bit situational, in my experience. But in the right situations it can be pretty powerful.

Take a look at https://chat.openai.com/share/41bdb053-facd-448b-b446-1ba1f1... for example.

TeMPOraL · on Sept 25, 2023

A great example. Here's a similar one from me: https://cloud.typingmind.com/share/d2000ffc-a1bf-4b71-b59d-c....

Context: had a bunch of photos and videos I wanted to share with a colleague, without uploading them to any cloud. I asked GPT-4 to write me a trivial single-page gallery that doesn't look like crap, feeding it the output of `ls -l` on the media directory, got it on first shot, copy-pasted and uploaded the whole bundle to a personal server - all in few minutes. It took maybe 15 minutes from the idea of doing it first occurring to me, to a private link I could share.

I have plenty more of those touching C++, Emacs Lisp, Python, generating vCARD and iCalendar files out of blobs of hastily-retyped or copy-pasted text, etc. The common thread here is: one-off, ad-hoc requests, usually underspecified. GPT-4 is quite good at being a fully generic tool for one-off jobs. This is something that never existed before, except in form of delegating a task to another human.

LouisSayers · on Sept 26, 2023

Here's a convo I had yesterday when thinking about how to print a Binary Search Tree.

https://chat.openai.com/share/338e7397-0201-44f4-a2c3-75b733...

I use ChatGPT for all sorts of things - looking into visas for countries, coding, reverse engineering companies from job descriptions, brainstorming etc etc.

It saves a lot of time and gives way more value than what you pay for it.

astrange · on Sept 25, 2023

I agree that none of the problems people have mentioned above happen with GPT4.

It used to be more reliable when web browsing worked, but it's still pretty reliable.

FooBarWidget · on Sept 25, 2023

A human driving buddy can make up a lot of stuff too. Have an interesting conversation but don't take it too seriously. If you're really researching something serious then take a mental note to double check things later, pretend as if you're talking to a semi-reliable human who knows a lot but occasionally makes mistakes.

jocaal · on Sept 25, 2023

> ...talk to something that doesn't provide factual information and...

Ah yes, I dont understand how to talk to people either!

bamboozled · on Sept 25, 2023

I always thought a better future would be full of more and more distilled, accurate, useful knowledge and truthful people to promote that.

Comments like yours make me think that no one cares about this...and judging by a lot of the other comments, I guess they don't.

Probably going to be people, wading through a sea of AI generated shit, and the individual is supposed to just forever "apply critical thinking" to it all. Even a call from ones spouse could be fake, and you'll just have to apply critical thinking or whatever to workout if you were scammed or not.

astrange · on Sept 25, 2023

There aren't any real world sources of truth you can avoid applying critical thinking to. Much published research is false, and when it isn't, you need to know when it's expired or what context it's valid in.

ChatGTP · on Sept 26, 2023

But do we need 9999999x the amount of information to critically be thinking about, is this going to be helpful ?

nh23423fefe · on Sept 25, 2023

Because it doesn't always make up stuff. Because I'm a human and can ask for more information. I don't want an encyclopedia on a podcast. I want to "talk" to someone about stuff. Not have an enumerated list of truths firehosed at me.

londons_explore · on Sept 25, 2023

Pay for the Plus version.

Then it makes stuff up far less frequently.

If the next version has the same step up in performance, I will no longer consider inaccuracy an issue - even the best books have mistakes in them, they just need to be infrequent enough.

nerdbert · on Sept 25, 2023

> Pay for the Plus version.

> Then it makes stuff up far less frequently.

Now there's a business model for a ChatGPT-like service.

$1/month: Almost always wrong

$10/month: 50/50 chance of being right or wrong

$100/month: right 95% of the time

TeMPOraL · on Sept 25, 2023

You make it sound like business shenanigans, but the truth is, it's a natural fit for now, as performance of LLMs improves with their size, but costs of training (up-front investment) and inference (marginal, per-query) also go up.

md5wasp · on Sept 28, 2023

Pay for the $1/month version and invert the responses; now you have the $100/month one for cheap :D

umanwizard · on Sept 25, 2023

Are you using 3.5 or 4?

chankstein38 · on Sept 25, 2023

I've wanted a ChatGPT Pod equivalent to a Google Home pod for a while! I have been intending to build it at some point. I am with you, talking to Google sucks.

"Hey Google, why do ____ happen?" "I'm sorry, I don't know anything about that"

But you're GOOGLE! Google it! What the heck lol

So yeah, ChatGPT being able to hear what I say and give me info about it would be great! My holdup has been wakewords.