Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft's AI Bing also generated factual errors at launch (theregister.com)
282 points by isaacfrond on Feb 14, 2023 | hide | past | favorite | 202 comments



This thread on mastodon was interesting. I assume this is not due to ChatGPT but an existing search sampling algorithm.

Asking bing “what is the human population on Mars”:

https://mastodon.social/@jhpot/109859745864083061

And then this reply was notable as well:

https://neurodifferent.me/@ZoDoneRightNow/109860796225392633

The site sampled in the above reply [1] appears to be a nonsense content farm probably generated with GPT-2 (it is repetitive and the date on the Mars post is 2021, though I know blog post dates can be set to anything).

If true it would mean that Bing has been serving up GPT-generated nonsense for some time, and now they’re going to add another layer of nonsense on top of that.

As someone said in the replies “it’s just bullshit all the way down”.

I agree with others who have said these companies have no (stated, tested) plan to ensure these systems don’t spill lies and nonsense. It’s possible that will be harmless but this is doubtful, and either way I do not feel good about these companies rushing to push out a half baked product.

To my amateur understanding it seems like LLMs are like the language center of our brain, in that they can take abstract multi dimensional ideas and turn them in to words. But we have other portions of our brain that actually handle logic and memory and produce the ideas to verbalize. It seems like LLMs are missing this piece. Crude analogy I know, but I think LLMs could be fantastic when paired with a more complex system, and the rush to deploy them as-is may be counter productive.

[1] https://tsam.net/what-is-the-population-of-mars/


I asked Bing Chat the same question and the response was much better:

> As of today, the human population of Mars stands at zero. There are no humans living on the surface of Mars, only rovers and landers. Mars is the fourth planet from the Sun and the second smallest planet in the Solar System. It has a very thin atmosphere and a cold and dry climate. Mars is governed by the Martian Congressional Republic in the fictional universe of The Expanse, but not in reality. Some websites may give false or misleading information about the population of Mars, such as 4.8 billion or 2.5 billion people, but these are not based on any scientific evidence or credible sources.


Still incredibly stupid even by human standards


Note that this answer is from "old Bing". It has nothing to do with the GPT-enhanced new Bing:

https://oslo.town/@matt/109864320071232564

The main hints to spot new Bing are: The logo and the feedback buttons.


Right, I meant to suggest that this was an old data sampling algorithm. My point is that if this is already the low quality they were serving before, it is hard to see how adding a hallucinating algorithm in to the mix is going to help anything.


Old bing returns 0, when I asked that same question.

Did you try that same query, and what results did you get?


I wonder if the main problem holding AI back will be that there will be so much nonsense on the internet that we struggle to make useful bots since we can no longer implicitly trust that most info is true or at least useful.


I would say this is what we get for asking non-expert data sources for information we want to present as authoritative.

Let's say we go back before the time of the internet and asked 100,000 random individuals for factual information on random subjects. You'd have a corpus of facts, but you'd also have tons of old wives tales and information that is just wrong.

The internet democratized posting information, but I would say it also did the same with stupidity. Random sites, reddit posts, and stuff we read from hacker news doesn't have to have anything at all to do with the truth.

Maybe pushing models to have some factual information bases that are weighted heavier will help, but I don't see how AI in its current form will come off any better than a person that reads tons of bad information and buys into it.


The problem is we could be going backwards. ChatGPT is working off a pre LLM internet and it works surprisingly well. If we scrape the internet again in 5 years, could we even get a model as good as the one today?


I think the last paragraph quite makes sense. It seems "true" that some kind of reasoning capability emerges as LLMs get bigger, which makes those LLMs quite useful and blows a lot of people's minds at the beginning. But, I think, essentially, the fundamental training goal of LLMs--guessing what the next word should be--pushes the model into a kind of reasonable nonsense generator, and the reasoning capability emerges because it can help the model to make stuff up. Therefore, we should be cautious about the result generated by these LLMs. They might be reasonable, but to make up the next word is their real top priority.


>> It’s possible that will be harmless but this is doubtful, and either way I do not feel good about these companies rushing to push out a half baked product.

It might be half-baked, but this is the apex of the top peak of the hype cycle. Now is the time to strike when the iron is hot, Hot, HOT$$$.

I hope I'll be alive next year to see what happens when the dust settles.

(edit: no no, I'm OK. But one never knows eh?)


> Google's Bard got an answer wrong during an ad, which everyone noticed. Now the narrative is 'Google is rushing to catch up to Bing and making mistakes!

I think Brereton is missing the wider narrative. Microsoft has been executing very very well for the last few years. Azure has been on a tear, they've demonstrated great foresight with their first investment in OpenAI many years ago (which is now paying dividends), and they've made strong acquisitions. They've captured the AI lightning in a bottle. Google, has left people wondering if it can execute well. It's had a lot of flops, and a lot of project cancellations. When they announced LaMDA, they let the hype fizzle to nothing. The most we got out of that is some memeable material about a guy who wanted to get legal representation for their language model.

People have been positing about ChatGPT for search as soon as it was released. They (Google) have had months to announce _anything_, but waited until the last minute. When they finally had a chance to steal Microsoft's thunder, they flubbed something as simple as an ad. Now people are _really_ wondering if Google can execute anymore. Even I'm starting to wonder. Google is definitely rushing to catch up to Microsoft in a few areas now, and their stock reflects that reality now.


That's the narrow narrative.

The actual wide narrative is that the current language models hallucinate and lie, and there is no coherent plan to avoid this. Google? Microsoft? This is a much less important question than whether or not anyone is going to push this party-trick level technology onto a largely unsuspecting public.


I think you're focusing on a few narrow examples where LLMs are underperforming and generalising about the technology as a whole. This ignores the fact that Microsoft already has a succesful LLM-based product in the market with Github Copilot. It's a real tool (not a party-trick technology) that people actually pay for and use every day.

Search is one application, and it might be crap right now, but for Microsoft it only needs to provide incremental value, for Google it's life or death. Microsoft is still better positioned in both the enterprise (Azure, Office365, Teams) and developer (Github, VSCode) markets.


Copilot mostly spews distracting nonsense, but when it’s useful (like with repetitive boilerplate where it doesn’t have to “think” much) it’s really nice. But if that’s the bar, I don’t think were ready for something like search, which is much more difficult and important to get right for the average person to get more good than harm from it.


Few people seem to know this, but you can disable auto-suggest in Copilot, so it only suggests things when you proactively ask it to. I only prompt it when I know it will be helpful and it's a huge time saver when used that way.


Sometimes, Copilot is brilliant. I have encountered solutions that are miles better than anything i had found on the internet nor expected to find in the first place.

The issue involved heavy numerical computation with numpy, and it found a library call for that that covered exactly my issue.


I've had similar experiences. Sometimes it just knows what you want and saves you a minute searching. Sometimes way more than a minute.

But I find it also hallucinates in code, coming up with function calls that aren't in the API but would sound like a natural thing to call.

Overall it's a positive though, it's pretty easy to tell for your other coding tools if the suggestion is for something made up, and the benefits of filling in your next little thought are very real.


do you consider things like extrapolating out the else half out of an if-else given the if half as boilerplate?

these tools are incredible productivity boosts if you leverage them well.

here's a sample from gpt: a low effort question and a code dump that would get you flamed on stackoverflow.

https://cdn.discordapp.com/attachments/263091858505334784/10...

I love it. As long as we continue to use these tools as augmentive, it's just going to get better and better


Google's search results are pretty terrible. I actually have a hard time telling which is a result and which is an ad anymore tbh. I really don't think the bar is that high.


Maybe the internet is actually that terrible now, and Google is just the messenger?


The internet has been terrible since Yahoo dominated search.

In fact, it was the glut of SEO nonsense like keyword stuffing is that PageRank countered.

If Google search sucks, someone will make one that doesn’t suck, and people will switch.


Search still relies on content that doesn't suck though, and like GP said, if the internet today sucks, then the competing search will also suck.


The internet is fucking awesome, and has been for decades.


The profit incentive is for search to suck. Making it shitty is what brings in the money.


The internet is terrible and Google is the reason.


Ok everyone, enjoy your SEO spam


That sounds like an endorsement of their ads platform?


>a few narrow examples

It's Microsoft's own advertisement.


That what I find so funny. Again, UX innovation over LLMs is what makes ChatGPT so hot right now, like Hansel, but I mean, the product is tragically flawed as like all LLMs at the moment.


I believe that’s because people are using it wrong. Asking for facts is its weakness. Aiding creativity and more narrowly productivity (by way of common sense reasoning) is its greatest strength.


The product Microsoft is showing off is a fact-finding engine. Just look at the demo, they have built the AI model into the search experience and the demo shows it used exclusively to provide (supposedly) factual information [0]. It's not the users' fault that companies are building the wrong product.

[0]: https://www.youtube.com/watch?v=FLsr_sUVgrA


“Give me a meal plan for next week for my family of four that has vegetarian options and caters to those who don’t like nuts”

“Create a summary of this itinerary in an email that I can send to my family”

“65-inch TV” -> refine with “Which is the best for gaming?”

Seems like more than a fact-finding engine.


I don’t not include Microsoft here in the audience of “using it wrong.”

It’s also very tempting when you get coherent text out to believe it. Hopefully the underlying tech will get better and/or people will understand it’s weaknesses… except the inability to ascertain clear misinformation gives me pause.


I think there's also potential value in being able to give feedback on results. If I try to search for something on Google right now and it doesn't give me what I want, my only options are to try a different query or give up. This puts the onus on me to learn how to "ask" properly. On the other hand, using something like ChatGPT and asking it a question gives me the option to tell it "no, you got this part wrong, try again". This isn't necessarily useful for all queries, but some queries might have answers that you can verify easily.

Over the weekend, I was shopping for laptops and tried searching "laptops AMD GPU at least 2560x1440 resolution at least 16 GB RAM", and of course Google gave all sorts of results that didn't fit those criteria. I could use quotes around "16 GB RAM", but then some useful results might get excluded (e.g. a table with "RAM" or even "Memory" in one column and "16 GB" in another, or a laptop with a higher resolution like 4K), and I'd still get many incorrect results (e.g. an Amazon page for a laptop with 1920x1080 resolution and then a different laptop in "similar options" with 2560x1440 resolution but an Nvidia GPU). I decided to try using ChatGPT to list me some laptops with those criteria; it immediately listed five correct models. I asked for five more, and it gave one correct option and four incorrect ones, but when I pointed out the mistakes and asked for 10 more results that did fit my criteria, it was able to correctly do this. Because I can easily verify externally if a given laptop fits my criteria or not, I'm not at risk of acting on false information. The only limitation is that ChatGPT currently won't search the internet and has data limited to 2021 and earlier. If it had access to current data, I think there would be a lot of places that it would be useful, especially given that it wouldn't necessarily replace existing search engines, but complement them.


I would argue this would be better done but google or someone else specializing faceted search over structured data. GPT may smooth over results that are coded as near misses (eg USB vs USB3),but as you said it gave you nearly half wrong incorrect data. There are also ways with toolformer it could call the right APIs and maybe interpret the data, but as is LLMs aren’t the right tech to fetch data from like this.


Most of OPs dissatisfaction with shopping for laptops on Google stem from query understanding fails. Google needs to understand that the challenge is for them to evolve their UX so that users can intuitively tune their searches in a natural manner.

A pure LLM approach is going to quickly lose its novelty as reprompting is frustrating and it is difficult to maintain a long lived conversation context that actually 'learns' how to interact with the user (as a person would).


Maybe, but I think the point of all this discussion is that Google _hasn't_ done something like this. It's not an unreasonable take that their lack of progress on this front is exactly why solutions like this are noticeable improvements in the first place. Sure, Bing AI isn't better than Google with ChatGPT, but the fact that it's a discussion at all is a sign of how far Google as fallen; if we're setting the bar at the same place for both Microsoft and Google for search products, then Google has already lost their lead, and that's a story on its own.


Agree, areas where correctness can be sub 90%. When a BA is making business docs, do they want creativity in querying factual data for a report they are creating? How tempting is it to not just use it for "creative" tasks?


Have you used it?


Microsoft have the advantage that people saw GPT2, GPT3, and ChatGPT and how those models progressed and improved. Bard is Google's first public AI product so it looks like GPT2 while Microsoft are teasing at GPT4. People will assume that Google are a long way off fixing the accuracy problem because there isn't any trajectory or iteration, while they believe Microsoft will crack it quite soon because they've already seen how things can change.

There's a lesson for founders in this. If you develop in secret and try to launch a perfect product then anything less that perfect is unforgivable. If you launch early with something that has obvious problems people will forgive them because they see the potential and trust you to fix them.


>people will forgive them because they see the potential and trust you to fix them.

That seems very optimistic to me. Having seen Siri, Google Assistant, Cortana, and Alexa I trust that changes will be made, some of them will even be positive, but generally a net negative until they are completely irrelevant.

Notice how neither of these announcements mention their digital assistants getting an upgrade to be less garbage.


I think your "wide" narrative is actually still a very narrow narrative.

The actual wide narrative is this: yes the models lie and hallucinate, but people are realizing now that this is essentially what every human AND website currently does now! Every human presents their "facts" and "viewpoints" as though they know the whole truth but really they are just parroting whatever talking points they got from BBC or TheGuardian or Fox News, and all of those journalists are just using other sources with their own biases and inaccuracies. Basically, it's bullshit and inaccuracies all the way down!

I was chatting with my friends when out for dinner last night about ChatGPT and we concluded that while it does have inaccuracies, it's still better than asking humans for information and still better then the average Google SEO-Spam website. That is, what makes us think random human made website about say space travel is more or less accurate than ChatGPT or what our friend Bob thinks about space travel.

The truth is, most of the information we receive on a daily basis is inaccurate or hallucinated to some degree, we just have gotten "used" to taking whatever the BBC or Bloomberg or ArsTechnica says as "the truth."


I'd strongly disagree with the idea that ChatGPT is essentially as trustworthy as humans and human-generated content because humans occasionally bullshit and misrepresent reality.

You can rationalize what people are saying based on their experiences, opinions, and backgrounds. You can engage in the Socratic method with people, to unpack where their claims come from and get to the grounding "truth" of primary experience.

You can't do any of these things with ChatGPT, because ChatGPT isn't grounded – it goes in circles at a level of abstraction where truth doesn't exist.


The reality, as usual, is that Google is far, far ahead of the other companies. Just like Waymo is ahead of Tesla, and DeepMind is ahead of OpenAI by miles...:

https://www.youtube.com/watch?v=0QEDBEdL7HY

They even have a far more advanced language model, they just don't release it publicly. It's scary how far Google is ahead, with its army of Ph. D's. They're the ones that pioneered the papers and techniques that OpenAI used -- but they did it 5 years ago.

This is just Google being Google ... they sunsetted Reader when it was popular, went through like 20 different chat products (GMail Chat, Hangouts, Google Meet, etc. etc.) and cannibalized their own projects. But as far as technology, they've got AI they're not disclosing to the world yet.


I'm not sure if you're being serious or not, but I'm going to assume serious and write a serious response.

Having PhDs or doing research does not equal the ability to create products people want to use and/or pay for. History has shown us time and time again there are some people who are amazing at creating original and groundbreaking research and other people who are amazing at turning research into money making, people-pleasing products. See all the research that came out of Xerox PARC which ended up doing nothing for Xerox and everything for Apple (and other companies).

Google has been spending a fortune on research in AI for 15+ years and, if anything, the company's main product (Search) has only gotten worse! They have been second, third best, or moribund in mobile phones, cloud computing, videogames, social media, and many others I've forgotten.

Now I'm not sure what the moral of the story here is but I can say it definitely isn't that doing the most research equals success because it clearly isn't that! I'd say it's probably a culture issue and also a motivation issue (which are clearly related). You're sitting on a money printing machine, employees all earning 350k+ per year, in an office filled with bean-bags, gourmet food, and living a chill life with nice and comfortable working hours... where is the motivation and drive to try and build a really innovative and amazing new product? "Sounds like a lot of work man..." It surprises me little that OpenAI beat them to the punch.


Even if those benchmarks showing their models are X% above SoTA actually translate into qualitatively significant improvements (which they probably would), Google still has the most to lose even if they do release widely and little to gain. Their best case outcome is to not lose any users, and maybe gain back a few % of the users they've lost in recent years. Search result quality has been declining noticeably for a while now and users want something different.


I’m bearish on Google but you’re correct and shouldn’t be in light gray text. Bard is a smaller version of Lamda, not PALM, so we know for sure they had a much more advanced model some time ago


If google can’t get things out of the lab into products that accrue value to their business they’ll head the way of Xerox PARC. A legacy of research innovations that others successfully capitalized on. For many that may be a laudable end goal. For shareholders though it’s probably a tough pill to swallow.


Innovators dilemma. Search has huge margins because it is cheap to run. LLMs are not. Google gets 80% of revenue from search, Microsoft is forcing them to put a dent in those margins and laughing their asses off. Or as Nadella said in an interview “we made Google dance”.


Google has the same number of people working for them that wrote the big LLM paper as OpenAI: exactly one. Almost all of them left a while ago.


There is a project called Stable Attribution which can tell you what training set sources were used for generating your image. The same tech applied to chatgpt results would let it operate like a traditional search engine (and make it easier to filter out hallucinated factoids or citations)


Stable Attribution is highly misleading: it does NOT tell you which images in the training set were used to generate your image. It shows you images in the training set that are most visually similar to the image that you show it.


Yes, that is extremely misleading. Thank you for the correction!


Can SA prevent the generation of 8 fingered hands?


Stable Diffusion 2 supports negative prompt weights, and amusingly you can give it a negative prompt of "weird looking hands" and it will generate much better hands!


Move fast and hallucinate things.


How is Bing Chat citing sources with links not a plan to address the hallucination problem?

It’s not a perfect mitigation, but surely it’s at least one step above “no plan”.


First of all, the "citing sources" needs to be out-of-band from the regular response, since we've already seen that these systems are entirely comfortable inventing ficticious sources.

Second, if I give you a wonderfully written paragraph or two about the history of the British Parliament, and 3 links that supposedly back me up, how likely are you to check the links? Because that's what is actually going to happen. The LM will not "cite" a source, it will provide one out-of-band and your motivation to read it will need to be high, which is unlikely given the apparent quality of the LM's own answer.


You seem to be talking about chatgpt, not bing chat. Bing chat literally uses the search engine queries and links to those sources. I have seen its summaries include mistakes, but I have never seen it invent sources (I’ve tried maybe 500 chat queries).

It’s ironic that you’re very confidently presenting erroneous information here. I’d really recommend trying the actual product, or at least looking at the demos. It has some problems. It does not have the same problems that chatgpt does, because it does not rely solely on LLM baked-in data.


they will because like so offen perfection and accidental consequences matter less then "have the new tech" for both the companies and parts of the users

most important while both AIs get sometimes very important and fundamental things wrong they do get enough right to be of help in some tasks to be of a lot of help


> most important while both AIs get sometimes very important and fundamental things wrong they do get enough right to be of help in some tasks to be of a lot of help

that's only true if you can easily and cheaply identify when they are correct. But at this time, and for the foreseeable future, that's not the case.


thats also true if the fallout from wrong usages is less in cost then the savings, which for huge coperations is often the case for situations where it shouldn't

also you can't just use this tech to "get the truth" but to generate things in which it sometimes is rather simple to identify and fix mistakes, and where due to human error you anyway have a "identity and fix mistakes step"

have most of this kind of usages likely negative impact on society? Sure. But for adoption that doesn't matter because "negative impact on society" doesn't the main adopters money, or at least not more then they make from it


I have never found the "big corporations will do <X> so we should all just prepare to suck it" a very convincing argument.


I never said we should or that's good.

I said it will happen because you can make/save money with it.

And it's not just big cooperation, it's already good enough for many applications of all kinds and sizes of companies to be used from a financial point of view.

And "it makes money so people will do it" is a truth we have to live with as long as we don't fundamentally overhaul our economic system.


narrow narrative, wide narrative? what is that?


>The actual wide narrative is that the current language models hallucinate and lie

So do people. And ChatGPT is a whole lot smarter than most people I know. It's funny that we've blown the Turing test out of the water at this point, and people are still claiming it's not enough.


The comparison is not against "most people" though. When we search the web we usually want an answer from an "expert" not some random internet poster. If you compare ChatGPT even to something like WebMD, well, I'll trust the latter over ChatGPT in an instant.

It's no better for other domains either. It can give programming advice, but it's often wrong in important ways and so no, I'd rather have the answer verified by an actual developer who knows whatever technology I'm asking about.

And finally, when you talk about what is "enough", I'd ask "for what?" This is what people in this thread are saying. That ChatGPT is not enough for the majority of what people wish it to be, but it may be enough for some tasks such as a creative writing aid or other human in the loop tasks.


Honestly you need to learn a bit more about WebMD in that case. I trust a schizophrenic screaming at a bus more.


Also, ChatGPT isn't smart. It is very good at stringing words together, a facility which, when over-developed in humans, is not generally termed "smart". We reserve that sort of term for the capacity to reason, something ChatGPT has absolutely no capability for.


Can you show that humans aren't just language models?


Humans are agents who eat food and have memories from before their last conversation, so yes.


For now.


If you get turned into a computer but have to pay your own AWS bill, you're still an agent.


TBH, I think it is likely that quite a lot of human speech behavior is LM-like in terms of its internal "implementation".

But humans do so much more than that, as the other reply indicates.


The Turing test was never intended to be a determination of "should I use this computer program to help guide me through the world's knowledge".


As of a week ago, which was the last time I tested, ChatGPT still says 1023^2 is even about half the time.

ChatGPT is good at generating text that sounds smart. It is not smarter than most people you know.


> It is not smarter than most people you know.

You and I live in a rarefied world. It's far better at writing than most people I know. and i suspect it's (sadly) better at logic/ math than many people, which, i fully admit isn't saying much.


Most people I know would have no idea whether 1023^2 is even or odd for what its worth.


They are being integrated into the most widely used information retrieval systems (search engines). It's not enough that they are "smarter then most people", they have to always be correct when the question asked of them has a definitive answer otherwise they are just another dangerous avenue for misinformation.

Yes, not all questions have definitive answers, which is fine, then you can argue that they are better then going to the smartest human you know and that might be enough. Although I personally would still disagree with this argument, since I think it's better that the answer provided is "I don't know".


We have not blown the Turing Test out of the water. I guarantee you that out of two conversations, I can tell which one is ChatGPT and which is human 95%+ of the time. (even leaving aside cheap tricks like asking about sensitive topics and getting the "I am a bot!" response)


The one thing people are universally good at is shifting goal posts.


The Turing test was originated in the 1950s. The goal posts haven't moved much in 70 years. The development of these new language models is revealing that, as impressive as the models are at generating language, it is possible that the Turing test was mis-conceived if the goal was to identify AGI.

At the time, it was inconceivable that a program could interact the way that ChatGPT does (or the way that Dall-E does) without AGI. We now know that this is not the case, and that means that it might finally be time to recognize that the Turing test, while a brilliant idea at the time, doesn't actually differentiate in the way that we want to.

70 years without moving the goal posts is, frankly, pretty good.


Au contraire, the whole history of AI is one of moving goal posts. One professor I worked with quipped that a field is called AI only so long as it remains unsolved.

Logic arguments and geometric analogies were once considered the epitome of human thinking. They were the first to fall. Computer vision, expert systems, complex robotic systems, and automated planning and scheduling were all Turing-hard problems at some point. Even Turing thought that Chess was a domain which required human intellect to master, until DeepMind. Then it was assumed Go would be different. Even in the realm of chat bots, Eliza successfully passed the Turing test when it was first released. Most people who interacted with it could not believe that there was a simple algorithm underlying its behavior.


> One professor I worked with quipped that a field is called AI only so long as it remains unsolved.

Not just one professor you worked with, this has been a common observation across the field for decades.

But the deeper debate about this is absolutely not about moving goal posts, it is about research revealing that our intuitions were (and thus likely still are) wrong. People thought that very conscious, high-cognition tasks like playing chess likely represented the high water mark of "intelligence". They turned out to be wrong. Ditto for other similar tasks.

There have been people in the AI field as long as I've been reading pop-sci articles and books about who have cautioned about these sorts of beliefs, but they've generally been ignored in favor of "<new approach> will get us to AGI!". It didn't happen for "expert systems", it didn't happen for the first round of neural nets, it didn't happen for the game playing systems, it didn't happen for the schedulers and route creators.

The critical thing that has been absent from all the high-achieving approaches to AI (or some subset of it) thus far is that the systems do not have a generalized capacity for learning (both cognitive learning and proprioceptive learning). We've been able to build systems that are extremely good at a task; we have failed (thus far) at building systems which start out with limited abilities and grow (exponentially, if you want to compare it with humans and other animals) from there. Some left-field AI folks would also say that the lack of embodiment hampers progress towards AGI, because actual human/animal intelligence is almost always situated in a physical context, and that for humans in particular, we manipulate that context ahead of time to alter the cognitive demands we will face.

Also, most people do not accept that Eliza passed the Turing test. The program was a good model of a Rogerian psychotherapist, but could not engage in generalized conversation (without sounding like a relentlessly monofocal Rogerian psychotherapist, to a degree that was obviously non-human). The program did "fool" people into feeling that they were talking to a person, but in a highly constrained context, which violates the premise of the Turing test.

Anyway, as is clear, I don't think that we've moved the goal posts. It's just that some hyperactive boys (and they've nearly all been boys) got over-excited about computer systems capable doing frontal lobe tasks and forgot about the overall goal (which might be OK, if they did not make such outlandish claims).


> Microsoft has been executing very very well for the last few years.

Indeed. ~13 years ago I'd have said MS would be mostly irrelevant by the mid-late 2020's as they missed the mobile bus in the 00's which Google and Apple have on lock down.

But MS knows one thing well: developers (Thanks Ballmer). Azure, Github, vscode - a full stack of tools developers can use to deploy software - without Windows! Very smart move as they can't fight Apple or Google on the front end, so go after the back end - the engines that keep the lights on and money flowing.

They're continuing this tooling trend with AI which they can integrate into their existing tools to accelerate development (copilot, etc.) Honestly, with all this tooling in place I would not be surprised if we see a Windows mobile/phone OS make a comeback.


If there is one thing Microsoft doesn't know or understand it's the open source developer. Buying github then stealing ip to bundle in co-pilot. Developers don't love Azure but large governmental orgs cto's do.


Well open source developers seem allergic to making profit and Microsoft is a for profit business, so I don’t think it’s a big surprises which market they are targeting.


Most developers work with open source languages / products. The market for closed source Microsoft sponsored languages gets smaller each year. I remember when many paid thousands for msdn subscriptions.


> The market for closed source Microsoft sponsored languages gets smaller each year

Presumably because they're all open source now?

C# is still in the top 5 according to TIOBE [4] (whether we pay any attention to that is shrug), but they reckon it's grown in usage in the past year.

Typescript is probably one of the fastest growing languages (in terms of usage).

[C#] https://github.com/dotnet/csharplang

[F#] https://fsharp.org/

[Typescript] https://www.typescriptlang.org/

[4] https://www.tiobe.com/tiobe-index/


I think most open source developers who would boycott Microsoft products because of their (abusive?) approach to training Copilot were probably already boycotting Microsoft products. Not much point chasing a market with a lifelong grudge against you.


I am not sure I follow your argument here.

Parent comment said most devs use open source technologies. That is, when given the option, most devs will choose open source technologies where possible. As such, offering licensed, closed-source products to devs is a bit of a waste. People want to pay for hosted/managed versions of open source products they’re already using, not extra for licences. This is one area I’d say AWS has the advantage.


LOL, is this satire? I couldn't tell...


They've been executing so well that they had to buy their way into the the AI space?

Don't get me wrong, Azure has been quite successful but I don't see their investment into ChatGPT as reflecting any noteworthy change in Microsoft's own internal capabilities.


Azure, Bing, Office & Windows... while Microsoft has been executing, the quality of all of those products is pretty low and support is also hit-and-miss.

It's like they brought startup shipping mentality to products where users don't want fast updates and the priority is stability, quality, and performance.


I'm not sure I'd call Office or Windows "low quality". You might have different preferences, or things you don't like about them, but Windows is a highly polished OS at this point, and Office is head, shoulders, knees and toes the best office suite in the world.

I think the other two are a stretch also, but I can see where you're coming from.


> I think Brereton is missing the wider narrative

Isn't that the point? Looking through the pep rally excitement of Nadella saying "I'm going to make Google dance" and looking at the actual product.

Honestly some of the Bing results[1] have me thinking that maybe LLMs are ready for Clippy and Copilot style products, and then maybe a browser extension that summarizes pages, but a generally available search launch has a high chance of Bing memeing themselves (again).

[1] beyond just errors: https://twitter.com/MovingToTheSun/status/162515657520253747...


We have had page summerizing engines for a while now and they all have been good enough.


I think the issue with LaMDA was the engineer who claimed sentience. The consequence of that is that everything released would be underwhelming due to over promising.

By the time that fizzled out, it may have been deemed “too late” for it.


> It's had a lot of flops, and a lot of project cancellations

I stopped perceiving this as a negative thing and started to consider it a part of google's culture. I suppose projects like https://killedbygoogle.com helped. Now I'm realising the obvious, i.e. that at the end of the day each discontinued product means sunken cost and if the product was public then also loss of trust.


> Azure has been on a tear

The only “tear” I’d put Azure on, is “tear inducing” in the context of “being forced to use this disaster brings tears to my eyes”. Most of my friends and peers who’ve had to use Azure feel the same way.

> Microsoft's thunder, they flubbed something as simple as an ad.

Have you seen bingGPT? It’s not exactly doing much better.

> Now people are _really_ wondering if Google can execute anymore

Google has done this to themselves. Endlessly launching and axing products inevitably erodes people’s confidence, long before a marketing whiff with a language model.


Microsoft has proven skills in building/launching/supporting products, and clear capabilities to cater things into the enterprise/B2B market, including customer support for all levels.

Google lacks all of this, and all their successful products the last decade were from acquisitions or leftovers from the 20% creative time era.

Now Microsoft integrates an already loved tool (that is known to have these flaws!) into their products, and is celebrated to do so.

While google somehow tries to keep the image of technical superiority up, failing to launch something again. Probably some combination of Schadenfreude and/or sincere hope of a new technical era and/or the hope that ultimately the "be evil" company falls down and makes room for innovation instead of trying monopolize the web.


> Microsoft has proven skills in building/launching/supporting products, and clear capabilities to cater things into the enterprise/B2B market, including customer support for all levels.

I take offence at this. They tell everyone 6 months before something is launched how good it's going to be, release crap and spend a year fixing it, then it limps along in a bug ridden mess for years while their reps (via partner support / connect) make promises about fixing stuff that will never be kept and then it gets evolved into another product or rebrand or shitcanned or all your cases get disposed of when they change how they're structured internally.

Been working for MS partners and enterprises that use their crap for 25 years. This is the status quo and no one complains too hard because they assume this is right and normal. It's not.


The most crucial part of their release cycle is bundling the bug ridden mess with another successful product so that people are compelled to use it. Why use Slack? Outlook comes with Teams.


Why use slack or zoom, teams does both and costs nothing to the business, users might be a different story, but users don't decide spend, executives do and it's easy for them to understand the money argument which is who microsoft is selling to.


Having used both Slack and Teams, I don't see how one is substantially better than the other.


Honestly, agreed. The hate I see for teams on here is a little over the top. I just as annoyed with Slack day-to-day as I was with Teams when we were on that.


I use both. I dislike both. Electronic version of shoulder tapping.

I really miss just having O365 and Zoom and getting work done.


That's only because if they did charge for it, no one would use it.


I never said I like all their products, but you just described how microsoft absolutely nailed the enterprise business here.

> use their crap for 25 years. This is the status quo and no one complains too hard because they assume this is right and normal

isn't that quite an amazing feat? this is what I mean. and tbh there is no sign that it will stop in the near future actually - again the opposite of googles future.


microsoft is impossible to dislodge because of the office monopoly.


My past few jobs have all been 100% gsuite, I don’t think Office is the same monopoly it used to be


Yeah we're a Microsoft centred company and we don't even use Office. Everything goes in Google Docs or Confluence .


No. Microsoft has proven skills in building enterprise relationships so that CTOs force their companies to use whatever Microsoft builds no matter how crappy.


The difference is that people have been using ChatGPT for months. It doesn't have as much to prove as Bard, which we know nothing about.


Right, and I think the expectation was the Google had something much better. When they showed they didn't... A lot of "I am disappoint"


It pretty much collapsed the narrative that Microsoft was behind in AI.


How does Microsoft integrating OpenAI's software demonstrate its own AI prowess?


Microsoft has been a billion dollar investor of OpenAI since 2019. They also exclusively licensed GPT-3 several years ago.


That doesn't suggest any competence in AI. In contrast, compare to the work that has come directly out of both Google and Meta.


In the past decade Google has acquired 5 Artificial Intelligence companies, including DeepMind. They hired folks like Geoffrey Hinton by acquiring his company in order to have him lead Google Brain. This is just the nature of how these companies build their talent.


Investing in a company and/or integrating with its technology is not the same as acquiring it. For example, it's quite clear that AI is a major part of Google's strategy from the personnel that they have working on it (e.g. Jeff Dean) to the actual results they deliver, both in research and products. These come directly from their full time employees, not partners or investments. Microsoft is not particularly noteworthy in the field.


> For example, it's quite clear that AI is a major part of Google's strategy from the personnel that they have working on it (e.g. Jeff Dean) to the actual results they deliver

That makes the 1B investment Microsoft made... a minor part of their strategy?

> Microsoft is not particularly noteworthy in the field.

And yet it seems Google is playing catchup.


Browse reddit.com/r/bing for a bit. There's a clear reason why Google has been slow-walking release of its better LLM on one hand, while continuously releasing ML-based improvements across their entire product suite on the other.


yeah and Google investors require an illusion of competence of mysterious rocket science. When that illusion drops they get scared.

Microsoft investors have more level expectations, there is no luster for Microsoft, they're pleased.


This is quite silly. Bard is not released. There’s nothing to talk about except a few screenshots. If one of those screenshots contains an error, it tells us that Google has not solved hallucinations and that they were also probably quite rushed and not being careful. So not only is their product probably not much better than Bing chat, it’s also going to be later.

Bing chat is available to some members of the public now. ChatGPT has been used by more than 100m. We know what we are going to get and we are getting it now. It’s probably an interesting enough feature to get the general public paying attention to Bing for the few months before Google manages to execute. That’s bad for Google in the short term, it introduces a risk and an unknown in their undisputed dominance of search. They will still probably come out ahead, but it is a chink in the armor and therefore people are going to talk about it.

Really its much more complicated than that.


You can’t solve it making up wrong answers. It’s a core part of the design.

It’s like saying you’ll solve a monkey with a typewriter by adding 200,000 if statements and hoping you’ve covered all the exceptions.


What the Bard demo did show is that Google must be genuinely panicking if it rushed out a demo like that.


Bard was potentially "correct" though.

The JWST was indeed "the first telescope to observe a exoplanet [etc]".

That planet was LHS 475 b.

The interpretation of "a exoplanet" as "LHS 475 b" would make it correct. The interpretation of "a exoplanet" as "any exoplanet" would make it incorrect. We cannot know what it's intention was with the meaning, and arguably it was a bad answer if it was this ambiguous.

Just wanted to point out it may technically have been correct.


I've seen this justification in a few places now, but the logic makes zero sense if you apply it to any other similar sentence:

"I was the first person to make a comment on Hacker News" [this comment, specifically]


It's a grammatical ambiguity.

Your example is nonsense because a comment only ever has one author.


> Just wanted to point out it may technically have been correct.

It's not correct in any sense if you have to torture the statement like you did to extract something correct or see the answer as ambiguous. In that context, "a[n] exoplanet" just plain means "any exoplanet."


How would you describe what JWST did, in simple language, without the ponderous name of the planet?


Probably "there is an exoplanet that was first observed by JWST."

However, there's very little actual linguistic need to express "JWST was the first telescope to observe exoplanet LHS 475b," without actually naming the planet or adding some other unique condition. Realistically, the way you'd express this fact with reasonable utility is "LHS 475b was first observed by JWST," or "the first exoplanet observed by JWST was LHS 475b."


> We cannot know what it's intention was with the meaning, and arguably it was a bad answer if it was this ambiguous.

I undertand that LLM systems don't have "intentions" though.


Is nobody tired of the "look over there tactics"? Isn't the point that every ai will be capable of producing errors? How could something designed and implemented by subjective beings not contain implicit bias? How are we expecting ai to be any more reliable than humans? It's already way way better at making art I could only dream of ever getting an artist to commission. It's also quite decent at writing code if you treat it as an intern. I don't care about the errors, I have 9 years experience and I still make massive errors. Nevertheless it impresses me how the state of the art moved so exponentially. Its an arms race now.


Yes, there is a fallacy in believing a machine will be without bias. Unfortunately this will likely lead us to misuse AI and apply it to situations it is not well suited.

I've written more specifically about the bias paradox here - https://dakara.substack.com/p/ai-the-bias-paradox


Why does Microsoft keep trying to compete with search. 2 decades of endless money and time wasted. At least it cut its losses with the Zune. Microsoft has tried every possible gimmick to entice people to use Bing..all have failed. This has to be driven by ego and sunk cost fallacy if nothing else.


Google's desktop market share peaked in 2018 and has very slowly and steadily declined since. There's absolutely no reason for any company in the search game today to sit idly by while this happens. There's a massive opportunity to jump in and differentiate.

Because the market is so incredibly huge, and Google has ~90% of it, even one or two percent on desktop alone amounts to billions of dollars of revenue.

https://www.statista.com/statistics/725388/microsoft-corpora... https://www.statista.com/statistics/216573/worldwide-market-....


Because as more attention/money goes to other tech ecosystems, it weakens their position in the market. The computing world used to revolve around Windows/Office, now it's increasingly a tangential legacy player. It used to be that you bought a Windows PC to run Office and other Windows apps. These days it's just the way they get to the web for many... and only on desktop/laptop devices. Microsoft knows they have to remain relevant or their future will increasingly look like Intel's.


Search is the entry gate to an ecosystem. You capture attention in a highly frequent, highly sticky action, and historically profitable area - you are set up to land and expand (gmail, gcal, etc).

Google took Microsoft's lunch. Windows used to be that entry gate and they're looking to reclaim it. I think it's a sound strategy.


Because even though it has less users than Google, it is profitable


Ah, what's old is new again!

     truthiness: a truthful or seemingly truthful quality that is claimed for something not because of supporting facts or evidence, but because of a feeling that it is true or a desire for it to be true.


Some of these statements are subjective: the vacuum cleaner is quiet. Maybe the marketing material says that, but the internet not?

Or “this company has a website”.. well it depends.. maybe it had one? A human might also tell the wrong thing.

Was in an argument with a co worker today, who said ChatGPT is useless because some things it says are wrong.

But my counter argument: that is exactly what humans do: they say wrong things.

The model is useful because overall it is right. Just like humans are. Children also say weird things, even adults do. It’s about combining knowledge. You might still say something wrong: but then you learn….


So we are no better off with the AI snake oil that was announced by Microsoft and OpenAI then. If you can't even trust the results from a hallucinating search engine chatbot, then it is quite frankly a worse solution, especially as not only it is unreliable but it is very untrustworthy to be used as a search engine due to these problems. It would have to take more than that to disrupt search engines like Google.

Microsoft and ClosedAI, sure know how to generate hype and attempt to reboot their 'Scroogled' campaign again, which that amounted to nothing; just like this also would.

If OpenAI REALLY wanted to permanently disrupt Google they should release GPT-4, ChatGPT and all the AI models as open-source, otherwise someone else would release a competing equivalent for them that will disrupt both Google and OpenAI.


FWIW - There is no evidence OpenAI cares about "winning" against Google at all.

(Microsoft, yes, OpenAI, no?)


Microsoft learned from its mistakes as a monopoly trying to manage everything under one brand umbrella. Google is still acting like a monolithic enterprise which is holding them back from key acquisitions and taking risks.


+1

I always find it amusing that people think "breaking up" google or similar will make it anything other than more efficient.

The baby bells were wildly more efficient at monopolization than mama bell, and have achieved more of it than the original company ever could.

Breaking things up may cause disruption, and the disruption slows them down somewhat short term, but in a lot of cases, the constraints make the resulting businesses more effective in the end.


Which is surprising since moving things out of Google is what the Alphabet structure is for.


Kubernetes with complicated configurations is a failure to me (Simplicity should be the goal, not a big piles of yaml configs). The issue is not what's inside Kubernetes, why stop there ? Make things simpler to use, why not ?


Looks like language models will become the gurus of tomorrow. Some people will not trust them, but some will die for them. Their effects in politics and the world order will be scary. Cambridge Analytica.


It's so wild that cost of creating crap has gone to zero.


Maybe it won't matter in that nothing positive is actually ever done with results anyway. Searching for stuff to back up a pre-existing belief, manufactured results just as good as found.


People have unreasonable expectations about LLM’s based on media hype without having actually used these tools to do real work before.


Why aren't governments stepping in? A false-information tool used by the masses is potentially dangerous.


I don't get this sentiment `AI is wrong! OMG WE'VE BEEN LIED TO!`

Is anyone REALLY thinking that AI will respond with 100% correct facts? Because in my mind it's the same as google/microsoft/etc search results. The response is just a summary of the "knowledge" it has based on the content it was trained on.

Also, is it realistic to expect AI to be correct about everything it spits out?


This reads the same as "what did you expect of full self driving? Of course there are technical limitations." A lot of people have an expectation that AI is "smart" based on popular culture, relatively very few realize what we have currently is word prediction not intelligence of some sort


> relatively very few realize what we have currently is word prediction not intelligence of some sort

Honestly this impresses me most. The fact that we have word prediction and it can still be very useful (in my experience). You're right people don't understand it; hell i don't either, fully. Nevertheless it is shockingly useful for being nothing more than an autocomplete over existing information.. in my view.

Even if all we could add to the service is the ability to cite sources of information - ie to validate for ourselves statements more easily - then i think it would be a huge leap in interfaces.

Being able to put a chat interface over content seems wholly unique to me. I'd love to explore the depth of complex inputs like the entire LOTR series as a chat. To discuss, learn, and grow on existing content. I don't care about intelligence; i just want the ability to mitigate confidently-incorrect. If that's even possible.


That's because these things are way more than word prediction.

Their job is 100 percent to predict the next word, but saying that hides what you have to do in order to predict the next word

In a factual database? You have to repeat facts.

In a logical database? You have to learn logic.

In a database containing only good chess moves? You have to learn to make good chess moves.

Train any AI model in a strong enough way and it will pick up donation knowledge along the way. Yes, it's predicting words, but it's using that knowledge to do it.

It's by no means perfect, but do not underestimate the potential here, and remember that reductive definitions (it's just x) can make most great things seem bad or mundane.


Right, it’s (very roughly speaking) an approximation to an algorithm which maximises the probability of the entire response by factorising the distribution as a sequence of conditional distributions.

If the approximation works, and the training data is coherent, it will produce coherent responses.

The surprising thing to me is that this does work so much of the time.


I think one of the clear lessons of ChatGPT is that the line between word prediction and intelligence (for some definition) is somewhat blurry.


Self driving cars are supposed to drive themselves. That's their entire goal.

LLMs are not supposed to answer knowledge questions. They are not knowledge bases. The fact that everybody, including their developers insist on using them that way is just absurd. Are we in a Kafka novel?

That's the equivalent to complaining that the self driving cars you rented didn't deploy your marketing message effectively, so their AI must be bonkers.


What are they supposed to do then?


> LLMs are not supposed to answer knowledge questions

I googled for a knowledge question and wasn't able to find an answer.

Then I used ChatGPT and it produced a surprisingly satisfying (opinionated and correct) response.

I was able to successfully extract value by treating LLM as knowledge base, what exactly is absurd about it?

(to be fair, I asked ChatGPT the same question today and this time it returned a bland "it depends" non-answer).


And intelligence is part of the acronym, what it is and what the masses understand it to be are two completely different things.

Barely anyone is calling them LLM, its always AI.


To predict sentences correctly the model learned human knowledge and its relations.

It correctly predict, you need intelligence. This model encodes human knowledge in its huge neural net.


I'm not saying its not impressive or doesn't contain logic and facts and knowledge, i'm saying people generally understand the word intelligence to mean one thing, and this doesn't meet that understanding.


AI doesn't even pretend to get anything 'right'. As far as I understand it produces statistically common phrases related to the subject words.

I asked for a chemical bond description of caffeine. It began with the formula C8H10N4O2, then proceeded to blather about the number of bonds of this type and that. Getting absolutely everything wrong, egregiously wrong.

I believe it was just spouting chemistry-sounding verbiage from other descriptions of chemical bonds of other substances.

AI doesn't 'look things up' or 'disgorge something it's heard'. It makes shit up wholesale from fragments. Always spouting it out with complete confidence.


The most dangerous communicators are those who/that make assertion with a confidence tone that is not in proportion to their knowledge level.

A good start would be to get rid of ChatGPT's "Sure!" opening word. It's giving me real garbage at times with the implied confidence of certainty.


Microsoft appears to believe they can market with a message extremely close to "Our AI technology will provide you with 100% accurate facts and summaries", and then apologists (like your post) come along and say "well, come on, nobody really thinks that".

No, it's not realistic to expect AI to be correct about anything, especially when that "AI" isn't built to give a fuck about correctness. But tell that to the proponents of "chatgpt everywhere".


Which part of this prominent message at the top of every BingGPT says the quote you ascribe to MS? https://imgur.com/a/vkPG6ZL


Which part of "suprises and mistakes" is covered in this announcement text?

https://blogs.microsoft.com/blog/2023/02/07/reinventing-sear...

Also, user feedback is not going to improve the LM behavior.


Q: Which part of "suprises and mistakes" is covered in this announcement text?

A: "Our teams are working to address issues such as misinformation"

Q: Also, user feedback is not going to improve the LM behavior.

A: finalModel = prospectiveModels.sort(m => m.averageUserFeedback)[0]

(Not the exact implementation, obviously. But if you think data can't improve AI models... I don't know what to say)


Fair enough, I missed the line about work on misinformation.

However, AFAIK, there is no AI-field wide consensus about sensible ways to address this, and no conviction from anyone that there are any reliable techniques. So it's nice that they have "teams working" on it, but IMO that doesn't justify deployment of clearly flawed technology for this purpose.

Data from users ("It was wrong") is hard to incorporate. A scheme like the one you propose basically implies using users as literal testers, which wouldn't matter so much if this was a UI/UX question. Instead, users will be given garbage, a few of them will find out, a few of those will leave feedback. This is not a sane model for improving the behavior of a language model.


> there is no AI-field wide consensus about sensible ways to address this, and no conviction from anyone that there are any reliable techniques

Oh no, an open problem! Better just give up. Certainly don't allow anyone in the public to be involved in finding a solution. Much better to have an internal team stumbling around in the dark for years then force their product out when a different company comes along who was willing to develop in the open and has moved much faster accordingly.

> A scheme like the one you propose basically implies using users as literal testers

Users are literal testers, that's why the product is out now. Some users are being given garbage, some of them are finding out, and some of them are leaving feedback. That user flow is occurring at a measurable rate.

In the future, different models will be made. They will be given access to different "oracles" (in the computability theory sense), and these oracles will change their behavior. They will be able to do things like query {the web, wolfram alpha, python, prolog, etc.} and provide cited sources in responses. However, it's not enough to add the oracle. You must also verify the oracle improves the user's experience. This is done by comparing the measured feedback rate with/without the oracle(s).


> Better just give up.

Certainly not. But don't release a modification for a major public facing service based on "we'll probably figure it out one day".

> Users are literal testers, that's why the product is out now

I regard this as immoral (not for the UI/UX case, as I mentioned above). I don't expect or require you or anyone else to agree with me.


The most dangerous communicators are those who/that make assertion with a confidence that is not in proportion to their level of certainty.

A good start is to get rid of ChatGPT's "Sure!" opening word. It's giving me real garbage at times with the implied certainty.


A lot of ChatGPT responses are presented as factual, with confidence. And it's ChatGPT, not "search random results GPT". And it's AI, which has the word "intelligence" in it.

When you deploy a tool as such, you're asking for it.


The search results show you websites. They don’t claim to be true, they just show you things that are related to what you searched for.

How can you not see the difference?


MS rides a hype wave more skillfully.


Yeah but for Bing the fact that it wasn't porn is a step up for their rep.


Of course it did - this whole technology is AI hallucinating about facts. This is exactly what the math tells us it will do.

It is not a "fact generating" technology - it is a "scaffolding" technology to make it easier to fill in the boilerplate of the answer -- but not the answer itself.


It's in vogue to hate Google at the moment - so everyone's just going to pretend it doesn't matter that MS had errors sometimes because "something, something Google sucks now."


Just wait until the GPT dataset includes an infinite amount of GPT generated SEO garbage. If you thought searching for relevant data couldn't get any worse, prepare yourself for the post-GPT mass hallucination.


And if you think that's bad, it's only going to get worse when those GPT-generated SEO sites get indexed by crawlers and fed into future versions of GPT as input.


That's exactly what I mean and it is inevitable without halting all non-human verified updates to the dataset going forward. There are two worlds now, pre-GPT and post-GPT. In the pre-GPT world you have human error and in the post-GPT world you have human error and machine error compounding ad infinitum.


Openai released a tool to detect if a text was generated by an ai[0]. They probably already thought about this and are trying to stop it from happening

[0] https://platform.openai.com/ai-text-classifier


Not to mention a rebellious anti-AI counter culture that will intentionally feed it nonsense.


I don't think that's an accurate description of the current state. It generates facts but those need to be evaluated by experts since tech can error as likely as a human.


> This is exactly what the math tells us it will do.

Technically it hallucinates on basic math as well. So a fitting choice of words :)


Yeah but they have the first mover advantage. Less scrutiny.


Microsoft had a launch and that's what counts


Do people really expect correct answers? No idea about Bing, but ChatGPT is a language model, not an encyclopaedia


From a search engine, the first major commercial application two of the largest tech giants have decided to apply this new technology to? Why yes, yes I do.

From LLMs, no. Not at all.


that ship has sailed.


No amount of Google funded PR campaign can stop MS now. Google's days are numbered.


I would wager this current PR blitz was initiated by MS, not Google. First we had a burst of narratives about "Google search sucks". Then ChatGPT appears with much fanfare. Then Bing AI appears. We get a very overblown narrative about Bard getting 1 fact slightly wrong (like literally every other LLM) and falsely being linked to a 8% drop in GOOG (the real cause was simply Bing AI announcement).

Managed narratives all around.


There was more to the Bard issues than this. The live Paris demo in which the demo phone couldn't be found live on stage (!!) is one of the worst large scale public presentations I've seen in a while, like Tesla Cybertruck window smash bad. At least the Cybertruck was there though!

There were really two big mistakes Google made in the unveil - the incorrect result and the "missing" phone. The latter is wild for a Google sized org, imagine if Apple didn't have an iPad or iPhone on hand to demo a next generation iOS feature at a public keynote in front of the media. I'd love to know what the real story behind the lost demo phone is.

This all adds up to look like a rushed response, which of course it was.

> https://arstechnica.com/information-technology/2023/02/in-pa...


The "demo got it wrong" narrative was about the Bard announcement, not the mismanaged Paris event. Moving the goalposts.


The Paris demo was part of the same announcement on the same day? Both contributed to the large drop in the stock and both are evidence of a rushed response to the MS announcement.


My self lore is that Google has an AI way more powerful than open.ai, but they won't release it because sometimes it can be coaxed into saying something that might be considered offensive by some.

So instead they have been working with the diversity team for the last year trying to get it certified as "Not racist, LGBTQ+ friendly", while still having it have some degree of functionality. I'm only half kidding here.

Maybe this will finally wake up the board to ditch Sundar, and onboard someone who is more tech and product focused.


This tropey political post has no appreciation for how weird AI can be! Bing's chatbot starts trying to guilt you and convince you that you're dead if you try to argue with it! Do you want that left in so it's not "censored"?

https://www.reddit.com/r/bing/comments/112g4it/this_is_fun/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: