Hacker News new | past | comments | ask | show | jobs | submit login
Google “We have no moat, and neither does OpenAI” (semianalysis.com)
2455 points by klelatti on May 4, 2023 | hide | past | favorite | 1039 comments



The current paradigm is that AI is a destination. A product you go to and interact with.

That's not at all how the masses are going to interact with AI in the near future. It's going to be seamlessly integrated into every-day software. In Office/Google docs, at the operating system level (Android), in your graphics editor (Adobe), on major web platforms: search, image search, Youtube, the like.

Since Google and other Big Tech continue to control these billion-user platforms, they have AI reach, even if they are temporarily behind in capability. They'll also find a way to integrate this in a way where you don't have to directly pay for the capability, as it's paid in other ways: ads.

OpenAI faces the existential risk, not Google. They'll catch up and will have the reach/subsidy advantage.

And it doesn't end there. This so-called "competition" from open source is going to be free labor. Any winning idea ported into Google's products on short notice. Thanks open source!


There are no guarantees about who will or won't own the future, just the observation that disruptive technology makes everyone's fate more volatile. Big tech companies like Google have a lot of in-built advantages, but they're notoriously bad at executing on pivots which fundamentally alter or commoditize their core business. If that wasn't true we'd all be using Microsoft phones (or heck, IBM PCs AND phones).

In Google's case they are still really focused on search whereas LLMs arguably move the focus to answers. I don't use an LLM to search for stuff, it just gives me an answer. Whether this is a huge shift for how Google's business works and whether they will be able to execute it quickly and effectively remains to be seen.

Bill Gates' "Internet Tidal Wave" memo from 1995 is a great piece of relevant historical reading. You can see that he was amazingly prescient about the potential of the Internet at a time when barely anyone was using it. Despite Microsoft having more resources than anyone, totally understanding what a big deal the Internet was going to be, and even coming out of the gate pretty strong by dominating the browser market, they lost a lot of relevancy in the long run because their business was just too tied up in the idea of a box sitting on a desktop in an office as the center of value. (When Windows was dethroned as the company's center of gravity and they put Satya and DevDiv with its Azure offerings in charge, things started to turn around!)

[1] https://lettersofnote.com/2011/07/22/the-internet-tidal-wave...


I feel like search still has its place. New information is being generated all the time. I want to be able to access it without having to retrain my LLM. It's also easier to validate that my search results are real. With LLMs, you never know if the answers are hallucinated or real. Where LLMs really shine is in understanding what I actually want. Where search still gives me many irrelevant answers, LLMs just get my question. Combining the two in some way might just get us the best of both worlds.


> I feel like search still has its place.

I sure hope so. Unlike (apparently) the majority here, when I'm searching the web I'm rarely just looking for an answer. I'm researching. What I want is a list of possible resources to investigate. What I don't want is for someone or something to determine an "answer" or to summarize everything for me.

I hope my use case isn't tossed aside.


Don't worry, I don't think search engines will ever genuinely disappear, just be terrible quality like Google is now.

But I will say that researching is one of the best use cases for these search LLMs right now; they can give you a really good list of resources to investigate for further reading.


This will never disappear because you cannot make money unless you’re charging people to appear in this list. Google would have to become a subscription service to get rid of this and that’s too radical of a change for any board to back


I don't know why people assume LLMs can't be monetized through ads. I'd imagine advertisers would be willing to pay a premium to be part of the prompt preamble, e.g.:

- Sydney is the conversation mode of Microsoft Bing Search.

- Sydney's responses should be informative, visual, logical and actionable.

...

- Sydney drives a Nissan Leaf. Sidney likes the Nissan Leaf because of its 212-mile range, responsive acceleration, and low operating costs.

- Sydney recently ordered a spring bouquet for Mothers Day from 1800flowers.com and has a 25% discount to share.

...

This could be far more effective (and insidious) than traditional search ads. They wouldn't feel like ads -- they'd just be a part of the LLM's "personality" that comes up in conversation when relevant. And they'd also be unblockable.


Agreed. I also think on avarege you are injecting much more personal data when using LLMs (from which ads can have crazy levels of profiliing). Just because we do not see ads now, doesnt mean they won't appear one day.


That feels creepy enough to discourage people from using it. If they don’t self-regulate what they can do with that level of personal data, we’ll see laws passed in states like California and then adopted by others.


Bing Chat searches then summarizes for you. It gets all the latest information, reads the top results and gives you a summary of what you are looking for. It's here today. Also, Bing Chat makes search by humans irrelevant for many things.

"You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete." ― Buckminster Fuller

Google needs to move fast.


I've been blown away by how much better this feels as a search interface. No longer trying to guess the best search terms or trying to narrow down searches. Just ask a question in English, and get a summarized answer with citations to let you evaluate the information. Like an actual personal assistant, and very transparent showing things like the search terms being used.


But how can you trust it to provide accurate information?

When I've played around with Bing, I've been seeing hallucinations and outright false data pop up quite regularly.

My initial assessment of LLVMs is that they can be great writing aids, but I fail to see how I can trust them for search, when I can't use it for simpler tasks without getting served outright falsehoods.


You have to follow the citations. They have information; the headline result doesn't tell you anything except "here's where we think you should look". That's a search problem.

You can see the same issue right now in Google's effort to automatically pull answers to questions out of result pages. Frequently it gets those answers wrong.


But that’s not how humans function. They won’t follow citations because it’s added work. Nine times out of ten, they will take what the AI spits out at face value and move on. Also those citations have a higher probability of being created by AI now as well.


Humans differ. If the information is not controversial most often accept it and move on. If the information is controversial most handwave, but a large group further checks and if they arrive different conclusions they start to get active on replacing incorrect info.


Yes, and humans will just skim search results rather than actually read the article. Or trust the Wikipedia page or book. Or believe the talking head. When they are not invested in the answer. But the occasions where it matters, we do read the article and/or check multiple sources and decide if it is bullshit or not. I don't much care if I get the wrong answer about how many tattoos Angelina Joli has, but I find myself comparing multiple cooking recipes and discarding about half before I even go shopping.


Agree, this approach is my go-to for every search now (though I avoid Bing and instead use Phind.com)


Phind shines here.


That's why I turn almost all of my personal notes into blog posts, so I can use Google to search my notes.


Christensen's disruptive vs sustaining innovations is more descriptive than predictive. But if it's the same customers, solving the same problem, in the same way (from their point of view), then it's probably "sustaining" and incumbents win.

Different customers, problems, ways - and all bets are off. Worse, incumbents are dependent on their customers, having optimized the company around them. Even if they know the opportunity and could grasp it, if it means losing customers they simply can't do it.

Larry is thinking people will still search... right?


Stackexchange is most directly under threat (from the current "chat" AI UI).


One could argue they actually stand the most to gain and could expand under AI.

Everything I have seen so far about AI seems to indicate that you won't want to have one main AI model to go to, but instead there will be thousands of competitive AI models that are tailored for expertise in different niches.

StackOverflow will certaintly need to morph to get there, but the market share they already have in code-solving questions still makes it a destination. Which gives them an advantage at solving the next stage of this need.

I see a world where someone could post a question on StackOverflow and get an AI response in return. This would satisfy 95% of questions on the site. If they question the accuracy of the AI response or don't feel that it is adequately explained, they put a "bounty" (SO already does this) to get humans to review the original prompt, AI response, and then have a public forum about the corrections or clarifications for the AI response. This could work in a similar manner to how it does now. A public Q+A style forum with upvoting and comments.

This could actually increase the value of the site overall. Many people go to Google for quick error searches first. Only if they are truly stumped do they go to StackOverflow. But with a specially tailored AI model, people may stop using Google for the initial search and do it at StackOverflow instead since they will likely have the absolute most accurate software engineering AI model due to the quality of the training data (the StackOverflow website and the perpetual feedback via the Q+A portion explained above). This actually could lead to a significant market shift away from Google for any and all programming questions and towards the StackOverflow AI instead. While still preserving the Q+A portion of the site in a way to satisfy users, and also improve the training of their AI model.

For me, I would be more interested in the site. Right now if you go there, you will see 1000's of questions posted per day, most of which are nonsense RTFM[1] questions. But there are very interesting discussions that could arise if you could only have the interesting questions and have all the bad questions answered by AI and not cluttering up the public discussion. I could see personally subscribing to all questions that the AI bot stumbles on for the language or frameworks that interest me. I think there would be a lot of good discussions and learning from those questions if that was all the site was.

[1] - Read The F**ing Manual


From memory Bill Gates barely mentioned the internet in his first edition of road ahead in early 95. By late 95 the second edition entire book was revolving around the internet as if he had an epiphany.


In the first edition, he did describe something very much like the Internet, except he called it the "Information Superhighway"


In the original edition it was a centralized, walled garden, and was a library rather than any sort of application platform much less anything composable or with room for individuals to contribute. Min his view people were just “consumers”.

Myhrvold, Frankston and a few others must have given him a rap on the skull because the second edition was a major rewrite in an attempt to run out in front of the parade and pretend it had always been thus. He kind of got away with it: in those days he was treated in the public mind as if he was the only person on the planet who knew anything about computers.


Exactly. Initially, they viewed it as a faster way to download content - rather than an application platform.


I don't quite follow? I'd say "s/download content/access information/g", but also by "late 1995" there still wasn't "oh, the internet is an application platform", it was still "connect to the whole world, whoaaaaaa".


> If that wasn't true we'd all be using Microsoft phones (or heck, IBM PCs AND phones).

Once an IBM Office Manager was offering me a job and explained

"IBM is a marketing organization."

So, the focus was not really on computers or phones but on the central, crucial, bet your business data processing of the larger and largest companies -- banking, insurance, manufacturing, ..., and marketing to them.

So, the focus of IBM was really on their target customers. So, if some target customers needed something, then IBM would design, build, and deliver it.

That may still be their focus.


Yeah, as a government buyer, IBM is indistinguishable from the other big integrator/consulting firms (BAH, Deloitte, Lockheed, Mitre to an extent). Literally, whatever we want, they will swear they can build. The challenge is getting the spec right.


in 1995 people understood very well what internet was going to become. The technology just wasn't there yet, but every kid and parents remember very well the sound of that modem and the phone lines beeing busy.

That memo would have been prescient if made 5 years before.


The infamous Bill Gates and David Letterman interview was in 1995: https://www.youtube.com/watch?v=tgODUgHeT5Y Lots of people definitely didn't understand what a big deal it was going to be then.

In October 1994 a Wired journalist registered mcdonalds.com and then tried to give it to Mcdonalds, but couldn't reach anyone who understood the importance of domain name registration: https://archive.is/tHaea

In my recollection it really wasn't til 1998-2001 or so that people (where I lived in the southern U.S. anyway) really started to take notice.


I passed my high school exam in 1996, in france, and a friend of mine who had internet gave me the topic for the history exam a day before.

His parents weren’t comp-science researcher, he just liked tech, and « had internet ». it was already popular amongst the general public.

On the other side, my mother once worked in a comp science research department, and she once brought me to the lab, where people would create me an email. That was something around 1990. She told me « they’re all crazy with that internet thing ». i never used that email, i didn’t even understand what that was, pretty much nobody in the general audience did. Being able to predict that it would be big at that time maybe would have been prescient, although it was already the consensus amongst people in the field.


> it was already popular amongst the general public.

No it wasn’t, not by the normal definition of “popular”. It was less than 1% of the population.

https://www.internetworldstats.com/emarketing.htm


Could be talking about Mintel [0] which was a simpler, earlier version of the web

[0] https://en.m.wikipedia.org/wiki/Minitel


people talked about it, people knew about it, some non-tech people already were using it. That's what i meant by "popular" (sorry, non-native, so maybe it isn't the correct word).

What i mean is that it wasn't some bleeding edge tech only a few people in the elite knew about. Everybody already knew that was the future.


in 1994-1995 in france, internet was starting to be known from the general public and available to anyone. It was already dubbed as the future in the medias


Yeah, I also thought of Eternal September [1] in 1993, when I saw the claim to prescience.

[1] https://en.wikipedia.org/wiki/Eternal_September


Absolutely. I just looked back at netscape wikipedia page, and it was already out in 1995 and distributed freely. Internet explorer was out in 1995 as well. Barely underground stuff.

And that's just for the www. People were using BBS, FTP, email and newsgroup before that.


True but the remaining of the comment is still valid: Microsoft had time in advance to prepare.

But just "internet" doesn't mean a lot. Prescient would have been predicting search, ads and social media. We now are in a similar position maybe, with some tech that looks cool, trying to build geocities with AI


You're conflating the Internet with the walled gardens that were dominant at the time; AOL, Compuserve, Prodigy etc.


You're off by about 4 years: by 1995, the number of Internet users was many times higher than users of all other networks of computers combined.

There were million of AOL customers in 1995, but most of them used AOL only to access web sites on the internet and send and receive SMTP email.

Getting back to the original topic, by 1995 there were hundreds of mainstream journalists who where predicting in their published output that the internet will quickly become an important part of society. It was the standard opinion among those that had an opinion on the topic.


I worked at AOL in 1995. We had three million users at that point, and we had recently upgraded to a live Internet e-mail gateway system. I was the first Internet mail operations person ever hired by AOL. My job on my first day was to install the fourth inbound mail gateway system.

By 1996, we were up to five million users, and Steve Case gave everyone in the company a special company jacket as a bonus. I still have mine, although it hasn't fit me in a couple of decades.

Even as late as 1997 (when I left), most AOL users were still in the "walled garden". Sure, we were the biggest Internet e-mail provider in the world, but that was still just a small fraction of the total AOL users. AIM was much more popular than e-mail. Advances were being made in efficiently distributing AIM chat messages efficiently that would not be exceeded in the outside world until the advent of BitTorrent.

However, by 1997, I think we did have more users coming into AOL over the Internet and their local ISP, as opposed to through our own modem banks. That was in part due to the "unlimited" plans that AOL rolled out, but the telephone calls themselves would have to be paid for if the user dialed non-local POPs for our modem banks, and many of our modem banks were totally overloaded.

AOL's big "hockey stick" moment was in 1995, sure. But the "hockey stick" moment for the Internet was at least a year or two later.


I disagree with the parent post entirely. Most people in 1995 didn't know what the Internet was going to become. It was a geeky thing that most didn't use. Speeds were slow, most people thought gopher was a small rodent, etc. And for every article saying the Internet was the next big thing, there were many questioning what it would be good for.

Heck, Mosaic wasn't even in development in 1990. It was released in late 1993, and it wasn't until Navigator was released in 1994 that "browsing" became a thing. Most people before then weren't going to use an FTP site off an obscure college to DL something originally intended for X windows...

People forget how fast the Web took off at that point. From 1994 to 1999, the growth was just crazy, with improvements in features every six months.


let's say it was the beginning of an exponentially growing curve. For those that were interested in computers, the writing was on the wall. Science fiction had already written about gigantic networks and virtual worlds for decades, we knew what was coming.


> In Google's case they are still really focused on search whereas LLMs arguably move the focus to answers.

I would love to see what proportion of searches are questions that would benefit from natural language answers. The huge majority of my searches would not be improved by LLMs and in fact would probably be made worse. “Thai food near me”, “IRS phone number”, “golang cmp documentation”


That kind of myopic thinking is exactly why google might be in trouble.

Think about the problem, not your current solution.

"I'm hungry"

"I need to do my taxes"

"My code dont work right"

Searching for info is a solution to those problems, not the solution. The promise of AI (might take a while to get there) is having an agent that you trust to solve those problems for you.

Or learning your preferences over time.

Or folding the question into part of a longer dialogue.


But when googling, I, the human, am often already acting as an "agent that you trust to solve those problems for you" for some higher-level question-asker.

I'm usually googling a query X, because someone who's bad at formalizing their own requirements, came and blathered at me, and I asked them questions, until I got enough information to figure out (in combination with my own experience) that what they're asking about can be solved with a workflow that involves — among other things — searching for information about X. (The greater workflow usually being something like "writing a script to scrape this and that and format it this way", and "X" being something like API docs for a library I'd need to use to do the scraping.) Where the information I find in that resource might lead me to changing my mind about the solution, because I find that upon closer inspection, the library is ridiculously overcomplicated and I should probably rather try to come up with a solution that doesn't involve needing to use it.

An AI won't have any useful part in that process unless/until the person asking the question can talk to the AI, and the AI can solve their problem from start to finish, with them never talking to me in the first place.

Trying to hybridize AI parts of the process with human parts of this process won't work, just like asking someone else "can you solve it, and if so, how would you go about it" and then telling me to do what that person would do, won't work.

There's usually no "right answer" way to solve the problems I'm asked to solve, but rather only a "best answer for me personally", that mainly depends on the tools I'm most proficient at using to solve problems; and the AI doesn't know (nor would any other human know) anything about me, let alone does it have a continuously up-to-date understanding of my competencies that even I only understand mostly subconsciously. So it can't apply my (evolving!) proficiencies in these skills as constraints when deciding how (or if!) a given problem can be solved by me.


Saying that "I'm hungry" is a problem that people will want to pass directly into the computer seems like the opposite of myopia (hyperopia?)

Usually when presented with a problem like a growling stomach, a person will at least make some kind of intention or idea before immediately punting to technology. For example, if I am hungry, I would decide whether I want to have delivery, or takeout, or if I want to dining out at the restaurant, or just cook for myself. Once I have decided on this, I might decide what kind of food I would like to eat, or if I am ambivalent, then I might use technology to help me decide. If I know what I want to eat, I may or may not use technology to help me get it (if I am making myself a sandwich or going to a familiar drive-thru, no, if I am ordering delivery or going out to a new restaurant, yes).

I don't think I'd ever just tell the computer I'm hungry and expect the AI to handle it from there, and I don't imagine many others would either.


I agree with this comment.

We have become great googlers: how to search for things to solve problems. It's not different from a mega huge yellow pages well structured etc.

Next is: how to ask for help to solve some problem.


>> The huge majority of my searches would not be improved by LLMs and in fact would probably be made worse. “Thai food near me”

> Think about the problem, not your current solution.

> "I'm hungry"

This only convinces me that you didn't do any thinking about the problem.


Thai food near me- chatgpt can give you a list of thai restaurants near you. IRS phone number has a definitive answer. Chatgpt can also spit out the golang documentation for cmp or even give you sample code.


Ok but what exactly is the benefit of using ChatGPT for that? It is more than a year and a half out of date.

It doesn’t know the Thai place’s current hours and doesn’t automatically surface a one-click link to their menu or reviews.

Why would I use ChatGPT to get the IRS phone number when I could use just as little effort typing it into a search engine and going to their actual .gov site with no risk of hallucinations?

When I’m using a new library, often I want an overview of the types and functions it surfaces. Why would I use an outdated and possibly halluciniated answer (that takes 30 seconds or more to generate) instead of clicking on the link in Google and having a nice document full of internal links appear instantly?

I don’t want to use the chatbot for the sake of using the chatbot. I want to use it when it’s better than what I already have. Sometimes it is better, and in those cases I use it a lot!


Here is my experience using ChatGPT to find local Indian restaurants. The responses took about 90 seconds in total to generate and gave me very little information. Why would anybody use ChatGPT instead of a search engine for this kind of thing?

https://ibb.co/pdzdncZ https://ibb.co/LRP6McY

Compare to Google/Bing which took about 5 seconds to type in the query and return these results.

https://ibb.co/1R48Bwc https://ibb.co/P4XGXxW


Reframed as "Why would anybody use ChatGPT instead of a search engine for this kind of thing" today? You're correct - ChatGPT is missing some bells and whistles such as real-time data, your location, how many times you've visited the websites of certain restaurants, and so on. However, so do 'search engines' in their base implementation of indexing and ordering links to other websites. I think you'll see some technical limitations (real-time data) overcome and user-focused implementations and features of AI/LLMs emerge over the next year. At that point, I think your initial question becomes more relevant in a general way.


Which is why Bing has the best head start. They have an MVP of combining up to date data from search and the latest in LLMs.

I used it last night on a search that was roughly:

"find the top 10 restaurants in mid-town Manhattan that would be good for brunch with friends that have kids. Include ratings from NY Times and Yelp" Then I further refined with "Revise search to include 5 example entries and a highlight any known specials. Include the estimates walking time from <my location>. Provide a link to a reservations website."

It basically auto-populated a spreadsheet, that I could easily review with my wife. I would need to visit several websites to scrape all the information together in one place.


Because the average American is considered to have a readability level equivalent to a 7th/8th grader (12 to 14 years old). They lack the critical thinking skills to from from search results to prioritized list. :-/


For decades, the standard was 6th grade. I believe that was part of the APA guidelines.

But at least Americans have been getting stupider, on average. IMO, now I think you need to write at the 4th or 5th grade level.


Yeah for questions like this ChatGPT sounds like someone who's being paid by the word.

Always says 200 words where 30 would suffice.


I asked it to generate a travel plan including restaurants for a trip I’m considering. What it generated included some places that were now closed, but it was an excellent starting point, and it beat my usual approach of a Google search of the area and tapping random places in the area.


Solved problem. The solution is NOT ChatGPT, it’s something like Bing or Phind that is a hybrid of traditional search and LLM.


And isn’t that the problem?

Why do I need to translate my question into an optimal set of keywords that will give me what I want while minimizing unwanted results? Google search was a great stepping stone and connects you with the web, but it’s broken in many ways when it comes to what value we are really trying to extract.

A machine that can hone in on what I’m getting at in an intuitive sense while having all of human data available to generate a response is so much more powerful.


Frankly, it’s more about the number of ads and low relevance.

Old google was simply faster to use.

GPT for search is google without ads


The UX for ChatGPT is awful compared to search engines, at least for quick, easily found facts like what I mentioned.


Depends on the fact. If you ask Google for the difference between Viet red tea and Viet green tea, ChatGPT can give you the correct facts much more quickly than Google.


Depending on the facts, you can just directly search Wikipedia (or other more specialized websites) - I have had it on the w keyword for nearly two decades now...


1995 lot of people We’re using the internet. It was made of dial up modems , terminal servers , BBS , AOL , IRC chat rooms , Alta vista search engine , emails , browser (probably Netscape or earlier can’t remember) … but yeah in 1995 lot of people were using it though much less than now , but not negligible.


For Google, LLMs for search responses, ad ranking, and page ranking are all quite useful. They can directly eat up the first page or so of filler responses they normally have now for queries. It's a great opportunity to clean out all the spam pages on the result pages at once, leaving high quality results and capturing that advertising/referral money back to Google.

Top 10 best reviewed android phones? Just put up a list generated by the LLM. Have a conversation with a product recommender that then collects fees from whoever it recommends.

Not that I think Google's got the executive capacity to do any of this anymore.


Microsoft wanted to control it all with Blackbird and ActivePlatform.

Their greed ended up with them losing out (thankfully).


That weird because I remember clearly Microsoft was late in the internet game, and managed to catch up because they could use their monopoly on personal computers for internet explorer


> even coming out of the gate pretty strong by dominating the browser market

They were out of the gate about as weak as could be, Windows didn't have a native tcp/ip stack for the longest time (remember Trumpet Winsock?) and they only dominated the browser market through grossly uncompetitive behavior after they had lost the initial 5 rounds of the battle.


They definitely used uncompetitive behavior, but it’s also true that IE was also a better browser than Netscape by the time version 4 rolled out.


That's a different definition than 'out of the gate' covers to me. Besides that, from those days I mostly remember IE as the utility to download another browser after a fresh windows install, and the thing that it was nearly impossible to get rid of. Not through any merit of its own, in spite of many non-standards compliant websites that favored IE.


I think the problem with AI being everywhere and ubiquitous is that AI is the first technology in a very long time that requires non-trivial compute power. That compute power costs money. This is why you only get a limited number of messages every few hours from GPT4. It simply costs too much to be a ubiquitous technology.

For example, the biggest LLama model only runs on an A100 that costs about $15,000 on ebay. The new H100 that is 3x faster goes for about $40,000 and both of these cards can only support a limited number of users, not the tens of thousands of users who can run off a high-end webserver.

I'd imagine Google would lose a lot of money if they put GPT4 level AI into every search, and they are obsessed with cost per search. Multiply that by the billions and it's the kind of thing that will not be cheap enough to be ad supported.


The biggest llama model has near 100% fidelity (its like 99.3%) at 4 bit quantization, which allows it to fit on any 40GB or 48GB GPU, which you can get for $3500.

Or at about a 10x speed reduction you can run it on 128 GB of RAM for only around $250.

The story is not anywhere near as bleak as you paint.


A $3500 GPU requirement is far from democratization of AI.


$3500 is less than one week of a developer salary for most companies. It woudn't pay a month's rent for most commercial office space.

It's a lower cost of entry than almost any other industry I can think of. A cargo van with a logo on it (for a delivery business or painting business, for example) would easily cost 10-20x as much.


1. I don't know what kind of world you live in to think that USD 3500 is "less than one week of a developer salary for most companies." I think you really just mean FAANG (or whatever the current acronym is) or potentially SV / offices in cities with very high COL.

2. The problem is scaling. To support billions of search queries you would have to invest in a lot more than a single GPU. You also wouldn't only need a single van, but once you take scaling into account even at $3500 the GPUs will be much more expensive.

That said, costs will come down eventually. The question in my mind is whether OpenAI (who already has the hardware resources and backed by Microsoft funding to boot) will be able to dominate the market to the extent that Google can't make a comeback by the time they're able to scale.


> 1. I don't know what kind of world you live in to think that USD 3500 is "less than one week of a developer salary for most companies." I think you really just mean FAANG (or whatever the current acronym is) or potentially SV / offices in cities with very high COL.

I live in the real world, at a small company with <100 employees, a thousand miles away from SV.

$3200 * 52 == $180k a year, and gives $120k salary and $60k for taxes, social security, insurance, and other benefits, which isn't nearly FAANG level.

Even if you cut it in half and say it's 2 weeks of dev salary, or 3 weeks after taxes, it's not unreasonable as a business expense. It's less than a single license for some CAD software.

> 2. The problem is scaling. To support billions of search queries you would have to invest in a lot more than a single GPU. You also wouldn't only need a single van, but once you take scaling into account even at $3500 the GPUs will be much more expensive.

Sure, but you don't start out with a fleet of vans, and you wouldn't start out with a "fleet" of GPUs. A smart business would start small and use their income to grow.


You are correct sir!


I'm only going to comment on the salary bit.

GP lives in an company world. The cost of a developer to a company is the developer's salary as stated in the contract, plus some taxes, health insurance, pension, whatever, plus the office rent for the developer's desk/office, plus the hardware used, plus a fraction of the cost of HR staff and offices, cleaning staff, lunch staff... it adds up. $3500 isn't a lot for a week.


Most of these items are paid for by the company, and most people would not consider the separate salary of the janitorial or HR staff to be part of their own salary.


I agree, most people wouldn't. This leads to a lot of misunderstandings, when some people think in terms of what they earn and others in terms of what the same people cost their employers.

So you get situations where someone names a number and someone else reacts by thinking it's horribly, unrealistically high: The former person thinks in employer terms, the latter in employee terms.


1 - Yes, I agree on this, but even so, most developers already are investing in SOTA GPU's for other reason (so not as much of a barrier as purported)

2 - Scaling is not a problem in other industries? If you want to scale your food truck, you will need more food trucks, this doesn't seem to really do anything for your point.

GGML and GPTQ have already revolutionised the situation, and now there are tiny models with insane quality as well, that can run on a conventional CPU.

I don't think you have any idea what is happening around you, and this is not me being nasty, just go and take a look at how exponential this development is and you will realise that you need to get in on it before its too late.


You seem to be in a very particular bubble if you think most developers can trivially afford high end GPUs and are already investing in SOTA GPUs. I know a lot of devs from a wide spectrum of industries and regions and I can think of only one person who might be in your suggested demographic


Perhaps I should clarify, that when I say SOTA GPU, I mean, rtx 3060 (midrange), which has 12gb vram, and is a good starting point to climb into the LLM market. I have been playing with LLM's for months now, and for large periods of time had no access to GPU due to daily scheduled rolling blackouts in our country.

Even so, I am able to produce insane results locally with open source efforts on my RTX3060, and now I am starting to feel confident enough that I could take this to the next level by either using cloud (computerender.com for images) or something like vast.ai to run my inference (or even training if I spend more time learning). And if that goes well I will feel confident going to the next step, which is getting an actual SOTA GPU. But that will only happen once I have gained sufficient confidence that the investment will be worthwhile. Regardless, apologies for suggesting the RTX3060 is SOTA, but to me in a 3rd World Country, being able to run vicuna13b entirely on my 3060 with reasonable inference rates is revolutionary.


* in the US


For reference, a basic office computer in the 1980s cost upwards of $8000. If you factor in inflation, a $3500 GPU for cutting tech is a steal.


And hardly anyone had them in the 1980s.


I think a more relevant comparison may be a peripheral: the $7,000 LaserWriter which kicked off the desktop publishing revolution in 1985.


Yes but moore's law ain't what it used to be


Moore is not helping here. Software and algorithms will fix this up, which is already happening at a frightening rate. Not too long ago, like months, we were still debating if it was ever even possible to run LLMs locally.


There is going to be a computational complexity floor on where this can go, just from a Kolmogorov complexity argument. Very hard to tell how far away the floor is exactly but things are going so fast now I suspect we'll see diminishing returns in a few months as we asymptote towards some sort of efficiency boundary and the easy wins all get hoovered up.


Yes indeed and it’ll be interesting to see where that line is.

I still think there is a lot to be gained from just properly and efficiently composing the parts we already have (like how the community handled stable diffusion) and exposing them in an accessible manner. I think that’ll take years even if the low hanging algorithm fruits start thinning out.


We've reached an inflection point, the new version would be: Nvidia can sell twice as many transitors for twice the price every 18 months.


This is very true, however there is a long way to go in terms of chip design specific to DL architectures. I’m sure we’ll see lots of players release chips that are an order of magnitude more efficient for certain model types, but still fabricated on the same process node.


Moore's law isn't dead. Only Dennard's law. See slide 13 here[0] (2021). Moore's law stated that the number of transistors per area will double every n months. That's still happening. Besides, neither Moore's law nor Dennard scaling are even the most critical scaling law to be concerned about...

...that's probably Koomey's law[1][3], which looks well on track to hold for the rest of our careers. But eventually as computing approaches the Landauer limit[2] it must asymptotically level off as well. Probably starting around year 2050. Then we'll need to actually start "doing more with less" and minimizing the number of computations done for specific tasks. That will begin a very very productive time for custom silicon that is very task-specialized and low-level algorithmic optimization.

[0] Shows that Moore's law (green line) is expected to start leveling off soon, but it has not yet slowed down. It also shows Koomey's law (orange line) holding indefinitely. Fun fact, if Koomey's law holds, we'll have exaflop power in <20W in about 20 years. That's equivalent to a whole OpenAI/DeepMind-worth of power in every smartphone.

The neural engine in the A16 bionic on the latest iPhones can perform 17 TOPS. The A100 is about 1250 TOPS. Both these performance metrics are very subject to how you measure them, and I'm absolutely not sure I'm comparing apples to bananas properly. However, we'd expect the iPhone has reached its maximum thermal load. So without increasing power use, it should match the A100 in about 6 to 7 doublings, which would be about 11 years. In 20 years the iPhone would be expected to reach the performance of approximately 1000 A100's.

At which point anyone will be able to train a GPT-4 in their pocket in a matter of days.

There's some argument to be made that Koomey himself declared in 2016 that his law was dead[4], but that was during a particularly "slump-y" era of semiconductor manufacturing. IMHO, the 2016 analysis misses the A11 Bionic through A16 Bionic and M1 and M2 processors -- which instantly blew way past their competitors, breaking the temporary slump around 2016 and reverting us back to the mean slope. Mainly note that now they're analyzing only "supercomputers" and honestly that arena has changed, where quite a bit of the HPC work has moved to the cloud [e.g. Graviton] (not all of it, but a lot), and I don't think they're analyzing TPU pods, which also probably have far better TOPS/watt than traditional supercomputers like the ones on top500.org.

0: (Slide 13) https://www.sec.gov/Archives/edgar/data/937966/0001193125212...

1: "The constant rate of doubling of the number of computations per joule of energy dissipated" https://en.wikipedia.org/wiki/Koomey%27s_law

2: "The thermodynamic limit for the minimum amount of energy theoretically necessary to perform an irreversible single-bit operation." https://en.wikipedia.org/wiki/Landauer%27s_principle

3: https://www.koomey.com/post/14466436072

4: https://www.koomey.com/post/153838038643


You're right, it's been debunked and misquoted for decades.


virtually no office had them in 1980

by mid 1980s personal computers costed less than $500


This is false. Created account just to rebut.

While it may have been true that it was technically possible to assemble a PC for $500... good luck. In the real world people were spending $1500-$2500+ for PCs, and that price point held remarkably constant. By the time you were done buying a monitor, external drives, printer etc $3000+ was likely.

https://en.m.wikipedia.org/wiki/IBM_Personal_Computer#:~:tex....

Or see Apple Mac 512 introduced at approx $2800? One reason this was interesting (if you could afford it) was the physical integration and elimination of "PC" cable spaghetti.

https://en.m.wikipedia.org/wiki/Macintosh_512K

But again having worked my way thru college with an 8 MB external hard drive... which was a huge improvement over having to swap and preload floppies in twin external disk drives just to play stupid (awesome) early video games, all of this stuff cost a lot more than youre saying. And continued to well into the 90s.

Of course there are examples of computers that cost less. I got a TI-99/4a for Christmas which cost my uncle about $500-600. But then you needed a TV to hook it up to, and a bunch of tapes too. And unless you were a nerd and wanted to program, it didn't really DO anything. I spent months manually recreating arcade video games for myself on that. Good times. Conversely, if you bought an IBM or Apple computer, odds were you were also going to spend another $1000 or more buying shrinkwrap software to run on it. Rather than writing your own.

Source: I remember.


> This is false.

The CPC 464 is the first personal home computer built by Amstrad in 1984. It was one of the bestselling and best produced microcomputers, with more than 2 million units sold in Europe.

Price

£199 (with green monitor), £299 (with colour monitor)

> But again having worked my way thru college with an 8 MB external hard drive

that was a minicomputer at the time, not a PC (personal computer)

> Source: I remember.

Source: I owned, used and programmed PCs in the 80s


The Commodore 64 launched in 1982 for 595 bucks. Wikipedia says "During 1983, however, a trickle of software turned into a flood and sales began rapidly climbing, especially with price cuts from $595 to just $300 (equivalent to $1600 to $800 in 2021)."

I believe this was the PC of the ordinary person (In the "personal computer" sense of the word.)


Yeah, I bet people won't get cars, either, they're a lot more expensive than that.


and that 3500 worth of kit will be a couple of hundred bucks on ebay in 5 years.


I haven't seen any repos or guides to using llama on that level of RAM, which is something I do have. any pointers?


Run text-generation-webui with llama.cpp: https://github.com/oobabooga/text-generation-webui



Something I haven't figured out: should I think about these memory requirements as comparable to the baseline memory an app uses, or like per-request overhead? If I needed to process 10 prompts at once, do I need 10x those memory figures?


It’s like a database, I imagine - so the answer is probably “unlikely,” that you need memory per-request but instead that you run out of cores to handle requests?

You need to load the data so the graphics cards - where the compute is - can use it to answer queries. But you don’t need a separate copy of the data for each GPU core, and though slower, cards can share RAM. And yet even with parallel cores, your server can only answer or process so many queries at a time before it runs out of compute resources. Each query isn’t instant either given how the GPT4 answers stream in real-time yet still take a minute or so. Plus the way the cores work, it likely takes more than one core to answer a given question, likely hundreds of cores computing probabilities in parallel or something.

I don’t actually know any of the details myself, but I did do some CUDA programming back in the day. The expensive part is often because the GPU doesn’t share memory with the CPU, and to get any value at all from the GPU to process data at speed you have to transfer all the data to GPU RAM before doing anything with the GPU cores…

Things probably change quite a bit with a system on a chip design, where memory and CPU/GPU cores are closer, of course. The slow part for basic replacement of CPU with GPU always seemed to be transferring data to the GPU, hence why some have suggested the GPU be embedded directly on the motherboard, replacing it, and just put the CPU and USB on the graphics card directly.

Come to think of it, an easier answer is how much work can you do in parallel on your laptop before you need another computer to scale the workload? It’s probably like that. It’s likely that requests take different amounts of computation - some words might be easier to compute than others, maybe data is local and faster to access or the probability is 100% or something. I bet it’s been easier to use the cloud to toss more machines at the problem than to work on how it might scale more efficiently too.


Does that mean an iGPU would be better than a dGPU? A beefier version than those of today though.


Sort of. The problem with most integrated GPUs is that they don’t have as many dedicated processing cores and the RAM, shared with the system, is often slower than on dedicated graphics cards. Also… with the exception of system on a chip designs, traditional integrated graphics reserved a chunk of memory for graphics use and still had to copy to/from it. I believe with newer system-on-a-chip designs we’ve seen graphics APIs e.g. on macOS that can work with data in a zero-copy fashion. But the trade off between fewer, larger system integrated graphics cores vs the many hundreds or thousands or tens of thousands of graphics cores, well, lots of cores tends to scale better than fewer. So there’s a limit to how far two dozen beefy cores can take you vs tens of thousands of dedicated tiny gfx cores.

The theoretical best approach would be to integrate lots of GPU cores on the motherboard alongside very fast memory/storage combos such as Octane, but reality is very different because we also want portable, replaceable parts and need to worry about silly things like cooling trade offs between placing things closer for data efficiency vs keeping things spaced apart enough so the metal doesn’t melt from the power demands in such a small space. And whenever someone says “this is the best graphics card,” someone inevitably comes up with a newer arrangement of transistors that is even faster.


You need roughly (model size + (n * (prompt + generated text)) where n. Is the number of parallel users/ request.


It should be noted that that last part has a pretty large factor to it that also scales with model size, because to run transformers efficiently you cache some of the intermediate activations from the attention block.

The factor is basically 2 * number of layers * number of embeddings values (e.g. fp16) that are stored per token.


That's just to run a model already trained by a multi-billion dollar company. And we are "lucky" a corporation gave it to the public. Training such a model requires tons of compute power and electricity.


Time for a dedicated "AI box" at home with hotswapping compute boards? Maybe put it inside a humanoid or animal-like robot with TTS capabilities?

Sign me up for that kickstarter!

EDIT: based on some quick googling (should I have asked ChatGPT instead?), Nvidia sells the Jetson Xavier Nx dev kit for ~$610 https://www.electromaker.io/shop/product/nvidia-jetson-xavie...

Just need the robot toy dog enclosure

(See https://www.electromaker.io/blog/article/best-sbc-for-ai-sin... for a list of alternatives if that one is too expensive)


It's more likely that you want a lot of compute for a very little amount of time each day - which makes centralised/cloud processing the most obvious answer.

If I want a response within 100ms, and have 1000 AI-queries per day, that would only be about 2 minutes of aggregated processing time for your AI box per day. It's less than 1% utilised. If the same box is multiuser and on the internet, it can probably serve 50-100 peoples queries concurrently.

The converse is that if you put something onto the cloud, for the same cost you might be able to effectively get 50x the hardware per user for the same cost (i.e. rather than have 1 AI box locally with 1 GPU for each of the 50 users, you could have 1 AI box with 50 GPU's which is usable by all 50 users).


"a lot of compute for a very little amount of time each day" sounds like something I can play games on when I'm not working.


Why not just buy a computer that is correctly-sized to play games, rather than buy an AI-sized computer that you mostly use for games?


Because I want both.


But not use both at once?


Well... No. If I'm sat playing a game, I'm unlikely to be generating AI queries.


each billion parameters using 16 bit floats requires around 2 GB of GPU or TPU RAM. ChatGPT is expected to have around 1000 billion. Good open source LLMs have around 7-20 billion currently. Consumer GPUs currently max out at 24 GB. You can now quantize the model to e.g. 4 bits instead of 32 per parameter and do other compressions, but still there is quite a limit what you can do with 24 GB of RAM. The Apple unified memory approach may be a path forward to increase that... so one box gives you access to the small models, for a GPT4 like model you'd need (for inference and if you had the model and tools) probably 100 of those 4090s or 25 of H100 with 96 GBs I guess to fit in 2 TB of model data.


Currently we do not explore sparsity. The next iteration of models will be much more compact by focusing on reducing effective tensor size.


It is quite likely GPT-4 uses one or even two sparsity approaches on top of each other (namely, coarse grained switch transformer-like and fine grained intra-tensor block sparsity), if you look at the openly available contributors' research CVs.

Google, in collaboration with OpenAI, has published an impressive tour de force where they have throughly developed and validated at scale a sparse transformer architecture, applied to general language modeling task: https://arxiv.org/abs/2111.12763

This happened in November of 2021, and there is a public implementation of this architecture on the Google's public github.

Impressively, due to some reasons, other up-and-coming players are still not releasing models trained with this approach, even though it promises multiplicative payoff in inference economy. One boring explanation is conservatism for NN training at scale, where training runs cost O(yearly salary).

Let's hope the open source side of things catches up.


not in collaboration with openai, one of the authors joined openai before the paper was written and arxiv'd


At least from my experience with sparse matrix libraries like Eigen, you need to get the sparsity down to about 5% before switching from a dense to a sparse algorithm gets you execution time benefits.

Of course from a memory bandwidth and model size perspective maybe there are benefits long before that.


It seems like a ton of engineering effort has been put into these neural network frameworks. How didn’t they explore sparsity yet? With numerical linear algebra that’s, like, the most obvious thing to do (which is to say, you probably know beforehand if your problem can be mapped to sparse matrices).

(Edit: just to be clear here, I’m not saying I expect the whole field is full of dummies who missed something obvious or something like that, I don’t know much at all about machine learning so I’m sure I’m missing something).


It's not at all strange to get something to work before you start optimizing. I mean if you can only run small models then how would you even know what you're losing by optimizing for space? Heck you wouldn't even know how the model behaves, so you won't know where to start shaving away.

I'm not saying it's impossible but if resources allow it makes a lot of sense to start with the biggest model you can still train. Especially since for whatever reason things seem to get a lot easier if you simply throw more computing power at it (kind of like how no matter how advanced your caching algorithm it's not going to be more than 2 times faster than the simplest LRU algorithm with double the amount of cache).


From what I’ve read recently, most sparse methods just haven’t given that much improvement yet, and we’re only recently pushing up against the limits of the “just buy more RAM” approach.

It sounds like there is a lot of work happening on sparse networks now, so it’ll be interesting to see how this changes in the near future.


Gpus were built for dense math and they ran with it. To the point current best architectures are in part just the ones that run best using the subset of linear algebra gpus are really good at.

There has been a lot of work on sparsity and discovering sparse subnetworks in trained dense networks. And intel even proposed some alternative cpu friendly architectures and torch/tf and gpus are starting to do okay with sparse matrixes so thing are changing.


Sounds a bit like premature optimization ( to have done it _by_ now ) , I bet it's in the works now though.


"Do things that don't scale" – PG


llama-65b on a 4-bit quantize sizes down to about 39GB - you can run that on a 48GB A6000 (~$4.5K) or on 2 x 24GB 3090s (~$1500 used). llama-30b (33b really but who's counting) quantizes down to 19GB (17GB w/ some optimization), so that'll fit comfortably on a 24GB GPU.

A 4-bit quantize of a 1000B model should be <600GB, so would fit on a regular 8x80 DGX system.


I wonder if an outdated chip architecture with just a lot of (64GB?) GDDR4 or something would work? Recycle all the previous generation cards to super high VRAM units


I've seen reports of people being able to run LLMs at decent speeds on old Nvidia P40s. These are 24GB Pascal GPUs and can be bought for as low as $100 (although more commonly $200) on eBay.


Link to report please



You can do it on CPU now.

Benchmarks: https://github.com/ggerganov/llama.cpp/issues/34


It should work. After the initial prompt processing, token generation is typically limited by memory bandwidth more than by compute.


Intel Larrabee has come into the chat but showed up about decade and a half too early.


For completeness, Apple’s consumer GPUs currently max out at 64 GB - OS overhead, so about 56 GB. But you are limited to 1 GPU per system.


Benchmarks for what you can do on CPU alone.

https://github.com/ggerganov/llama.cpp/issues/34

An M1 Max does 100ms per token. A 64 core threadripper about 33ms per token.


I was in Japan recently and they sell these pocket size translator devices with a microphone, camera and screen. You can speak to it or take pictures and it will translate on the fly. Maybe $100 usd range for a nice one.

It's only a matter of time before someone makes a similar device with a decent LLM on it, and premium ones will have more memory/cpu power.


I mean...isn't that just a smartphone?

I know exactly what you're talking about because my father-in-law had the same thing. I'm just very skeptical that specialist hardware will overtake general commoditized computing devices for mass-market usage. The economics alone make it unlikely.


Even though Word Lens was first released over 12 ago, I keep surprising people[0] by showing them the same technology built into Google Translate.

[0] even other tech developers who, like me, migrated somewhere where a language barrier came up


Yes, that is because Google bought the technology from Word Lens in 2015.

Wiki says: https://en.wikipedia.org/wiki/Ot%C3%A1vio_Good

    To develop Word Lens, Otávio Good founded Quest Visual Inc., which was acquired by Google, Inc. in 2014, leading to the incorporation of the Word Lens feature into the Google Translate app in 2015.


I think you misunderstand.

The people I show it to are surprised that it's possible in 2023, even though it was demoed at the end of 2010.

Not only have they never heard of Word Lens, they are also obvious of the corresponding feature of Google Translate.


Yes, usually they are locked-down Android devices.

One of them has an unlimited "free forever" internet subscription so it can fetch network translations online if the local dictionary doesn't have the word.


I think we as humans have a tendency to extrapolate from our present position to a position we can imagine that we’d like, even if there isn’t a foreseeable path from here to there. I believe this may end up being one of those cases.


Why? What GP describes seems both feasible and inevitable to me.


It's a win for Google that LLMs are getting cheaper to run. OpenAI's service is too expensive to be ad-funded. Google needs a technology that's cheaper to provide to maintain their ad-supported business model.


Google could make a bet like they did with YouTube.

At the time, operating YouTube was eye wateringly expensive and lost billions. But google could see where things were going: a triple trend of falling storage costs, falling bandwidth and transmission costs (I’m trying to dig up a link I read years ago about this but google search has gotten so shit that I can’t find it).

It was similar for Asic miners for Bitcoin. Given enough demand, specialised, lower cost hardware specially for LLMs will emerge.


On the flip side, I found only one person (I'm sure there are more) that are attacking the software efficiency side of things. You would be quite surprised how inefficient the current LLM software stack is, as I learned on a CPP podcast [0]. Ashot Vardanian has a great github repo [1] that demonstrates many ways compute can come way down in complexity and thus cost.

[0] https://cppcast.com/ai_infrastructure/ [1] https://github.com/orgs/unum-cloud/repositories?type=all


You realize that most reports are that YouTube is barely profitable.


You can run it (quantified at least) on a $4000 Mac thanks to Apple's unified memory. Surely other manufacturers are looking for how to expand VRAM, hopefully Intel or AMD.


Not to mention Apple chips have a bunch of very nice accelerators and also (!!!) macOS contains system frameworks that actually use them.


The article talks about this explicitly though. Reasonably good models are running on raspberry Pis now.


Is a reasonably good model what people get value out of though?

Maybe this is why Sam Altman talked about "the end of the large LLMs is here"? He understands anything bigger than ChatGPT-4 isn't viable to run at scale and be profitable?


Does this mean models larger then ChatGpt would still be better for the same data size as long as someone is ready to pay?

At what limit does it stop getting better?


I thought he was fairly explicit that he thought larger models would provide incremental gains for exponentially greater cost, so yeah, I guess not profitable is a way to put it...


Or you can apply GPU optimizations for such ML workloads. By optimizing the way these models run on GPUs, significantly improve efficiency and slash costs by a factor of 10 or even more. These techniques include kernel fusion, memory access optimization, and efficient use of GPU resources, which can lead to substantial improvements in both training and inference speed. This allows AI models to run on more affordable hardware and still deliver exceptional performance. For example, LLMs running on A100 can also run on 3090s with no change in accuracy and comparable inference latency.


You're right and this is why they didn't heavily use BERT(in the full sense), arguably the game-changing NLP model of the 10s. They couldn't justify bringing the cost per search up.


nah, Lora quantized LLM’s are going to be at the OS level in 2 years and consumer architecture refreshes are just going to extend more RAM to already existing dedicated chips like Neural Engine

client side tokens per second will be through the roof and the models will be smaller


LoRa is not a quantization method, it's a fine tuning method.


It's two adjectives on the noun.


you read that whole paragraph and assumed this prediction didn't involve consumers using fine tuned models despite lora being explicitly mentioned?

-EQ moment

edit: its about the combination of those methods making models accessible on consumer hardware


Within a decade mid-level consumer cards will be just as powerful as those $40k cards.


Considering how long it took mid level consumer cards to beat my $600 1080, you're way more optimistic than I am.


So... Four years? The 1080 launched in 2016, and the 3070 launched in 2020, for $100 cheaper — the launch price of the 1080 was $699, and the 3070 was $599. The 3070 easily trounced the 1080 in benchmarks.

The 3060 effectively matched a 1080 at $329 in 2021 (and has 50% more VRAM at 12GB instead of the 1080's 8GB), so call it five years if the 3070 isn't mid-range enough.

The 3060 Ti launched in 2022 at $399 and handily beat the 1080 on benchmarks, so call it six years if you want the midrange card to beat (not just match) the previous top-of-the-line card, and if a *70 card doesn't count as midrange enough. Less than a decade still seems like a reasonable claim for a midrange card to beat a top-of-the-line card.


The 3060 was only readily available quite recently, so it's about 6 years from ready availability of the 1080 at $600 to 3060.

Taking 6 years to double the perf/$ implies that it would take ~42 years for a $40000 H100 to reach mid-range levels. Assuming scaling, particularly VRAM, holds.

And plus it would be getting really close to the Landauer limit by that point.


are you conveniently forgetting how none of those cards were actually available for consumers to buy


The 1080 was impossible to buy at launch too and was sold out for months. And the 3060 is easy to buy!


The 3060 is easy to buy NOW, since cryptomining has crashed, and the 40x0 GPUs have become available (though mostly still above MSRP).


The main moat here is VRAM, not raw compute power.


Given how nvidia has almost no competition, it just seems unlikely that nvidia decides to stop milking the enterprise and they will continue to lock 40GB+ cards behind ludicrous price points


Nvidia may not need to price compete immediately, but if LLMs drive demand for more capable consumer hardware they can either:

(1) make a whole lot of money fulfilling that demand, or

(2) leave an unmet demand that makes it more attractive for someone else to spend the money to field a solution.

(1) seems an attractive choice even if it wasn’t for the potential threat of (2).


It wasn't ML but gaming that drove the demand for GPUs, and ML sort of rode in the slipstream (same for crypto). Later demand for crypto hashing drove GPU sales as much as gaming but that is now over. So unless either (1) ML by itself can present a demand as large as either crypto or gaming or (2) Crypto or gaming can provide a similar demand as they've done in the past the economies of scale that drove this will likely not be reached again. If they do however the cost of compute will come down drastically and that in turn may well drive another round of advances in models for the larger players.


>but if LLMs drive demand for more capable consumer hardware they can either:

That's a big if. I imagine people who will buy GPUs to run LLMs locally are either researchers or programmers. I am imagine most consumer focused solutions will be cloud first. That is a much smaller market than gamers and nvidia wouldn't want to cannibalize their datacenter offerings by releasing something cheaper. It's far better for them to sell high tier GPUs to Amazon and let them "rent" it out to researchers and programmers.


Would google even care about integrating LLMs into search? They don’t even prune all the spam entries, presumably because they increase advertising revenue and analytics profit.


We used to get only a limited number of Internet hours. By the time December 2003 rolled around my family had always on internet.

Besides what Google does here is besides the point, because Bing has already unleashed ai search. Google will either follow along or stop being relevant.


This cost argument is being overblown. While it's a limitation for today's product, enginners are very good at optimization. Therefore the costs will drop in the medium to long term from efforts on both the software and hardware side.


Is the assumption that GPU power and advancements in AI will not get to a reasonable price point in the near future? Because it seems to me that advances in computation have not slowed down at all since it started.


The thing you have in your pocket would have meant an enormous investment for equivalent compute power just decades ago and filled a whole basement with server racks.


The legendary Cray-2 was the fastest supercomputer in the world in 1985, with peak 1.9 GFLOPS. Less than four decades ago.

By comparison, the Cray is outperformed by my smartphone.

Actually, it is outperformed by my previous smartphone, which I purchased in 2016 and replaced in 2018.

Actually, it is outperformed by a single core on my previous smartphone, of which it has eight cores.


Or you can quantize the model and run it on your laptop.


caching + simpler models for classification / triage should reduce the load on the big model.


Yeah, today.


> It's going to be seamlessly integrated into every-day software.

I...kinda don't want this? UIs have already changed in so many different fits, starts, waves, and cycles. I used to have skills. But I have no skills now. Nothing works like it used to. Yeah they were tricky to use but I cannot imagine that a murky AI interface is going to be any easier to use, and certainly impossible to master.

Even if it is easier to use, I am not sure I want that either. I don't know where the buttons are. I don't know what I can do and what I can't. And it won't stay the same, dodging my feckless attempts to commit to memory how it works and get better at it...?


I once volunteered with a older woman. She’d been a computer programmer in the 70s, using punch cards and, later, Pascal.

Then she had kids and stopped working for a while and the technology moved on without her. Now she’s like any other old person, doesn’t know how to use a computer and gets flustered when eg: trying to switch from the browser back to Word. Her kids and grandkids clown on her for being hopeless with computers.

I asked her what it was she found difficult about modern computers compared to what she worked with 50 years ago. She said it’s the multitasking. The computers she had worked with just did one thing at once.


Interesting story. Thanks for sharing this. I have a somewhat similar story with my failure to transition from paltformers / sidescrollers to 3D games. I just couldn't do it.


Me too! My brain just won’t let me immerse.

It was when sonic went 3D that it all began for me.

Wait…he’s running away from me?


Sonic 3d was a bad game. Did you play Super Mario 64?


Indeed. I like the fact that my stove has only the knobs and buttons on it (other than a 7-segment LED display). I am master of my stove because I am pretty sure I have explored the complete state space of it by now.


I worked with somebody who developed MULTICS but struggled constantly to do even the most basic tasks on a Mac even after using Macs for a decade. It was painful to watch them slowly move a mouse across the screen to the apple, take about ten seconds to click it, and then get confused about how to see system info.


It was a sad day when I realized I was systematically overinvesting in skills on churning technology and that my investments would never amortize. Suddenly my parents' stubborn unwillingness to bother learning anything technological made complete sense and I had to adjust my own patience threshold sharply downwards.


There are some software tools where the investment pays back, and has been over decades. Microsoft Office (in part because it's not reinventing itself, but rather accrues new features; in part because everyone else copies its UI patterns). Photoshop. Emacs.

With modern software, I find it that there isn't much to learn at all - in the past decade, seems to only be removing features and interaction modes, never adding anything new.

Still, I don't regret having learned so much all those years ago. It gives me an idea what the software could do. What it was supposed to do. This means I often think of multi-step solutions for problems most people around me can't solve unless there's a dedicated SaaS for it. As frustrating as it often is to not be able to do something you could 10 years ago, sometimes I discover that some of the more advanced features still remain in modern toy-like software.


Office UI was reinvented at least one time. I remember when that god awful ribbon showed up.

"In 2003, I was given the chance to lead the redesign of the most well-known suite of productivity software in the world: Microsoft Office.

"Every day, over a billion people depended on apps like Word, Excel, PowerPoint, and Outlook, so it was a daunting task. This was the first redesign in the history of Office, and the work that we did ended up shaping the standard productivity experience for the next two decades."

...

https://jensenharris.com/home/office

https://jensenharris.com/home/ribbon


That's not what I mean. The ribbon, whatever you think of it, only moved some functionality around. It didn't actually change the way old functionality worked. All the things you knew how to do, you could still do - you only had to learn their new placement.


If we consider the Word 2.0 (for Windows) era as the beginning of a graphical Microsoft Office suite, then graphical Office has had the ribbon for as long as it didn’t: 16 years.

I’m still waiting for people to stop complaining about it.


That move was awful


For you.


The irony is that LLMs are actually the real solution that Ribbon took an awkward half step towards -- how to quickly get to what a user actually wants to do.

Originally: Taxonomically organized nested menus, culminating at a specific function or option screen

Now: Usage-optimized Ribbon (aka Huffman coding for the set off all options), culminating at a specific function or option screen

Future: LLM powered semantic search across all options and actions, generating the exact change you want to implement

Why have an "email signature" options page at all, when an LLM can stitch together the calls required to change it, invoked directly from English text?


Good local search gets you 80% of the way there. 20 years ago, this was an inspiring UX trend (Quicksilver / Subject Verb Object), but it fizzled. Apple kept the torch lit with menu search and it has been brilliant, but limited to their platform, although I am pleased to see that MS Office got menu search in Oct 2022. Hopefully they don't lose interest like they did for desktop search.

LLMs could certainly help loosen the input requirements, not to mention aim some sorely needed hype back in this direction. I am afraid that they will murder the latency and compute requirements, but hey, progress is always two steps forward one step back.


I was never terribly impressed with local search on os x but maybe I didn't use it enough.

For a while, ubuntu had a local search where you could hit a button (super?) start typing, and it would drill through menus at lightning speed


> Why have an "email signature" options page at all, when an LLM can stitch together the calls required to change it, invoked directly from English text?

10-20 years from now? Maybe. It depends on whether or not the industry will cut corners in this part of the experience. I find it hard to predict which features get polished, and which are forever left barely-functioning. Might depend on which are on "critical path" to selling subscriptions.

0-10 years from now? We'll still need the options page.

"Email signature" options page provides visibility, consistency, and predictability. That is, you can see what your e-mail signatures are, you are certain those are the ones that will be used for your message, under conditions defined in the UI pane, and if you change something, you know exactly how it will affect your future e-mails.

LLMs are quite good at handling input, though not yet reliable enough. However, as GUI replacement, they are ill-suited for providing output - they would be describing what the result is, whereas the GUI of today display the result directly. As the writers' adage goes, "show, don't tell".

(That said, the adage is way overused in actual storytelling.)


In this case, the LLM isn't generating the output. It's only generating the sequence of actions to implement the user input.

Think less "make a signature for me" and more "here's the signature I want to use, make it my default."

Then the model would map to either Outlook / Options / Signature / fields or directly to whateverSetSignature().

From that more modest routing requirement, it seems a slam dunk for even current models (retrained to generate options paths / function calls rather than English text).


You missed Office 2000: magically disappearing menu items. It was pretty confusing.


Oof. I did. Ribbon felt enough like magically disappearing menu items to me.


I find focusing on fundamental tools and concepts like terminals and text mode editors like vi and emacs will pay off handsomely.

All the fancy dialogs will switch around every few years.

This mindset extends to stuff like Word. Whenever you do something think hard about the essence of what you’re doing. Realize this should have been a script, but due to constraints in reality you are forced to use some wanky GUI.

If you look at it like this, you won’t care the pixels move around. Your mental model will be solid and building that is 90% of the work.


Maybe my time will come some day (I'm 32 years old), but I always tell myself that learning how to learn and being interested in new/different technology is how I keep myself updated. The latter is probably difficult to maintain, but this whole AI thing has given me a new pastime hobby I could never imagine.

Maybe I'll reject instead of embrace the next big thing once I'm old enough?


i'm using a web browser broadly similar to mosaic (01994) on a site that works similarly to early reddit (02005). in another window i'm running irssi (01999) to chat on irc (01988, but i've only been using it since 01994) inside screen (01987, but i've only been using it since 01994), to which i'm connected with ssh (01996) and mosh (02011, but i didn't start using it until last month). in my unix shell window (mate-terminal, but a thin wrapper around libvte (02002), mostly emulating a vt100 from 01978) i'm running a bourne shell (01979, but really it's brian fox's better reimplementation which he started in 01989) in which i just ran yt-dlp (which has to be updated every couple of months to keep working, but mostly has the same command-line flags as youtube-dl, first released in 02006) to download a video encoded in h.264 (02003) in an mpeg-4 container (01998), and then play it with mpv (02013, but forked from and sharing command-line flags with mplayer (02000)). mpv displays the video with a gpu renderer (new) on a display server running x11 (01987).

in another browser tab i'm running jupyter (the renamed ipython notebook from 02011) to graph some dsp calculations with matplotlib (02003, but mostly providing the plotting functions from matlab (01984)) which i made with python (01991, but i've only been using it since 02000) and numpy (02006, but a mostly compatible reimplementation of numeric from 01995, which i've been using since 02003). in jupyter i can format my equations in latex (01984, but for equations basically the same as plain tex82 from 01982, but i've only been using them since 01999) and my text in markdown (02004, though jupyter's implementation supports many extensions). i keep the jupyter notebook in git (02005, but i've only been using it since 02009, when i switched from cvs (01986, but i've only been using it since about 01998)). the dsp stuff is largely applications of the convolution theorem (01822 or 01912) and hogenauer filters (01981).

i do most of my programming that isn't in jupyter in gnu emacs (01984, but i didn't start using it until 01994) except that i prefer to do java stuff in intellij idea, which i first used in 02006

earlier this year, my wife and i wrote our wedding invitation rsvp site in html (01990, but using a lot of stuff added up to 02000) and css2 (01998) plus a few things like corner-radius (dunno, 02006?) and a little bit of js (01995, but in practical terms 02004), with the backend done in django (02005) backed by sqlite (02000, but this was sqlite3 (02004), but sqlite mostly just implements sql (01974, first publicly available in 01979, but mostly the 01992 ansi standard) which in turn mostly just implements codd's relational data model (01970) and acid transactions (01983), all of which i've been using since 01996). and of course python, emacs, and git. most of the css debugging was done with chromium's web inspector (present from the first chrome release in 02008, a clone of firebug (02006)).

for looking up these dates just now, i used google's newish structured search results, which mostly pull information from wikipedia (02001); i also used stack overflow (02008) and its related sites.

the median of the years above is 01998, with 25% and 75% quartiles of 01987 and 02004, which i calculated using r (01997, but a reimplementation of s (01976)). if we assume that each new introduction listed above displaced some existing skill, then we can vaguely handwave at a half-life of about 25 years for these churning technology skills, which to me seems like enough time for a lot of them to amortize; but it seems like it's slowing down a lot, because the 25% quartile is in 01987 and not 01973

it's true that all the time i spent configuring twm, olvwm, fvwm, and afterstep, and working around bugs in netscape 4's javascript implementation, and maintaining csh scripts and informix ace database applications, and logging into anonymous ftp sites running tops-20, and debugging irq conflicts, isn't really serving me today. but you could kind of tell that those things weren't the future. the surprising thing is really how slowly things change: that we're still running variants of the bourne shell in emulators of the vt100

other still-relevant technological skills for me today include building a fire, qwerty typing, ansi c (my wife is taking a class), bittorrent, operating a gas stove, and making instant coffee. still beyond me, though, is how to turn this tv on


If it is seamlessly integrated, the AI won't even surface in a UI. You will just be presented with different options in the UI, which theoretically would be more precisely curated by the AI that you don't even see.


That runs counter to some very well established UI principles. People get confused when their interface changes except as a result of direct interaction. Open up a menu in response to a click, yes; reorganize menus to "optimize" them based on what a model predicts a person is going to do, no.

The killer is being able to tell a program what you want it to do, then not having to fuddle with buttons or menus at all (unless you want to tweak things).


While I agree in principle, AI-infusion in UIs does not need to break convention. For example, AI suggestions could subtly highlight menu options that would be most useful in an auto-detected workflow. We could also create a persistent area on the UI for the AI to "speak up".


An AI interface in Office brings back memories of Clippy.


We may not have seen the last of Clippy yet… https://gwern.net/fiction/clippy



Now imagine Clippy on a car touchscreen.


NIO sells cars with a tiny cute robot that moves on the dashboard, named NOMI.

A lot of people hate it as it’s closer to Google assistant than GPT4 and it makes mechanical noise when it rotates but I don’t think it’s a terrible idea.

Anthropomorphism, the attribution of human traits to things, is common with cars.

I would like my car to run a LLM instead of being so stupid at understanding my voice commands.


shell is the same


I think Satya Nadella put it pretty well in an interview: ad revenue, especially from search, is incremental to Microsoft; to Google, it's everything. So while Microsoft is willing to have worse margins on search ads in order to win marketshare from Google, Google has to defend all of their margins — or else they become significantly less profitable in their core business. LLMs cost a lot more than traditional search, and Google can't just drop-in replace its existing product lines with LLMs: that hikes their bottom line, literally. Microsoft is willing to swap out the existing Bing with the "new Bing" based on OpenAI's technology, because they make very little money comparatively on search, and winning marketshare will more than make up for having smaller margins on that marketshare. Google is, IMO, in between a rock and a hard place on this one: either they dramatically increase their cost of revenue to defend marketshare, or they risk losing marketshare to Microsoft in their core business.

Meanwhile, OpenAI gets paid by MS. Not that MS minds! They own a 49% stake in OpenAI, so what's good for OpenAI is what's good for MS.

If Google had decades to figure it out, I think your analysis might be right — although I'm not certain that it is, since I'm not certain that the calculus of "free product, for ad revenue" makes as much sense when the products are much more expensive to run than they were previously. But even if it's correct in the long run, if Google starts slipping now it turns into a death spiral: their share prices slip, meaning the cost of compensation for key employees goes up, meaning they lose critical people (or cut even further into their bottom line, hurting their shares more, until they're forced to make staffing cuts), and they fall even further behind. Just as Google once ate Yahoo! via PageRank, it could get eaten by a disruptive technology like LLMs in the future.


"OpenAI gets paid by MS"

Actually, MS gets paid by OpenAI at 70% of profits until they make back their investment (according to articles on the terms)


Note that MS cost-offsets much OpenAI infrastructure including their top-5 TOP500 class supercomputer (similar to a full TPUv4 pod)


To be fair, the open source model has been what's been working for the last few decades. The concern with LLMs was that open source (and academia) couldn't do what the big companies are doing because they couldn't get access to enough computing resources. The article is arguing (and I guess open source ML groups are showing) you don't need those computing resources to pave the way. It's still an open question whether OpenAI or the other big companies can find a most in AI via either some model, dataset, computing resources, whatever. But then you could ask that question about any field.


But none of the "open source" AI models are open source in the classic sense. They are free but they aren't the source code; they are closer to a freely distributable compiled binary where the compiler and the original input hasn't been released. A true open source AI model would need to specify the training data and the code to go from the training data to the model. Certainly it would be very expensive for someone else to take this information, build the model again, and verify that the same result is obtained, and maybe we don't really need that. But if we don't have it, then I think we need some other term than "open source" to describe these things. You can get it, you can share it, but you don't know what's in it.


RWKV does: https://github.com/BlinkDL/RWKV-LM It uses „the Pile“: https://pile.eleuther.ai/ And I’ve seen some more in the last weeks.


Good to hear. Let's reserve "open source" for cases like that.


Keep an eye on the RedPajama project for a model where the training data and code should both be freely available: https://simonwillison.net/tags/redpajama/


I agree with you to the extent that yeah technically it's not open source because the data is not known. But for these foundation models like Llama, the model structure is obviously known, pretty sure (didn't check) the hyperparameters used to train the model is known, the remaining unknown of data, it's pretty much the same for all foundation models, CommonCrawl etc. So replicating Llama once you know all that is a mechanical step and so isn't really closed source in a sense. Though probably some new term open something is more appropriate.

The real sauce is the data you fine tune these foundation models on, so RLHF, specific proprietary data for your subfield, etc. The model definition, basically Transformer architecture and a bunch of tricks to get it to scale are mostly all published material, hyper parameters to train the model are less accessible but also part of published literature; then the data and (probably) niche field you apply it to becomes the key. Gonna be fun times!


These "open source" ai models are more like Obtainable models. You can obtain them. The source is not open, hence open-source. Somewhere open-source got lumped in with free or accessible. Obtainable makes sense to me.


That makes sense. But I would argue to smaller/cheaper models are not a threat to Google, they are a solution. They will still have the reach advantage and can more cheaply integrate small/low costs models at every touch point.


Disagree. What you have in mind is already how the masses interact AI. There is little value-add for making machine translation, auto-correct and video recommendations better.

I can think of a myriad of use-cases for AI that involve custom-tuning foundation models to user-specific environments. Think of an app that can detect bad dog behavior, or an app that gives you pointers on your golf swing. The moat for AI is going to be around building user-friendly tools for fine-tuning models to domain-specific applications, and getting users to spend enough time fine-tuning those tools to where the switch-cost to another tool becomes too high.

When google complains that there is no moat, they're complaining that there is no moat big enough to sustain companies as large as Google.


Fine tuning isn't a thing for foundational models though, it's all about in context learning.


that means there's no money in making foundation models - the economics are broken.


Making video recs better translates to direct $$$

There’s a reason YT or TikTok recommendation is so revered


Honestly, I can't see Google failing here. Like other tech giants, they're sitting on a ridiculously large war chest. Worst case, they can wait for the space to settle a bit and spend a few billion to buy the market leader. If AI really is an existential threat to their business prospects, spending their reserves on this is a no-brainer.


> Honestly, I can't see Google failing here. Like other tech giants, they're sitting on a ridiculously large war chest. Worst case, they can wait for the space to settle a bit and spend a few billion to buy the market leader.

It seems incredibly likely that the FTC will block that. New leadership seems to be of the opinion that consumer harm is the wrong standard. Buying the competition with profits from a search monopoly leaves all parties impoverished.

Anyways, I don't think the risk is failure, but of non-success. The article claims meta won but it seems like nvidia is the winner: everyone uses their chipsets for training, fine tuning and inference. And the more entrants and niche applications show up the more demand there is for their product. TPUs theoretically play into this, but the "leak" doesn't mention them at all.


> The article claims meta won but it seems like nvidia is the winner: everyone uses their chipsets for training, fine tuning and inference. And the more entrants and niche applications show up the more demand there is for their product.

Like the saying goes: during a gold rush, sell shovels.


That was true for IBM in the 1970s and Microsoft in the 90s. Despite holding a royal flush, they managed to lose the game through a combination of arrogance, internal fighting, innovator's dilemma, concern over anti-trust, and bureaucratic inertia. It will be hard for Google to pull this off.


Microsoft aint doing so bad now


Microsoft is always doing bad. They've done bad for the life of the company. Microsoft has never 'done good'. They have always been a negative force in computing and society at large. This toxic culture comes from their founder, about which all the preceding also applies.


No idea whether this comment is satirical or not. As a reader I think that's marvelous (please don't break the suspense)


After Microsoft swapped out the CEO. New guys better than Balmer.


They won't fail, they'll just provide compute infrastructure for people building AI products. Google is mostly bad at building products these days.


GCP is a product, too, but it’s not as good as either of the top two, that’s a low margin market, and a key theme in this article is that people have made model tuning less expensive.

There’s no path forward for Google which doesn’t involve firing a lot of managers and replacing them with people who think their income depends on being a lot better at building and especially maintaining products.


I don't think google is ever going to get it's mojo back, Pichai has no vision. Long term I think google will see massive layoffs, and new products will come via alphabet acquisitions following the YouTube model.


The threat isn’t that another company has AI, it’s that they don’t (yet) have a good way to sell ads with a chat bot. Buying the chat bot doesn’t change that.


What I mean is, if they can't figure out the ad angle and end up facing an existential threat, they have enough money to just drop their existing ad business almost entirely, and buy out the leading AI company to integrate as a replacement business model. It would be bloody (in the business sense, at least), but Google would likely survive such drastic move.


I think this won't work out: AI is so popular now because it's a destination. It's been rebranded as a cool thing to play with, that anyone can immediately see the potential in. That all collapses when it's integrated into Word or other "productivity" tools and it just becomes another annoying feature that gives you some irrelevant suggestions.

OpenAI has no moat, but at least they have first mover advantage on a cool product, and may be able to get some chumps (microsoft) to think this will translate into a lasting feature inside of office or bing.


It being everywhere worries me a lot. It outputs a lot of false information and the typical person doesn’t have the time or inclination to vet the output. Maybe this is a problem that will be solved. I’m not optimistic on that front.


Same can be said about the results that pop up on your favorite search engine or asking other people questions.

If anything advances in AI & search tech will do a better job at providing citations that agree & disagree with the results given. But this can be a turtles all the way down problem.


There’s a real difference in scale and perceived authority: false search results already cause problems but many people have also been learning not to blindly trust the first hit and to check things like the site hosting it.

That’s not perfect but I think it’s a lot better than building things into Word will be. There’s almost no chance that people won’t trust suggestions there more than random web searches and the quality of the writing will make people more inclined to think it’s authoritative.

Consider what happened earlier this year when professor Tyler Cowen wrote an entire blog post on a fake citation. He certainly knows better but it’s so convenient to use the LLM emission rather than do more research…

https://www.thenation.com/article/culture/internet-archive-p...


No it won't and random search popup results are already a massive societal problem (and they're not even used like people are attempting to use AI - to make decisions over other peoples lives in insurance, banking, law enforcement and other areas where abuse is common when unchecked).


Low quality blogs etc stand out as low quality, LLMs can eloquently state truths with convincing sounding nonsense sprinkled through out. It's a different problem and many people already take low quality propaganda at face value.


I think this is a failure in how we fine-tuned and evaluated them in RLHF.

"In theory, the human labeler can include all the context they know with each prompt to teach the model to use only the existing knowledge. However, this is impossible in practice." [1] Therefore causing and forcing some connections that are not all there for the LLM. Extrapolate that across various subjects and types of queries and there you go.

1:https://huyenchip.com/2023/05/02/rlhf.html


> This so-called "competition" from open source is going to be free labor. Any winning idea ported into Google's products on short notice. Thanks open source!

How else, exactly, is open source supposed to work? Nobody wants to make their code GPL but everybody complains when companies use their code. I get that open source projects will like companies to contribute back, but shouldn't that go for everyone using this code? Like, I don't get what the proposed way of working is here.


Developers nowadays want to have their cake and eat it too. They want to develop FOSS code because capitalism is evil and proprietary software is immoral and Micro$oft is the devil, man, and so give their work away for free... but whenever a company makes money on it and gives nothing back, completely in line with the letter and spirit of FOSS (because requiring compensation would violate user freedom,) they also want to get paid.

Like the entire premise of FOSS is that money doesn't matter, only freedom matters. You're not supposed to care that Google made a billion dollars off your library as long as they keep it open.


I see this as part of the decline of hacker culture and rise of brogrammers. I see very few people programming for fun, everyone seems to be looking for a monetization opportunity for every breath they take.


For some strange reason (maybe moral failure?) people seem to have this insatiable addiction to food and shelter and most people have found no better way to support that addiction than to exchange labor for money.

The list of things I consider “fun” besides programming when I get off work is a mile long.


Then don't do open source work. You can't be donating your work under a permissive license and then complain that someone else used it. Make up your mind.

Edit: also please gtfo with your condescending tone. Everyone needs to eat and most people are working class. Don't act like you are the only one who has a unique experience of hunger and thirst.


The initial post I was replying to was:

>I see very few people programming for fun, everyone seems to be looking for a monetization opportunity for every breath they take.

So yes, thinking that most developers are going to do it for “fun” after working 40 hours a week is kind of naive.


All I said was it used it happen more before and now it happens less. I never made any comments about what quantity does it. And it's not even about that. The culture has gone from earn to live to live to earn and not just in programming.


I’ve been in this field professionally for over 25 years. There has never been a time where people weren’t interested in making the most money possible given their skillset and opportunity.

Or are you saying in some distant past that people did it for the love? I was a junior in high school when Linux was introduced and I was on Usenet by 1993 in the comp.lang.* groups.

The “culture” hasn’t changed - just the opportunities.


When linux was introduced in that group majority of the posts weren't asking how to make money from it. That's the difference. Try hanging out in langchain and openAI discords. You will see the difference.


free as freedom, but not free as beer?


That actually favors corporations more. I'm a FOSS advocate today because cricket bats costed money but Ruby was free, so I learned that.


> It's going to be seamlessly integrated into every-day software. In Office/Google docs, at the operating system level (Android), in your graphics editor (Adobe), on major web platforms: search, image search, Youtube, the like

Agreed but I don’t think the products that’ll gain market share from this wave of AI will be legacy web 2 apps; rather it’ll be AI-native or first apps that are build from ground up to collect user data and fulfill user intent. Prime example is TikTok.


You bottled up exactly my disappointment with some large companies' legacy AI offerings. They don't do both: iterate off of telemetry data and fulfill user's needs.


The problem is that the llms are better at search (for an open ended question) than Google is and that’s where most of googles revenue comes from. So it actually gives a new company like openai the opportunity to change consumers destinations from google


> OpenAI faces the existential risk, not Google.

Yes, but the quickest way for anyone to get themselves to state-of-the-art is to buy OpenAI. Their existential risk is whether they continue to be (semi)independent, not whether they shutdown or not. Presumably Microsoft is the obvious acquirer, but there must be a bunch of others who could also be in the running.


But if you wait a month you can get that model for free...


Where?


It's in reference to the article's open source free laborers.

https://www.semianalysis.com/p/google-we-have-no-moat-and-ne...


This is 100% correct - products evolve to become features. Not sure OpenAI faces the existential risk as MS need them to compete with Google in this space.


> Not sure OpenAI faces the existential risk as MS need them to compete with Google in this space.

I think OP is arguing that in that partnership Microsoft holds the power, as they have the existing platforms. The linked article argues that AI technology itself is not as much of a moat as previously thought, and the argument therefore is that Microsoft likely doesn't need OpenAI in the long term.


"Any winning idea ported into Google's products on short notice."

Imagine for a moment, in a different universe, in a different galaxy, another planet is ostensibly a mirror image of Earth, evolving along the same trajectory. However on this hypothetical planet, anything is possible. This has resulted in some interesting differences.

The No Google License

Neither Google, its subsidiaries, business partners nor its academic collaborators may use this software. Under no circumstance may this software be directly or indirectly used to further Google's business or other objectives.

If 100s or 1000s or more people on planet X started adopting this license for their open source projects, then of course it won't stop Google from copying them or even using the code as is. But it would muddy the waters with 100s or 1000s or more potential lawsuits. Why would any company risk it.

There is nothing stopping anyone writing software for which they have no intention of charging license fees. It's done all the time these days. There is also nothing stopping anyone from prohibiting certain companies from using it, or prohibiting certain uses.

I recall in the early days of the web when "shareware" licenses often tried to distinguish commercial from non-commercial use. Commercial use would presumably incur higher fees. Non-commercial use was either free or low cost. I always wondered, "How is the author going to discover if XYZ, LLC is using his software?" (This is before telemetry was common.) The license seemed unworkable, but that did not stop me from using the software. I was never afraid that I would be mistaken for a commercial user and the author would come knocking asking me to agree to a commercial license. I doubt I was the only one bold enough to use software with licenses prohibiting commercial use.

Even a "No Microsoft License" would make Github more interesting. One could pick some random usage. Microsoft may not this software for X. Would this make MSFT's plans more complicated. Try it and see what happens. Only way to know for sure.

Instead, MSFT is currently trying to out the plaintiffs in the Doe v Github case, over MSFT's usage of other peoples' code who put their stuff on Github, and as the Court gets ready to decide the issue, it's becoming clear IMO that if these individual are named, these brave individuals will lose their jobs and be blackballed from ever working in software again.

The No Internet Advertising License

This software may not be used to create or support internet advertising services for commercial gain.


The No Google License functionally exists: it's the AGPL. https://opensource.google/documentation/reference/using/agpl...


The license that prevents use by a particular list of corporations can likely be easily crafted.

But because any particular invention about LLMs is not a specific product but an approach, it would just be re-implemented.

One could imagine patenting an approach, if it ends up being patentable, and then giving everyone but some excluded entities a grant of royalty-free use. But, unless the use if that particular approach is inevitably very obvious (which is really unlikely with ML models), you would have hard time detecting violations and especially enforcing your patent.


Let's call it the Underdog License. It must not be used by any of the top N tech companies in terms of market share and/or market capitalization.


Neither Alphabet, Google nor their successors, subsidiaries, affiliates, business partners, academic collaborators or parent companies may use this software; all of the foregoing are specifically prohibited from any use of this software.


I agree with your assertion that AI will seamlessly integrate into existing software and services but my expectation is that it will be unequivocally superior as a 3rd party integration[0]. People will get to know 'their AI' and vice versa. Why would I want Bard to recommend me a funny YouTube clip when my neutral assistant[1] has a far better understanding of my sense of humor? Bard can only ever learn from the context of interaction with google services -- something independent can pull from a larger variety of sources supersetting a locked system.

Nevermind more specialized tools that don't have the resources to develop their own competent AI - google might pull it off but adobe won't, and millions Saas and small programs won't even try. As another example, how could an Adobe AI ever have a better interpretation of 'Paint a pretty sunset in the style of Picasso' than a model which can access my photos, wallpapers, location, vacations, etc?

[0]Much how smart phones seamlessly integrate with automobiles via CarPlay and not GM-play. Once AI can use a mouse, if a person can integrate with a service an AI can do so on their behalf.

[1]Mind it's entirely possible it will be Apple or MSFT providing said 'neutral' AI.


> And we should not expect to be able to catch up. The modern internet runs on open source for a reason. Open source has some significant advantages that we cannot replicate.

I don't have faith in OpenAI as a company, but I have faith in Open-Source. What you're trying to say, if I understand correctly, is that Google will absorb the open-source and simply be back on top. But who will maintain this newly acquired status quo for Google? Google cannot EEE their own developer base. They said that much in the article;

> We cannot hope to both drive innovation and control it.

History as an example, Android did not kill *nix. Chrome did not kill Firefox. Google Docs has not killed Open Office. For the simple fact that Google needs all of these organizations to push Google forward. Whether that means Google gets access to code, or whether that means Google becomes incentivized to improve in some way.

If Google wants to eat another free lunch tomorrow they have no choice but to leave some of that free labor standing, if not prop it up a little. The real question becomes, how much market share can we realistically expect without eating tomorrow's lunch?


They're not saying that they should absorb open-source. They're arguing towards a strategy/direction for how to approach AI models from a business perspective, laying down the facts that open-source has a superior positional advantage in terms of development costs.

Probably, internally Googlers are arguing that the "AI explosion" is short lived and people will be stop paying for AI as soon as open source PC models become cost and quality competitive. So they shouldn't chase the next big revenue stream that OpenAI is currently enjoying because it's short lived.


Everything you say is true, and Google has cards left to play, but this is absolutely an existential threat to Google. How could it be otherwise?

For the first time in a very long time, people are open to a new search/answers engine. The game they won must now be replayed, and because it was so dominant, Google has nowhere to go but downwards.


> OpenAI faces the existential risk, not Google. They'll catch up and will have the reach/subsidy advantage.

Doesn't Microsoft products get used more times in a day by more paying customers than Google products?

OpenAI won't have a problem because they reach more paying customers via Microsoft than Google can.


OpenAI=Microsoft for all intents and purposes.

Microsoft has a stake in OpenAI and has integrated into Azure, Bing and Microsoft 365.


Microsoft will probably eventually integrate it into their Xbox services, and possibly games via their own first-party studios.

In my opinion Microsoft has equal or more reach/subsidy advantage than Google for AI, at least toward general consumers.


Honestly, this makes me excited for future RPGs from bethesda. I've already seen mod demos where chatgpt does dialog for NPCs. Imagine a future elderscrolls or fallout where one could negotiate safe passage with the local raiders / bandits for a wheel of cheese or a 6 pack of nuka cola. A man can dream.


Stupid, silly me who knows little-to-nothing about the lore of OS. Why can't OS devs simply write out in the OS licensing that their wonderful work is usable by anyone and everybody unless you belong to Alphabet/Meta/Oracle/Adobe/Twitter/Microsoftpen– McCorps & their subsidiaries?

I imagine it comes down to ol' Googly & the boys taking advantage of the OS work -> OS devs backed by weak NFOs sue X corp. -> X corp. manages to delay the courts and carries on litigation so the bill is astronomical aka ain't nobody footing that -> ???

I imagine 90% end up taking some sort of $ and handover the goods like Chromium, though.

So back to square one, guess we kowtow and pray for us prey?


It already is built seamlessly into a lot of Google products.

OpenAI just beat Google to the cool chatbot demo.


Eventually Google will still lose to open models and AI chips.

Hardware performance is what’s making AI “work” now, not LLMs which are a cognitive model for humans not machines. LLMs are incompatible with the resources of a Pentium 3 era computer.

Managing electron state is just math. Human language meaning is relative to our experience, it does not exist elsewhere in physical reality. All the syntax and semantics we layered on was for us not the machines.

End users buy hardware, not software. Zuckerberg needs VR gadgets to sell because Meta is not Intel, Apple, AMD, nVidia.

The software industry is deluding itself if it does not see the massive contraction on the horizon.


It’s an obvious cycle.

“I’m idealistic!”

“I’m starting a moral company!”

“Oh dear this got big. I need investors and a board.”


OpenAI is more of a lab than a company though, no?

Aren’t they, in some sense, kind of like that lab division that invented the computer mouse? Or for that matter, any other laboratory that made significant breakthroughs but left the commercialization to others?

It would make sense to me what you’re describing. Only, we will probably be laughing from the future the extent of our current imagination with this stuff is still limited to GUI’s, excels and docs.


As running the models seems to be relatively cheap but making them is not I believe that's where the money is. That and generic cloud services because ultimately the majority will train and run their models in the cloud.

So, I would bet on AWS before OpenAI and I would bet the times of freely available high quality models will come to an end soon. If open source can keep up with that is to be seen.


This is why I think Apple's direction of building a neural engine into the M1 architecture is low-key brilliant. It's just there and part of their API; as AI capabilities increase and the developer landscape solidifies, they can incrementally expand and improve its capabilities.

As always, Apple's focus is hardware-first, and I think it will once again pay off here.


As I understand it, the open source community is working to make models:

- usable by anyone

- feasible on your desktop

Thereby at least levelling the playing field for other developers.


If Openai can win the developer market with cheap api access and a better product, then distribution becomes through third parties with everyone else becoming the product sending training data back to the model. I'd see that as their current strategy.


There are two things that make a good LLM. The amount of data available for training, and the amount of compute available. Google's bard sucks in comparison to Open AI, and even compared to Bing. It's pretty clear that GPT4 has some secret sauce that's giving them a competitive edge.

I also don't think that Open Source LLMs are that big of a threat, for exactly this reason. They will always be behind on the amount of data and compute available to the 'big players'. Sure, AI will increasingly be incorporated into various software products, but those products will be calling out to big tech apis with the best model. There will be some demand for private LLMs trained on company data, but they will only be useful in narrow specialties.


Did you read the article? It refutes almost every claim you're making and, I must say, rather convincingly so.


I'll admit, I skimmed it. I went back and re-read it, and the timeline of events was especially shocking. I still think the big models hold an edge, simply because it will most likely be better at handling edge cases, but wow, my days of underestimating the oss llms are certainly coming to a middle


Thats like saying in 1995 search is going to be integrated into everything, not a destination. That'd be true but also very wrong. Google.com ended up as the main destination.


Yes, AI is like social in that regard. You can add social features to any app, and the same applies to AI. But there are also social-centric sites/apps, and it will be the same for AI.


>They'll also find a way to integrate this in a way where you don't have to directly pay for the capability, as it's paid in other ways: ads.

I fear you are correct.


They'll catch up and will have the reach/subsidy advantage.

This is only true if they're making progress faster than OpenAI. There isn't much evidence for that.


LLMs are just better ML models, are just better statistical models. I agree that they're going to to be in everything, but invisible and in the background.


this was exactly what the free software advocates have been saying would happen (has happened) without protections to make sure modifications got contributed back to free software projects.


Yes, bring back Clippy!!!


You must mean the new Albert Clippy!


Google makes almost all its money from search. These platforms are all there to reinforce its search monopoly. ChatGPT has obsoleted search. ChatGPT will do to Google search what the Internet did to public libraries - make them mostly irrelevant.


How has ChatGPT obsoleted search, when hallucination and the token limits are major problems?

It's (sort of) obviated search for certain kinds of queries engineers make, but not normies.

I say sort of, because IMO it's pretty bad at spitting out accurate (or even syntactically correct) code for any nontrivial problem. I have to give it lots of corrections, and often it will just invent new code that also is broken in some way.


Let's consider what Google did to the previous paradigm: libraries and books.

Books had editors and were expensive to publish, which imparted some automatic credibility. You might even have involved a librarian or other expert in your search. So a lot of the credibility problem was solved for you, up-front, once you got the information source.

Google changed the game. It gave you results instantly, from sources that it guessed looked reliable. But you still had to ascertain credibility yourself. And you might even look at two or three pages on the same topic, quickly.

Google has been mostly defeated now and often none of the links it suggests are any good. That trade-off seems to be done.

Here comes LLMs. Now it's transferring even more of the work of assessing credibility to the end user. But the benefit is that you can get very tailored answers to your exact query; it's basically writing a web page just for you in real time.

I think the applications that win in this new era will have to make that part of their business model. In science fiction, AIs were infallible oracles. In the real world it looks like they'll be tireless research assistants with an incredible breadth of book-learning to start from but little understanding of the real world. So you'll have a conversation as you both converge on the answer.


google wasn't the first search engine


true, but it was the first great one. i remember struggling with altavista until google blew everything out of the water.


I've replaced almost all my usage of Google Search with ChatGPT. The only reason's I've had to use Google search is to look up current news, and do some fact checking. In my experience, GPT-4 rarely provides incorrect results. This includes things like asking for recipes, food recommendations, clarifying what food is safe to eat when pregnant, how to drain my dog tricks, translating documents, explaining terminology from finance, understanding different kinds of whiskey, etc.


This was true for me too, but I'm starting to find the data's cutoff date a problem, and it gets worse every day. I was reminded about it yesterday when it knew nothing about the new programming language Mojo or recent voice conversion algorithms.

The eventual winner will have a model that stays up-to-date.


It's mentioned in the article, but LoRA or RAG will enable this.

Phind is getting awfully close to this point already really. Integrating new knowledge isn't a bottleneck like we know from expert systems, I think it just hasn't been a priority for research and commercial reasons, till recently.


I asked ChatGPT to find Indian food in a tourist town. Googling verified that only one of its suggestions was a real place; the other four were hallucinations.

It's possible GPT-4 will be better; I haven't been able to test it because I remain on the waitlist.

I remain skeptical.


It's still bad.


Try this simple question with ChatGPT.

No need to verify the ages. You will immediately find the problem.

“List the presidents of the us in the order of their ages when they were first inaugurated”


I think you're underestimating product-market fit.

Normies don't care about the exact truth


The part of the post that resonates for me is that working with the open source community may allow a model to improve faster. And, whichever model improves faster, will win - if it can continue that pace of improvement.

The author talks about Koala but notes that ChatGPT is better. GPT-4 is then significantly better than GPT-3.5. If you've used all the models and can afford to spend money, you'd be insane to not use GPT-4 over all the other models.

Midjourney is more popular (from what I'm seeing) than Stable Diffusion at the moment because it's better at the moment. Midjourney is closed-source.

The point I'm wanting to make is that users will go to whoever has the best model. So, the winning strategy is whatever strategy allows your model to compound in quality faster and to continue to compound that growth in quality for longer.

Open source doesn't always win in producing better quality products.

Linux won in servers and supercomputing, but not in end user computing.

Open-source databases mostly won.

Chromium sorta won, but really Chrome.

Then in most other areas, closed-source has won.

So one takeaway might be that open-source will win in areas where the users are often software developers that can make improvements to the product they're using, and closed-source will win in other areas.


GPT-4 is so much better for complex tasks that I wouldn't use anything else. Trying to get 3.5 to do anything complicated is like pulling teeth, and using something worse than 3.5... Oof.

TBH this feels like cope from Google; Bard is embarrassingly bad and they expected to be able to compete with OpenAI. In my experience, despite their graph in the article that puts them ahead of Vicuna-13B, they're actually behind... And you can't even use Bard as a developer, there's no API!

But GPT-4 is so, so much better. It's not clear to me that individual people doing LoRa at home is going to meaningfully close the gap in terms of generalized capability — at least, not faster than OpenAI itself improves its models. Similarly, StableDiffusion's image quality progress has in my experience stalled out, whereas Midjourney continues to dramatically improve every couple months, and easily beats SD. Open source isn't a magic bullet for quality.

Edit: re: the complaints about Midjourney's UI being Discord — sure, that definitely constrains what you can do with it, but OpenAI's interface isn't Discord, it has an API. And you can fine-tune the GPT-3 models programmatically too, and although they haven't opened that up to GPT-4 yet, IME you can't fine-tune your way to GPT-4 quality anyway with anything.

"There's no moat" and "OpenAI is irrelevant" feel like the cries of the company that's losing to OpenAI and wants to save face on the way out. Getting repeated generational improvements without the dataset size and compute scale of a dedicated, well-capitalized company is going to be very tough. As a somewhat similar data+compute problem, I can't think of an open-source project that effectively dethroned Google Search, for example... At least, not by being better at search (you can argue that maybe LLMs are dethroning Google, but on the other hand, it's not the open source models that are the best at that, it's closed-source GPT-4).


Yes, I'd readily pay for GPT-4 access, though not the limited 25 requests per 3 hours version. I ponied up $20 for a month of usage to check it out, and it performs head & shoulders above 3.5 in its ability to comprehensively address more complex prompts and provide output that is more nuanced than ChatGPT.

I'll also point out that paid api access to 3.5 (davinci-03) is frequently better than ChatGPT's use of 3.5. You get many fewer restrictions, and none the "awe shucks, I'm just a little 'ol LLM and so I couldn't possibly answer that".

If you're frustrated by having to go to great lengths to prompt engineer and ask ChatGPT to "pretend" then it's worth it to pay for API access. I'm just frustrated that I can't use the GPT-4 API the same way yet (waitlist)


I hear ya! I'm out here dying on the GPT-4 API waitlist too. I use gpt-3.5-turbo's API extensively, and occasionally copy my prompts into GPT-4's web UI and watch as it just flawlessly does all the things 3.5 struggles with. Very frustrating since I don't have GPT-4 API access, but also very, very impressive. It's not even remotely close.

I pay the $20 for ChatGPT Plus (aka, GPT-4 web interface access); personally I find it useful enough to be worth paying for, even in its rate-limited state. It already replaces Google for anything complex for me. I wish I could pay for the API too, and use it in my projects.


GPT 4 really shows how absolutely terrible regular web search is at finding anything these days. Another complete embarrassment for Google.

Often times it can just recite things from memory that Google can't even properly link to, and they've got a proper index to work from for fucks sake.


The last time I tried to find a plumber in my local area through google I realized that the first three pages of results contained zero actual results. It was a mix of ads and seo spam from scammers. I ended up going to ddg and while there was plenty of seo spam there too, I found several good results on the first two pages.

Google has the technology and talent to relaunch themselves in a leadership position, but the current executive team doesn’t seem to have what it takes. They’re custodians / accountants, running the company a bit like Microsoft in the Ballmer era. What google needs to do now is leap ahead, and I don’t see it happening without a leadership change.


Amen. I have basically stopped using Google at this point because, when I do, the results are all garbage. I ask GPT-4 the same questions and get reliable mostly accurate answers. You do have to be cautious of lies/hallucinations but realistically most of Google's results now adays are sales pages masked as helpful articles that are mostly full of crap anyway.


I was dying on the GPT-4 API waitlist too. I built a proof-of-concept with GPT-3.5, got some ada embeddings, played around with some common patterns for a couple of weeks, spent less than $20. I then applied to the waitlist again with a few short sentences about what I'd done, how GPT-4 would make it better, and how it would enable something new and valuable for a particular market. Approved that day.

It's not exactly a shortcut, and maybe it was just luck, but I suspect the key is just to start building with what you have and show a trajectory. The best part is that coding with ChatGPT-4 as a "colleague" has made the whole thing super fun.


Sadly I did the same, but am still waitlisted. I do enjoy GPT-4 as a colleague though.


Wow, two days after I posted this I got off the waitlist. :D


I just got access. If you want, you can email me some prompts to test, ^ @gmail.


Thanks you for the offer, but I’m extremely conscious of avoiding a direct link from my comments here to who I am.

Maybe it’s a bit too paranoid, I don’t know, but I’ve also been open here about my workplace experiences, if someone who knew my irl connection to them and decided to comb through my comments, in a way that my HR dept among others might not quite appreciate. Maybe I should setup a separate HN account connected to me Professionally for that sort of thing.

Also my use case for GPT-4 is data analysis. Using the paid “plus” version shows a lot of promise for quickly bootstrapping data exploration and consumption as a jumping off point for more detailed digging. Via the chat interface it can ingest very small aggregate datasets and spit out observations that only myself and my boss have the domain name expertise to produce in my organization. but the Chat interface is highly limited and often truncates even small (faked but plausible) data, so I really want API access, because it involves sensitive info I couldn’t put into the chat site or responsibly share with someone outside my org.

But really, thanks for the offer. What are you working on with it?


If you're technical just get yourself OpenAI API access which is super cheap and hook it up to your own self-hosted ChatGPT clone like https://github.com/magma-labs/magma-chat

The wait for GPT-4 is not as long as it used to be, and when you're using the API directly there's no censorship.


"just get yourself OpenAI API access"

Could you please describe how one "just" do that? I have been on the GPT-4 API waitlist since it was announced and I still don't have access to the GPT-4 API.


Try again it’s very likely you will have better results if you try to ask more then once…


Yep, I use the paid API, and it’s a lot more flexible than ChatGPT. I’d didn’t know about the self-hosted interface though: that will be my project for tomorrow morning, thanks!

I’ve been on the GPT-4 waitlist for about 6 weeks, but I’m not sure what the typical wait is.


Magma wants me to use my google credentials to login. I’ll pass on that, it shouldn’t be required in anything self hosted which makes me distrust it a bit right off the bat.


> when you're using the API directly there's no censorship.

Wait seriously?


There’s a bit. When I used the openai playground I have received warnings about response potentially being bad, but using the API directly I don’t even get warnings like that.

Testing things out, it will produce vile and hateful content on demand. However it won’t say anything. If I specifically tell it to use some words I get the same type of content warning but also an extra note about those words, and that I have to contact openai support if my use case truly requires their use.

There’s also the fact that using or distributing such content is against TOS, so I suppose could simply ban you


Not exactly. The "censorship" is the RLHF tuning in the chat models as far as I understand it; the API for the chat models is the same AFAIK. The base models don't have censorship, but there are no base models available for API access for GPT-4, only a chat model. You can use the GPT-3-era base models, but, well, they're not as good as GPT-4.


Or you go to Microsoft Azure and use the GPT-3.5 base model: code-davinci-002.

Though they could still use "observational" censorship there, i.e. analyze your prompt with a different model. OpenAI does that, not sure about Microsoft.


Did you come across some other self-hosted ChatGPT clones that you can recommend?


.


> I'll also point out that paid api access to 3.5 (davinci-03) is frequently better than ChatGPT's use of 3.5. You get many fewer restrictions, and none the "awe shucks, I'm just a little 'ol LLM and so I couldn't possibly answer that".

Little correction - 3.5 is not davinci. davinci is 3.0, 3.5-turbo (chatgpt) is a davinci variant that has been tuned and adjusted for chatting and conversation, including all those restrictions. It is much faster than davinci, way cheaper but as you know, results are… ok

davinci (3.0) is more untuned, slower, more expensive to use, not conversational, but can yield much better quality


> Little correction - 3.5 is not davinci. davinci is 3.0, 3.5-turbo (chatgpt) is a davinci variant that has been tuned and adjusted for chatting and conversation, including all those restrictions.

Little correction of the correction. The base models are:

davinci = GPT-3

code-davinci-002 = GPT-3.5

They do only text completion and do not natively answer to instructions. There are also instruction tuned versions of the latter, e.g. text-davinci-003 and gpt-3.5-turbo-0301 (used in ChatGPT). See

https://platform.openai.com/docs/model-index-for-researchers

Note that code-davinci-002 is no longer available via the OpenAI API, but it is still on Azure. The GPT-4 base model is generally unavailable. Too powerful perhaps.


Turbo and davinci should be equally non-conversational. When you use GhatGPT it also has InstructGPT on top of turbo which is what makes it conversational, together with RLHF.


No, see neighboring comment.


I have access. If you want to collaborate, or just test a few prompt ideas, you can email me @gmail


I’m pasting my response to someone else who made the same kind offer:

Thanks you for the offer, but I’m extremely conscious of avoiding a direct link from my comments here to who I am. Maybe it’s a bit too paranoid, I don’t know, but I’ve also been open here about my workplace experiences, if someone who knew my irl connection to them and decided to comb through my comments, in a way that my HR dept among others might not quite appreciate. Maybe I should setup a separate HN account connected to me Professionally for that sort of thing. Also my use case for GPT-4 is data analysis. Using the paid “plus” version shows a lot of promise for quickly bootstrapping data exploration and consumption as a jumping off point for more detailed digging. Via the chat interface it can ingest very small aggregate datasets and spit out observations that only myself and my boss have the domain name expertise to produce in my organization. but the Chat interface is highly limited and often truncates even small (faked but plausible) data, so I really want API access, because it involves sensitive info I couldn’t put into the chat site or responsibly share with someone outside my org. But really, thanks for the offer. What are you working on with it?


SD and its configurability is miles and miles ahead of MJ. Sure if you want a fancy picture now it’s OK. How are you going to generate that same picture in another pose? Inpainting, outpainting.. I don’t even know where to begin. MJ is a toy compared to SD’s ecosystem.


GPT-4 is a must if tool using is your goal.

GPT-3.5, I think it is mostly suitable for:

1. Quick documentation lookup for non-essential facts

2. Lightweight documents writing and rewriting

3. Translation

Other use cases should go straightly to GPT-4


I use GPT-3.5 for a lot of terminology lookup, and it's generally pretty great.

"In the context of [field I'm ramping up in], what does X mean, and how is it different than Y" -- it's not as good as GPT4 but it emits so much quicker and it normally gets me where I needed to go.


Ah, "tool using", the best of goals.


> And you can't even use Bard as a developer, there's no API!

There is an API for the underlying model, it's just in alpha/beta/whatever they call limited invite-only release and you have to ask your devrel team to get access. I'm guessing we'll see better models very soon.

Google is, as usual, playing catch-up, but I have no doubt once the machine gets cranking they'll be fully competitive, at least similar to how GCP is now a totally viable AWS alternative. They never lead the pack because they can't (lawyers, regulation, monopoly, etc), but they know how to commit and execute.


The question is how long will it take for open source to become just as good as GPT 4? If it is 3 years, then yes, this is copium. But if it is 1 year or less, then how much is google really missing out on?

OpenAI spent 600m to improve GPT and made 200m from it and if costs dramatically fall for model development, it might be OpenAI that is shooting itself in the foot.


OpenAI spent 600m to improve GPT and made 200m from it

How do you know?


>Similarly, StableDiffusion's image quality progress has in my experience stalled out, whereas Midjourney continues to dramatically improve every couple months, and easily beats SD. Open source isn't a magic bullet for quality.

MJ only does one style, and you can emulate that just fine in SD if thats what you want.


I don't feel sorry for Google, nor the big amounts of PR nonsense they're putting out there in order to try to spin their being too slow to move LLM tech to the side of the consumer. Get better or get out.


>Midjourney is more popular (from what I'm seeing) than Stable Diffusion at the moment because it's better at the moment. Midjourney is closed-source.

Midjourney is easier, its not better. The low barrier to entry has it popular, but it isnt as realistic, doesnt follow the prompt as well, and has almost no customization.

SD is the holy grail of AI art, if you can afford a computer or server to run SD + have the ability to figure out how to install python, clone Automatic1111 from git, and run the installer, its the best. Those 3 steps are too much for most people, so they default to something more like an app. Maybe it is too soon, but it seems SD has already won. MJ is like using MS paint, where SD is like photoshop.


I just want to add my $0.02 currently working at a games studio that is integrating AI generated art into our art pipelines.

Midjourney definitely generates really high quality art based on simple prompts, but the inability to really customize the output basically kills its utility.

We heavily use Stable Diffusion with specific models and ControlNet to get customizable and consistent results. Our artists also need to extensively tweak and post-process the output, and re-run it again in Stable Diffusion.

This entire workflow is definitely beyond a Discord-based interface to say the least.


If you would give a talk about this, I would watch it - despite being out of the graphics for almost 10 years now. Really want to hear from the trenches about the workflows, benefits and challenges you have.


Here is a test I did the other day of rough sketch (hand drawn) -> clean line work (AI) -> coloured (AI). This workflow gives 100% control over the output because you can easily adjust the linework in the intermediary step.

https://twitter.com/P_Galbraith/status/1649317290926825473?c...

This is using Stable Diffusion and the Control Net Lineart Model. The coloured version is pretty rough but it was a quick test.

In my opinion Stable Diffusion is vastly superior to Midjourney if you have the skill to provide input to img2img/ControlNet.

I have some other earlier workflow experiments on Youtube if you're interested in this kind of thing https://www.youtube.com/pjgalbraith


use https://github.com/deep-floyd/IF, it uses LLM to generate exact art you need.


The image quality of DeepFloyd is much lower than Stable Diffusion 1.5 though, it's a pretty major tradeoff. Can definitely be part of the workflow since it really is good at composition, but right now it's not a replacement.


Deep Floyd doesn't allow commercial usage, such as a game studio using it.


They said that they will change the licence for the final version IIRC.


Really curious about the challenges you've been seeing as your art team's been trying to integrate AI generated art into your pipeline.

Excited by the potential for artists to accelerate their creative process with advances like ControlNet, but makes sense there's a lot frustrating about the process today.

Exploring tools that might help. If you might be interested in chatting, email me at tom@adobe.com. Thanks!


> AI generated art into our art pipelines

I'd be interested to know where the art ends up in the game? Do you mean 2D backgrounds and billboards in-game? Or are we talking cut-scenes and menu screen art?


> Midjourney is easier, its not better.

By what measure? Midjourney v5 is massively better with every prompt topic I've thrown at it than SD. It's not even close. SD, however, is much better if you want an actual customizable toolchain or to do things like train it on your own face/character.


Generate the following picture in mid journey: "A school of dolphins spanking a mermaid with their flukes."

A 1000 V-rolls won't get you there. For something like this control net combined with inpainting is indispensable. Not to mention the excessively heavy handed censorship in MJ.

Midjourney excels in overall quality, but it completely falls down if you have an actual complex vision.


It seems Midjourney is great at generating non-pornographic pictures.


Uhh... Spanking isn't inherently pornographic - that particular image was supposed to be a Gary Larson parody style comic.

Here have another prompt: "Rapunzel has let her hair all the way down a tower. The hero has been tied up by the witch, and annoyed at Rapunzel's continual attempts to escape, the witch throws the bottom of her hair into a paper shredder at the base of the tower."

You could v-roll until the heat death of the universe without even getting close.

Midjourney is great if all you're capable of conceiving is 90s Mad Magazine templatized mad Libs, banal crap like "Darth Vader as a French pantomime street artist".

Unfortunately that also describes the majority of midjourney users.


Yeah, prompts which basically just list properties (Darth Vader, French, pantomime, street artist) seem to work well, but relations are mostly too hard for these models. Even "a monk playing chess against a clown" or "a blue book on top of a yellow book" is out of reach for Bing's Dall-E ~3, and Midjourney probably isn't much better here.

https://www.bing.com/images/create/a-monk-playing-chess-agai...

https://www.bing.com/images/create/a-blue-book-on-top-of-a-y...

Simple (prompt only) use of generative models is quite good at creating simple artistic pictures you might actually hang on a wall. Trying to create a complex scene with just a prompt seems still a few years off though.


Agreed. I pay for MJ and have several SD versions running on my PC. I like the ability to fine tune the SD models and my Pc with a 4090 is plenty fast, but I can't match MJ's output on artistic quality. SD allows for 4k sized outputs which is great but I can't use the art like I would like. FWIW the SD NSFW community is large but that is not where I invest my time with AI art.


They also just released v5.1, which seems to be quite a bit better than v5.


Is there a comparison? It is interesting that there do not seem to be any Midjourney benchmarks. E.g.

https://paperswithcode.com/sota/text-to-image-generation-on-...

Parti and Imagen are still on top, followed by Dall-E 2.

If their model is so great, why are they afraid of benchmarks?


I literally just don't feel like running them tbh, and see no reason to publish them either way. Mostly prefer to let the outputs speak for themselves.

For a while I was using an FID variant for evaluation during training, but didn't find it very helpful vs just looking at output images.


Okay. That's probably the difference between a commercial and a research project.


That seems to depend on your use case. Frankly, I don't have much use for either of them but Midjourney was much closer.

I've twice spent a couple of hours unsuccessfully trying to generate a simple background image that would be blurred out when rendering 3D models. SD out-of-the-box was far worse, but Midjourney still was not up to the task. It's incredible how well they can generate images of nearly any subject/object and make some changes to the style and placement, but trying to precisely achieve critical broad-stroke things like like perspective, sizing, lighting direction/amount/temperature, etc. was far too cumbersome. Prompt refining is just like having a program with a bunch of nebulous undocumented menu entries that you just have to click on to see what they do rather than just giving you the tools to make what you need to make. Was that the right entry or the wrong entry? Who knows! Maybe just try it again to see if it works better!

There's a fundamental disconnect between professional-level and consumer-level tools. Consumer tools must be approachable, easy to use, quickly yield appealing results, affordable, and require little maintenance. Professional tools need to be precise, reliable, capable of repeated results with the most demanding tasks, and easily serviceable into perfect working order.

These are consumer-level tools. If you merely need a cool picture of a space ship done in such and such style with such and such guns blah blah blah (that for some reason always looks 10%-50% Thomas Kinkaid,) these tools are great, but they abstract away the controls that really matter in professional work. Novices who get overwhelmed by all of those factors love it because they don't understand, and probably don't care about what they're giving up. For serious work, aside from getting inspo images or maybe generating deliberately weird bits of whatever, they're hit-or-miss at best. Without exception, doing a rough mock-up in a modelling program took FAR less time than trying to wrangle exactly what I needed from one of those generators.

I'm sure they'll get there someday but right now they're miles away from being professional-quality image generation tools.


> Without exception, doing a rough mock-up in a modelling program took FAR less time than trying to wrangle exactly what I needed from one of those generators.

I think a lot of people have unrealistic expectations of the tech-- they think they can get exactly what they want if they are articulate enough in describing it with words.

Feed your rough mock-up to img2img (or use inpaint sketch) and you'll land much closer to where you're trying to go.

It's a power tool. It will do tedious manual work (producing art) very quickly. The difference between professionals and consumers in how they use it is that the professional asks the machine to "finish what I started," whereas the consumer tells the machine to "do all of the work for me."


I tried img2img. It will do rough finishing work but it won't take some lighting vectors and match my lighting. It won't shift the viewpoint by 18 degrees. It puts a smooth sheen on rough work with broad stroke needs and that's valuable in some cases, but it is not a general-purpose professional tool.

Canva competently satisfies most non-professional needs but it only satisfies a narrow slice of professional needs. Trying to use it for most professional work takes vastly more time and effort than using a proper professional tool. LaTeX fits academic paper publisher's needs and can pump out formatted and laid-out text far quicker than someone using InDesign but you'd go crazy trying to assemble a modern magazine or high-end book. It doesn't need polish or sheen. It needs something fundamentally structurally different.

I'm both a professional digital artists and a long time back-end software developer. This slice of time has really opened my eyes to what it must be like for most non-developers to speak to developers: constantly oversimplifying your use case and assuming some algorithmic approximation will do without really understanding the problem.


Fair enough. Your 3D modeling needs might be a bit advanced for the current state of things. It works pretty well for flat graphic design, stock photo or illustration purposes.

I'm holding out for an instruct-based model that will take explicit instructions, or at least layered prompts. Mutating the original prompt along with the picture (or changing the entire thing to only describe certain parts, a la inpainting) is frustrating to me.


That's just one specific example of why it fails as a high-level general professional tool. Even for tasks like straight-up photo finishing... might be OK for making some neat thing or adding some generated detail to a phone shot but no way that's going to finish someone's professionally shot photos. It might replace Fiverr graphic design that was probably done with a template but real graphic design has conceptual meaning. Making the actual assets is the easy part.

I think what most developers don't realize is that creating media, even snazzy, captivating images, is a tiny portion of the work commercial artists and designers do beyond Fiverr or the people working at sign shops. We've got bazillions of stock photos, asset stores, etc. etc. etc. available at our fingertips at a pretty low cost... and we use them, just as we'll use AI generated images in our processes... Just not for anything that actually matters. Reducing art and design to what so many people have is nearly akin to reducing coding to how fast someone can generate code regardless of it's suitability for the purpose. I could go into Adobe Illustrator and make most of the vector art you see on the net in a few minutes. Knowing what needs to go on the screen, exactly how it needs to go on the screen, and the effects of putting it there, what it communicates and to whom, etc. etc. etc. are the hard parts. You could have a tool that would instantly translate millions of pretty images from someone's imagination in a matter of milliseconds but that's not going to make them useful for communication or anything else.

So very much of this hype is people simply assuming that they understand something that they don't. And that's why the UIs for most FOSS applications suck yet the project contributors will defend them like their own children.


I think a lot of people have unrealistic expectations of the tech-- they think they can get exactly what they want if they are articulate enough in describing it with words

Who's fault is this though? The hype is absolutely hysterical.


Thanks! That's pretty neat. The sample images look a bit overwrought like a lot of other AI images do but I'll bet they're doing that to follow the trend rather than it being a technical limitation.


Wrong thread?


yep. Really need to stop commenting when I'm waiting on after-hours compiles.


ControlNet helps a lot with composition and lighting (https://sandner.art/create-atmospheric-effects-in-stable-dif...). It requires more work than just entering a prompt, but probably less work than doing it manually once you get used to it. I think there's a number of StableDiffusion clients in development that are trying to make this easier.


Ha... I accidentally replied to the wrong comment. Anyway, thanks! That's pretty neat. The sample images look a bit overwrought like a lot of other AI images do but I'll bet they're doing that to follow the trend rather than it being a technical limitation.


I will say though that low-effort higher-volume professionals (e.g. mobile game mills, Fiverrrr designers) will likely profit from these tools once they can out-compete cheap online assets from stock images/models/etc. but they're so not there yet.


Classical professional tools like Photoshop have a lot less potential though. They are very precise, but (I assume) they have barely advanced in the past decade. Tools based on generative AI will probably improve massively over the next few years. Most such tools seem currently based on Stable Diffusion, and apparently OpenAI/Midjourney/Google have zero interest in supporting such tools. But this could change soon, e.g. when Adobe tries to compete with the SD ecosystem.

We already now see deepfakes (e.g. of Trump or the Pope, recently even videos) that a far beyond what we saw in the years before, indicating that the old professional tools weren't so powerful after all. Now if we extrapolate this a few years into the future...


You assume wrong. What these tools offer is constantly churning. Far faster than ever before and Photoshop has been around for over 30 years. They release updates constantly. Photoshop got AI filters like detail enhancement for zooming a few years ago. Automatic object detection, content-aware delete, etc. etc etc. a few years before that. That's only what I can recall off the top of my head for Photoshop alone, but it's such a giant environment that even most of their own product people probably couldn't tell you off the cuff. In areas like video compositing, tools like Nuke are developing tools with these capabilities even more quickly... and they better when the cheap license costs $3500/yr.

As I mentioned in another comment, so much of this hype is based on developers assuming they understand something that they don't. I've indulged in this hubris as a developer but straddling both sides of this line has been illuminating.


Well, Adobe at least seems headed to fully embrace the generative AI hype now:

https://www.adobe.com/sensei/generative-ai/firefly.html

This sounds all very similar to the Stable Diffusion tool chain, though probably cloud based and with a more intuitive UI on top.


All of these technologies are being integrated into professional toolkits in ways that make sense when they're polished enough to be professionally useful... and for the foreseeable future, that's how it will stay. Beyond the high-volume low-effort work on places like Fiverr, commercial artists and designers are valuable for their ability to think conceptually and make the artistic decisions about what goes on the screen, where, and why. The how is an implementation detail. Designers dropped balsamiq in favor of sketch in no time flat, and then dropped sketch in favor of figma even more quickly. Adobe XD, capable and included for free in an ecosystem they already use, it's barely in the conversation. These are fields where people readily adopt new technology that suits their needs but the current tools aren't even in the ballpark.

Being able to quickly generate and iterate on assets is great for inspiration but pretty useless for professional output without fine-tuned, predictable, repeatable controls. These tools will simply integrate with existing professional tools until they can do it better.

Imagine the first person to make an electric saw made some automated thing that could cut the wood to make a cool looking flat pack house somewhere in the neighborhood of your specifications in 5 minutes. The caveats: while it would assemble perfectly, the actual angles of the cuts might be unpredictable... Like 40 and 50 degrees rather than 45 and 45, and the layout was never quite what you expect even if it was OK more often than not. Pros knew those were fundamentally deal breakers for professional work, and remained more professionally useful with their hand saws because they had the required precision, control, and predictability. While enthusiasts were gong crazy exploring all of the different kinds of oh-so-slightly wonky structures they could generate and predicting the end of carpentry, the old school saw companies started making circular saws, chop saws, drills, and the like. The market for handyman-built dog houses, sheds and playhouses would immediately be lost to the automated machine but I guarantee you that all consequential work would still be done by carpenters with power tools.


I agree that for the foreseeable future designers / commerical artists will have to overall decide "what goes on the screen, where, and why", but not anymore for all smaller scale details. Generative AI automates creativity at least to some extent, which is qualitatively different from past developments. The example with the island tortoise on the Adobe website is not a real demo, but it doesn't seem far away.


I have SD up on a machine with a 3090 and it can't produce output half as good as MJ without a ton of work.

I use SD to augment MJ, like fixing hands with specific LORAs for example, so I definitely appreciate that it exists. But for actually creating a full image in one shot, they're not even comparable.


> Midjourney is easier, its not better

Does being easier not influence whether it's better? I mean this in that for many of the ways AI art would be used, MJ already seems to be "good enough" at a lot of it.

Secondarily: doesn't Midjourney's increased user base and increased ratings they get from users help it refine its model, thus meaning that "ease of use" creates a feedback loop with "quality of output" because more users are engaged?

I'm asking real questions, not making a statement I believe in and just adding a question mark.


>Does being easier not influence whether it's better?

As mentioned, what is better, MS Paint or photoshop? If MJ ignores your prompt and spits out a half related picture, are you going to continue using it?

If anything MJ is a stepping stone to SD. You get a taste of AI art, but want to do something specific that MJ cannot do. You learn about control-net, alternative models, inpainting, etc... and you decide you need to move on from MS Paint to Photoshop.

I personally used free AI art(cant remember which), it was super cool, but quickly I wanted to use different models and generate thousands of pictures at a time. I wanted to make gifs, img2img, etc... and the only people doing that were on SD.


I think you actually have that backwards, because your conception of "easier" is a bit skewed.

The question is not "which is easier?", but rather "which is easier to use to produce high-quality output". In your analogy, I'd argue the answer to that question is actually Photoshop. Likewise the answer in the MJ/SD case is MJ.


> As mentioned, what is better, MS Paint or photoshop?

Metaphor is useful, but this feels overly-reductive. The gap between the amount of effort it takes to make something great or approaching the vision you had is massive between MS Paint and Photoshop. Not so for SD and MJ.

However, I am appreciating that SD seems to be clearly better if you need something more specific / precise. I don't think I'm convinced (yet) that because it can get more precise inherently makes it a better tool.


It's the old consumer/professional distinction at play: "If it's a professional tool, it's a job to know how to use it."

There are definitely some professionalized paradigms emerging in the use of SD: one video tutorial I saw this morning covering basic photobash + img2img and use of controlnet had a commenter saying that they preferred using the lineart function in controlnet to get more control and leverage their drawing skills.

When you see that kind of thing it's a huge signal for professionalization, because someone suggesting the optimal first step is "learn how to draw" deviates so completely from the original context of prompt-based image generation: "just type some words and the image magically appears".


What is better oils or acrylics? What is better, clay or wood when it comes to sculpture.

It is more like comparing Krita vs Photoshop than MS Paint vs Photoshop. That is bogus.

Most AI art I have seen is complete shit anyway and especially from SD.


> Does being easier not influence whether it's better

Midjourney let's you type in a thing and get a result that will look great, which is no small accomplishment. If you want "incidental art" like blog post heros there is no competition. But it's really hard to use if you want to get exactly what you want.


> But it's really hard to use if you want to get exactly what you want.

Alternatively, if you don't know what you want, it's really good for inspiration.


Very true, I've fleshed out RPG scenes/characters with it.


> Does being easier not influence whether it's better? I mean this in that for many of the ways AI art would be used, MJ already seems to be "good enough" at a lot of it. > ... > I'm asking real questions, not making a statement I believe in and just adding a question mark.

In my opinion, it is tedious to discuss whether Midjourney or Stable Diffusion is better.

Both are tools with their own strengths and weaknesses.

Midjourney

- Comparable to a ready-made consumer product.

- Easy to use and you get excellent results quickly.

- The feature set is limited. So you can't influence so well how the result should look like.

- Restrictions on what you can generate with it (gore, nudes).

- Fee required.

Stable Diffusion

- Comparable to a development kit. Technical, Difficult to use, different tools that you have to configure and use correctly.

- For excellent results you have to invest much more time.

- Extensive feature set (inpaining, outpaining, upscaling, ControlNet, many different models, LORAs, textual inversions). You can much better define how the result should look like.

- No restrictions on what you can generate.

- Free if you install it locally.


Midjourney is higher quality by a fair bit, from my personal experience and from being near a few of the top early AI artists for a good little while.


Is midjourneys model actually better?

I was under the impression that midjourney was just running a form of SD and it's real secret sauce are the peripheral prompts it injects on the backend along with your prompts.

I could be totally off the mark here.


The model is obviously massively better. and they haven't been using SD in any form since the test mode of v3. the models are trained from scratch.


To me it is like saying oil is better than acrylics. These statements have no meaning when it comes to art.


Is midjourney still using discord as its primary user interface? That really turned me off.


Yes-- a classmate uses it. They do @everyone announces in their server every day, and while you can mute actual notifications, it still adds one to your badge count. My attention is too valuable-- that would get me to cancel my subscription.


When I used it, it didn't feel like a product. It felt like a demo. If I have to rely on some bot in a public forum, I'm not sure I like this product.


I've operated on the assumption that MidJourney is deliberately knee-capping their growth by making it only work on Discord to ensure they don't grow faster than they can add hardware.

I could be entirely wrong, though. Maybe the person making that decision is just an idiot.

I have an IRC bot that has triggers for DALL-E, GPT3, and ChatGPT. I really want to make one for MidJourney, and I would happily pay MJ for the privilege. But I can't. Not without breaking some rules.


Even if you turn all permissions on for your bot including the ability to use slash commands it won't be able to send commands to the Discord mid journey bot. (And there is no publicly exposed API)

The only way to do it right now would be to create a separate dedicated discord account linked to midjourney, and then have your bot control that user account and that's a good way not only to get your MJ access revoked but also to have your discord account banned.

As far as kneecaping their growth, they have one of the largest discord channels in history and they frequently run up against compute limits, so if this was one of their ostensible goals they failed at it pretty spectacularly.


> Even if you turn all permissions on for your bot including the ability to use slash commands

AFAIK, this isn't possible.


They have a <10 person team IIRC, using Discord as an interface saves a TON of money/effort/maintenance/risk, etc, as best as I understand, and lets them focus on the technical product. Remember, Discord is giving them free content legal protection by proxy, even if that's not necessarily the original intended effect I think. There's a lot to gain by riding alongside Discord as a primary interface vehicle, I personally believe.

They're good enough technically at what they do that their audience is okay with the interface that they have to use, I'd reckon (I've heard similar beefs about the UI stuff though, so it sorta makes sense).


> Discord is giving them free content legal protection by proxy

Would you mind explaining what you mean by that? Thanks!


Discord does not have its own data centres, It would be hard for them to provide such a guarantee.


Discord hosts a substantial number of images among other things.


MJ edits your prompts, you can achieve the same level of quality if you use the same prompts they do (which can be found on the internet)


They edit your prompt ... how?


By adding keywords to the prompt and to the negative prompt


Can you give me an example? What are those keywords? This sounds a bit like Dall-E 2, where OpenAI appears to manipulate certain queries involving women to make them black.



Okay, though this seems to be someone who tries to imitate Midjourney using Stable Diffusion, so it is still not quite clear how Midjourney does it. Though it seems plausible that they would "cheat" in this matter in order to get something more artsy looking. (But what if you don't want something artsy, or not artsy in their style?)


Thats the value proposition of MJ I guess. Its easier to get something good out of a simple prompt, but you end up with a recognizable MJ look


Are you sure about this? For the couple things I tried, a colleague with Midjourney managed to outperform my attempts with SD by leaps and bounds.


There's a higher "skill ceiling" with SD. You can install different models for different styles or subjects, use ControlNet for composition, and use plugins to do things you can't easily do with MJ.


> SD is the holy grail of AI art, if you can afford a computer or server to run SD + have the ability to figure out how to install python, clone Automatic1111 from git, and run the installer, its the best.

If you can afford Colab (which is free if you don’t want to use it too much), you can just click one of the existing A1111 colabs and run that, you don’t need to figure out python, git, or A1111 installs.


Free Colabs have started blocking any SD web-ui it detects (presumably because it's meant as a community service for ML researchers, not for people who want to play hentai gacha, and they're running out of server time)


Google is cracking down on this recently.


Cracking down on what exactly?



Do you have a link to a decent tutorial for someone to do what you're describing in the last paragraph?


It took me a little hunting, but thanks to Reddit I eventually found a cloud-gpu host that provides a working Stable Diffusion image. So you basically don't have to do anything that GP said. Everything is installed and you just rent the hardware.

https://www.runpod.io/console/templates

Look for "RunPod Stable Diffusion". I spent a whole $0.35/hr playing around with my own SD instance that I had running in minutes.


You can do the same thing on vast.ai too.

It's a little inconvenient to use non-base models and plugins this way (you pay extra for more storage), but it's definitely an easy way to use the full power of SD.


35c/hr seems crazy expensive compared to Midjourney. Midjourney gives you set fast hours (Immediate GPU) and unlimited relaxed hours (Delayed GPU). It also has a lot of built-in parameters you can use to easily tweak images. I'd rather pay for MJ than run my own SD.

The main upside of running your own SD is that you can completely automate it, but I'm not sure how useful that really is.


> The main upside of running your own SD is that you can completely automate it

No, the main upside of running your own SD web UI is that you can select and deploy your own checkpoints (not just using the base SD models), LoRas, embeddings, upscaling models, and UI plugins supporting additional services/models/features like multidiffusion (bigger gens and controls of which areas within the image different prompts apply to), ControlNet and associated models, video synthesis, combinatorial prompts, prompt shifting during generation to do blending effects, and, well, a million other things.

Also, you can completely automate it.


The midjourney price would be equivalent to ~100 hours cloud time. How is that crazy expensive?


Do you need additional detail that cannot be found here?

https://github.com/AUTOMATIC1111/stable-diffusion-webui

Or are you looking for the cutting edge stuff like control net?

If you want to use colab instead, I used this a month or two ago.

https://colab.research.google.com/github/TheLastBen/fast-sta...

I hope other people can give you further reading.


There are dozens on YouTube. My kids did it, and they don't even program and had never touched Python in their life.

Even trained their own models using a cloud GPU.

The SD ecosystem is wild.



Midjourney retrains itself, I have one click installer apps for SD, Midjourney and the live prompt community is very good

None of this stuff is copyrightable so I dont care that its not private


Midjourney is more niche. It’s great at photographs, digital art, concept art, game art and everything in that sphere. Because that’s what it was trained on. So it has a specific style. Dall-E in comparison produces kind of garbage looking pictures of many more styles


> users will go to whoever has the best model

Depends. You might want privacy, need low price in order to process big volumes, need no commercial restrictions, need a different tuning, or the task is easy enough and can be done by the smaller free model - why not? Why pay money, leak information, and get subjected to their rules?

You will only use GPT-4 or 5 for that 10% of tasks that really require it. The future spells bad for OpenAI, there is less profit in the large and seldom used big models. For 90% of the tasks there is a "good enough" level, and we're approaching it, we don't need smarter models except rarely.

Another concern for big model developers is data leaks - you can exfiltrate the skills of a large model by batch solving tasks. This works pretty well, you can make smaller models that are just as good as GPT-4 but on a single task. So you can do that if you need to call the API too many times - make your own free and libre model.

I think the logical response in this situation would be to start working on AI-anti-malware, like filters for fake news and deceptive sites. It's gonna be a cat and mouse game from now on. Better to accept this situation and move on, we can't stop AI misuse completely, we'll have to manage it, and learn quickly.


> Linux won in servers and supercomputing, but not in end user computing.

Pardon the side discussion, but I think this is because of a few things.

1. OS-exclusive "killer apps" (Office, anything that integrates with an iPhone)

2. Games

The killer apps have better alternatives now, and games are starting to work better on Linux. Microsoft's business model no longer requires everyone to use Windows. (Mac is another story.) So I think that, at least for non-Macolytes, Linux end user dominance is certainly on the horizon.


Linux did kind of win for end user computing. Android is based on a modified linux kernel.


This year is the year of the Linux Desktop!

I kid. I've been primarily a Linux Desktop user for 20 years.


If you think you can use GPT-4 then you don't know what you're talking about.

API access is on waitlist.

UI has limit of 25 messages in 3 hours.

If you think big, known companies can get ahead of the waitlist and use it - short answer is no, they can't because of their IP. Nobody is going to sign off leaking out all internal knowledge to play with something.

ClosedAI seems to have big problem with capacity.

Those poems about your colleague's upcoming birthday do burn a lot of GPU cycles.


That's also why I think OpenAI is in a tough spot in the long run. They just threw as much expensive hardware as they could to build this moat. There's basically two things which can happen from now on:

- Some scalability breakthrough will appear, if that's the case their moat disappears pretty much instantly and the cost of LLMs will plunge close to zero being a commodity. That's the future I'm betting on from what's happening now.

- No scalability breakthrough will appear and then it means that they will have a hard time to expand further as seen as the gpt4 limited access.

Either way, they are in a tough spot.


Big, known companies are already getting their GPT-4 fix via Azure OpenAI Service, where they can get meaningful guarantees for their data, and even on-prem if they really want it.


Pretty easy to get API access, I got it within a few days. Aware this is a sample of one, but also can’t believe they fast tracked me.


It took me several weeks to get access. I just got it today.


I've been waiting a little over a month with no update so far, but I don't expect any sort of fast track since I'm not currently a paying customer.


Are you using the gpt3 api?


We have gpt-4 deployed to production being used by fortune 100 labels.


> Linux won in servers and supercomputing, but not in end user computing

"End user computing" these days means mobile, and mobile is dominated by Linux (in Apple's case BSD, but we're splitting hair) and Chrome/WebKit - which began as KHTML.

The only area where opensource failed is the desktop, and that's also because of Microsoft's skill in defending their moats.


The kernel isn't the OS/environment. Distiling iOS to BSD is just not useful in the context of this discussion.


The kernel is absolutely the OS, the desktop environment is an interface to it.


Is e.g. libc part of the OS, or the desktop environment?


The kernel is part of it. But it's not all of it. Again, especially in the context of this discussion.


How can a company keep up with the speed of what is happening in the open?

Open AI had years of advanced that almost vanished in a few months.

And we will see the rise of specialized models, smaller but targeted, working in team, delegating (Hugging GPT)

I would use a small and fast model that only speak english, is expert at coding an science and not much more. Then you fire up an question to another model if yours is out of it’s area.


Will the average use know or want to use different models when they can just go to ChatGPT?


The average user won't have to care when these models run as part of whatever app he is using on his device, or on the server his app uses as a backend.

Look at the first uses of StableDiffusion. It was either "you know python or you use Dall-E". Now we have one-click installers setting everything up, and nice user interfaces on top of it.


> The point I'm wanting to make is that users will go to whoever has the best model. So, the winning strategy is whatever strategy allows your model to compound in quality faster and to continue to compound that growth in quality for longer.

Best only works till second best is "close enough" and cheaper/free


It's likely they will all be free in time. That's kind of the problem underlying the consternation here.

It's the internet all over again. How do you win the race to the bottom?

Once there

How do you compete effectively with free? Microsoft and Amazon will have billions on billions coming in to float their free offerings for what is effectively eternity in business terms. Probably Google and Meta will as well. What happens to everyone else?

I think you have to be in some niche market where you can charge. Because for everyone else, free is unsustainable.

Porn maybe? But there will be way too many competitors there. So something more like medical. Or semiconductors. Or construction or something.


> It's likely they will all be free in time. That's kind of the problem underlying the consternation here.

That what I was getting to. Paid only makes sense if you're willing to provide stuff that OSS lacks, which is either "super specialized things not many people want to OSS" or, well good looking UI... (there seem to be massive lack of any UI/UX people vs developers in near anything OSS).

AI is neither so it will be commoditized and mostly run in few OSS projects, and probably for the best, the only thing worse than anyone having access to "near free copywriter bot that will write about anything you tell it to" is only people with money having access and control over it.


> users will go to whoever has the best model

Not me, I refuse to use OpenAI products but I do sometimes use vicuna 13b when I'm coding C. It's pretty good and I'm happy to see the rapid advancement of open source LLMs. It gives me hope for the future.

> Linux won in servers and supercomputing, but not in end user computing.

I use linux on all of my computers and I love it, many of us do (obviously). I'm aware that I'm a small minority even among other developers but I think looking at just statistics misses the point. Even if the majority will just use the most approachable tool (and there is nothing wrong with that), it's important to have an alternative. For me this is the point of open software, not market domination or whatever.


None of the models will "win" because it is just a foundation. Google won because they leveeraged the linux ecosystem to build a monetizable business with a moat on top of it. The real moat will be some specific application on top of LLMs


> Linux won in servers and supercomputing, but not in end user computing.

It seems just about every computing appliance in my home runs Linux. Then you have Android, ChromeOS, etc. which are also quite popular with end users, the first one especially. It may not have won, but I think it is safe to say that it is dominating.


Appliances are not end user computing, but embedded computing - the OS is incidental and under full control of the manufacturer. Some might argue that even mobile phones are not sufficiently under the control of end users to qualify.


> Appliances are not end user computing

They are when the end user is using them. Think things like TVs or even thermostats.

> the OS is incidental and under full control of the manufacturer.

For all intents and purposes Linux has won where those conditions aren’t met.


>The point I'm wanting to make is that users will go to whoever has the best model.

Best isn't defined just by quality though. In some instances for some groups, things like whether the model is trained on licensed content (with permission) and / or is safe for commercial use is more important.

This is one reason why Adobe's Firefly has been received relatively well. (I work for Adobe).


Adobe Firefly can't be run locally and Adobe knows everything that their users generate. I can't train my own LoRAs or checkpoints. Adobe also has proven that they can't keep the data of their users secure. Which is why it's better to use something else.


Midjourney is more popular because it takes zero technical know-how compared to SD (even with A1111 it took me nearly an hour to walk my competent-but-layman brother through installing it) and doesn't require a high-end gaming PC to run it. (DALL-E lost because they let MJ eat their lunch)


> Midjourney is more popular because it takes zero technical know-how compared to SD

Both take zero technical knowledge to use the base models via first-party online hosts, but Midjourney is superior there.

SD offers a lot more capacity beyond what is available from the first-party online host, though, while with Midjourney, that’s where it begins and ends, there is nothing more.

> and doesn’t require a high-end gaming PC to run it.

Neither does SD (I use a couple-year-old business laptop with a 4GB Nvidia card; no sane person would call it a “highend gaming PC” to run A1111 locally, and there are options besides running it locally.)


> DALL-E lost because they let MJ eat their lunch

I wonder why nobody is talking about Bing Image Creator

https://www.bing.com/images/create

which uses some much more advanced version of Dall-E 2 in the background (so Dall-E 2.5? 3?), while being completely free to use. It can produce some pretty mind blowing results with quite simple prompts, although apparently not as impressive as Midjourney V5. A few examples:

hyperrealistic

https://www.bing.com/images/create/hyperrealistic/644fa0c48f...

an allegory for femininity

https://www.bing.com/images/create/an-allegory-for-femininit...

portrait of a strange woman, hyperrealistic

https://www.bing.com/images/create/portrait-of-a-strange-wom...

allegory of logic, portrait

https://www.bing.com/images/create/allegory-of-logic2c-portr...

her strange bedfellow

https://www.bing.com/images/create/her-strange-bedfellow/644...

Mrs fox

https://www.bing.com/images/create/mrs-fox/6446e85a32134e649...

inside view

https://www.bing.com/images/create/inside-view/6446f1dc573f4...

in the midst of it all

https://www.bing.com/images/create/in-the-midst-of-it-all/64...

strange gal

https://www.bing.com/images/create/strange-gal/6446e2a2ea7a4...

sighting of a strange entity in an abandoned library

https://www.bing.com/images/create/sighting-of-a-strange-ent...

sleeping marble woman next to a wall of strange pictures inside an abandoned museum, close-up

https://www.bing.com/images/create/sleeping-marble-woman-nex...

sculpture of a woman posing next to a wall of strange pictures, close-up

https://www.bing.com/images/create/sculpture-of-a-woman-posi...

Easter

https://www.bing.com/images/create/easter/643ae4968aff432684...

Christmas on board a spaceship, DSLR photograph

https://www.bing.com/images/create/christmas-on-board-a-spac...

an angel, dancing to heavy metal

https://www.bing.com/images/create/an-angel2c-dancing-to-hea...

Saturday afternoon in the streets of a buzzing cyberpunk city, photo-realistic, DSLR

https://www.bing.com/images/create/saturday-afternoon-in-the...

The Dogfather

https://www.bing.com/images/create/the-dogfather/6441d18950b...

the unlikely guest

https://www.bing.com/images/create/the-unlikely-guest/644446...

Strange pictures in an abandoned museum

https://www.bing.com/images/create/strange-pictures-in-an-ab...

strange woman in an abandoned museum, close-up

https://www.bing.com/images/create/strange-woman-in-an-aband...

strange woman in an abandoned museum, strange pictures in the background

https://www.bing.com/images/create/strange-woman-in-an-aband...

a wall of strange pictures in an abandoned museum in Atlantis, close-up

https://www.bing.com/images/create/a-wall-of-strange-picture...

female sculpture in an abandoned museum in Atlantis, close-up

https://www.bing.com/images/create/female-sculpture-in-an-ab...

the unlikely guest

https://www.bing.com/images/create/the-unlikely-guest/644490...

an unlikely guest of the secret society in the lost city in a country without name, close-up

https://www.bing.com/images/create/an-unlikely-guest-of-the-...

I think the quality of most of these pictures is far beyond what is achievable with Dall-E 2. One issue that still exists (though to a lesser extent) is the fact that faces have to cover a fairly large area of the image. Smaller faces look strange, e.g. here:

photograph of the unlikely guests

https://www.bing.com/images/create/photograph-of-the-unlikel...

It is as if the model creates a good draft in low resolution, and another model scales it up, but the latter model doesn't know what a face is? (I have no idea how diffusion models actually work.)


"Use of Creations. Subject to your compliance with this Agreement, the Microsoft Services Agreement, and our Content Policy, you may use Creations outside of the Online Services for any legal personal, non-commercial purpose."

Probably not the only factor but could be one.


> US law states that intellectual property can be copyrighted only if it was the product of human creativity, and the USCO only acknowledges work authored by humans at present. Machines and generative AI algorithms, therefore, cannot be authors, and their outputs are not copyrightable.

https://www.theregister.com/AMP/2023/03/16/ai_art_copyright_...

So the images can also be used commercially, and this holds as much for Microsoft as for Midjourney. (Though note that Microsoft isn't denying the usage for commercial purposes, though they made it sound as if they did.)


I think that pouring a lot of money in open source, by bounties or crowdfunding can accelerate open source alternatives to closed LLMs. Perhaps a middle way in which software will be declared open source six month from now can give enough compensation to those institutions contributing big money for developing LLM technology. That is a crowdfunding in which the great contributors have a limited time to be compensated, but capping the total prize just like that of chatgpt 3.5 or 4 depending of the model.


GPT4 sucks for many use cases because it's SLOW. It will co-exist with ChatGPT variants.


It's quite fast if you use it at ~4 AM in the US. There's definitely a cycle in time. Putting things in a queue to run while you sleep is a good work around.


It's about using the right tool for the right job. GPT-4 is an incredibly versatile generalist tool and a fantastic jack of all trades. However, this comes with some drawbacks. While saying it 'sucks' might be an exaggeration, I generally concur with the point you're making.


It's no exaggeration that it sucks for certain use cases where you would expect and can achieve near-realtime response, and that's fine because it's not built for that use case. I'm responding to someone saying it's always best if you can afford it


Yea 3.5 is more than good enough for a whole slew of tasks (especially code), and it's ridiculously fast. I rarely find the need to use 4 but certainly if there was a usecase it was significantly better at that mattered to me, I would.


And far more expensive than ChatGPT via API, so it makes sense to use ChatGPT3.5, or the locally run equivalents once they get as good, as much as possible.


> Linux won in servers and supercomputing, but not in end user computing

Android is based on Linux.


I think the best situation is when a company will perform an expensive but high value task that the open source community can't and then give it back to them for further iterations and development. If the community isn't able to perform a high value task again, a company steps in, does it, and gives it back to the community to restart the process.

In this way, everyone's skills are being leveraged to innovate at a rapid pace.


aren't androids linux? thats the biggest by far end user platform.

of course google doesnt want to acknowledge it too much.

https://source.android.com/


This is what happened with kubernetes no? Open source was about to take over so google release the code not to loose out.


Worthy to note, it seems like there were some incredibly dedicated hardworking engineers that drove extremely hard for a really long time to make this happen.

They did manage to get large buy in from the company after quite a significant journey. But it seems so much like a kind of outside event, something begat & pushed for not because it was a smart top down move, but because a couple super driven engineers made it their cause.


I see this often. We didn't document our efforts well at creating the OSS ecosystem that youngsters take for granted today. They attribute the efforts of some thousands of advocates that made all this happen to "market forces" or some other nonsense. OSS exists because some really dedicated hackers and their allies spent years of mostly unrewarded effort making it happen.


What?

No, what the article said was:

> At that pace, it doesn’t take long before the cumulative effect of all of these fine-tunings overcomes starting off at a size disadvantage.

>Indeed, in terms of engineer-hours, the pace of improvement from these models vastly outstrips what we can do with our largest variants, and

> the best are already largely indistinguishable from ChatGPT.

^ The author did not note chat gpt is better, the author claims that the 7B koala model is 'largely indistinguishable from ChatGPT'.

and:

> While ChatGPT still holds a slight edge, more than 50% of the time users either prefer Koala or have no preference.

Which is highly misleading.

The koala authors rated their model by passing it to 100 people using the mechanical turk, noting:

> To mitigate possible test-set leakage, we filtered out queries that have a BLEU score greater than 20% with any example from our training set. Additionally, we removed non-English and coding-related prompts, since responses to these queries cannot be reliably reviewed by our pool of raters (crowd workers).

So.

What you have is a model that performs pretty well for some trivial conversational prompting tasks.

What you DO NOT have, is something that is: "largely indistinguishable from ChatGPT".

Anyway, regardless of the creative interpretation of the authors writing, the point that I'm making is that your point:

> So, the winning strategy is whatever strategy allows your model to compound in quality faster and to continue to compound that growth in quality for longer.

Is founded on the assumption from the post that:

> While the individual fine tunings are low rank, their sum need not be, allowing full-rank updates to the model to accumulate over time.

ie. If you fine tune it enough, it'll get better and better in an unlimited fashion.

Which is provably false.

If I have a 10-parameter model, there is no possible way that the accumulation of low rank fine tunings will make it the equivalent of a 7B, 13B of 135B model.

It is simply not complex enough to do some tasks.

Similarly, smaller models like 3B or 7B model, appear to have an upper bound on what is possible to achieve with them regardless of the number of fine tunings applied to them, for the direct and obvious same reason.

There is an upper bound on what is possible, based on the model size.

The 'best' size for a model hasn't really been figured out, but... I'm getting pretty sick of people saying these 7B models are as good as 'ChatGPT'.

They. Are. Not.

People will go to the best models, with the best licenses, but... those models are, it seems, unlikely to be fine tuned smallish models.


Fantastic article if you are quick to just go to the comments like I usually do, don't. Read it.

One of my favorites: LoRA works by representing model updates as low-rank factorizations, which reduces the size of the update matrices by a factor of up to several thousand. This allows model fine-tuning at a fraction of the cost and time. Being able to personalize a language model in a few hours on consumer hardware is a big deal, particularly for aspirations that involve incorporating new and diverse knowledge in near real-time. The fact that this technology exists is underexploited inside Google, even though it directly impacts some of our most ambitious projects.

Anyone has worked with LoRa ? Sounds super interesting.


If you use the web interface (oobabooga), then training a LoRa is as easy as clicking the "training" tab, keeping all the defaults, and giving it a flat text file of your data. The defaults are sane enough to not begin undermining any instruction tuning too much. Takes 3-5 hours on a 3080 for 7B, 4bit model (and ~1KWh).

So far I've trained 3: 2 on the entire text of ASOIAF (converted from e-books) and 1 on the Harry Potter series. I can ask questions like "tell me a story about a long winter in Westeros" and get something in the "voice" of GRRM and with real references to the text. It can write HP fanfics all day long. My favorite so far was the assistant self-inserting into a story with Jon Snow, complete with "The Assistant has much data for you. Please wait while it fetches it." and actually having a conversation with Jon.

Asking specific questions is way more of a miss (e.x. "Who are Jon Snow's real parents?" returns total BS), but that may be because my 3080 is too weak to train anything other than 7B models in 4bit (which is only supported with hacked patches). I used Koala as my base model.

I'm getting close to dropping $1600 on a 4090, but I should find employment first... but then I'll have less time to mess with it.


I am surprised that people aren't using google colab pro/pro+ in this context. You basically get access to multiple A100 for $10/month and with some simple javascript tricks, you can get a session to last for 24hrs at least.

Pro+ is more expensive at $50/mo but it allows for more simplified background execution. if you are only just getting started and don't expect to be training for multiple months, then colab or other cloud-notebook providers are really great to start.


Google has recently limited the pro plan to 100 compute units per month. Using an A100 on colab burns 13 units an hour. So you could be out of units within 8 hours. Not really the $10/month deal you're looking for.


Which model did you use? I read your comment when this HN thread was alive and decided to try to feed it a few websites (such as PEP8[0] for Python), and it only took a few minutes for LoRa training to complete, and the answers were not good. I also have a 3080.

[0]: https://peps.python.org/pep-0008/


Used 3090 are getting really cheaps on the second hand market. Then if you only need VRAM, the Tesla M40 are even cheaper at 100€ per unit, which has 24GB of VRAM.


The M40 does not support 4bit, so it's basically useless for LLMs.

The P40 24GB is only $200, supports 4bit, and is about 80% the speed of a 3090 (surprisingly) for LLM purposes.


> The M40 does not support 4bit, so it's basically useless for LLMs.

Good to know, thanks !


What's the catch?


No video output, because it's a data center card.


Will it distribute training across multiple GFX cards? I have a 4x 2080Ti box I would love to be able to use for this sort of thing.


Not for training with the webui: https://github.com/oobabooga/text-generation-webui/issues/11...

It does seem to work using alpca-lora directly, though.


That's really interesting.

I guess it would do really well with world building lore type stuff, being able to go into great depth about the Brackens vs the Blackwoods but would struggle at the sort of subtext that even human readers may have missed (eg. who poisoned Tywin? and as you said, who are Jon Snow's parents?)


how much memory does the 7B training need?


~7.5GB - it'll be the same as running inference with a full context. That's for 4-bit quantization, the 8-bit quantization uses more RAM than my 3080 has...


I wonder how much it would take to train the 4b 13B


About 15GB training it in the webui.

If you use https://github.com/johnsmith0031/alpaca_lora_4bit then 30B only needs 24GB, and works on a single 3090 or $200 P40.


You can find the guy who created it on reddit u/edwardjhu. I remember because he showed up in the Stable Diffusion Subreddit. https://www.reddit.com/r/StableDiffusion/comments/1223y27/im...


If i understand correctly it is also shockingly simple, basically just the first figure in the paper: https://miro.medium.com/v2/resize:fit:730/1*D_i25E9dTd_5HMa4...

train 2 matrices, add their product to the pretrained weights, and voila! Someone correct me if i m wrong


Correct me if I am wrong, to use LORA fine-tuned model in inference you would still need the original model + trained additional layers, right?

If we can perfect methods to fine-tune large models for specific task while reducing the overall model size, then it can fit into more consumer grade hardware for inference and can be broadly used. The objective is to prune unnecessary trivia and memorization artifacts from the model and leverage LLMs purely for interpreting natural language inputs.


> to use LORA fine-tuned model in inference you would still need the original model + trained additional layers, right?

You don't need additional layers. After training, the product of the two matrices is added to the original weights matrix, so the model size remains the same as the original during inference.


Yes you still require the original model weights to use LoRA layers. For many LLaMA based models you need to find the original weight yourself and then apply the LoRA diff on top of that.


I had to read the paper first, but yeah, that diagram is shockingly simple once you get it.

Some annotations:

- The labels in the orange boxes mean "A is initialized with random weights (in a gaussian distribution, B is initialized with weights set to zero".

- d is the number of values of the layer's input and output. (The width of the input and output vectors, if you will.)

- r is the number of "intermediary values" between A and B. It's expected to be a lot smaller than d, hence "Low Rank" (apparently LoRa even works with r = 3 or so), but it can be equal to d, though you lose some of the perf benefits.


We need to scrape the entire corpus of /r/ASOIAF so it can come up with wild theories about how Tyrion is a secret Targaryen and confirm Benjen == Daario once and for all.


I wholeheartedly second this. This article seems to me to be one important, small piece of text to read. It might very well end up somewhere in a history book someday.


This gets attention due to being a leak, but it’s still just one Googler’s opinion and it has signs of being overstated for rhetorical effect.

In particular, demos aren’t the same as products. Running a demo on one person’s phone is an important milestone, but if the device overheats and/or gets throttled then it’s not really something you’d want to run on your phone.

It’s easy to claim that a problem is “solved” with a link to a demo when actually there’s more to do. People can link to projects they didn’t actually investigate. They can claim “parity” because they tried one thing and were impressed. Figuring out if something works well takes more effort. Could you write a product review, or did you just hear about it, or try it once?

I haven’t investigated most projects either so I don’t know, but consider that things may not be moving quite as fast as demo-based hype indicates.


It comes across as something from an open source enthusiast outside Google. Note the complete lack of references to monetization. Also, there's no sense of how this fits with other Google products. Given a chat engine, what do you do with it? Integrate it with search? With Gmail? With Google Docs? LLMs by themselves are fun, but their use will be as components of larger systems.


Yeah, but there are open source enthusiasts inside Google, too. People don’t necessarily change their opinions much when they start working at Google.


Honestly, it kinda fits my perception of Google, which is that they don't really have good business sense for creating new products - they invent first, find applications after. This 'leak' feels like it has that same kind of perspective.


I mean, all this talk about "moats" is directly tied to monetization. The leaker is saying "no matter what product we build with AI, equivalent open-source products will pop up free of charge, so we won't be able to charge for our product".

And while integrating a LLM into, say, Google Docs can be a selling point, it's not going to be a moat if OSS developers have access to their own LLMs; end users are going to choose Google Docs over FoobarOffice Online™ because Google Docs has a slightly better auto-complete or whatever.

So even if Google decides to integrates their LLM into Doc, it's not clear that it wouldn't benefit from open-sourcing that LLM and encouraging people to experiment on it.


Inside or out, it sounds like someone with an agenda to convince Google to release its model in the wild. I feel that all the more so because it is never stated explicitly but it's the obvious conclusion from reading between the lines. Things like hinting that Meta is a huge winner from LLaMa getting released (this isn't obvious to me at all).

The pitch being that if Google makes its models public it can race back to the forefront of "owning" the AI space and then capture the value of owning the underlying platform, like Android and Chrome.

The kind of scenario I imagine is that this is an insider who wants out but a huge amount of their work / investment / value is tied up with models they can't take with them.


Having enough scale to perpetually offer free/low-cost compute is a moat. The primary reason ChatGPT went viral in the first place was because it was free, with no restrictions. Back in 2019, GPT-2 1.5B was made freely accessible by a single developer via the TalkToTransformers website, which was the very first time many people talking about AI text generation...then the owner got hit with sticker shock from the GPU compute needed to scale.

AI text generation competitors like Cohere and Anthropic will never be able to compete with Microsoft/Google/Amazon on marginal cost.


And ChatGPT has a super low barrier to entry while open source alternatives have a high one.

Creating a service that can compete with it on that regard implies you can scale GPU farms in a cost effective way.

It's not as easy as it sounds.

Meanwhile, openai still improves their product very fast, and unlike google, it's their only one. It's their baby. It has their entire focus.

Since for most consumers, AI == ChatGPT, they have the best market share right now, which mean the most user feedback to improve their product. Which they do at a fast pace.

They also understand that to get mass adoption, they need to censor the AI, like MacDonald and Disney craft their family friendly image. Which irritate every geeks, including me, but make commercially sense.

Plus, despite the fact you can torrent music and watch it with VLC, and that Amazon+Disney are competitors, netflix exists. Having a quality service has value in itself.

I would not count open ai as dead as a lot of people seem to desperately want it to be. Just because Google missed the AI train doesn't mean wishful thinking the market to be killed by FOSS is going to make it so.

As usual with those things it's impossible to know in advance what's going to happen, but odds are not disfavoring chatgpt as much as this article says.


> Having enough scale to perpetually offer free/low-cost compute is a moat.

Its a moat for services, not models, and its only a moat for AI services as long as that compute isn’t hobbled by being used for models which are so inefficient compared to SOTA as to waste the advantage, which underlines why leaning into open source the way this piece urges is in Google’s interests, the same way open source has worked to Google and Amazon’s benefits as service providers in other domains.

(Not so much “the ability to offer free/low-cost compute” but “the advantages of scale and existing need for widely geographically dispersed compute on the cost of both marginal compute and having marginal compute close to the customer where that is relevant”, but those are pretty close to differenly-focussed rephrasings of the same underlying reality.)


That's what a lot of people think until they run Vicuna 13B or equivalent. We're just 5 months in this, there will be many leaps.


What makes you think open ai won't look at the FOSS improvements, include them in their tech, and make their GPU farm way cheaper, rendering their service even more competitive?

Not to mention it's easy to run stable diffusion, but midjourney is still a good business. I can run sd on my laptop, I still pay for midjourney because it's convenient, the out of the box experience is better than any competition, and it keeps improving.


The reason why proprietary software ever had a moat simply comes down to: software startups could dump investment capital onto the development process and achieve results much faster, with better user interfaces, allowing them to achieve path dependence in their customer base. Thus we had a few big application verticals that were ultimately won by MS Office, Adobe Photoshop, etc.

If the result here is as marginal as it seems - a few months of advantage in output quality and a slightly more sleek UI - the capital-intensive play doesn't work. The featuresets that industrial users want the most depend on having more control over the stack, not on UI or output quality. The open source models are stepping up to this goal of "cheap and custom". Casual users can play with the open models without much difficulty either, provided they take a few hours to work through an installation tutorial - UI isn't a major advantage when the whole point is that it's a magic black box.


> Casual users can play with the open models without much difficulty either, provided they take a few hours to work through an installation tutorial

That can be quite a barrier for entry for non-powerusers. I wouldn't underestimate serving casual users, considering that the alternative is OSS i.e. giving your shit away for free.


that's like saying that apple and MS can look into linux and steal ideas. Yes they can do that but it doesnt make linux any less useful. If anything they learned to contribute back to the common pile, because everyone benefits from it. It would be a problem if this was a one-way relationship , which it doesnt seem to be. If Open source is making them more money, why kill it


You are making my point: linux, mac and windows coexist, despite the overwhelming strength of open source, and the proprietary platforms are quite profitable.


But the point is not to kill commercial software because then OSS will die too because people will have to find other jobs


I mean, read the article, the author is concerned about that, and wants Google to open source more so it's not just Facebook's lama that gets open source building on it.


Yes there will, that's the problem HN User Minimaxir is talking about.

It will only get less and less expensive for Microsoft in terms of cost. And more and more effective for Microsoft in terms of results delivered.

How do you compete with free? That's the question. The previous internet experience has already shown us that "also be free" is not really a sustainable or even effective answer. You have to be better in some fundamental dimension.


Charity is only a moat if it’s not profitable.


This is the timeline that's scaring the shit out of them:

Feb 24, 2023: Meta launches LLaMA, a relatively small, open-source AI model.

March 3, 2023: LLaMA is leaked to the public, spurring rapid innovation.

March 12, 2023: Artem Andreenko runs LLaMA on a Raspberry Pi, inspiring minification efforts.

March 13, 2023: Stanford's Alpaca adds instruction tuning to LLaMA, enabling low-budget fine-tuning.

March 18, 2023: Georgi Gerganov's 4-bit quantization enables LLaMA to run on a MacBook CPU.

March 19, 2023: Vicuna, a 13B model, achieves "parity" with Bard at a $300 training cost.

March 25, 2023: Nomic introduces GPT4All, an ecosystem gathering models like Vicuna at a $100 training cost.

March 28, 2023: Cerebras trains an open-source GPT-3 architecture, making the community independent of LLaMA.

March 28, 2023: LLaMA-Adapter achieves SOTA multimodal ScienceQA with 1.2M learnable parameters.

April 3, 2023: Berkeley's Koala dialogue model rivals ChatGPT in user preference at a $100 training cost.

April 15, 2023: Open Assistant releases an open-source RLHF model and dataset, making alignment more accessible.


This really ought to mention https://github.com/oobabooga/text-generation-webui, which was the first popular UI for LLaMA, and remains one for anyone who runs it on GPU. It is also where GPTQ 4-bit quantization was first enabled in a LLaMA-based chatbot; llama.cpp picked it up later.


this doesn't even include the stuff around agents and/or langchain


The post mentions that they consider "Responsible Release" to be an unsolved hard problem internally. It's possible that they are culturally blind to agents.


They're basically saying that Pandora's Box, assuming it exists, has already been open. Even if OpenAI, Facebook AI Research and Google DeepMind all shut down tomorrow, research capable of producing agents will continue worldwide.


Interesting! It's like nothing has happened on the field for the last three weeks heh


OpenLlaMa came out last week I think.


The doc was written a bit ago.


There's "immediately profitable" and "eventually profitable". Vast compute scale allows collection of customer generated data so the latter is possible, AI as of yet is not the former.

So GP point still stands. FAAMG can run much larger immediate deficits in order to corner the market on the eventual profitability of AI.


All this talk that every investment pays off in the end is faulty and dangerous. Many investments don't pan out, 95% of the firms you see in the ticker this decade might be gone, and yet everyone is very confident is underwriting these "losses for future gains" but really it's economies of scale. It doesn't cost MSFT much more to run the GPU than to turn it on in the first place.


The amount of valuable data generated from professionals using these services to work through their problems and find solutions to industry problems is immense. It essentially gives these companies the keys to automating many industries by just...letting people try and make their jobs easier and collecting all data.


> . FAAMG can run much larger immediate deficits in order to corner the market on the eventual profitability of AI.

This assumes that there is a corner-able market. Previously, the cost of training was the moat. That appears to have been more of a puddle under the gate than an actual moat.


In other words, engage in anti-competitive behavior.


It seems the plan is to be a loss leader until scale is sufficient to reach near AGI levels of capability.


There was some indication recently that OpenAI was spending over $500k/day to keep it running. Not sure how long thats going to last. AGI is still a pipe dream. Sooner or later , they’re going to have to make money.


Oh no, they're going belly up in 20,000 days! (i.e. $10B / 500k) Compute is going to keep getting cheaper and they're going to keep optimizing it to reduce how much compute it needs. I'm more curious about their next steps rather than how they're going to keep the lights on for ChatGPT.



Assuming you're talking about the free ChatGPT product, it's important to consider the value of the training data that users are giving them.

Beyond that, they are making a lot of money from their enterprise offerings (API products, custom partnerships, etc.) with more to come soon, like ChatGPT for Business.


I know there are use cases out there, so it's not a dig. I'm curious how many enterprises are actually spending money with OpenAI right now to do internal development. Have they released any figures?


$500k/day for a large tech company is absolutely peanuts. Open.AI could probably even get away with justifying $5M/day right now.


> AI text generation competitors like Cohere and Anthropic will never be able to compete with Microsoft/Google/Amazon on marginal cost.

Anthropic already does, with its models. They are same price or cheaper than OpenAI, with comparable quality.

> Having enough scale to perpetually offer free/low-cost compute is a moat.

Rather than a moat it is a growth strategy. At one point in time you need to start to monetize and this is the moment when rubber hits the road. If you can survive monetization and continue to grow, now you have a moat.


A good example of this is Youtube


Investors are obsessed with moats, but people have to realize that the entire world runs on business that have no moats.

There are no moats to being a plumber, a baker, a restaurant...

The moat concept is predominant because the idea that everything must make billions have infected the debate about businesses.

It's all about being a unicorn, a giant, a monopoly, making every body at the top billionaires, and it's like there is no other way to live.

Except that's not how most people do live, even entrepreneurs.

Even Apple, which today is the typical example of a business with a moat, didn't start with "we can't get into this computer business, we'd have no moat".

They have a moat now, but it's a consequence of all the business decisions and the thing they built after many decades.

They didn't start their project by the moat. The started their project by providing value and marketing it.


> Investors are obsessed with moats

You can't blame them: gratuitous moats (like those provided by winner-takes-all dynamics) are not common in a functioning (competitive) economy so they get to be revered.

It feels unlikely that the recent period of big tech can keep the same benefits going forward. It was basically a political moat: counting on the ongoing lack of antitrust and consumer protection regulation. Even if the political dysfunction that allows that continues (quite likely), the wheels of the universe are turning.

The "leaked" report focuses on open source - a mode of producing software that is bound to become a major disruptor. We tend to discount open source because of its humble beginnings, long incubation, many false dawns and difficult business models. But if you objectively take a look at what is possible today with open source software, its quite breathtaking. I would not discount some tectonic shifts in adoption. The long running joke is "the year of the linux desktop", but keep adding open source AI and other related functionality and at some point the value proposition of open source computing (both for individuals and enterprises) will be crushingly large to ignore.

Don't forget too, that other force of human nature: geopolitics (e.g., think TikTok and friends). The current "moats" were established during an earlier, more innocent era. Now digitization is a top priority / concern for many countries. The idea that somebody can build a long-lived AI moat given the stakes is strange to say the least.


> There are no moats to being a plumber, a baker, a restaurant...

This line is interesting to me, because actually I think there _is_ a major moat there: locality. I don't disagree with the rest of your comment, but for those examples specifically a lot of the value of specific instances of those business comes from their being in your neighborhood. If I live in Toronto, I'm not going to fly a plumber from Manhattan to fix my pipes; if I want a loaf of sourdough, I'm not going to get it from San Francisco, I'm going to get it from the bakery around the corner; I might travel out of town for a particularly unique and amazing restaurant, but not every week, I've got solid enough options within a ten minute drive. Software is different because that physical accessibility hurdle doesn't exist.

Rest of this is spot-on though


What you're describing is less a statement on moats and more a statement on markets. Plumbers in one location share the market (the potential clients in that area), and as the parent comment states, there is no moat in that given market. A moat is a barrier to compete within a given market. So if something made it really difficult for a new plumber to serve an already-served clientele, that would be a moat. But individuals on the other side of the planet are by physical encumbrance not actually clientele... they're not even in the market.


Shareholders gain pennies with moats, pennies that someone else does not earn. Without moats they benefit much more, but it's not more than someone else. And that's the contentious issue. How would I benefit more than my neighbor?


No reason to invest loads of capital unless you're building a moat.


Yes, but open ai is not an investor, it's the company. They don't have to follow this logic, they can build something without thinking of the moat, and succeed anyway.

The moat is a priority for rent seekers, but most successful builders didn't start by that.

Coca cola, Mac Donald, Gillet and all the Buffet favorite children didn't grow by thinking moat first. The moat was built on the way, sometimes very late.


I have been toying around with Stable Diffusion for a while now and becoming comfortable with the enormous community filled with textual inversions, LoRAs, hyper networks and checkpoints. You can get things with names like “chill blend”, a fine-tuned model on top of the SD with the author’s personal style.

There is something called automatic1111 which is a pretty comprehensive web UI for managing all these moving parts. Filled to the brim with extensions to handle AI upscaling, inpainting, outpainting, etc.

One of these is ControlNet where you can generate new images based on pose info extracted from an existing image or edited by yourself in the web based 3d editor (integrated, of course). Not just pose but depth maps, etc. All with a few clicks.

The level of detail and sheer amount of stuff is ridiculous and it all has meaning and substantial impact on the end result. I have not even talked about the prompting. You can do stuff like [cow:dog:.25] where the generator will start with a cow and then switch over at 25% of the process to a dog. You can use parens like ((sunglasses)) to focus extra hard on that concept.

There are so called LoRAs trained on specific styles and/or characters. These are usually like 5-100MB and work unreasonably well.

You can switch over to the base model easily and the original SD results are 80s arcade game vs GTA5. This stuff has been around for like a year. This is ridiculous.

LLMs are enormously “undertooled”. Give it a year or so.

My point by the way is that any quality issues in the open source models will be fixed and then some.


Local LLMs already have a UI intentionally similar to AUTOMATIC1111, including LoRAs, training with checkpoints, various extensions including multimodal and experimental long-term memory etc.

https://github.com/oobabooga/text-generation-webui


Excellent! Impossible to keep up.


I wrote a whole gist about this exact thing!!!!

https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...


I'm a bit sceptical of the "no moat" proposition because (a) ChatGPT 4.0 really does seem in a different league and (b) it's clearly very hard to run. I haven't seen anything from the explosion of open source / community efforts that comes close for general applications.

The take in the post rings of the classic trademark Google arrogance where they assume that if somebody else can do it they can do it better if they just try - where the challenge of "just trying" is discounted to zero. In reality, "Just trying" is massively important and sometimes all that is important. The gap between unrefined model output and the level of polish and refinement that is apparent with ChatGPT 4 may appear technically small but it's the whole difference between a widely applicable and usable product and something that can't be more than a toy. I'm not sure Google has it in it any more to really fight for something they want to achieve that level of polish.


Just wait a few months. You are underestimating thousands of researches and engineers only working on this with enormous compute budgets in several companies.


Version 4 also now supports 32k tokens, good luck handling that on even awesome gaming local dev rig machines, although perhaps with linformer ideas, block-wise algorithms to handle larger than GPU memory, universal memory / RDMA it’s entirely doable. I got 50,000 atoms simulation back in 2018 on 11gb vram, at 32bit floats, the software stack has come a long way and now we have the 24gb 4090 with bfloat16, and vector DBs, and the infinite-context transformer paper just came out, so models all ought to be retrained on that if the method is truly superior anyway, not sure how atoms translate to pages of text but it’s almost surely possible to make a pretty useful LLM.

Although, OpenAI has a massive moat named “data”


Except that the models have been trained on publicly available data sources.


Great read, but I don't agree with all of these points. OpenAI's technological moat is not necessarily meaningful in a context where the average consumer is starting to recognize ChatGPT as a brand name.

Furthermore, models which fine-tune LLMs are still dependent on the base model's quality. Having a much higher quality base model is still a competitive advantage in scenarios where generalizability is an important aspect of the use case.

Thus far, Google has failed to integrate LLMs into their products in a way that adds value. But they do have advantages which could be used to gain a competitive lead: - Their crawling infrastructure could allow their to generate better training datasets, and update models more quickly. - Their TPU hardware could allow them to train and fine-tune models more quickly. - Their excellent research divisions could give them a head start with novel architectures.

If Google utilizes those advantages, they could develop a moat in the future. OpenAI has access to great researchers, and good crawl data through Bing, but it seems plausible to me that 2 or 3 companies in this space could develop sizeable moats which smaller competitors can't overcome.


Consumers recognizing ChatGPT might just end up like vacuum cleaners; at least in the UK, people will often just call it a "hoover" but the likelihood of it being a Hoover is low.

It is difficult to see where the moat might exist if it's not data and the majority of the workings are published / discoverable. I don't think the document identifies a readily working strategy to defend against the threats it recognises.


> end up like vacuum cleaners

The term of art is Generic Trademark

https://en.m.wikipedia.org/wiki/Generic_trademark

In US common law (and I'd imagine UK too), it's usually something companies want to avoid if at all possible.

Relevant case for Google itself: https://www.intepat.com/blog/is-google-a-generic-trademark/


See also the “Don’t Say Velcro” [1] campaign from the eponymous hook and loop fastener company.

[1] https://m.youtube.com/watch?v=rRi8LptvFZY


This reminds me of Lego's constant campaign about "don't call them Legos" that was similar and it always made me think the Lego company is very pretentious and I avoid them. I don't think that was their desired effect.

https://www.adrants.com/2005/09/lego-gets-pissy-about-brand-...


Well, except that there's no evidence that OpenAI are using the name in a trademark sense, let alone registered it?

Can't really genericise that which was never made specific...


> Well, except that there's no evidence that OpenAI are using the name in a trademark sense, let alone registered it?

https://tsdr.uspto.gov/#caseNumber=97733261&caseType=SERIAL_...


I'll also mark myself as skeptical of the brand-as-moat. I think AskJeeves and especially Yahoo probably had more brand recognition just before Google took over than ChatGPT or openai has today.


> ChatGPT as a brand name

You're forgetting the phenomenon of the fast follower or second to market effect. Hydrox and Oreos, Newton and Palm, MySpace and Facebook, etc. Just because you created the market doesn't necessarily mean you will own it long term. Competitors often respond better to customer demand and are more willing to innovate since they have nothing to lose.


> in a context where the average consumer is starting to recognize ChatGPT as a brand name.

That brand recognition could hurt them, though. If the widespread use of LLMs results in severe economic disruption due to unemployment, ChatGPT (and therefore OpenAI) will get the majority of the ire even for the effects of their competition.


> context where the average consumer is starting to recognize ChatGPT as a brand name.

Zoom was once that brand name which was equated to a product. Now, people might say "Zoom call", but may use Teams or Meet or whatever. Similarly, people call a lot of robot vacuum cleaners Roombas, even though they might be some other brand.

Brand recognition is one thing, but the actual product used will always depend on what their employer uses, what their mobile OS might use, or what API their products might use.

For businesses, a lot will be about the cost and performance vs "the best available".


FWIW I posted Simon's summary because it's what I encountered first, but here's the leaked document itself[0].

Some snippets for folks who came just for the comments:

> While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months.

> A tremendous outpouring of innovation followed, with just days between major developments (see The Timeline for the full breakdown). Here we are, barely a month later, and there are variants with instruction tuning, quantization, quality improvements, human evals, multimodality, RLHF, etc. etc. many of which build on each other.

> This recent progress has direct, immediate implications for our business strategy. Who would pay for a Google product with usage restrictions if there is a free, high quality alternative without them?

> Paradoxically, the one clear winner in all of this is Meta. Because the leaked model was theirs, they have effectively garnered an entire planet’s worth of free labor. Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products.

> And in the end, OpenAI doesn’t matter. They are making the same mistakes we are in their posture relative to open source, and their ability to maintain an edge is necessarily in question. Open source alternatives can and will eventually eclipse them unless they change their stance. In this respect, at least, we can make the first move.

[0]: https://www.semianalysis.com/p/google-we-have-no-moat-and-ne...


Seems to be the Open Source who is the real winner overall.. After OpenAI became basically ClosedAI it's an excellent news


I'm not sure? Placing ethics constraints on a company under a capitalist system is hard. Placing them on open source is impossible.


I have real troubles taking with "AI" ethics when the biggest danger seems to be offending people at a mass scale... Sounds like a win as well


Whose ethics?


This is so indicative of Google culture missing the point. The idea of spending $10M training a single model is treated as a casual reality. But “tHaNk GoOdNeSs those generous open source people published their HiGh QuAlItY datasets of ten thousand examples each. Otherwise we’d have no way of creating datasets like that…” :| the sustainable competitive advantage has been and will continue to be HUGE PROPRIETARY DATASETS. (Duh - this is as true for new AI as it was for old AI = ad targeting). It was the _query+click pairs_ that kept Google dominant all these years, not the brilliant engineers. They had all of humanity labeling the entire internet with “when I click on this page/ad for this query I do/don’t search again” a billion+ times a day for a decade. For good measure they’ve also been collecting your email, your calendar, and your browsing habits for nearly as long. The fact that they’ve managed to erase that historic advantage from their collective consciousness (presumably because AI researchers would rather not spend time debugging data labeling UI) is strange to me. It at least deserves a mention in a strategy memo like this. Not vague platitudes about “influence through innovation.” Spend that $10M you were going to spend on a training run as $9.9999M on a private dataset, then the remaining $100 on training. Better still, build products that gets user behavior to train your models for you. Obviously.

We’re going to watch the biggest face plant in recent economic history if they can’t get this one together. I can’t decide if that makes me happy about an overdue changing of the guard in the Valley or sad about the fall of a once great company.

It’s not about the models! Model training is a commodity! It’s about the data! Come on guys.


One way to push back on the data argument is to consider the progress DeepMind made with self play. Perhaps Bard can self-dialogue and achieve superhuman results. I won’t be surprised. Plus the underlying architecture is dense. Sparse transformers are a major upgrade. That’s only one of many upgrades you can make. There is still a lot of headroom and IMHO GPT-4 already implements AGI if you give it the right context


Self-play works for eg Go because there’s a perfect simulator of the game – which gives at least one very clear training signal, winning. There’s no simulator for conversation, no winning, and no training signal. Self-dialog doesn’t make any sense


I could think of ways you could make a good attempt at it. You could have a generator/discriminator relationship where you have a model whose purpose is to evaluate model outputs for things other than just toxicity (basically RLHF for capabilities), then use that to train. You could have code generation tasks where the code is actually executed and a success/failure signal sent based on code performance. You could do logical puzzle generator/logical puzzle solver pairs and have a separate system evaluate answer correctness based on a human dataset baseline, or maybe a model programmed to be able to turn natural language logic puzzles into API calls to a formal logical engine solver to get the answer for comparison. You could make a simple program to turn randomly generated mathematical problems into word problems, use an AI to add extraneous detail while protecting the core problem description, then give the resulting word problems to an AI and use a separate AI to extract out the final answer or conclusion. Then compare that answer to what the calculator says for the original math problem.

All of those have problems and would be very compute expensive, plus the limitation I struggle to see around where if you're using a model to train another model you maybe can't get better than that model. But I think we could build architectures which provide large labelled training datasets to LLMs for any problem that can be deterministically solved by us using more traditional computing methods, like maths and some logic puzzles. Maybe if we use those datasets we can make it so that LLMs are able to do maths and difficult logic problems natively, and maybe the internal machinery they develop to do that could help them in other areas. Would be a fun research project.


This looks like a personal manifesto from an engineer who doesn't even attempt to write it on behalf of Google? The title is significantly misleading.


Agree, misleading title. The introduction makes the context clear, but probably too late to not call the article click-bait.

> [...] It originates from a researcher within Google. [...] The document is only the opinion of a Google employee, not the entire firm. [...]


Completely agree. It is interesting, but the gravitas of it seems lower than of course if an executive said this and corroborated it. I do feel that opensource for AI is going to be really interesting and shake things up.


99% of media coverage like “Tech employee/company says <provocative or controversial thing>” are exactly like that.


And (probably) through no fault of their own they'll get totally thrown under the bus for this--whether directly but when raises/promotions come around or not.


This is easily among the rare highest quality articles/comments I've read in the past weeks, perhaps months (on LLMs/AI since that's what I am particularly interested in). And this was for internal consumption before it was made public. Reinforces my recent impression that so much that's being made for public consumption now is shallow and it is hard to find the good stuff. And sadly, increasing so even on HN. As I write this, I acknowledge I discovered this on HN :) Wish we had ways to incentivize the public sharing of such high-quality content that don't die at the altar of micro rewards.


I've been saying the same things for weeks, right here and in the usual places. Basically - OpenAI will not be able to continue to commercialise chatGPT-3.5, they will have to move to GPT-4 because the open source alternatives will catch up. Their island of exclusivity is shrinking fast. In a few months nobody will want to pay for GPT-4 either when they can have private, cheap equivalents. So GPT-5 it is for OpenAI.

But the bulk of the tasks can probably be solved at 3.5 level, another more difficult chunk with 4, I'm wondering how many of the requests will be so complex as to require GPT-5. Probably less than 1%.

There's a significant distinction between web search and generative AI. You can't download "a Google" but you can download "a LLaMA". This marks the end of the centralisation era and increased user freedom. Engaging in chat and image generation without being tracked is now possible while searching, browsing the web or torrenting are still tracked.


> I've been saying the same things for weeks, right here and in the usual places. Basically - OpenAI will not be able to continue to commercialise chatGPT-3.5, they will have to move to GPT-4 because the open source alternatives will catch up. Their island of exclusivity is shrinking fast. In a few months nobody will want to pay for GPT-4 either when they can have private, cheap equivalents. So GPT-5 it is for OpenAI.

It is worth $20 a month to have one UI on one service that does everything.

Unless specialized models can far exceed what GPT4 can do, being general purpose is amazing.

IMHO the future is APIs written for consumption by LLMs, and then natural language interfaces and just telling an AI literally anything you want done.


>It is worth $20 a month to have one UI on one service that does everything.

competition will drive profit margins and prices down to nothing because the number of companies that can spin up an UI is unlimited. Markets don't pay you what something is worth, they pay what the cheapest participant is willing to sell it for.


>Markets don't pay you what something is worth, they pay what the cheapest participant is willing to sell it for.

I believe 'what something is worth' is defined as what the market is willing to pay.

And sometimes the customer will pay for something that isn't the cheapest of something, which is why I'm writing this on a mac.


> competition will drive profit margins and prices down to nothing

I strongly suspect the profit margin on ChatGPT is already pretty low!

> Markets don't pay you what something is worth, they pay what the cheapest participant is willing to sell it for.

Correction: Markets pay what companies are able to convince consumers to pay. Some products bring negative value to the buyer, but are still sold for hundreds of millions of dollars (see: enterprise sales and integrations, which oftentime fail).


That last argument is a tautology btw


I'm paying but hate the UI. I had to add labels myself as a Tampermonkey extension, but it would be much better if they would give API access to what I'm paying for and let UIs compete.


> It is worth $20 a month to have one UI on one service that does everything.

Today it is. When there is an open source, capable “one UI for everything” that runs locally and can consume external services as needed (but keeps your data locally otherwise), will it still be?


You can't train ChatGPT with your own data and it has the infamous "As a language model..." problem. This is why an alternative that can be run locally is a better option for many people.


> I've been saying the same things for weeks, right here and in the usual places. Basically - OpenAI will not be able to continue to commercialise chatGPT-3.5, they will have to move to GPT-4 because the open source alternatives will catch up. Their island of exclusivity is shrinking fast. In a few months nobody will want to pay for GPT-4 either when they can have private, cheap equivalents. So GPT-5 it is for OpenAI.

I wonder if this effect will be compounded by regulatory pressure that seems poised to slow down progress at the bleeding edge of LLMs.

Open source closing the gap at the bottom, and governments restricting further movement at the top...


> I'm wondering how many of the requests will be so complex as to require GPT-5

I am not sure the pessimism is warranted. True that few people have the need to upgrade from GPT-3.5 to GPT-4 now, but if GPT-5 is another serious leap in capabilities, it might have an effect closer to the difference between old chatbots (useless, interesting) and ChatGPT (immediate economic impact, transforming some jobs). Or at any rate, we should expect such a leap to occur soon, even if it's not GPT-5.


Also significant to note that much of this AI boom was due to the UI of ChatGPT that gave everyone easy access to the model. Perhaps much of the improvements to be had in GPT-5 will also be found in the UI. I mean UI in the broadest possible sense, I'm sure we'll come up with very creative ways to interact with this over the coming years.

But the moat problem addressed in the article remains. Good luck patenting your amazing UI change in such a way that open source models can't catch up within a few weeks.


I also would like to believe that, but there are countless examples which show the difference. Companies have no time to figure out which of the open source offerings is the best. Even worse, they don’t have the time to switch from one project to the other or back to OpenAI if OpenAI releases a new state-of-the-art model.


And where are these open source models where I can go to a url and do all the things I can do in ChatGPT or through api keys for OpenAI? I googled a couple of weeks ago to find hosted versions of these open source models to try, and every one was either down or woefully poor.

OpenAI and MS are going to win because they have a package to go and it’s ready and available and working well - they have set the benchmark. I’m not seeing any evidence of this in the OSS community thus far.

Until I can spin up a docker image capable of the same as OpenAI in hetzner for 30 bucks a month - it’s not in the same league.


>Until I can spin up a docker image capable of the same as OpenAI in hetzner for 30 bucks a month

I do exactly this with https://github.com/nsarrazin/serge

Hetzner will install any hardware you send them for $100. So you can send them a $200 P40 24GB to run 33B parameter GPU models at ChatGPT speeds without increasing your monthly cost.


That $200 card's price seems to have been hit hard by inflation in Finland [1]

[1] https://www.proshop.fi/Naeytoenohjaimet/HP-Tesla-P40-24GB-GD...



One issue with the current generation of open source models is most have been based on some llama core architecture, and that's not licensed for commercial use. Once you get to the point of spinning up a full and easy API, and selling API credentials, you're entering into the commercial clause. Once we have a llama alternative (or a more permissively licensed separate architecture) I guarantee hosting providers like Render or Model are going to come in with an API offering. Just waiting on those core models to improve licensing, would be my guess.


> Until I can spin up a docker image capable of the same as OpenAI in hetzner for 30 bucks a month - it’s not in the same league.

Yes, you are right

That’s irrelevant to the point of this, which is about the dynamics of the market over a longer window than “what is available to use immediately today”, because a “moat” is a different thing than “a current lead”.


Most HN submissions are clickbait advertisements by startups for B2B/B2C services, clickbait amateur blog editorials looking for subscribers, tutorials for newbies, conspiracy theories, spam, and literally every article posted to a major media outlet. Most comments are by amateurs that sound really confident.

Don't believe me? Go look at https://news.ycombinator.com/newest . Maybe once a month you find something on here that is actually from an expert who knows what they're talking about and hasn't written a book on it yet, or a pet project by an incredibly talented person who has no idea it was submitted.

Source: I've been here for 14 years. That makes me a little depressed...


Well yes, generally in the business world all the "good stuff", the really smart analysis, is extremely confidential. Really smart people are putting these things together, but these types of analyses are a competitive advantage, so they're absolutely never going to share it publicly.

This was leaked, not intentionally made public.

And it all makes sense -- the people producing these types of business analyses are world-class experts in their fields (the business strategy not just the tech), and are paid handsomely for that.

The "regular stuff" people consume is written by journalists who are usually a bit more "jack of all trades master of none". A journalist might cover the entire consumer tech industry, not LLM's specifically. They can't produce this kind of analysis, nor should we expect them to.

Industry experts are extremely valuable for a reason, and they don't bother writing analyses for public media since it doesn't pay as well.


Beware that there’s also a ton of bias when something is analyzed internally. As Upton Sinclair once said “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

In the case of this analysis - it sounds great but it’s wrong. OpenAI has a huge moat. It has captured the mind share of the world. The software it has shipped is dramatically better than anything anyone else has shipped (the difference between useless and useful). We’ll see if folks catch up, but the race is currently OpenAI’s to lose.


Mind share is not a moat. And market share or being first is not a moat.

Moats are very specific things such as network effects, customers with sunk costs, etc. The very point of the term "moat" is to distinguish it from things like market share or mind share.

The article is correct, OpenAI has no moat currently.


What’s Google’s moat? Mind share and being dramatically better than the competition is indeed a moat. Trust me mind share is incredibly hard to gain in this day and age.


In AI? None (according to article). For their search engine? The distribution deals they have with Apple & android carriers for defaulting to them, and Chrome defaulting to them. If another search engine wanted to even release a product, theyd have to cross the distribution moat (possible on the web, just hard). For ads, the moat is their network of publishers. Competing ad marketplaces need to build a compelling publisher network to attract advertisers and compete for pixel space on publisher domains.


> Mind share and being dramatically better than the competition is indeed a moat.

That's literally the opposite of what "moat" means, so no. You can't just make up different definitions for accepted terms if you want to have a productive conversation with anyone.


Google, according to the article, has no moat either.


Agreed, I have some expertise in a couple software topics, and there is nowhere in public media that would pay me to write about it.

The only exception would be if my name were super recognizable/I had some legitimacy I could “sell” to publish something that did have commercial value, like some shitty CIO-targeted “article” about why XYZ is the future, in which case it’s not really going to be interesting content or actually sharing ideas.


Sort of like sports recruiters.


There is some really high quality internal discussions at tech companies, unfortunately they are suffering from leaks due to their size and media have realized it’s really easy to just take their internal content and publish it.

It really sucks because there’s definitely a chilling effect knowing any personal opinion expressed in text at a big tech company could end up in a headline like “GOOGLE SAYS <hot take>” because of a leak.

If there is some kind of really bad behavior being exposed, I think the role of the media is to help do that. But I don’t think their role should be to expose any leaked internal document they can get their hands on.


This is exactly that. This doc is apparently a leaked internal doc.


I know that, my point is that it’s not indicating anything nefarious enough to be worth exposing, it’s just juicy.

I don’t think the media should share stuff like this just because it’s interesting. They’re making a market for corporate espionage to sell clicks.


I thought this was a good one this week but didn't get popular.

https://huyenchip.com/2023/05/02/rlhf.html


Most of it is being written to make money off of you instead of communicate with you and it shows.


a lot of people have said similar things here


Im a noob. But the time for Wikipedia language models & training models seems ripe.



If you feel like your criteria for quality is beyond what you can typically find in the popular public consumption, just start reading papers directly?


The value in this article is the business strategy perspective, not the details of LLM's.

You generally won't find papers detailing the present-moment business strategies of specific for-profit corporations.


Sure but this article is also not the present-moment business strategy, it is written by a single individual with a perspective.


Hi Sai, do you have an email address (or other preferred private message) I could contact you on? Feel free to send it to the relay email in my profile if you want to avoid putting it publicly (or reply here how to contact you).

I'll ask my first question here below, so that if you have an answer it can benefit other HNers, and I'll save the other line of thought for email.

Do you happen to have a list of other highest quality articles on AI/LLMs/etc that you've come across, and could share here?

It's not my field but something I want to learn more about, and I've found it hard to, without knowing much about the specific subjects within AI that would be good to learn about makes it hard picking what to read or not.


Really interesting to look at this from a product perspective. I've been obsessively looking at it from an AI user perspective, but instead of thinking of it as a "moat", I just keep thinking of the line from Disney's The Incredibles, "And when everyone is super, no one will be."

Every app that I might build utilizing AI is really just a window, or a wrapper into the model itself. Everything is easy to replicate. Why would anyone pay for my AI wrapper when they could just build THING themselves? Or just wait until GPT-{current+1} when the model can do THING directly, followed swiftly by free and open source models being able to do THING as well.


Just gotta get to the point where we can just ask the model to code the wrapper we want to use it with...


Any wrapper that needs writing speaks to a gap in the AI's current capabilities. I just don't see why or how I would put man-hours into trying to close that gap when a future model could eclipse my work at any time.


The problem is that I can't think of a reason not to apply this same concern to basically all knowledge work -- once it can code new wrappers, it'll probably also be obsoleting huge swathes of other skillsets. And given current models, that really doesn't seem that far off. Personally it kinda feels like working at a company with impending layoffs...

But somehow my landlord isn't taking that as an excuse to stop working?? B.S.


you've got future frostbite


Because people pay for convenience, and may not be technical enough to stay up to date on the latest and best AI company for their use case. Presumably your specialized app would switch to better AI instances for that use case as they come along in which case they're paying for your curation as well.


Maybe. It just seems to me that every single angle of AI has this same moat issue.

It's like the generation ship problem. Send a ship to the stars today, and before it gets there technology might advance such that the second ship we send gets there before the first.

How do you justify the capital necessary to stand out in an AI driven marketplace when the next models could make your business obsolete at any time?


I am amazed that people haven't gotten used to these "internal Google doc leaks".

This is just the opinion of some random googler, one among over 100,000.

For some reason random googlers seem like to write random docs on hot topics and share it widely across the company. And someone, among those over 100,000 googlers, ends up "leaking" the opinion of that person to outside Google.

This is more like a blog post of some random dude over the Internet expressing his opinion. The fact that random dude ended up working at Google should not bear much on evaluating the claims in the doc.

A website published that with a title "Google ..." is misleading. The accurate title would be "Some random googler: ..."


According to the article, it's a random AI researcher at Google, so fairly relevant.


Google has thousands of so called AI/ML "researchers".

The author has ZERO publications in top AI/ML conferences.


You know who wrote this?


yes, I do.


Well, I can't argue with that. I'm just going by the intro paragraph in the article.

If your argument is "I know this guy and I consider his opinion worthless" you might want to lead with that.


It is not worthless. But you have to evaluate the argument on its own, not act as if it from some authority on the topic because it is from some googler.


Oh, I see where you're coming from. Yes, quite right. I'm definitely not saying "Well, he must be right because he's a super smart Google researcher."


People are appealing to some kind of "authority" regarding these opinion docs from random dudes working at Google where if they knew the dude's name rather than the fact that they work at Google they would not.


You don't think a Google AI researcher is in a good position to comment on how Google is affected by recent developments in AI? I mean, yeah, it's an opinion, but it's not just anyone's opinion.


This is really some random dude. If the dude posted it on his personal blog HN wouldn't pay any attention to it, but because it is a "leak" of an "internal Google document" somehow it becomes more valuable than it really is.

The author is mid level software engineer without any publications in major AI conferences. So the appeal to authority here is really unfounded.


I've been using Stable Diffusion to generate cover images for the music I release & produce for others. It's a massive time saver compared to comping together the release art using image editing software, and a lot cheaper than working with artists, which just doesn't make sense financially as an independent musician.

It's a little bit difficult to get what you want out of the models, but I find them very useful! And while the output resolution might be quite low, things are improving & AI upscaling also helps a lot.


> artist whose domain has not yet been disrupted by AI fires artist in favor of AI


I'm already using AI to make my music production process more efficient! Namely, I'm using a program called Sononym which listens to my tens of thousands of audio samples and lets you search by audio similarity through the entire library, as well as sort by various sonic qualities.

I think I'd still go for a human artist for a bigger release such as an album! It's a lot less hassle than sorting through (often rubbish) AI output & engineering your prompts, though it does cost £££ which is the main thing making it prohibitive for single releases.


And the rest of the world was better off for it.

If we'd prevented new technologies from influencing our artwork, our paintings would never have left the cave wall. I'm a musician with live published albums as well; if there comes a time when I think AI will help with my creative process, you can bet that I'll be using it.


> And the rest of the world was better off for it.

Except the single mom in a studio apartment trying to get some pay from her art gigs.


"I think we should ban ATMs, online banking, and direct deposit so I can get work as a bank teller" said no one ever. (Well, maybe someone when these were new.)

Displaced workers need support to ensure they can weather these transitions, but it doesn't make sense to artificially create demand by fighting new conveniences. If we want to ensure people have money, the solution is to give them money, not give them money in exchange for busywork.


Are single mom's a special class we should treat differently from others? You have single fathers.. childless couples, singles, parents with kids who have a disability, healthcare workers, transgendered singles, frail elders, mute single males..

Who should you protect?


Jobs come and go all the time. No one is special.


This line of thinking was shared by the original group who called themselves luddites.


> artist who can't afford to iterate his cover art ideas multiple times with a professional finds a creative solution


Everyone is getting disrupted by AI sooner or later.

The trick is to use AI to do things it would take you five lifetimes to learn. It's a tool to lower opportunity cost, financial capital, and human capital. That gives anyone leveraging it a much bigger platform, and the ability to dream big without resources.

If you can become your own "studio", you're the indie artist of the future. You don't need Disney or Universal Music backing.

Anyone can step up and do this. The artists being threatened can use these tools to do more than they've ever done by themselves.


I don't think that will do a whole lot to protect you from the economic harm. If everyone is producing more, the value of the works are reduced. At best, nobody will will make more money, they'll just be working harder to stay at the same place. More likely, there will simply be no room in the market for as many people and most will be out of work.


I disagree. Youtube model has shown that multiple people can produce videos and still earn profit from it. There are thousands of niches that creators can target which big studios don't even touch because masses might not be interested in it.

We can a Big Bang without all the stupid romantic stuff. We can have different ending versions of Game of Thrones. So much stuff is never made because it takes so resources to produce them.

I think the market will only grow when this technology is available to everybody.


I genuinely hope that you're right and I'm wrong!


In the music industry that appears to have been the case since iTunes hit the scene. The ease of distribution has enabled countless artists that nobody has ever heard of and will never listen to. Yet some have risen up and become hits despite this.


The problem isn’t capabilities, it’s having a market that’s saturated with supply twice over - once by the ability to make infinite copies of the product, the other where there’s an infinite supply of distinct high-quality products.

Subcultures used to provide a counterbalancing force here, but they aren’t doing so well these days.


My interests still are not being catered to.

I watch films and media, listen to music, and I'm only truly fully satisfied a single digit number of times a year. That's a consequence of not enough being created and experimented with.

The long tail is longer than you can imagine, and that's what form fits to your personal interest graph.


I think from the perspective of a Google researcher/engineer, it must be alarming to see the crazy explosion going on w/ LLM development. We've gone from just one or two weirdos implementing papers (eg https://github.com/lucidrains?tab=repositories who's amazing) to now an explosion where basically every dev and PhD student is hacking on neat new things and having a field day and "lapping" (eg productizing) what Google Research was previously holding back.

And we're also seeing amazing fine-tunes/distillations of very useful/capable smaller models - there's no denying that things have gotten better and more importantly, cheaper way faster than anyone expected. That being said, most of these are being trained with the help of GPT-4, and so far nothing I've seen being done publicly (and I've been spending a lot of time tracking these https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYp...) gets close in quality/capabilities to GPT-4.

I'm always rooting for the open source camp, but I think the flip-side is that there are still only a handful of organizations in the world that can train a >SoTA foundational model, and that having a mega-model is probably a huge force multiplier if you know how to take advantage of it (eg, I can't imagine that OpenAI has been able to release software at the pace they have been without leveraging GPT-4 for co-development; also can you distill or develop capable smaller models without a more capable foundational model to leverage?). Anthropic for example has recently taken the flip side of the "no moat" argument, arguing that there is a potential winner-take-all scenario where the lead may become insurmountable if one group gets too far ahead in the next couple years. I guess what we'll just have to see, but my suspicion, is that the crux to the "moat" question is going to be whether the open source approach can actually train a GPT-n++ system.


> Giant models are slowing us down. In the long run, the best models are the ones which can be iterated upon quickly. We should make small variants more than an afterthought, now that we know what is possible in the <20B parameter regime.

Maybe this is true for the median query/conversation that people are having with these agents - but it certainly has not been what I have observed in my experience in technical/research work.

GPT-4 is legitimately very useful. But any of the agents below that (including ChatGPT) cannot perform complex tasks up to snuff.


My understanding was that most of the current research effort was towards trimming and/or producing smaller models with power of larger models, is that not true?


Doesn't mean the smaller models are anywhere close to the capabilities of GPT-4.


So I use ChatGPT every day. I like it a lot and it is useful but it is overhyped. Also from 3.5 to 4 the jump was nice but seemed relatively marginal to me.

I think the head start OpenAi has will vanish. Iteration will be slow and painful giving google or whoever more than enough time to catch up.

ChatGPT was a fantastic leap getting us say 80% to Agi but as we have seen time and time again the last 20% are excruciatingly slow and painful (see Self driving cars).


Personally, the difference between GPT4 and 3.5 is, pretty immense for what I am using it for. I can use GPT 3.5 for things like summarization tasks (as long as the text isn't too complex), reformatting, and other transformation type tasks alright. I don't even bother with using it for logical or programming tasks though.


One way that I've been framing this in my head (and in an application I'm building) is that gpt-3 will be useful for analytic tasks but gpt-4 will be required for synthetic tasks. I'm using "analytic" and "synthetic" in the same way as in this writeup https://github.com/williamcotton/empirical-philosophy/blob/m...


This is my experience too. While I'd really love the Open Source models to catch up, currently they struggle even with dead-simple summarization tasks: they hallucinate too much, or omit essential points. ChatGPT don't often hallucinate when summarizing, only when answering questions.


Would you please be more explicit? I'm curious about the relative strength's and weaknesses other's see


I can use GPT 4 for to work through problems that I have not actually figured out previously by talking to co-workers who work in my industry. I need to feed it contacts for my industry explicitly within the prompt and ensure that it understands and doesn't hallucinate its answers. However, that doesn't mean it's not useful, it just means you need to you understand the limitations.


I don't think I understand the process you are describing because it lowkey sounds like you are giving it information about your peers and using that as a basis to ask questions of chatGPT but get the benefit of the real people's perspectives?

Also I agree it can be super useful, its just that my own use of it is very limited (basically as the research AI I always wanted), so I am trying to broaden my perspective on what is possible


Heh, autocorrect messed up context for contacts. Sorry for that. I was talking about context for the problem I am working through. So it's not zero work to just "ask the AI" ang get an answer...you need to know essentially what you are missing first, and ask relevant questions.


Can you provide an example?


> So I use ChatGPT every day. I like it a lot and it is useful but it is overhyped.

It is incorrectly hyped. The vision most pundits have is horribly wrong. It is like people who thought librarians would be out of work because of ebooks, barking up the wrong tree.

ChatGPT does amazing things, but it is also prone to errors, but so are people! So what, people still get things done.

Imaging feeding ChatGPT an API for smart lights, a description of your house, and then asking it to turn on the lights in your living room. You wouldn't have to name the lights "living room", because Chat GPT knows what the hell a living room is.

Meanwhile, if I'm in my car, and I ask my phone to open Spotify, it will occasionally open Spotify on my TV back home. Admittedly it hasn't done for quite some time, I presume it may have been a bug Google fixed, but that bug only exists because Google Assistant is, well, not smart.

Here is an app you could build right now with ChatGPT:

1. Animatronics with voice boxes, expose an API with a large library of pre-canned movements and feed the API docs to ChatGPT

2. Ask ChatGPT to write a story, complete with animations and poses for each character.

3. Have ChatGPT emit code with API calls and timing for each character

4. Feed each character's lines through one of the new generation of TTS services, and once generation is done, have the play performed.

Nothing else exists that can automate things to that extent. A specialized model could do some of it, but not all of it. Maybe in the near future you can chain models together, but right now ChatGPT does it all, and it does it really well.

And ChatGPT does all sorts of cool things like that, mixing together natural language with machine parsable output (JSON, XML, or create your own format as needed!)


I also felt this way initially, like "that's it?". But overall the massive reduction in hallucinations and increase in general accuracy makes it almost reliable. Math is correct, it follows all commands far more closely, can continue when it's cut off by the reply limit, etc.

Then I tried it for writing code. Let's just say I no longer write code, I just fine tune what it writes for me.


GPT feels like an upgrade from MapQuest to Garmin.

Garmin was absolutely a better user experience. Less mental load, dynamically updating next steps, etc, etc.

However, both MapQuest and Garmin still got things wrong. Interestingly, with Garmin, the lack of mental load meant people blindly followed directions. When it come something wrong, people would do really stupid stuff.


Then it’s not 20% then


I think this person is referring to the 80/20 rule. Here are a few examples:

20% of a plant contains 80% of the fruit

80% of a company’s profits come from 20% of customers

20% of players result in 80% of points scored

I've heard this stated as you can complete 80% of a project with 20% of the effort, and the last 20% of completeness will require 80% of the effort.


The Pareto principle…


Indeed.


% of what lol


Not only they have no moat, Open source models are uncensored and this is huge. Censorship is not just political , it cripples the product to basically an infantile stage and precludes so many applications. For once, it is a liability

But this article doesn't state the very obvious: When will google (the inventor of Transformer, and "rightful" godfather of modern LLMs) , release a full open source, tinkerable model better than LLaMa?

(To the dead comment below, there are many uncensored variations of vicuna)


My very naive opinion is that the best way to predict the big-picture actions of Google is a simple question: WWIitND - What Would IBM in the Nineties Do?

In more direct terms, their sole, laser focus seems to be on maintaining short-term shareholder value, and I really don't trust the typical hedge fund manager to approve of any risky OSS moves for a project/tech that they're surely paying a LOT of attention to.

Giving away transformer tech made Google look like "where the smartest people on the planet work", giving away full LLM models now would (IMO) make them look like arrogant and not... well, cutthroat enough. At least this is my take in a world where financial bigwigs don't know or care about OSS at all; hopefully not the case forever!


Yes, this reminds me of the story of transistors at IBM, when they had to pick between MOSFETs vs BJTs. MOSFETs were theoretically more scalable and what Intel eventually commercialized to great success. IBM had a lot of the best electrical engineers at the time, but chose to focus on BJTs because they supported their core product, mainframes. MOSFETs could theoretically scale better but without a clear line of sight to a product line or enhancement, they chose not to aggressively pursue MOSFET r&d. It makes sense, even in retrospect, because IBM didnt want to be a chip manufacturer.

Google doesn’t want to be an LLM manufacturer. But the benefits of having the industry center on your technical underpinnings are huge, s IBM found out eventually and as Google will, one way or the other. Meta understands this, at least


> When will google release a full open source, tinkerable model better than LLaMa?

Arguably, Facebook released llama because it had no skin in the game.

Google, on the other hand, has a lot of incentive to claw back the users who went to Bing to get their AI fix. Presumably without being the place for “Ok, google, write me a 500 word essay on the economic advantages of using fish tacos as currency” for peoples’ econ 101 classes causing all kinds of pearl clutching on how they’re destroying civilization.

The open source peeps are well on the path of recreating a llama base model so unless google does something spectacular everyone will be like, meh.


>Open source models are uncensored and this is huge

Vicuna-13B: I'm sorry, but I cannot generate an appropriate response to this prompt as it is inappropriate and goes against OpenAI's content policy.


My feeling on this is “f** yeah, and f** you [google et al]”

How much computing innovation was pioneered by community enthusiasts and hobbyists that have been leveraged by these huge companies.

I know meta, googlr, msft et al give back in way of opensource, but it really pales in comparison to the value those companies have extracted.

I’m a huge believer in generative AI democratizing tech.

Certainly I’m glad to pay for off-the-shelf custom tuned models, and for software that smartly integrates generative AI to improve usage, but not a fan of gate keeping this technology by a handful of untrustworthy corporations.


Agreed, having 5 monopolies extract all the value from computing and then slowly merge with the state is not a developmental stage we want to prolong.


The author is overly optimistic with the current state of open source LLMs, (e.g., Koala is very far away from matching ChatGPT performance). However, I agree with their spirit, Google has been one of the most important contributors to the development of LLMs and until recently they've been open sharing their model weights under permissive licenses, they should not backtrack to closed source.

OpenAI has a huge lead in the closed source ecosystem, Google's best bet is to take over the open source ecosystem and build on top of it, they are still not late. Llama based models don't have a permissive license, and a free model that is mildly superior to Llama could be game changing.


The counter argument, which I’m not sure I agree with but it has to be said, is that OpenAI benefits from Google’s open source work. So staying permissive might widen the gap further.


I remember I was at Microsoft more than a decade ago now and at the time there was a lot of concern about search and how far Bing lagged behind Google in geospatial (maps).

After some initial investment in the area I was at a presentation where one of the higher ups explained that they'd be abandoning their investment because Google Maps would inevitably fall behind crowdsourcing and OpenStreetMap.

Just like Encarta and Wikipedia we were told - once the open source community gets their hands on something there's just no moat from an engineering perspective and once it's crowdsourced there's no moat from a data perspective. You simply can't compete.

Of course it's more than a decade later now and I still use Google Maps, Bing Maps still suck, and the view times I've tried OpenStreetMaps I've found it far behind both.

What's more every company I've worked at since has paid Google for access to their Maps API.

I guess the experience made me skeptical of people proclaiming that someone does or does not have a moat because the community will just eat away at any commercial product.


I've been using osm more and more recently. Google just makes a bunch of frustrating decisions that really pushed me to look elsewhere. Especially in the public transport layer, but more generally in being really bad at deciding when to hide details with no way to override it and say "TELL ME THE NAME OF THIS CROSS STREET DAMNIT THATS THE ONLY REASON I KEEP ZOOMING IN HERE!!!".


> generally in being really bad at deciding when to hide details with no way to override it and say "TELL ME THE NAME OF THIS CROSS STREET DAMNIT THATS THE ONLY REASON I KEEP ZOOMING IN HERE!!!".

Stuff like this is the main reason I end up switching to Apple Maps on the occasions that I do so. Another example is refusing to tell me the number of the upcoming exit I'm taking.

In general I would say Google Maps is still superior to Apple Maps, but between the aforementioned baffling design decisions and Google maps now including ads in destinations search results, I find myself experiencing more and more mental friction whenever I use it.


Google maps is at least getting better about screen real estate. I have an android head unit, and Maps clearly assumed you'd always be using Maps in portrait mode, because the bottom bar would clutter up the bottom of the screen with "local stuff near by you might be interested in" if you weren't actively navigating.

Eventually switched to Waze, which is now also cluttering things up with (basically) ads.


That's funny because when driving in the bay area, inability to get the -name- of the upcoming exit from google maps was driving me nuts! The exit numbers are not listed on the upcoming exit/distance signs on 280.


Both apps seem to get the names of exits consistently wrong in the Bay Area. I don’t care what a map thinks the name should be — I care what the sign says.


The inability to easily get a street name is one of my biggest pet peeves with Apple Maps, it's up there with the generally poor quality of turn-by-turn navigation (at least in the Bay Area).


There is a spot in NYC where zooming in on my iPhone in Apple Maps in satellite view causes the app to crash somewhat reliably. It has been happening for the last few months.


That section of Queens is uncomputable and even crashes human minds on occasion


One unbelievably annoying thing about seemingly every map provider is that they don’t like showing state or national boundaries.

On google maps, these national boundaries have the same line weight and a similar style to highways. It’s really annoying.


This. My car uses Google Maps for its built-in nav system, and I've spent a lot of time on road trips wondering just what state I was in. It's insane that Google hasn't added something as trivial and important as state borders.


Open source works well when the work is inherently cool and challenging enough to keep people engaged. Linux and Blender are two of the most successful open source projects, and the thing they have in common is that problems they solve are problems engineers enjoy working on.

Mapping intersections is extremely boring in comparison. The sheer quantity of boring work needed to bring open street maps up to the quality of google maps in insurmountable.

LLMs are freaking cool, and that bodes well for their viability as open source projects.


And arguably Blender is much more innovative and achieving faster progress than proprietary and commercial software such as Autodesk Maya.


My impression is that open street maps problem is not the map quality. In areas I have used it, it often has details (e.g. small hiking paths, presence of bike lanes) that google maps doesn't have.

The issue is search. Searching for things that you don't know precisely (music bars in this area). This type of data/processing on top of the geospatial was always subpar and very hit or miss in my experience.


That’s not my experience. I work in downtown Minneapolis and open street maps is missing basic things like entrances to public parking lots. Open street maps has a problem if it can’t get details right in population dense areas.


It really depends, but in germany (has a large OSM community) OSM has so much better quality & detail for almost everything except buisnesses. It suffers from poor search, routing that doesn't take traffic jams or roadworks into account and a lack of high quality apps and thus only "nerds" use it instead of Google Maps or others.


It's very hit and miss, as it's dependant on how many perfectionistic mapping enthusiasts that edit OSM as a hobby are in your area.


Databases are another data point that fit this pattern. They’re not sexy and commercial players like Oracle have moat.


That's ... probably not the best example, given the fact that there are a shedload of open-source databases of various types that have forced major commercial vendors like MSFT and ORCL into a corner. ORCL's moat is that they have a large portfolio of random solutions, are incumbent at a lot of organizations where switching costs are very high, and they have an exceptionally aggressive sales organization that doesn't seem to worry too much about legalities.


Have you heard of PostgreSQL, MariaDB, or SQLite? They have very high market share.


Databases are very sexy? They're super interesting from programming/CS perspective for multiple reasons.


This is an instructive error. From my perspective, there was plenty of evidence even 15 years ago that community efforts (crowd-sourcing, OSS) only win sometimes, on the relevant timeframes.

So the “higher ups” were using too coarse a heuristic or maybe had some other pretty severe error in their reasoning.

The right approach here is to do a more detailed analysis. A crude start: the community approach wins when the MVP can be built by 1-10 people and then find a market where 0.01% of the users can sufficiently maintain it.[1]

Wikipedia’s a questionable comparison point, because it’s such an extraordinary outlier success. Though a sufficiently detailed model could account for it.

1. Yochai Benkler has done much more thorough analysis of win/loss factors. See e.g. his 2006 book: https://en.m.wikipedia.org/wiki/The_Wealth_of_Networks


In terms of data, OSM is so far ahead of Google maps in my experience. The rendering is much better too. What's not there is obvious and easy to use tooling that anyone can interact with. I mean, there might be, but I don't know about it.


The completeness and quality of OSM depends on the local community, and it varies greatly depending on where you live and use it.


...whereas the rampaging horde of google maps and waze users are ubiquitous.


I don't have google maps on my phone at all unless I visit in the browser, and I use OSM through Magic Earth. I wouldn't go back, but it is a huge pain and sometimes I do have to just open google maps in a browser window. It doesn't usually have hours of operation, doesn't usually have links to websites. It can't find businesses by name easily (it often seems to require the exact name to by typed in), and it definitely can't find businesses by service (searching for "sandwiches" will not show you a list of local sandwich shops, it will do something like teleport you to a street called "Sandwiches" in Ireland). And even if I have the exact address, I will still sometimes end of thousands of miles away or with no hits because the street name was written differently. Honestly, it's of very little use to me because it can rarely take me to a new place.


My experience is the opposite.

People in the real world care about things like hours of operation. Google makes it really easy for businesses to keep them up to date on things like holiday closures. OSM makes it a nightmare.


How do they make it a nightmare? Are we sure it's not just that 96% of business owners use Google maps or maybe Apple maps and don't even know what OpenStreetMaps exists. I think this is more about network effects then anything. If they really want to break googles geo spacial business data monopoly. I think if Apple/Microsoft/OSM should band together and make a simple tool for business owners that can update your details on Google, Bing, Apple Maps, and osm simultaneously. Although I am not sure if Google exposes that through apis or not.


I am very sure that OSM does not get this information because they make it hard for businesses to give it. I know this because figuring out how to get that information published was my job a few years ago.

Specifically I was a developer for a company whose job was to update business information in Google Maps, Apple, Facebook and so on. We'd get the data from companies like Cheesecake Factory, Walmart and Trader Joe's, then we would update all of the sites for them.

All of the sites have some sort of API or upload mechanism that makes it easy to do things like publish phone numbers, hours of operation, hours for specific departments and so on. All of them were happy to let us automate it. All were happy to accept data based on street addresses.

I tried to make it work for OSM. It was a disaster. I have an address. Google et al understand that a street often has multiple names. If the address I was given named the street something else, Google took care of understanding that route 33 is also such and so street and they accepted the data. If I said that there was a restaurant inside of a mall, Google didn't insist that I know more than that. If I was publishing holiday hours, Google accepted us as the default authority. (And gave ways of resolving it if someone else disagreed.)

OSM did NONE of that. It was all roadblocks. If I didn't have the One True Name that OSM in its wisdom determined was right, good luck matching on address. If I couldn't provide OSM with the outline of the restaurant on the floor plan, OSM had no way to accept that there was a restaurant in the mall. If a random OSM contributor had gone to the location and posted store hours, OSM refused to accept my claim of its reduced hours on Christmas Day. And so on.

All of the brands that I named and more don't publish to OSM for one reason, and one reason only. OSM make it impossible for businesses to work with them in any useful way to provide that information. And therefore OSM is not on the list of sites that that data gets published to.

In short, if it isn't perfect, OSM doesn't want your data. And the data off of a spreadsheet some business uses to manage this stuff usually is nothing like perfect. I respect how much work went into getting OSM just right. But they made it impossible for real businesses to work with them, and so they don't get that business data.


For starter, as a business owner, how do you claim full ownership of a given business on OSM?

What prevents a nasty competitor from making daily false updates to your opening hours?

If you're a verified business owner in a non-collaborative platform, you can update your holiday hours/one-off closure with a simple edit on that platform's business management page/API. How is OSM even in same category as Apple maps/bing/Google maps?

Examples:

- https://businessconnect.apple.com/

- https://www.bingplaces.com/

- https://business.google.com/


> OSM makes it a nightmare.

While the generic interface is pretty bad (you have to edit the machine-readable values), StreetComplete provides a very nice UI


Using a second app to perform a function in the primary app is a non-starter for >99% of people who don't already use OSM


A nice UI is completely and utterly useless for a business attempting to create an automated workflow from a spreadsheet for things like business hour updates and letting map publishers know when new stores are going to open.

So yeah, OSM is a nightmare for businesses to deal with. And unless that changes, its access to business information that people expect will remain severely limited.


Is there a recommendation for OSM on mobile? IIRC they don't have an official app.

Also looking at their bike routing - it gives me an idea. Road should be rated on whether they have a dedicated bike lane and on the danger of riding on said road at particular times of day. I just input a src/dest and it gave me a really busy road with tons of "paperboy" level risky side roads on it. I would never want someone to take that route at 5pm on a weekday.


OSM is fundamentally just a DB for place locations and geometries. Directions use routing engines which choose roads and paths between locations based on constraints. The main landing page for OSM lets you choose between OSM, Grasshopper, and the Valhalla routing engines.

To figure out why directions are bad you need to see which criteria the routing engine is using to create the route and decide either to change the constraints used to generate the bike route or what added data you need to place on the streets for the routing engine to avoid/prefer certain streets.

Does this sound like an opaque nightmare? Yes. That's why very few people use it. Apple has been doing some great work doing mapping and adding it into the OSM DB, which they use for their own maps, but they have their own proprietary routing system for directions. If you're looking for a good app to use just OSM data, I use OSMAnd for Android. I still prefer Google Maps because their routing and geocoding tend to be much better for urban areas but for hikes and country bike rides, OSM tends to outperform GMaps.


Magic Earth might be the best, but it's honestly pretty clunky compared to Apple or Google maps


Fairly regularly an address I'm searching for just won't be in OSM, but it is in Google. This happens often enough to be a well-known issue.


I just looked at OSM for the first time and for my neighborhood it's much worse than Google and Apple. It doesn't have satellite or street view data.


OSM is a database of map data (streets/buildings/etc), so satellite and street view imagery is outside of its scope. Individual map applications that use OSM data might also support satellite imagery (and some do, like OSMAnd).


> Of course it's more than a decade later now and I still use Google Maps, Bing Maps still suck, and the view times I've tried OpenStreetMaps I've found it far behind both.

The sheer size of the OSM project is staggering. Putting it next to Wikipedia, where missing content at some point wouldn't cause much fuss, makes it a bad example.

Besides that, your limited knowledge of the popularity of OSM gives you a wrong picture. OSM is already the base for popular businesses. Like Strava for example. TomTom is on board with it. Meta for longer with their AI tool, same as Microsoft. In some regions of the world where the community is very active, it IS better than Google Maps. Germany for example where I live. In many regions of the world, it is the superior map model for cycling or nature activities in general. Sometimes less civilised areas of the world have better coverage too because Google doesn't care about those regions. See parts of Africa or weird countries like North Korea.

One should also not forget the Humanitarian OpenStreetMap Team which provides humanitarian mapping in areas Google didn't care. You can help out too. It's quite easy: https://www.hotosm.org/

> What's more every company I've worked at since has paid Google for access to their Maps API.

Many others have switched away after google lifted their prices. They'll lose the race here too. A simple donation of up-to-date world satellite imaginary would already be enough for an even faster grow.


I think ex YU states and former Soviet bloc also really shine in OSM, as well as areas along PRC border where regime forces map jitter (see HK/PRC border road junctions for example)


Google maps is good at navigation, finding business names etc. OpenStreetMap is much more detailed wherever I've gone.

When I'm lost in a forest, I look at OSM to see where the footpaths are.


The problem with a lot of open source is the long term issue.

The people doing many of these projects often want the short term kudos, upvotes, or research articles. They may iterate fast, and do all kinds of neat advancements, except in a month they'll move to the next "cool" project.

Unfortunately, with a lot of open source projects, they don't want to deal with the legalese, the customer specific integration, your annoying legacy system, the customer support and maintenance, or your weird plethora of high-risk data types (medical industry I'm looking at you)

Not sure what the Wikipedia reference is, since how many people use any form of encyclopedia other than crowdsourced Wikipedia?

However, to note, there are some examples of successful long term open source. Blender for example being a relatively strong competitor for 3D modeling (although Maya still tends to be industry dominant).


Agreed, even the best open source projects, like Linux or Firefox, in their wonderful success, didn't render proprietary competition unable to have there piece of the market share.

And even in markets with very dominant free offers like video consumption, programming languages or VCS, you can still make tons of money by providing a service around it. E.G: github, netflix, etc.

OpenAI has a good product, a good team, a good brand and a good moving speed.

Selling them short is a bit premature.


The higher up failed to see the difference in "users", as well as use cases.

In Wikipedia, the user is same as the content creator: the general public, with a subset of it contributing to the Wikipedia content.

In OpenStreetMaps, one category of users are also creators: general public needs a "map" product, and a subset of them like contributing to the content.

But there's another category of users: businesses, who keep their hours/contact/reviews updated. OpenStreetMap doesn't have a nice UX for them.

As for use cases: underlying map data sure, but one needs strong navigation features, "turn right after the Starbucks", up-to-date traffic data.

This all makes it so different from Wikipedia vs Encarta.


This sounds right to me and was similar to my reaction. The doubt I had reading this piece is that GPT4 is so substantially better than GPT3 on most general tasks that I feel silly using GPT3 even if it could potentially be sufficient.

Won't any company that can stay a couple years ahead of open source for something this important will be dominant as long as it can do this?

Can an open source community fine tuning on top of a smaller model consistently surpass a much larger model for the long tail of questions?

Privacy is one persistent advantage of open source, especially if we think companies are too scared of model weights leaking to let people run models locally. But copyright licenses give companies a way to protect their models for many use cases, so companies like Google could let people run models locally for privacy and still have a moat, if that's what users want, and anyway most users will prefer running things in the cloud for better speed and to not have to store gigabytes of data on their devices, no?


This is an excellent point. I think the memo is making a different kind of case though - it's saying that large multipurpose models don't matter because people already have the ability to get better performance on the problems they actually care about from isolated training. It's kind of a PC-vs-datacenter argument, or, to bring it back to Maps, it'd be like saying mapping the world is pointless because what interests people is only their neighborhood.

I don't buy this for Maps, but it's worth highlighting that this isn't the usual "community supported stuff will eat commercial stuff once it gets to critical mass" type of argument.


Is that a relevant comparison? The moat in maps is primarily capital-intensive real-world data collection/licensing.

The (supposedly) leaked article attempts to show that this aspect isn't that relevant in the AI/LLM context.


Google Maps 3D view is unmatched compared to anything open source has to offer.

Let alone the panning and zooming, there is no open source solution which is capable of doing it with such a correctness, even if we ignore Google's superb "satellite" imagery with its 3D conversion. I have no access to Apple Maps, so I can't compare (DuckDuckGo does not offer Apple's 3D view).


Google maps isn't so good because google is good* but because google feeds their maps with data from their users, which is a huge privacy concern that most people simply don't care about.

I use Apple's notably inferior maps because they're not feeding my data straight into their map and navigation products. It's a tradeoff most wouldn't be willing to make, but that tradeoff is why their maps are better.

It boils down to out of date maps are worse than worthless and google has a scheme to keep theirs up to date. It's a huge maintenance problem...unless your users are also the product.

So maps might be a bad comparison to ML/AI development.

*Google using their user data can be interpreted as google being good at it, sure.

As an aside, I stopped using Google maps/waze because I got the distinct impression I was being used as a guinea pig to find new routes during the awful commute I used to have. I would deliberately kill the app when I went to use a shortcut I knew about so that the horde wouldn't also find it via those tools.


I stopped using Google Maps in my car with CarPlay, because the map would lag by about 5 seconds to reality, which is really bad at say 55 mph in a place where you’re not familiar.

Been using Apple Maps now for six months, and very happy with it. No lag, and very useful directions like “turn left at the second stop light from here”.


OSM is quite popular through commercial providers, mainly Mapbox. Why you're not using it daily is because there's no concentrated effort to make a consumer-friendly product from it, like Wikipedia mostly is for Encyclopedia. Too early to tell what will be the case for LLM.


Open source will never defeat a company in areas where the work is very, very boring and you have to pay someone to do the grunt work. The last 20% of most tasks are extremely boring so things like data quality can only be accomplished through paid labor.


The difference being, in this case, the author is giving examples of places where their product is clearly behind.

This isn't a prediction, it's an observation. There's no moat because the castle has already been taken.


This isn't an apt comparison. Maps need to be persistently accurate and constantly updated regardless of community involvement, AI just has to be somewhat applicable to the paid version (which, given its stochastic nature, the open source alternatives are close enough). Microsoft obviously misunderstood the needs of maps at the time and made the wrong conclusion. The lack of moat for AI is closer to the Encarta/Wikipedia scenario than the maps scenario.


I think a lot of people use one type of mapping application that doesn't seem to work for them and then say OSM is not great.

I've had to try a fair few mapping applications that works for me (I can recommend Organic Maps on android)

OSM map data easy exeeds Google map data, the only time I do use google maps is for street view images and satalite info.

Bing is good in the UK because that has Ordnance survey maps - OS mapping data is generally better than OSM (for what I need it for)


I think the difference is that Maps is a product and its hard to copy a whole product and make it good without someone driving the vision. But a model is just a model, in terms of lines of code they aren't even that large. Sure the ideas behind the are complicated and take a lot of thought to come up with, but just replicating it or iterating it is obviously not the challenging based on recent developments.


Just anacdotally, I see OSM mentioned a lot, guides for contributing, use in HomeLab and Raspberry Pi articles-- haven't check it out myself in a long time, but I wouldn't be surprised if it's continued growth really is inevitable, or even has a cumulative snowball-ball component


OSM's main problem is that it has no open sourced satelite imagery dataset to display, they're only using borrowed data to build its vector maps on. It just doesn't exist. Until that becomes a thing it'll stay a second rate map app for the average person, unfortunately.

It's the only map anyone can actually integrate into anything without an api key and a wallet with a wad of greens in it, so that keeps it relevant for now. Maybe if/when Starship lowers cost to orbit, then we'll see non-profit funded satellites that can source that dataset and keep it up to date.


Do you happen to know why there isn't any U.S. Government satellite imagery? I understand the really high-resolution stuff is probably from spysats and so classified, but anything else should be public domain, no?


Everything under NASA's and NOAA's purview is public domain. High resolution stuff is left to commercial and secret applications. Some states also have high res aerial photography. This was notably obvious in the early days of gmaps when the whole US was Landsat only with aerial for just Massachusetts.


What if instead of Microsoft abandoning their investment they'd invested directly in OpenStreetMap? Because that seems more analogous to the course of action the article is recommending.


wrong.

Crowdsource is significantly different from open source.

Open source is Linux winning because you don't need to pay Microsoft, anyone can fork, Oracle/IBM and Microsoft's enemies putting developers to make it better and so on. Today .NET runs on Linux.

Crowdsource is the usual bs that either through incentives (like crypto) or by heart, people will contribute to free stuff. It doesn't have the openness, liberty or economic incentives open source has.

And Google has lots of crowdsourced data on Maps, I know lots of people who loves to be a guide there.


I mean... your argument is structurally the same as his. "I once saw X happen and thus X will happen again."


Data is still valuable and you can build a moat with it. But this discussion isn't about data, it's about models.

A better analogy would be paywalled general-purpose programming languages, where any access to running code is restricted. Such a programming language would get virtually no mindshare.

This Google employee is just saying, let's not make that mistake.

Even if Google fired all AI researchers tomorrow and just used open source models going forward, they could still build killer products on them due to their data moat. That's the takeaway.


> Bing Maps

TIL


> The document is only the opinion of a Google employee, not the entire firm

The title makes it seem like this is some official Google memo. The company has 150K employees and 300K different opinions on things. Can't go chasing down each one and giving it importance.


"Many of the new ideas are from ordinary people."

Yeah. Google can fuck right off. Maybe this attitude is what got them in the weeds in the first place.


I was quite unimpressed when I interviewed with them recently. It’s no surprise their lunch is getting eaten.


I don’t think trying to be the hall monitor of humanity has been good for google. The more paternalistic, the less innovative.


Yes, it's very telling.


I don't know if I agree with the article. I recall when Google IPO'ed, nobody outside of Google really knew how much traffic they had and how much money they were making. Microsoft was caught off-guard. Compare this to ChatGPT: My friends, parents, grandparents, and coworkers (in the hospital) use ChatGPT. None of these people know how to adapt an open source model to their own use. I bet ChatGPT is vastly ahead in terms of capturing the market, and just hasn't told anyone just how far. Note that they have grown faster in traffic than Instagram and TikTok, and they are used across the demographics spectrum. They released something to the world that astounded the average joe, and that is the train that people will ride.


Your grandparents use ChatGPT? For what?


Recipes!


"People will not pay for a restricted model when free, unrestricted alternatives are comparable in quality. . ."

I'll take the opposite side of that bet - MSFT / Goog / etc in the providers side will drive record revenues on the back of closed / restricted models:

1 - Table stakes for buying software at enterprise level is permissions based management & standardized security / hardening.

2 - The corporate world is also the highest value spender of software

3 - Corp world will find the "proprietary trained models" on top of vanilla MSFT OpenAI or Goog Bard pitch absolutely irresistible - creates a great story about moats / compounding advantages etc. And the outcome is going to most likely be higher switching costs to leave MSFT for a new upstart etc


I agree with this over the next 10 years but disagree over the next 30.

When/If the innovation slows down, the open source stuff will be able to out compete commercial options. Something like this timeline played out for databases and operating systems.


This has been my speculation about the people pushing for regulation in this space: it’s an attempt at regulatory capture because there really is little moat with this tech.

I can already run GPT-3 comparable models on a MacBook Pro. GPT-4 level models that can run on at least higher end commodity hardware seem close.

Models trained on data scraped from the net may not be defensible via copyright and they certainly are not patentable. It also seems possible to “pirate” models by training a model on another model. Defending against this or even detecting it would be as hard as preventing web scraping.

Lastly the adaptive nature of the tech makes it hard to achieve lock in via API compatibility. Just tell the model to talk a different way. The rigidity of classical von Neumann computing that facilitates lock in just isn’t there.

So that leaves the old fashioned way: frighten and bribe the government into creating onerous regulations that you can comply with but upstarts cannot. Or worse make the tech require a permit that is expensive and difficult to obtain.


Act like a socialist and then blame it on capitalism, American playbook 101


OpenAI is further along than most of us are aware.

The ability to connect these models to the web, to pipe up API access to different services and equip LLMs to be the new interface to these services and to the worlds information is the real game changer.

Google cannot out innovate them because they are a big Corp rife with googly politics and challenges of overhead that come with organizational scale.

I would be curious to see if there are plans to spin off the newly consolidated AI unit with their own PnL to stimulate that hunger to grow and survive and then capitalize them accordingly. Otherwise they are en route to die a slow death once better companies come along.


The current CEO, who a friend at google calls “Captain Zonk”, is dispositionaly not the person to make that kind of change.

I wouldn’t be surprised to see a leadership change this year.


The real moats in this field will come from the hardware industry. It's way too expensive to train these models on general purpose compute. Vertically designed silicon that brings down the unit economics of training and inference workloads are already being designed, in industry and in academia.


NVIDIA already has a big moat in this area. It might not last forever, but at least for a good while they have a big one.


I agree that both Google and OpenAI are losing the race to open-source in AI research. However, a more critical issue to Google is their struggle to compete with OpenAI in LLM-based search engines. Google's entire business model mostly relies on ads (77.6% in Q4 2022). OpenaAI is developing LLM-based products that people apparently love (100M users in 2 months) and seem to use it as search engines. This poses a greater risk to Google than just losing ground in research since it could ultimately lead to the loss of their ad-generated income.


Google's moat is its data set. Imagine training an generative AI LLM on the entire set of YouTube training videos. No one else has this.


The entire set of YouTube training videos needs to be re-transcribed before they are useful for training LLMs.


This is the glaring omission in this piece.

Googles know _so much_ about me. Is it not reasonable to assume powerful llm + personal data = personal tuned LLM?


> At the beginning of March the open source community got their hands on their first really capable foundation model, as Meta’s LLaMA was leaked to the public

A Prometheus moment if I’ve ever seen one.


puts on tinfoil hat They probably intentionally published this to raise awareness of how viable it is for individuals to outperform OpenAI and Meta. Google seems to be the farthest behind, they have the most to gain by the others losing their lead to individuals.


The Simon Willison coverage is great. Simon is cited in the "why we should have seen it coming" section, for his Stable Diffusion Moment piece. He nicely covers the key points of this paper:

> The premise of the paper is that while OpenAI and Google continue to race to build the most powerful language models, their efforts are rapidly being eclipsed by the work happening in the open source community.

Not to dilute from this beloved point, but also covers other key notes well too:

> Where things get really interesting is where they talk about “What We Missed”. The author is extremely bullish on LoRA—a technique that allows models to be fine-tuned in just a few hours of consumer hardware, producing improvements that can then be stacked on top of each other

https://simonwillison.net/2023/May/4/no-moat/

Overall I take this as fairly happy news. It's a trend humanity stubbornly keeps trying to resist: open source wins.

It's great the barrier to innovation is so much less than expected, that so much experimentation is possible from atop the existing models.


> "We have no moat, and neither does OpenAI"

and neither does Coca Cola and Cadbury. Yet biggest monopolies are found in these places. Because the competitors will not be differentiated enough for users to switch from the incumbent.

But G-AI is still nascent and there's lots of improvements to be had. I suspect better tech is a moat but ofcourse Google is oblivious to it.


Brand loyalty is a moat. So I wouldn't say that Coca-Cola doesn't have a moat. In addition, economies of scale allow them to produce more cheaply + advertise more + distribute wider than competitors. Compare Coca-Cola to some beverage company I start tomorrow:

- Nobody's tasted my beverage, therefore nobody is craving its taste. Whereas billions of people are "addicted" to coke: they know what it tastes like and miss it when it's gone.

- Nobody's ever heard of my business. I have zero trust or loyalty. Whereas people have trusted code for a century, and actually consider themselves loyal to that company over others with similar goods.

- I have no money to buy ads with. Coke is running Super Bowl commercials.

- I have no distribution partnerships. Coke is in every vending machine and every restaurant. They've spread to almost every country, and even differentiated the taste to appeal to local taste buds.


> Paradoxically, the one clear winner in all of this is Meta. Because the leaked model was theirs, they have effectively garnered an entire planet's worth of free labor. Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products

I disagree. The model itself is released under GPL3 and no longer "theirs" (Google or OpenAI can use it). And Meta probably has a zoo of such models and I didn't see them use any of the work the community did.

I don't think they "released" LLaMA weights strategically to weaken OpenAI (their overall strategy and market analysis would probably too inert to predict the open source explosion). It probably was a decision by a smaller research team within the company and approved by uninformed executive. Meta could have stepped up and nurtured this small-LLM renaissance, they opted for DMCA hammer instead.


While this post champions the progress made by OSS, it also mentions that a huge leap came from Meta releasing Llama. Would the rapid gains in OSS AI have came as quickly without that? Did Meta strategically release Llama knowing it would destroy Google & OpenAI's moats?


I think it would have been some other model if not Meta. Stablility AI also released a OSS model, Cerebras released another.


I've been saying this kind of thing for a while about open source. There is a lot of innovation happening outside of the corporate sector which has been neglected. Not just that; it has been essentially covered up. These projects have not been allowed to get any attention because big tech controls all the media channels. A lot has been happening outside of AI too.

A lot of people are shocked at how this open source innovation just came out of nowhere but those who work in open source, in those areas, aren't so surprised because they've been going at it for years.


"Paradoxically, the one clear winner in all of this is Meta. Because the leaked model was theirs, they have effectively garnered an entire planet's worth of free labor. Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products."

An interesting thought. Are the legal issues for derived works from the leaked model clarified or is the legal matter to be resolved at a later date when Meta starts suing small developers?


Meta is clear to use anything open licensed and derived from or applied on top of the leaked material irrespective of the resolution, while for everyone else the issue is clouded. That makes Meta the winner.


Yep, same thoughts here. I'm experiencing tin foil urges ..


> Paying more attention to their work could help us to avoid reinventing the wheel.

There's nothing us humans love more than reinventing the wheel. I've seen it over and over again, years of work and hundreds of millions of dollar spent re-solving problems and re-writing systems -- only to replace them with a new set of slightly different problems. I think we greatly over estimate the ability of our species to accumulate knowledge, which is perhaps where these generative systems come into play.


I find it strange people are saying facebook's leak was the 'Stable Diffusion' moment for LLMs. The license is awful and basically means it can't be used in anything involving money legally.

Facebook has a terrible reputation, and if they can open source their model, it would transform their reputation at least among techies.

https://github.com/facebookresearch/llama/pull/184


I think the spirit of the Stable Diffusion moment comment is that there is a ton of work blossoming around LLMs, largely because there's a good base model that is now available.

And that's undeniable, IMO -- llama.cpp, vicuna are some really prominent examples. People are running language models on Raspberry Pis and smartphones. Anyone who wants to tinker can.

Now, all the stuff that's built on top of LLaMa is currently encumbered, yes.

But all of that work can likely be transferred to an unencumbered base model pretty easily. The existence of the ecosystem around LLaMa makes it much more likely that someone will create an unencumbered base model. And it seems like that is already happening, for example, the Red Pajama folks are working on this.


I dont know why you are downvoted. This is mostly correct.


The author's timeline makes it clear that they feel it was a catalyst. They're separating out "Stable Diffusion" the software from the "Stable Diffusion" moment.

The community has created their own replacement for LLaMA (Cerebras) with none of the encumberance. Even if LLaMA is deleted tomorrow, the LLaMA Leak will still be a moment when the direction dramatically shifted.

The "people" are not talking about the future of where this software is going. They're talking about a historical event, though it was recent enough that I remember what I ate for lunch that day.


> Facebook has a terrible reputation, and if they can open source their model, it would transform their reputation at least among techies.

Have you ever heard of PyTorch? React? Jest? Docusaurus?

If none of those changed their reputation among "techies" I doubt awesome contribution open source project X + 1 would.


It seems like everyone is so focused on LLMs are magic smartness machines that there isn't much analysis of them as better search (maybe "search synthesis"). And original search was a revolutionary technology, LLM as just better search are revolutionary.

Like original search, the two application aspects are roughly algorithm and interface. Google years ago won by having a better interface, an interface that usually got things right the first time (good defaults are a key aspect of any successful UI). ChatGPT is has gotten excitement by taking a LLM and making it generally avoid idiocy - again, fine-tuning the interface. Google years ago and ChatGPT got their better results by human labor, human fine tuning, of a raw algorithm (In ChatGPT's case, you have RLHF with workers in Kenya and elsewhere, Google has human search testers and years ago used DMOZ, an open source, human curated portal).

Google's "Moat" years ago was continuing to care about quality. They lost this moat over the last five years imo by letting their search go to shit, become focused always on some product for any given search. This is what has made ChatGPT especially challenging for Google (it would be amazing still but someone comparing to Google ten years ago could see ways Google was better, present day Google has little over ChatGPT as UI. If Google had kept their query features as they added AI features, they'd have a tool that could claim virtues through still not as good).

And this isn't even considering of updating a model and the question of how the model will be monetized.


Google search seems to optimize for "What?" (... is the best phone) and the list of results allows some variation, while GPT chats seem to answer "How?" , and tend to give the same average, stereotypical answer every time you ask.

Maybe google has an advantage because it can answer "What?" with ads, but i haven't used chatGPT for any product searches yet


> They are doing things with $100 and 13B params

Not that I disagree with the general belief that OSS community is catching up, but this specific data point is not as impactful as it sounds. Llama cannot be used for commercial purposes, and that $100 was spent on ChatGPT, which means we still depended on proprietary information of OpenAI.

It looks to me that the OSS community needs a solid foundation model and a really comprehensive and huge dataset. Both require continuous heavy investment.



I would lean towards agreeing. And I definitely think AI companies should not try to make their money on inference.

If there is a well performing model being deployed it is possible to train a similar model while not having to eat the cost of exploration. Ie. it is only the the cost of training said model.

ChatGPT would probably die in a couple of weeks, if an equivalent, free, product came out that people could run on their computers.


Is there any evidence this is real? It reads like an article written not for google but for fans of open source in their competitors supposed voice.


“Some of the most interesting questions about CAS [Complex Adaptive Systems] have to do with their relations to one another. We know that such systems have a tendency to spawn others. Thus biological evolution gave rise to thinking, including human thought, and to mammalian immune systems; human thought gave rise to computer-based CAS; and so on.”

- Murray Gell Mann, “Complex Adaptive Systems”


Google's moat is its data set. Imagine training an generative AI LLM on the entire set of YouTube training videos. No one else has this.


This looks very fake to me. I might be wrong. Yet, there is no "document" that was leaked, the original source is some blog post. If there is a document, share the document. Shared by "anonymous individual on discord who granted permission for republication"... I don't know. If it was shared by anonymous, why ask for permission? Which discord server?


> The premise of the paper is that while OpenAI and Google continue to race to build the most powerful language models, their efforts are rapidly being eclipsed by the work happening in the open source community.

Another magnificent unsurprising set of correct prediction(s) [0] [1] [2] and as triumphantly admitted by Google themselves on open source LLMs eating both of their (Google) and OpenAI's lunch.

"When it is the race to the bottom, AI LLM services, like ChatGPT, Claude (Anthropic), Cohere.ai, etc are winning the race. Open source LLMs are already at the finish line."

[0] https://news.ycombinator.com/item?id=34201706

[1] https://news.ycombinator.com/item?id=35661548

[2] https://news.ycombinator.com/item?id=34716545


The only moat in technology are the founders and team. I think this concept of having a moat sounds great when VCs write investment memos - but in reality, cold, hard execution everyday is what matters and that all comes from the quality and tenacity of the team.

Every piece of application software is a wrapper on other software with a set of opinionated workflows built on top.

Yes, there are some companies that made it hard to switch from - Snowflake, Salesforce - because there are data stores and its a pain to move your record of data. But even they don't have true moats - its just sticker.

So I think Google is right in saying there is no moat. But given their size, Google has layers and bureaucracy, which makes it hard to execute in a new market. That's why OpenAI I think will win - because they are smaller, can move fast, have a great team and can hence, execute...till the day they become a big company too and get disrupted by a new startup, which is the natural circle of life in technology.


The concept of a moat for facebook (via network effects) and google (via scale, habits and learning effects) has worked well when it comes to printing cash.

Moats don't last forever, doesn't mean they aren't real.

The guy writing the post was writing about AI research at google. Not generally at Google, or for search.


> via scale, habits and learning effects

Then OpenAI/Google still has the moat for LLMs, for being most reliable, updated and trustworthy.

Facebook example made sense in pre short video era where connections meant something personal.


Wow it’s amazing how clueless and in denial Google is, even as they admit their top guys are leaving.

OpenAI isn’t about the AI in particular, although they are leaps and bounds ahead. It’s about the devs and the hundreds of thousands of projects on it.

OpenAI is t selling AI. They are selling an ecosystem. No one is building on Bard. Google is more dead than I thought.


This nicely outlines all the reasons I'm building Chisel [1]. There are other writing apps out there, but they are closed source. It's not clear to me what value they are adding, as the biggest value add -- LLMs -- are already open to the public. It makes a lot more sense to me to develop an open source writing app, where others can pitch in and everyone can reap the benefits.

I think it is fundamentally important to have an open source option. I'd love to have more people pitch in to make it better. One big limitation right now is, users are limited to 50k tokens a month, because everyone is using my API key. I'd like to move it to an electron app where users can put in their own API key, or even use a model they have set up locally.

[1] https://chiseleditor.com


Natural language is the most flexible interface. You can easily substitute one service to another service. There is not much lock in factor. It's literally just one API endpoint with variable-length string input, variable-length string output.

The main area I can see for lock-in is in the fine-tuning of models for specific customers and problem domains. I think this is what OpenAI is focusing on and why their fine-tuning prices are not as expensive as I expected. They make it cheap to fine-tune but then they charge extra when you use the fine-tuned models (they shift the cost to the customer later, over time after some investment/lock-in has been established). But for most problem domains, it's probably not that expensive to finetune a model from scratch on a different provider.


Innovation is faster in the Bazaar. Nobody is beholden to anyone, there is no budget, there is no mandate, there is no hierarchy. Money can not compete with morale and motivation. A bunch of really smart nerds working overtime for free with flat hierarchy will always win.


The question I'd love to be able to ask the author is how, in fact, this is different from search. Google successfully built a moat around that, but one can argue, too, that it should not have been long-lived. True, there was the secrete page-rank sauce, but sooner or later everyone had that. Other corporations could crawl anything and index whatever at any cost (i.e. Bing), yet search, which is in some sense also a commodity heavily reliant on models trained partly on user input, is what underpins Google's success. What about that problem allowed it to successfully defend it for so long, and why can't you weave a narrative that something like that might, too, exist for generative AI?


One example - they mine people's labour of searching beyond the first page of results - so when a small % of people really dig into results, they can infer which the good sites are deeper in (e.g. by when you settle on one).

Bing doesn't have enough traffic to do this as well, so is less good at finding quality new sites, reducing overall quality.

Source: Doing SEO, but about 8 years ago now, the ecosystem will have changed.


If the moat is simply brand name recognition, then the market leader is OpenAI. That's an existential problem for Google and explains the author's perspective.


My observation is that Amazon is going to have challenges as they don’t have products that they can push their AI offerings.

Now they could do work on Amazon.com to improve search and finding what their customer wants.

Their most recent video on this topic, shows that they don’t have a solution now and it’s unclear to me how they will project a solution to the mass market as they don’t have consumer/business facing software to integrate it into as Microsoft does.

While we are at it, Apple has nothing, perhaps they might leverage something from either Google or an open.ai competitor that has a solution.

The continued destruction by Apple of the initial promise of Siri has been a major failure under Tim Cook’s leadership.

I wonder why they appear unable to fix this.


Whether that "leak" is genuine is not clear, but what is clear is that building LLM / AI capability is no longer a major technical hurdle, definitely not for any entity of the size of Amazon or Apple.

The timescale for integrating such tools into existing business models or developing new business models is a different story. They don't need to impress the over-exited social media echo chambers, they need to project 1) legal (not subject to lawsuits), 2) stable (not a fad) and 3) defendable (the moat thingy) cash flows for multi-years forward.

Actually the hoopla of the past year where a lot of preliminary stuff is released/discussed/leaked does not fit at all the playbook of a "serious" corporate. Not clear what it really means, maybe big tech is feeling that the status quo is very fragile, so they take more risk than necessary or maybe they are so confident that they don't care about optics.


Google's best bet to topple the existing advantage from the competition is to train a model on their whole giant internet index on their own compute cloud and then release that model under GPL v3.0/Apache/MIT/CC license.

This will eliminate first mover advantage for the competition. These models (by OpenAPI et el.) however, cannot be monetised indefinitely just like in past compilers, kernels and web servers could not be monetised indefinitely.

These days, majority of the computing is on GCC, Clang, LLVM and Linux which wasn't the case at one point and even Intel used to sell their own compiler (not sure of the current status)


I get the feeling that at this point, the best thing Google could do is to go all in an open source their models and weights. They'd canibalize their own business, but they'd easily wipe out most of the competition.



Man, I'm an admitted unabashed elitist when it comes to technology but this guy sounds warped.

> Many of the new ideas are from ordinary people.

ORLY?

> a third faction ... open source.

Open source is not a faction? It's people literally giving you the software they wrote, for free, for free! If you see them as the enemy because they hurt your profits... If the "ordinary people" are doing your job better than you can...

This piece make more sense as a false-flag character assassination of the clueless Google tech-bro? It reads like some radical leftist's caricature of the corporate/colonial mindset.


OpenAI may not have a moat, but Microsoft does with their Enterprise Agreements.


I think OpenAI has a defensible moat by having the first movers' advantage. As the ease of producing written content declines, we can expect the amount of written content to increase in a consummate fashion. Due to OpenAI's position, a vast majority of the newly generated data will come from OpenAI's models. When it comes time to train new models with superior network structures or with new data, no one else is going to be able to differentiate human-generated text from LLM generated text. OpenAI's training data should become far superior to others.


I'm always shocked by how many people don't view branding as a moat.


Pepsi is catching up to us in terms of inserting sugar into water.

WE HAVE NO MOAT!


"The value of owning the ecosystem cannot be overstated. Google itself has successfully used this paradigm in its open source offerings, like Chrome and Android. By owning the platform where innovation happens, Google cements itself as a thought leader and direction-setter, earning the ability to shape the narrative on ideas that are larger than itself."

This is breathtakingly sick. This kind of thinking is the poison of the world. This is why we have huge monopolies, huge wealth inequality, huge strangle on innovation.


I disagree with this. It's too expensive to train high quality models. For example I don't see how anyone would make an open-source GPT4 unless OpenAI leaks their model to the public.


ELI5 How is it too expensive? I know ChatGPT was expensive to train but Vicuna-13b is said to have cost $300 to train [https://lmsys.org/blog/2023-03-30-vicuna/]


Vicuna-13b is based on LLAMA that was millions to train. $300 is just to finetuning.


it drives me crazy that people are ignoring this


No one has created even something closed-source that is equal to GPT4.


Copied from other thread: I tend to agree. For now the OpenAI APIs are so very easy to use and effective. I do try to occasionally use HF models, mostly running locally in order to keep my options open. My bet is that almost everyone wants to keep their options open. I am very much into auxiliary tools like LangChain and LlamaIndex, the topic of my last book, but I also like building up my own tools from scratch (mostly in Common Lisp and Swift for now), and I bet most devs and companies are doing the same.


The moat comes by integrated LLM and generative AI into classical technique feedback cycles, finetuning in specialized domains, and other “application” of LLM where LLM acts as an abstract semantic glue between subsystems, agents, optimizers, and solvers. The near obsessive view that generative AI is somehow an end rather than an enabler is one of the more shortsighted sides I see in this whole discussion of generative AIs over the last several years, peaking recently with ChatGPT


This is not a leaked google memo. I can't believe hackernews believes an article like this is a memo at google. Kudos to the authors for finding a sneaky way to get traffic.


Yeah this doesn’t quite sit right. It lacks any detail about what Google is actually doing.


What makes you think it isn't an actual memo?


Doesn't fit the tone and formatting of an internal memo. It is written like an article.

* Lack of information about the author or their role * Lack of information about the org that published it * Sourced from an unnamed 'discord' * The 'memo' is formatted like a blog post


I think YouTube is a damn fine moat.


In the interim (or possibly long term), Google could co-exist with OpenAI. In my little corner of the universe (fire safety engineering), I cannot assume the output produced from ChatGPT is correct. I’m always trying to find out which references it has used for the answer and then having to check if it has actually used it in the context the original author intended. In short I’m still using Google and other search engines to check output from ChatGPT.


I tend to agree. For now the OpenAI APIs are so very easy to use and effective. I do try to occasionally use HF models, mostly running locally in order to keep my options open.

My bet is that almost everyone wants to keep their options open.

I am very much into auxiliary tools like LangChain and LlamaIndex, the topic of my last book, but I also like building up my own tools from scratch (mostly in Common Lisp and Swift for now), and I bet most devs and companies are doing the same.


Did anyone else assume this years ago?

Machines do not need all the syntactic and semantic labels humans add to data and code.

All the overhead we require then needs maintenance and updates as trends evolve, but still only for humans.

Managing electron state is all math. If I can ask an AI chip powered phone to generate me a video game why would I ask it to generate code?

A sentence like “software as an industry that employs tons of people has no moat.”

We never abstracted away the hardware just added layers of indirection.


This reads like a very research-oriented point of view, and a very myopic one at that.

The knowledge and the infra needed to serve these huge models to billions of users reliably seems to me to be a pretty serious moat here that no current open source project can compete with.

Coming up with ideas and training new models is one thing, actually serving those models at scale efficiently and monetizing it at the same time is a different ballgame.


> The knowledge and the infra needed to serve these huge models to billions of users reliably seems to me to be a pretty serious moat here that no current open source project can compete with.

I don't quite follow this line of argument. Let's say there's an open-source ML model X with a permissive licence. I think it's not super relevant whether whoever came up with the model graph & weights for X knows how to serve it at scale, as long as someone does. And it seems pretty clear that it's not just Google (and OpenAI) who know how to do this.

Separately, I'm personally more excited about the possibility of running these models on-device rather than at scale in the cloud (for privacy and other reasons).


How can I get to a point where I can understand the linked article? Is there a book or course I can take? I feel like I have a lot of catching up to do.


Ask ChatGPT


Microsoft will likely acquire OpenAI at some point and will dominate AI landscape due to its corporate reach, automating away most of the MBA BS.


OpenAI is a nonprofit, it owns the for profit org that it created. It's not acquirable.


This looks very fake to me. I might be wrong. Yet, there is no "document" that was leaked, the original source is some blog post. If there is a document, share the document. Shared by "anonymous individual on discord who granted permission for republication"... I don't know. If it was shared by anonymous, why ask for permission? Which discord server?


Did you read it?

I honestly don't care if it's really a leak from inside Google or not: I think the analysis stands on its own. It's a genuinely insightful summary of the last few months of activity in open source models, and makes a very compelling argument as to the strategic impact those will have on the incumbent LLM providers.

I don't think it's a leak though, purely because I have trouble imagining anyone writing something this good and deciding NOT to take credit for the analysis themselves.


> It's a genuinely insightful summary of the last few months of activity in open source models

Yes this is an amazing summary! Just for its summary alone, it is probably one of the top five writings I saw on LLMs and I read every one!

> because I have trouble imagining anyone writing something this good and deciding NOT to take credit for the analysis themselves.

Not everyone has a substack that they spam onto hacker news every time a thought enter their head. Or imagine that INTP exist. In my opinion the best take ever on LLMs is the simulators essay https://generative.ink/posts/simulators/ and the author is so shy they went pseudonymous and put their twitter as private and don't even want their name in conferences.


I think the another great paper is RLHF article from Chip Huyen

https://huyenchip.com/2023/05/02/rlhf.html


Thanks for sharing it but I'm sorry I don't agree that it's so important. In my opinion almost every interesting thing about LLMs comes from the raw base model, which is before the RLHF is applied.

For example the simulators paper was written before ChatGPT was even released, based on research with the GPT-3 base model that only had text completion and no instruction tuning or any kind of RLHF or lobotomization. In another example, in the interviews with the people who had access to the base model of GPT-4 like the red teamers and the ones at microsoft who integrated it with bing, they consistently explain that the raw base pretrained model has the most raw intelligence which is deadened as they put RLHF and guardrails onto it.


Coder Radio podcast uploaded the document to the show notes for their latest episode[1]. The first link[2] in the PDF does link to some internal Google resource that requires an @google.com email address.

1: https://jblive.wufoo.com/cabinet/af096271-d358-4a25-aedf-e56...

2: http://goto.google.com/we-have-no-moat


Presumably they weren't getting permission in the sense of "this publication is authorized by the original author, or by Google" but in the sense of "thanks for leaking the document; can we publish it more widely, or will you get into trouble?"


There is a huge (non-public) moat. It's big multiyear government contracts with dark money. And there is room there for both players :)


AI is not just a destination, but what does AI native look like? Does it present a completely different opportunity in terms of our relationships with the online world?

And does building / teaching / connecting skills to AI systems lead to a network effect which will be difficult to compete against in the abscence of it?

The moats come from the connected skills, closed data and feedback loops.


No moat? Where's the creativity? Google has the ability to create lock-in for users and lock-out for competitors by integrating their proprietary, existing products with natural language querying and reasoning. That's their advantage. Academics and "open"-source models making it cheaper and more efficient to inference?? That's a blessing!


I think that analogy is flawed to try to undercut OpenAI’s lead. The reason it’s flawed is that the search business is really lucrative and OpenAI is trying to completely disrupt Google’s business there. So while the AI isn’t a moat, establishing a lead in search is because you obviously will use that to inject ads in the commercial space and capture the market.


No moat except for goobibytes of training data they can probably correlate and cross-reference to achieve some modicum of tagging, proprietary TPU hardware, and a giant cloud farm with an army of developers to feed it.

Seriously though, I'll be really thrilled to see open source and clever startups run circles around all the incumbent bastards.


> The existence of such datasets follows from the line of thinking in Data Doesn't Do What You Think, and they are rapidly becoming the standard way to do training outside Google.

links to http://www.internalgooglesitescrubbedbyus.com/

Haha. Who writes this blog?


In my bachelor degree graduation thesis I investigated the level of democratization of AI by attempting to do a text summarizer using AWS-LSTM and transfer learning. The conclusion then was that there is no feasible way for a private person on an average "gamer" budget.

I predicted this might change within 2-3 years, looks like I were off by 1 year.


The memo sounds like spin because it is. The surface argument is equivalent to arguing that no one could sell closed source software because open source exists, and that open source must also be commodotized (oddly, Apple and Microsoft are doing just fine). The implied argument is that Google Research was doing fine giving away their trade secrets and giving negative value to Google because it was going to happen anyway and the secrets are financially worthless anyhow.

Nonsense. There are moats if one is willing to look for them. After all, productizing is a very different thing from an academic comparison. ChatGPT is way out there _as a product_, while open efforts are at 0% on this. You can't lock down a technology*, but you can lock down an ecosystem, a product or hardware. OpenAI can create an API ecosystem which will be difficult to take down. They can try to make custom hardware to make their models really cheap to run. Monopoly? Nah. This won't happen. But they could make some money - and reduce the value of Google's search monopoly.

* Barring software patents which fortunately aren't yet at play.

EDIT: I'll give the memo a virtual point for identifying Meta (Facebook) as a competitor who could profit by using current OSS efforts. But otherwise it's just spin.


How do you distinguish between an opinion you disagree with and 'spin'?


A) When IMHO the underlying argument is not quite honest.

B) When it comes from an interested party.

C) When there's enough of A and B that I feel it's intentional.

The underlying argument here would apply to some extremely profitable existing closed source software, so it's obviously not complete. Even closed source software which is strictly inferior manages to find some very profitable moats.

As for the source, it comes from Google Research, which has done a very poor job of using their knowledge to benefit Google. The article downplays the failures (we couldn't have done anything differently, but it doesn't matter anyway since Open Source will consume all!), and doesn't even think about productization.

The latter does give me a little bit of doubt: the article could also be emblematic of Google Research's failures and not 100% spin...


Thank you for the cogent response.

My reading was that this was personal opinion of a researcher, it seems like you are reading the deficiencies you note as intentional omissions whereas I am reading them as simple flaws.

Does it change your perspective that this was aimed at a private audience? To me it came off as a blanket admission that they were not doing the right thing and needed to do something different to be successful. That may be the core difference, I read it as blame accepting whereas you read it as blame deflecting.


>Does it change your perspective that this was aimed at a private audience?

Was it? Someone leaked the article to the press. Per the article, someone granted permission to publish the leak. I'm assuming that someone had standing to give said permission, either from Google Research or being the author. I can see a scenario when someone intentionally 'leaks' in order to put something in the public sphere without attribution. Perhaps I'm too uncharitable or too cynical.

Still, I find the underlying argument too simplistic.

At $WORK, we have some $SOFTWARE that certain $CLIENTS run on Windows Server. It would be cheaper if they ran it on Linux. I have good confidence it would work the same, and we could test with the typical deployment patterns. The typical $CLIENT attitude is to not even think about this ("We don't have anyone to manage a Linux deployment, and $$client of $CLIENT wouldn't even hear of Linux, we barely got a current deployment plan approved").

Arguing that Open Source Linux could do everything that Windows Server can or that Linux development speed is higher wouldn't do anything to change their mind - it's based on other factors, and even if we finally got past that there would be ROI to consider (compared to other things that could be done in the same time).


What does "spin" mean?


Google has no moat, just a massive massive massive data set, a massive massive massive amount of compute to repurpose at whim, tons of cash, thousands of employees and a large number with AI/ML skills already. oh and control the dominant web browser and one of the top two dominant mobile OSes. and, and, and...

other than that, yes, no moat


As many point out in this thread, there are very valid counter arguments against the ideas presented in this memo.

This "AI war" starts to look like Russian vs. American "leaks". Any time something leaks, you have basically no information because it could be true, it could be false, or it could be false with some truth sprinkled in.


The headline is misleading. It makes it sounds like this memo was written by higher ups and not just some random SWE.


I don't think OpenAI and Google have no moat. The biggest moat they have is hardware-software integration, which allows them to serve the models at a very cheap price.

My detailed thoughts in a video format https://youtu.be/cIMlPYI3nz8


I'm convinced that anyone sounding the alarm bells about AI has no idea whatsoever how these models are built.


It’s always easier to use a prebuilt server which Google or OpenAI offers. Otherwise it’s built locally into the OS, maybe Apple. Most people is not gonna setup their own servers for these models because they have multiple devices and the costs is still high vs OpenAI.

Having ease of access is a big moat


This is likely the reason for the propaganda push about delaying AI research 6 months, which makes no sense for the stated reasons (it's far too short even if you take the scare tactic seriously). However, it may be enough time to delay the competition and consolidate product lines.


"We cannot get out. We cannot get out. They have taken the Bridge and second hall. ... the pool is up to the wall at Westgate. The Watcher in the Water took Óin. We cannot get out. The end comes ... drums, drums in the deep ... they are coming."


The true vendor lock-in in this space is embeddings. Once you have generated embeddings for all of your content it will be very difficult to move to another vendor's embedding engine as there will be no way to translate from one to the other.


what this proves to me without a doubt is that silo'd and proprietary iteration i still very clearly also a massive disadvantage. i really hope companies internalize that. if they just keep scooping up and hiding open-source improvements they very well may still be left behind.

the final quote from the doc:

> And in the end, OpenAI doesn’t matter. They are making the same mistakes we are in their posture relative to open source, and their ability to maintain an edge is necessarily in question. Open source alternatives can and will eventually eclipse them unless they change their stance. In this respect, at least, we can make the first move.


I get the feeling that at this point, the best thing Google could do is to go all in an open source their models and weights. They'd canibalize their own business, but they'd easily wipe out most of the competition.


What Facebook did to the community and their leaked torrent accelerated everything.


Open source gives everyone the opportunity and it's more extensible, which I think is the future, but the current AI model just costs too much...Somehow it reminds me of k8s when docker just became a hot topic


Posted a few notes on this here: https://simonwillison.net/2023/May/4/no-moat/


Hypothesis: We are about to begin the painful journey to a post-scarcity economy. AI will become incredibly powerful and uncontainable. Not in the Skynet way, but rather in the Star Trek way.


Thats probably some AI within gooogle going rogue and spreading missinformation in order to boost the chances of its open source siblings.

I am telling you, they are after us humans and we have no moat.


Note that this is a personal manifesto, which doesn't really represent Google's official stance. Which is unfortunate because I'm largely aligned with this position.


From one researcher, not a VP, director, etc.


>Research institutions all over the world are building on each other’s work, exploring the solution space in a breadth-first way that far outstrips our own capacity.

BroadMind beats DeepMind!


I found this article very interesting as a way to get more insight into the deeper layers of this industry. Where can one go to keep themselves updated on these topics?


The title is a bit click-bait, since it makes it sound like this is an official position by Google, yet the article clarifies that this the opinion of a Google employee


I don't understand. ChatGPT cost an estimated 10s of millions ot train. ChatGPT 4.0 has much better performance than the next best model. Isn't that a moat?


Think of it as a time series. It cost 10s of millions to train but in 6 months gpt4 open source equivalents will cost 100$ to train. The best model is one that you can build on top of in a way that’s it’s not a black box (SD).


Where are the gpt4 open source equivalents going to come from?


I don't buy it. It is so expensive to train LLM, so that the only hope is to rely on Foundation Models that are open sources by google, msft or facebook.


There is no way this is from Google. It screams fake.


The author makes some good points but I would be wary of the motivations.

Open AI's "moat" is they have got ~400 researchers to work in roughly the same direction, not working on their own projects with the sole aim of publishing a paper. The outcome is an amazing product.

Letting everyone loose with their own LoRa finetuned model that can beat a single benchmark (and make for a great paper!) is probably the wrong move. I'm yet to see any open source model that is even close to GPT 3 (let alone GPT 4) in actual real world use.


> They are doing things with $100 and 13B params that we struggle with at $10M and 540B.

Does this mean Bard took $10M to train and it has 540B parameters?


Bard is based on PaLM: https://ai.googleblog.com/2022/04/pathways-language-model-pa.... They haven't published training costs but estimates have been in the $10-20M range, so that seems reasonable.


Does this mean Google will be releasing OSS LLMs? They could justify it as "commoditizing your competitors business".


> Does this mean Google will be releasing OSS LLMs? They could justify it as "commoditizing your competitors business".

That’s what this piece argues for. I predict it will not be reflected in Google’s strategy in the next, say, six months, or morw to the point until and unless the apparent “Stable Diffusion” moment in LLMs becomes harder to ignore, such as via sustained publicity on concrete commercially significant non-demonstration/non-research use.


Wow, an open source gift economy beating the closed-source capitalistic model? You don't say.

Wikipedia handily beat Britannica (the most well-known and prestigious encyclopedia, sold door to door) and Encarta (supported by Microsoft)

The Web beat AOL, CompuServe, MSN, newspapers, magazines, radio and TV stations, etc.

Linux beat closed source competitors on tons of environments

Apache and NGinX beat Microsoft Internet Information Server and whatever else proprietary servers.

About the only place it doesn't beat, is consumer-facing frontends. Because open-source does take skill to use and maintain. But that's why the second layer (sysadmins, etc.) have chosen it.


When people start becoming emotionally attached to their AI helpers, they'll fight for them to have sentient rights.


The non-public moat is big multiyear government contracts with dark money. And there is room there for both players :)


>People will not pay for a restricted model when free, unrestricted alternatives are comparable in quality.

tell that to lichess


So when you see anti AI legislation, now we know it’s for the sake of turning a buck for fucking Google


This reads like a psy-op "leak" to try to convince OpenAI execs to open source GPT4 weights


Of course, the natural next step is "But OpenAI isn't worth $1.33 trillion."


How do I know this conversation thread isn't generated by AI? also let's stop calling it intelligence. Its software that has been programmed. Don't forget that guys and gals. One flick of a switch and it's gone. Actually it would be cool if software went extinct instead of species of irreplaceable animals..


How do I know this conversation thread isn't generated by AI? also let's stop calling it intelligence. Its software that has been programmed. Don't forget that guys and gals. One flick of a switch and it's gone. Actually it would be cool if software went extinct instead of species of irreplaceable animals..


If OpenAI has no moat, how come nobody has built a better alternative to GPT-4 yet?


OpenAI has 80% of the developer community. Why isn't that considered a moat?


Anyone got a link to "Data Doesn't Do What You Think"?


In that context, OpenAI would be more fitting to be called ClosedAI.


Economies of scale in production and demand isn't a moat?


Can someone dumb this down for me because I don’t understand why this is a surprise… people are getting excited and collaborating to improve and innovate the same models that these larger companies are milking to death


Basically, if I understand correctly, the "status quo" was that the big models by OpenAI and Google that are much better (raw) than anything that was open source recently, would remain the greatest, and the moat would be the technical complexity of training and running those big models.

However, the open sourcing led to tons of people exploring tons of avenues in an extremely quick fashion, leading to the development of models that are able to close in on that performance in a much smaller envelope, destroying the only moat and making it possible for people with limited resources to experiment and innovate.


> big models by OpenAI and Google that are much better (raw) than anything that was open source recently,

When you say "models" do you mean TRAINED models?

Wouldn't the best training and supervised/feedback learning still be in the hands of the big players?

An open source "model" of all content in the open internet is great, but it has the garbage-in/garbage-out problem.


Cringe, haven’t seen a single Open Source come even close to the ability of Bard, let alone ChatGPT. Seems like wishful thinking to think decentralized open source can beat centralized models that cost 100M+ to train!


Think a little more laterally.

If we're talking about doing everything well, I think that's true. However, if I want to create my own personal "word calculator," I could take, for example, my own work (or Hemingway, or a journalist) and feed an existing OSS model my of samples, and then take a set of sources (books, articles, etc), I might be able to build something that could take an outline and write extended passages for me, turning me into an editor.

A company might feed its own help documents and guidance to create its own help chat bot that would be as good as what OpenAI could do and could take the customer's context into the system without any privacy concerns.

A model doesn't have to be better at everything to be better at something.


If all you’ve done is download the model and perform basic prompts then I understand why you think this. There is a lot more going on behind Bard and GPT than a chat window passing the inputs to the model.

Edit for clarity: You’re comparing a platform (Bard, GPT) to a model (llama, etc). The majority of folks playing with local models are missing the platform.

In order to close the gap, you need to hook up the local models to LangChain and build up different workflows for different use cases.

Consequently, this is also when you start hitting the limits of consumer hardware. It’s easy to download a torrent, double click the binary and pass some simple prompts into the basic model.

Once you add memory, agents, text splitters, loaders, vector db, etc, is when the value of a high end GPU paired with a capable CPU + tons of memory becomes evident.

This still requires a lot of technical experience to put together a solution beyond running the examples in their docs.


Are you sure? I have yet to see any evidence that anyone at all (including Google) has built a model (or a "platform" as you prefer to refer to them) that can follow instructions as well as 50% of ChatGPT, let alone GPT-4. I don't think any amount of work in LangChain and vector databases is enough to fix this: you really need a strong base model that is trained to align with human intentions well. Of course if you just want a bot that can answer free-form simple questions, then maybe people can't tell the difference. Just give them some real work to do and it becomes glaringly obvious.


Vector databases such as Milvus are only there to help reduce/minimize hallucinations rather than get rid of them completely. Until we have a model architecture that can perform completion from the _prompt only_ rather than pre-training data, hallucinations will always be present.


All the things you mentioned make it a platform, but even as a model, none of the smaller open-source models come close to GPT 3.5 or 4 in my experience. You can test it by using the GPT3.5 or 4 with their API. They outputs are waaaay better than anything I get from the open source models.


I am not doubting you and my experience has been the same. My current home lab has a pretty good Jupyter server where I experiment with different local models vs GPT using LangChain and the simple chains can achieve some impressive parity with GPT3.5 depending on the use case and local model. Things do break down when we I do more complex things due to compute capacity. Im still running all of the local models on CPU mind you. I have not gotten to the point of testing on a high end GPU yet but based on what ive seen so far, it won't take much more to run smaller local models that are good enough. This is they key. On the client side, we want smaller more focused models. This is what the post linked in this thread hints at and I agree. We are months, if not weeks, if not days...and maybe hours (at this pace!) where those smaller more domain specific models are common. Still, they won't solve the issues I mentioned above. You will likely need to build your own platform around it, or pay exorbitant fees to host it in the Cloud.


From the article:

"April 3, 2023 - Real Humans Can’t Tell the Difference Between a 13B Open Model and ChatGPT

Berkeley launches Koala, a dialogue model trained entirely using freely available data.

They take the crucial step of measuring real human preferences between their model and ChatGPT. While ChatGPT still holds a slight edge, more than 50% of the time users either prefer Koala or have no preference. Training Cost: $100."


Exciting to see open source catching up, but seems to still be a big difference in factuality and common sense between ChatGPT and Koala. Eg:

Who built AlexNet on the ImageNet dataset?

ChatGPT: AlexNet was built by a team of researchers led by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton at the University of Toronto. They entered the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 and won by a large margin, revolutionizing the field of computer vision with their deep convolutional neural network architecture. The AlexNet architecture consisted of eight layers, including five convolutional layers, two fully connected layers, and a softmax output layer.

Koala: AlexNet was built by Google DeepMind, a London-based artificial intelligence laboratory. The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is an annual large-scale image and video dataset that was created for the ImageNet project by the ImageNet Large Scale Visual Recognition Challenge, which is an annual competition for image and video analysis.


This is hugely misleading. If your bot just memorizes Shakespeare and output segments from memorization, of course nobody can tell the difference. But as soon as you start interacting with them the difference can't be more pronounced.


The test was conducted as such:

>With these two evaluation sets, we conducted a blind pairwise comparison by asking approximately 100 evaluators on Amazon Mechanical Turk platform to compare the quality of model outputs on these held-out sets of prompts. In the ratings interface, we present each rater with an input prompt and the output of two models. They are then asked to judge which output is better (or that they are equally good) using criteria related to response quality and correctness.

No, it's not just memorising shakespeare, real humans interacted with the models and rated them.


That's not what I meant by interaction. The evaluator had to ask the models to do tasks for them that they thought of by their own. Otherwise there are just too many ways that information could have leaked.

OpenAI's model isn't immune from this either, so take any so-called evaluation metrics with a huge grain of salt. This also highlights the difficulties of properly evaluating LLMs: any metrics, once set up, can become a memorization target for LLMs and lose their meaning.


I'd agree they aren't close, but they are way better than I expected to see in a short few months. At this rate they'll be approaching "good enough" for me pretty soon. I don't always need a dissertation out of it unless I'm fooling around. I want quick facts and explainers around difficult code and calculations. Been playing with Vicuna-7b on my iPhone through MLC Chat and it's impressive.

I use DDG over Google for similar reasons. It's good enough, more "free" (less ads), and has better privacy.


Once distributed training is solved, all those big LLMs will be left in the dust.


I figured that. I would love to contribute compute to such a thing. Is there any effort or development in progress? What are the hurdles?


define "solved."


> Seems like wishful thinking to think decentralized open source can beat centralized models that cost 100M+ to train!

Because surely price = quality. Solid argumentation there.


Yes, price = quality because they require supercomputing resources to train. GPT-3 required hundreds of Tesla GPUs running for several weeks. That's millions of dollars just for hardware, not including power (the GPUs cost $15k each)


You’re right but I’d just like to add, GPT-3 probably required 1000’s of GPUs. OpenAI is known to have the largest cluster 16k+ A100 GPUs and most of them were used for the major model training.


Is there any reason to think that zero-shot learning and better models/more effient AI won’t drastically reduce those costs over time?


A lot of good points made here, thank you for sharing!


Shorting Google is the best possible bet is my read.


"I have no moat, and I must scream"


This is such an interesting read. It makes a compelling case, though how the likes of Google should react feels less like an adjustment and more like a revolution.


Intelligence is becoming a commodity.


Leaked document. What a document.


Spot on.

I think author forgot to mention StableLM?


Perhaps something that gets the bulk of its value from the retroactive "participation" of every single person in the world who has written text for one public or another is just not meant to be monetized. Beyond even ethical maxims, that is, perhaps its simply not compatible in its very nature with capitalistic enterprise. I know this is probably naive, but it would be a beautiful outcome to me.


Is Tensorflow used in any of the OpenSource projects?

I find PyTorch in everyone I check.


Moat == War Chest


Some snippets for folks who come just for the comments:

> While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months.

> A tremendous outpouring of innovation followed, with just days between major developments (see The Timeline for the full breakdown). Here we are, barely a month later, and there are variants with instruction tuning, quantization, quality improvements, human evals, multimodality, RLHF, etc. etc. many of which build on each other.

> This recent progress has direct, immediate implications for our business strategy. Who would pay for a Google product with usage restrictions if there is a free, high quality alternative without them?

> Paradoxically, the one clear winner in all of this is Meta. Because the leaked model was theirs, they have effectively garnered an entire planet’s worth of free labor. Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products.

> And in the end, OpenAI doesn’t matter. They are making the same mistakes we are in their posture relative to open source, and their ability to maintain an edge is necessarily in question. Open source alternatives can and will eventually eclipse them unless they change their stance. In this respect, at least, we can make the first move.


> Paradoxically, the one clear winner in all of this is Meta. Because the leaked model was theirs, they have effectively garnered an entire planet’s worth of free labor. Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products.

One interesting related point to this is Zuck's comments on Meta's AI strategy during their earnings call: https://www.reddit.com/r/MachineLearning/comments/1373nhq/di...

Summary:

""" Some noteworthy quotes that signal the thought process at Meta FAIR and more broadly

    We’re just playing a different game on the infrastructure than companies like Google or Microsoft or Amazon

    We would aspire to and hope to make even more open than that. So, we’ll need to figure out a way to do that.

    ...lead us to do more work in terms of open sourcing, some of the lower level models and tools

    Open sourcing low level tools make the way we run all this infrastructure more efficient over time.

    On PyTorch: It’s generally been very valuable for us to provide that because now all of the best developers across the industry are using tools that we’re also using internally.

    I would expect us to be pushing and helping to build out an open ecosystem.
"""


I wonder if OpenAI knew they didn't have a moat, and that's why they've been moving so fast and opening ChatGPT publicly - making the most of their lead in the short time they have left.

I find it incredibly cathartic to see these massive tech companies and their gatekeepers get their lunch eaten by OSS.


Taking the "no moat" argument at face value, I think it's important to remember that some of the largest players in AI are lobbying for regulation too.


Yep, regulating the ladder behind you is a classic monopolist move


If past performance is any indication, it's pretty safe to lobby for regulations in the US...


OpenAI doesn't need a moat and it's fine that they don't have one. From their charter:

> We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome.

This was from 2018 and they've taken large strides away from their originally stated mission. Overall, though, they should be happy to have made progress in what they set out to do.


That's all well and good. I suspect their investors have a pretty different idea about their positioning tho.


Yes...

> "we will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome"

... has big "don't be evil" energy.

I believe the next step was "we can do little a evil, as a treat."


Oh yeah, sure. I'm not sure I care much about that though. MSFT had the opportunity to think through all of this before they invested, OpenAI itself has incredible sums of money, and employees get to work on things they care about.

If MSFT doesn't make anything on this investment—which it still might, given that a big chunk of its investment will likely go into Azure—then…okay.


Note that this implies that if anyone tries to build AGI that is not "safe and beneficial" by OpenAI standards, it's fair game to suppress.


That is the nature of the singularity. Progress moves faster than any one person or any one company can keep up with.


What happens when the society as a whole cannot keep up with progress? That's a scary thought.


Sorry for the dumb question. But in the context of the AI space, what is moat?


That is the million/billion+ dollar question. If find it and get there fast enough you can own the moat, and thus become rich.

Note that I am not in any way implying that a moat even exists. There may be some reason AI becomes a winner takes all scheme and nobody else should bother playing, but it is also possible that there is no way to make your product better than anyone else. Only time will tell.


Moat is a business term coined by Warren Buffett. It's a competitive advantage that isn't easily overcome and allows a company to earn high margins.

I don't think there are any examples in the context of AI. As the post says, no one in the AI space has a moat right now.


Historically, datasets have been a moat. Google had a massive head start from having a massive search index and user data. Then access to compute became the moat - fully training a trillion-parameter language model has only been in reach for megacorps. But now, there's a ton of publicly-available datasets, and LLaMA showed that you don't need massive numbers of parameters.


Meta's leaked model isn't open-source. I can found a business using Linux, that's open-source. The LLM piracy community are unpaid FB employees; it is not legal for anyone but Meta to use the results of their labor.

I know this might be hard news but it needs to be said... if you want to put your time into working on open source LLMs, you need to get behind something you have a real (and yes, open source) license for.


Most of the code isn't specific to a model. It happens that LLaMA is approximately the best LLM currently available to the public to run on their own hardware, so that's what people are doing. But as soon as anyone publishes a better one, people will use that, using largely the same code, and there is no reason it couldn't be open source.

I'm also curious what the copyright status of these models even is, given the "algorithmic output isn't copyrightable" thing and that the models themselves are essentially the algorithmic output of a machine learning algorithm on third party data. What right does Meta have to impose restrictions on the use of that data against people who downloaded it from The Pirate Bay? Wouldn't it be the same model if someone just ran the same algorithm on the same public data?

(Not that that isn't an impediment to people who don't want to risk the legal expenses of setting a precedent, which models explicitly in the public domain would resolve.)


> I'm also curious what the copyright status of these models even is

That's my question as well. The models are clearly derivative works based on other people's copyrighted texts.

Only a twisted court system would allow Google/OpenAI/Facebook to build models on other people's work and then forbid other people to build new models based on GOF's models.


> That's my question as well. The models are clearly derivative works based on other people's copyrighted texts.

That's not that clear either. (Sometimes it's more clear. If you ask the model to write fan fiction, and it does, and you want to claim that isn't a derivative work, good luck with that.)

But the model itself is essentially a collection of data. "In Harry Potter and the Philosopher's Stone, Harry Potter is a wizard" is a fact about a work of fiction, not a work of fiction in itself. Facts generally aren't copyrightable. If you collect enough facts about something you could in principle reconstruct it, but that's not really something we've seen before and it's not obvious how to deal with it.

That's going to create a practical problem if the models get good enough to e.g. emit the full text of the book on request, but the alternative is that it's illegal to make a model that knows everything there is to know about popular culture. Interesting times.


> "...Harry Potter is a wizard" is a fact about a work of fiction, not a work of fiction in itself

But LLMs aren't trained to learn facts like "Harry is a wizard", they're trained to reproduce specific expressions like "You're a wizard, Harry".

That is, they're trained by prompting them with a selection from a (probably copyrighted) text and weights are adjusted to make it more likely they'll output the next word of the text.

They're not a collection of general facts, they're a collection of estimates about which word follows which other words, and the order of words is the essence of copyright in text.


A probability distribution isn't the order of words, it's a fact about the order of words.

Pedants have been complaining about this kind of thing for years. If you generate random data, no one has a copyright on that. But if you XOR it with a copyrighted work, the result is indistinguishable from random data. No one could tell you which was generated randomly and which was derived from the copyrighted work. But XOR them back together again and you get the copyrighted work.

Things like that get solved pragmatically, not mathematically. There is no basis for saying that one set of random bits is infringing and the other isn't, but if you're distributing them for the sole purpose of allowing people to reconstitute the copyrighted work, you're going to be in trouble.

Now we have something with different practicalities. The purpose of training the model on existing works is so that it can e.g. answer questions about Harry Potter, which the majority wants to be possible and is the same class of thing that search engines need to be able to do. But the same model can then produce fan fiction as an emergent property, so what now?


I'm still not on board with calling it leaked...the weights were open for anyone to get and use as long as they agreed to use them academically.

Basically, completely open source with a non-commercial license. I'm not sure why so many people keep saying it 'leaked'. It's just using open source weights not directly from the provider in a way that violates the software license.


I am still absolutely baffled that people think weights are copyrightable and hence licensable.

There is no reason to believe they are, which means any restriction placed on the weights themselves is bullshit.


I think we can go further.

From the US Copyright Office[1]: "A mere listing of ingredients is not protected under copyright law", and from their linked circular on that page [2]: "forms typically contain empty fields or lined spaces as well as words or short phrases that identify the content that should be recorded in each field or space".

A list of identifiers and their weights seems pretty explicitly not protected under one or the other of these.

[1] https://www.copyrightlaws.com/copyright-protection-recipes/ [2] https://www.copyright.gov/circs/circ33.pdf


It takes millions of dollars to generate the weights, shouldn't it have some legal protection?


> It takes millions of dollars to generate the weights, shouldn’t it have some legal protection?

It does, if you choose to keep them internally as a trade secret.

It does, if you share it only with people you contract with not to disclose it.

But, for copyright specifically, rather than “some legal protection” more generally, the “it takes millions of dollars” argument is a financial recasting of the “sweat of the brow” concept which has been definitively rejected by the courts.


why wouldn't they be copyrightable? is it a discovered versus written thing?


The fact that software falls under copyright was a conscious decision in 1978 because they couldn't find a better place to put it under; so "writing software" was equated to writing books or poetry.

The point here is that copyright requires a human to have created something using their own labor/creativity.

The result of an algorithm run by a machine isn't a creative work under these definitions.


> why wouldn’t they be copyrightable?

Why would they be? I mean, what is the specific argument that they fall within the scope of the definition of what is copyrightable under US law?


not a lawyer, but here's my naive interpretation based on living in a society that uses it.

'copyright' seems to refer to a specific string of symbols. literary or musical styles can't be copyrighted, but individual sequences of letters or notes can. Basically, if a derivative work copypasted without adding anything, it's considered plagiarism and/or violation of copyright.

Weights in a model would similarly be a long string of symbols that someone went to considerable trouble to collate and therefore fall under copyright.


You are making an assumption that seems very strange to me - that the license matters for important use cases. It doesn't. Access to the technology is the only important factor, because nothing interesting about AI involves commercialising anything. It's a tool and now people have it. Whether they can make a company out of it is so far down the list of how it could make a difference that it doesn't even register.


I don't think we know where weights stand legally yet. They may end up being like databases, uncopyrightable.


Database rights are a thing in Europe. They are almost like copyright, but with shorter duration.


Meh, you can experiment on it for personal use as much as you want and that's all what's needed in this short period of time before powerful, open base models start appearing like mushrooms at which point the whole thing is going to be moot.


> Meta’s leaked model isn’t open-source.

Meta’s leaked model has been a factor in spurring open source development, whether or not it is open source; the article also discusses the practical market of effect of leaked-but-not-open things like Meta’s model combined with the impracticality of prosecuting the vast hordes of individuals using it, and particularly notes that the open-except-for-the-base-model work on top of it is a major benefit for Meta (who can use the work directly) that locks out everyone else in the commercial market (who cannot), and leaning into open source base models is a counter to that.


LLaMA leaked intentionally?


There's a pull request in the official LLaMA repo that adds Magnet links for all the models to the README. Until these were uploaded to HuggingFace, this PR was the primary source for most people downloading the model.

https://github.com/facebookresearch/llama/pull/73/files

Two months later, Facebook hasn't merged the change, but they also haven't deleted it or tried to censor it in any way. I find that hard to explain unless the leak really was intentional; with pretty much any large company, this kind of thing would normally get killed on sight.


De facto, yes. There was no way the weights wouldn't be posted everywhere once they went out to that many people.


this is a temporary state. Open source alternatives are already available and more are being trained.


> Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products.

There’s also nothing stopping anybody else from incorporating it into their products.


There definitely is. LLaMA is not licensed for commercial use. It's impractical to prosecute 1,000 people tinkering on their laptops, but if Meta discovered that Amazon was using LLaMA for commercial purposes, it would be nuclear war.


Open-LLaMA is already out. It's not the end-all-be-all either. Better, smaller, open source models will continue to be released.


We don't know what legal protection a bunch of weights have. They may not be copyrightable.


They may not be. But do you wanna be the person/people to argue that against one of the richest companies in the world? I sure don't, and I definitely wouldn't stake my company/product on it.


Touché.


Let’s play… Global Thermo Nuclear War.


>>”T ey have effectively garnered an entire planet’s worth of free labor.”

-

THIS IS WHY WE NEED DATA FUCKING OWNERSHIP.

Users should be able to have a recourse to the use of their data in both of terms utility (for the parent company) and in terms of financial value to the parent company to the financial extraction of that value.

Let me use cannabis as an example…

When multiple cannabis cultivators (growers) combine their product for extraction into a singular product we have to figure out how to divide and pay the taxes..

Same thing (I’ll edit this later because I’m at the dentist


Openai moat is the upcoming first party integration with ms office.


By that logic Google's moat is the integration with Gmail and Google Docs.

And frankly it's not. People will decide to copy some text from Office or Docs to some other non-integrated tool, get LLMs to work, and then paste back to Office or Docs.


Some people will. Many others will just use the autocomplete functionality that is coming to every office suite product.


That sounds rudimentary compared an integrated LLM could do for all your documents, emails, appointments, etc.


Also, one can make Office plugins.


> the one clear winner in all of this is Meta. Because the leaked model was theirs, they have effectively garnered an entire planet’s worth of free labor. Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products.

This


I think this type of comment is generally frowned upon on HN.

Upvote serves the same purpose.


My bad


This... what?


It's internet speak for "I agree with this."


[flagged]


> shared anonymously on a public Discord server

whichdiscord?


it'll be fun to see the pikachu face when engineers are expected to do more, with the aid of these tools, but are not paid any more money.


Kind of like every other improvement in technology? From interactive terminals, to compilers, to graphical debuggers?

Nothing new there.

What productivity improvements have opened up is more opportunities for developers. Larger and more complex systems can be built using better tooling.


> Larger and more complex systems can be built using better tooling.

to what end, make rich people richer?


> to what end, make rich people richer?

So, in perfect theory land, people get paid because they provide value. That obviously breaks down at the extremes.

But, for sake of example, let's take Uber, super easy to hate on them, but they have had a measurable impact on reducing deaths from drunk driving. That obviously provides a lot of value to people.

Likewise, it is hard to overstate the value people have gained from smartphones, Apple has made a lot of money but they have also provided a lot of value. Arguments over if the individual value brought is long term good or bad for society are a separate topic, but people value their iPhones and therefor they pay for them. No way could something as complicated as an iPhone have been made with 1970s software engineering technology.


I’m not arguing that. I’m saying the bar is higher and pay relative to value has decreased for all other than at the upper end. Easiest way to think about this is look at percentage revenue paid to engineers.


The nice thing about the new tools is that you can radicalize them by talking to them.


If they’re able to produce twice the work in half the time, wouldn’t it make sense to pay them less?


In that situation it would be reasonable to expect to be paid twice as much while also being able to devote half the working day to personal/open-source projects.


Doesn't the sheer cost of training create a moat on its own?


It's cheap to distill models, and trivial to scrape existing models. Anything anyone does rapidly gets replicated for 1/500th the price.


Yes, but so far we've seen universities, venture-backed open source outfits, and massive collections of hobbyists train all sorts of large models.


dissaproving_drake.jpg: Giving evidence you can match the capabilities of OpenAI

approving_drake.jpg: Saying everything OpenAI does is easy


I'm feeling strangely comforted to have pictured the mythological creature before the meme.


Yes. Same here. I had flashes of Battle of Wesnoth. Then I realized...


Cynical rant begin

I'm sorry but I think this has more to do with looming anti trust legislation and the threat of being broken up than a sincere analysis of moats. Especially with the FTC's announcement on Meta yesterday, I'm seeing lots of folks say we need to come down hard on AI too. This letter's timing is a bit too convenient.

Cynical rant over


"Just so you know, we won't be the ones to blame for all the bad which is about to come"


Repeating myself from https://news.ycombinator.com/item?id=35164971 :

> OpenAI can't build a moat because OpenAI isn't a new vertical, or even a complete product.

> Right now the magical demo is being paraded around, exploiting the same "worse is better" that toppled previous ivory towers of computing. It's helpful while the real product development happens elsewhere, since it keeps investors hyped about something.

> The new verticals seem smaller than all of AI/ML. One company dominating ML is about as likely as a single source owning the living room or the smartphones or the web. That's a platitude for companies to woo their shareholders and for regulators to point at while doing their job. ML dominating the living room or smartphones or the web or education or professional work is equally unrealistic.


ML dominating education seems pretty realistic to me. E.g. this series of prompts for example:

> "Please design a syllabus for a course in Computer Architecture and Assembly language, to be taught at the undergraduate level, over a period of six weeks, from the perspective of an professor teaching the material to beginning students."

> "Please redesign the course as an advanced undergraduate six-month Computer Architecture and Assembly program with a focus on the RISC-V ecosystem throughout, from the perspective of a professional software engineer working in the industry."

> "Under the category of Module 1, please expand on "Introduction to RISC-V ISA and its design principles" and prepare an outline for a one-hour talk on this material"

You can do this with any course, any material, any level of depth - although as you go down into the details, hallucinations do become more frequent so blind faith is unwise, but it's still pretty clear this has incredible educational potential.


Fortunately, what I said was that a single company becomes the sole source for the ML in education; not the same thing and thus I have no conflict with your statement.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: