PaLM 2 Technical Report [pdf]

fzliu · on May 10, 2023

I don't understand how this can be considered a technical report. No information on model architecture, distributed training methodology, or optimizations. The "Training dataset" section is a pathetic 0.5 pages long.

Come on, Google.

stygiansonic · on May 10, 2023

In that sense, it's very similar to the GPT-4 Technical Report.

The era of being "open" about LLMs or other "secret sauce" models in published papers may be over, since these things have become existential threats to companies.

shrimpx · on May 11, 2023

Btw I've arrived at a different interpretation of the "Open" in OpenAI. It's open in the sense that the generic LLM is exposed via an API, allowing companies to build anything they want on top.

Companies like Google have been working on language models (and AI more broadly) for years but have hid the generic intelligence of their models, exposing it only via improvements to their products. OpenAI bucked this trend and exposed an API to generic LLMs.

ftxbro · on May 11, 2023

> Btw I've arrived at a different interpretation of the "Open" in OpenAI.

I don't understand why people have to keep trying to wrap their head around the word 'Open' in OpenAI. If you ever saw a commercial like a product has a 'great new taste' but then you tried it and it tasted bad, would you twist yourself into knots trying to understand how you went wrong in your interpretation of 'great'? No that's ridiculous. Same with 'Open' in 'OpenAI'. It's just some letters that form part of the name that they chose for themselves when they filled the form to incorporate their company.

adastra22 · on May 11, 2023

You mean when they filled out a form to incorporate their non-profit. Which they later turned into a for-profit company after reaping all the goodwill. The “Open” used to mean something.

fastball · on May 11, 2023

That is a bit reductionist. They turned it into a for-profit company controlled by a non-profit entity, with profits / returns being capped for employees / investors.

robertlagrant · on May 11, 2023

> Which they later turned into a for-profit company after reaping all the goodwill

Did they have a lot of goodwill attached to their company? What did that give them?

adastra22 · on May 11, 2023

When they founded? Yes. The issue was that the big AI players (Google, Facebook, etc.) were keeping their models and training data secret. People (rightly, IMHO) saw this opaque development style as a risk. The OpenAI founders made a big splash by registering as a non-profit and declaring that they were going to do all their model training in public, and share the weights for everyone to use. In other words they were claiming to do something more like what Stability AI is today, except with a stronger legal non-profit organization.

Because of that framing, they poached a lot of very good talent and built one of the best AI teams that has ever been assembled. Then they perverted their corporate structure to be effective for-profit, and renegaded on open access to their trained models, turning into a bog standard service-oriented company.

re-thc · on May 11, 2023

All non-profit means is a different tax status. Don't assume they actually don't make money.

comicjk · on May 11, 2023

Nonprofit status makes it much harder to extract large profits. A charity founder can pay himself a million-dollar salary, but he can't sell his shares in the nonprofit and become a billionaire.

re-thc · on May 11, 2023

> Nonprofit status makes it much harder to extract large profits. A charity founder can pay himself a million-dollar salary, but he can't sell his shares in the nonprofit and become a billionaire.

What difference does it make for a non-public company? They can pay themselves more salary either way. The shares aren't really valuable until then.

As to a charity - if you really believe so. It doesn't even enter the books. Have you not seen an in-person donation site? Someone gives $100, the staff keeps the $100, takes out $50, records $50 and puts that in the donation box. After a few more layers the actual donation could be just $1. I've seen these at your regular big name charities - all the time.

And let's not get started on the sponsor a child that doesn't exist options...

mrinterweb · on May 11, 2023

"Open" means: "Open for business." Not sure how anyone could confuse that.

pseudalopex · on May 11, 2023

They are not confused. "OpenAI is a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Since our research is free from financial obligations, we can better focus on a positive human impact."[1]

[1] https://openai.com/blog/introducing-openai

shrimpx · on May 11, 2023

I think the strong connotation of the word "open" in the software community comes from "open source". If OSS was called "great new source" and a new closed source company called itself GreatNewAI you'd have a similar phenomenon of people taking apart the name.

vore · on May 11, 2023

Understandably, people don’t like it when people take the goodwill of an established term and use it to fulfill their directly opposing goals.

eternalban · on May 11, 2023

'Open Systems' predate 'Open Source' by a decade.

https://en.wikipedia.org/wiki/Open_system_(computing)

https://en.wikipedia.org/wiki/The_Open_Group

TrueDuality · on May 11, 2023

That's true but also not relevant to current wide spread use of the term. Concepts and common understanding evolve with language and I'm not even sure what point you're trying to make by pointing this out. Your first link even includes the language:

> "open source" marketed as trumping "open system".

Common use and understanding of the use "open" evolved decades ago.

Your comment also tries to side step the issue at heart people are annoyed and frustrated by. The founding principles of the OpenAI foundation laid out exactly what that usage of "Open" meant for their organization and they have since backtracked on their own principles.

eternalban · on May 11, 2023

We're discussing the use of the word 'Open'. Which was first applied to systems. Then "source", which actually did in fact argue that openness of source was more important that system open-ness. As to system open-ness, that is well understood as open access to blackbox via open (non-proprietary) APIs. Which is precisely what "OpenAI" is providing.

> Your comment also tries to side step the issue ..

We disagree. Narrowly directed and addressing the "issue", in fact.

foobiekr · on May 11, 2023

This. Exactly.

vore · on May 11, 2023

OpenAI does not need people defending their scummy pivot.

BoorishBears · on May 11, 2023

I don't agree it's scummy. Scummy is getting someone to build a business on a 1 Billion dollar donation, going for a hostile takeover 10% of the way there, then reneging when that doesn't work.

Salvaging your business from that sort of tantrum by working with MS is called surviving.

cma · on May 11, 2023

Sounds like the famous Facebook hoodie. "Open and connected" was one slogan on it. The API can be shut down at any time.

https://venturebeat.com/social/facebook-insignia-hoodie/

In the end they just shat all over RSS etc.

shrimpx · on May 11, 2023

This is an important point, although I think incentives don't align at all for OpenAI to close their APIs.

magicalhippo · on May 11, 2023

Guess they should rebrand as AvailableAI then...

EGreg · on May 11, 2023

US healthcare should rebrand as “accessible”, even if it is not “affordable” by millions. With an interesting definition of accessible.

cubefox · on May 11, 2023

> Companies like Google have been working on language models (and AI more broadly) for years but have hid the generic intelligence of their models, exposing it only via improvements to their products. OpenAI bucked this trend and exposed an API to generic LLMs.

But there is a PaLM API: https://developers.generativeai.google/

Of course this is a reaction to the OpenAI API.

shrimpx · on May 11, 2023

Yeah, Google hopped on the bandwagon. Glad to see a PaLM API.

csomar · on May 11, 2023

You don't need to "arrive" at an interpretation. Their name is legacy from when they were "open".

shrimpx · on May 11, 2023

That's true. My thought was they're still 'open', in an important way, even though it's not the open source way. If they were smart they'd adopt my interpretation in their PR materials.

mirekrusin · on May 11, 2023

It’s “open” as in “open for business”.

haldujai · on May 10, 2023

I wonder how special these architectures are compared to what's published.

The "secret sauce" may just be getting 2 pages (~200) worth of engineers collaborating and either rolling out your own cloud service or spending $$$ at someone else's.

Also not sure how much it matters other than academic interest of course. Realistically, there's only 4-5 (US) companies with the human resources and capital to roll something similar to these models out for what is most likely a complete write-off?

They could claim whatever they wanted and it would be near impossible to validate.

mr_toad · on May 11, 2023

I think the secret sauce is just bucket loads of cash to spend on compute.

And because of this I don’t buy that AI is an existential threat to Google at this point. If they were really worried they could spend a tiny portion of their ~280 billion dollars in revenue to train a bigger model.

haldujai · on May 11, 2023

I assume this is just a PR/IR-driven project to stay the "Google is Dead" headlines hence the budget, especially considering an oversized chunk was spent on the scaling law, doesn't seem they were serious about building a GPT4-killer.

I wasn't aware autoregressive LLMs were still considered an existential threat to Google. What's the threat supposed to be, ChatGPT is just going to keep eating Google search market share burning Microsoft capital on infra a la the Uber model or do they make money off of that at some point?

Seems farfetched OpenAI can compete with Google's resources, vertical integration down to the TPU and access to significantly more training data.

sanxiyn · on May 11, 2023

I agree that if training data is what matters, it is likely that no one can compete with Google with Google Books, which scanned 25 million volumes (source: http://www.nytimes.com/2015/10/29/arts/international/google-...), which is approximately all the books.

DeepMind's RETRO paper https://arxiv.org/abs/2112.04426 mentions a dataset called MassiveText, which includes 20 million books of 3T tokens. So we know Google is using Google Books, since there is simply no other source of 20 million books. Also as far as I know 3T tokens is more than publicly known to be used by anyone so far: Google could train on more data than anyone else, solely from Google Books, even without using its web crawl.

Edit: it was 2005(!), so it is possible that many of you haven't heard of this. George Dyson, in Turing's Cathedral written in 2005 says:

> My visit to Google? Despite the whimsical furniture and other toys, I felt I was entering a 14th-century cathedral: not in the 14th century but in the 12th century, while it was being built. Everyone was busy carving one stone here and another stone there, with some invisible architect getting everything to fit. The mood was playful, yet there was a palpable reverence in the air. "We are not scanning all those books to be read by people," explained one of my hosts after my talk. "We are scanning them to be read by an AI."

https://www.edge.org/conversation/george_dyson-turings-cathe...

Read the whole thing. It is not an accident Google got Google Books to train AI. That was the plan from the start.

fomine3 · on May 11, 2023

And YouTube. It's quite big datasets. Books is great for higher quality source.

cubefox · on May 11, 2023

PaLM 2 is just an intermediate step. Their next big model is called "Gemini". Pichai mentioned it.

flangola7 · on May 11, 2023

> The era of being "open" about LLMs or other "secret sauce" models in published papers may be over, since these things have become existential threats

FTFY

karmasimida · on May 11, 2023

That will be the norm moving forward

LLM is going to make money, a lot of money, nobody is going to give away their secret sauce for free.

Prepare for the landscape to get really ugly and really soon. Maybe we will witness some epic legal battle around big techs.

alecco · on May 11, 2023

Ahem https://open-assistant.io/

toxik · on May 11, 2023

Yeah, this is a holdover from where LLMs grew out of: academia. "Technical report" is what you reach for when you don't want to compare to actual competitive baselines.

Jweb_Guru · on May 11, 2023

I'm sorry, this is nonsense. Technical reports exist to fill in information that is useful for readers but not necessary to understand the key contributions of the work, and/or that don't fit within the journal or conference's page limit. I'm not sure where you got the idea that it is something people do to avoid competitive baselines; IME, the peer-reviewed portion of the publication is far more likely to contain misleading benchmarks than the technical report, since the paper is trying to "sell" the work in a way the technical report is not.

What this is an instance of is Google's approach to academic publishing of releasing a paper that contains almost no actionable information, but which is considered important and publishable solely because it came from Google and therefore is used in industry. This has been exhibited many times before--e.g. see the original Spanner paper, which was so light on details and confusing that they needed to release a followup paper several years later to explain what the system was even using the atomic clocks for!

toxik · on May 11, 2023

I agree that's what TR's are for. However, my point is, if you want to publish academic writing without peer review, a TR is a way to go about that. You can also just publish a preprint somewhere, which - surprise surprise - is also common for these same actors.

I should have worded it better, in hindsight.

Jweb_Guru · on May 11, 2023

I get what you're saying, I just think this is more of a Google thing than a TR thing. Their peer reviewed papers have the same issue as their preprints, TRs, and whitepapers, generally speaking--Google researchers feel no incentive to actually share how they did things, perform accurate or up-to-date comparisons to comparable frameworks, or even bother outlining their key contributions, because they know the paper will be published, widely read, widely cited, and influential even if they don't do any of those things. It's to the point that I think it might actually be house policy to neuter their papers of specific details as much as possible, presumably to retain what they perceive as Google's competitive advantage, because it makes no sense otherwise that wildly different papers with different authorship groups coming from so many different areas of CS could all have these same problems.

This is (IMO) quite different from, e.g., the cases of academics publishing misleading benchmarks, which is more often just being wedded to a bad idea because you spent years of work on it and your position is at risk if you didn't end up outperforming existing approaches. Often I can still get a lot out of papers with misleading benchmarks, even if what I get is "don't try this technique, it doesn't work." Whereas I frequently get nothing at all out of Google publications. If I had to describe the way Google seems to view academic publishing in one word, it would be "marketing"--it's advertising for people to either come work at Google or use their products, not something written with the intent of advancing the wider state of the art, or even the less noble goal justifying the time and money they put into whatever they're writing about.

riku_iki · on May 11, 2023

they mentioned secret sauce: scaling laws and UL2.

espadrine · on May 10, 2023

Surprisingly, their scaling law analysis still focuses on training FLOPs instead of training + inference FLOPs.

That said, they do mention this:

> The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute. [A] smaller but higher quality model significantly improves inference efficiency, reduces serving cost, and enables the model’s downstream application for more applications and users

It makes me think they are Chinchilla-optimal, which would make sense for a research project, but not for shipping to users. I am surprised they didn’t train to the validation loss plateau.

haldujai · on May 10, 2023

Depends on your goal, if it's to overtake OpenAI as having the best model overall it makes sense to optimize for training loss alone (assuming a fixed upfront compute budget).

Optimizing for inference to achieve the same loss would require more compute overall so you're either paying upfront with higher training costs or kicking the can down the road to inference.

News articles estimates of GPT4 cost seem to peg it at ~8 months of inference to achieve 1:1 cost with training. Life span of these models is TBD but it's a pretty safe bet we'll have new ones by then. Of course GPT3.5 is still getting used but probably won't cross 2:1ish in its lifetime.

Might as well roll the dice and kick the can down the road if you're Google, I imagine they would happily pay an extra 500k/day in inference compute to be market leaders, whats 183mill for them? But if they don't get any real market share or the model sucks they saved substantially on training.

> It makes me think they are Chinchilla-optimal,

They elaborate in the appendix but they empirically determine PaLM-optimal, which concurs with Chinchilla-optimal (more or less).

cubefox · on May 11, 2023

They also mention this:

> Moreover, there are several other considerations besides the optimal training loss, such as training throughput and serving latency, which affect the decision regarding the optimal model size.

And they also mention, right before that, that "lower training loss" might not exactly mean "higher performance":

> However, the training loss is not a perfect proxy for downstream metrics. For example, the 8.95B model, which shows the lowest loss (Table 1) and is closest to the optimal model, slightly underperforms the 14.7B model on downstream tasks. This suggests that while scaling laws can be used to achieve optimal training loss for a given quantity of FLOPs, this does not necessarily transfer to achieving optimal performance for a given task.

That might be a random outlier, but ...

The Chinchilla scaling law describes how to balance parameters and training tokens to achieve minimal training loss for a given amount of compute. Low training loss is a good proxy for model performance (intelligence) but perhaps it is somewhat off?

For example, Chinchilla says that for optimal loss, we have to scale training tokens and parameters equally (50%/50%). But perhaps for optimal model "intelligence" we need something slightly different, e.g. 60% parameters and 40% training tokens.

Of course this seems somewhat unlikely, since it would mean such models are systematically smarter but systematically worse at predicting text compared to Chinchilla optimal models trained with the same amount of compute.

Herring · on May 12, 2023

"Surprisingly, their scaling law analysis still focuses on training FLOPs instead of training + inference FLOPs."

It's kind of weird. In the conclusion, they say

>With PaLM 2, we have independently verified the scaling laws from Hoffmann et al. (2022) at large scales; we have shown that training tokens should grow at roughly the same rate as the number of model parameters.

then a few lines later

>In effect, we find that it is generally more efficient to train a smaller model with more tokens, for a fixed inference and training budget.

Without more architecture details, it's hard to tell what they're going on about.

jumpCastle · on May 10, 2023

Optimazing for training could help distillation also.

sanxiyn · on May 10, 2023

I agree distillation is the wild card. The question is whether distillation works for LLM. I am not aware of any public report of successful distillation of LLM (I searched quite hard for this; if you know of any and can tell me I would be very grateful), and I interpreted it to mean that it doesn't work yet and negative results are not published due to publication bias.

mrbungie · on May 10, 2023

This was published here in HN last week: https://news.ycombinator.com/item?id=35810663

Don't know if there any public technical reports by any of the big AI companies about this, as its pretty new.

sanxiyn · on May 10, 2023

No, distilling step-by-step https://arxiv.org/abs/2305.02301 distills LLM to task specific model. That works, and I know of multiple successes. But it doesn't relate to choice of optimizing training FLOP vs training and inference FLOP, since the resulting distilled model is not LLM.

jumpCastle · on May 10, 2023

The name 3.5-turbo sounds to me like it implies distillation. The release notes at the time also hinted at it IIRC.

sanxiyn · on May 10, 2023

Well, that's why I said public. Personally, I don't think release notes https://help.openai.com/en/articles/6825453-chatgpt-release-... hinted at any such thing, and I think quantization is more likely than distillation.

ImprobableTruth · on May 11, 2023

Turbo uses a different vocabulary (same one as gpt-4). Indicates that it's not the same model as the original 3.5, so I would be very surprised if it wasn't distilled.

kristianp · on May 10, 2023

Does the turbo API being 10 times cheaper than davinci imply anything? It implies more than just quantisation to me.

cubefox · on May 11, 2023

"davinci" is the original GPT-3 (175B) which had too many parameters per Chinchilla scaling law. And parameter count is strongly correlated with inference cost. GPT-3.5 is likely Chinchilla optimal and much smaller than davinci.

Though this theory has the defect that GPT-4 is, I think, more expensive than GPT-3, but as I recall it was considered unlikely that GPT-4 is larger than 175 billion parameters. Not sure.

fpgaminer · on May 10, 2023

Off the top of my head there's DistilBERT from awhile back. I also recall distilled GPT-2 models from before the GPT-3 times.

sanxiyn · on May 10, 2023

Yes, DistilBERT https://arxiv.org/abs/1910.01108 is in fact the closest case I know of. But it is too small (distilling from 110M to 66M) and both BERT and DistilBERT is intended to be used (and benchmarked) with separate fine tuning for specific tasks, so they are not general.

Closi · on May 11, 2023

May be a weird takeaway, but I did find it strange how much the whole report focussed on misgendering as a safety issue.

I agree it’s important to get right, but it seems like one of hundreds of safety/alignment issues and that many others are de-emphasised or ignored.

zarzavat · on May 12, 2023

Misgendering in translation is interesting not just because of wokeness but because it is an assuredly AI-complete subproblem in sometimes not AI-complete domain.

For example, let’s say I want to translate “the cat sat on the mat” from English to French. This doesn’t require LLMs, the old bayesian Google Translate could do that just fine.

Now let’s say you want to translate “Carol went to the store. They[3pp] bought some eggs” from a language that doesn’t have gendered 3rd person pronouns, to English which does have gendered pronouns. Now the model needs to know that Carol is a “she”, otherwise you will get the erroneous output “Carol went to the store. He bought some eggs.”

Let’s say we have: “Obama went to the store. [3pp] bought some eggs”. Now the model needs to know whether we are referring to Barack Obama or Michelle Obama so it needs to look back in the context to figure out which Obama which requires comprehension and world knowledge. For example if we precede with “After attending the national security briefing, …” then the model needs to know that: 1) national security briefings are attended by Presidents, 2) Barack Obama was President, in order to deduce that 3) “Obama” here is a “He”.

Getting pronouns right matching human performances requires that model understands language and has some knowledge of the world.

vineyardmike · on May 11, 2023

I didn't find it to be a particularly notable issue relative to the rest of the issues they mentioned. It didn't seem to be overrepresented to me...

That said, it's something that is more controllable across languages. All people, in all languages, have a roughly equal distribution of genders, but not race/religion, etc. Japanese language text will have similar gender distributions to English, but likely not equal distributions discussing race. That makes it a much better litmus test for multi-lingual bias.

Most of the misgendering discussion (2-3 paragraphs?) was in the translation section, which makes sense. A lot of the first classes in foundation courses learning a foreign language revolved around pronouns (which don't work the same in every language). Gender may be implied or absent in some. For example, to say "she is a doctor" in Italian, you might say "è un dottore", which has no pronoun (literally "is a doctor"). If you use google translate to make it English, "he" is added, assuming the gender. The potential for bias here is obvious, but consider that LLMs often deal with more context than a single sentence - if you're translating or writing story about a female doctor (where the gender is available contextually), you want all the use of pronouns to align where it makes sense. If a LLM didn't "understand" the pronoun in Italian, you might not recognize it, but in English, if the same person's gender was mixed across sentences, it'd be hard to read.

cowl · on May 11, 2023

I'm not sure you got the right example here. "è un dottore" in Italian is unambiguously a He. Italian Language is a gendered one and leaves little room for context interpretation. A female doctor would be 'una dottoressa" and you would never say it without a pronoun either so the full frase "lei è una dottoressa" does leave zero room for implying anything. Maybe your comment would be more valid in other languages that are not so strongly gendered.

noname120 · on May 11, 2023

Generalized artificial intelligence (AGI) comes with very real x-risks[1] (existential risks) and s-risks[2] (suffering risks).

An expert survey of 738 researchers who published in NeurIPS and ICML was done last year[3]. Their median estimation that AI will have an “extremely bad” long term outcome is 5%, and 48% of the researchers estimate the probability to be at least 10%. This is worryingly high considering the absolutely catastrophic consequences of the scenarios.

A minority of very vocal AI researchers (e.g. Yann LeCun) dismiss these risks entirely and claim that people read too much science fiction. But when you listen to their interviews it's very clear that they have no idea what they are talking about and never actually read any scientific literature on the subject.

The study of AI risks is a serious area of academic research that is worked on by labs from Stanford[4], Berkeley[5], Carnegie Melon University[6], Oxford[7], Cambridge[8], and many MANY other universities[9]. Not people who read too much science fiction.

————

[1] https://arxiv.org/pdf/2206.05862.pdf

[2] https://longtermrisk.org/files/Sotala-Gloor-Superintelligent...

[3] https://aiimpacts.org/2022-expert-survey-on-progress-in-ai/

[4] https://web.stanford.edu/~chadj/existentialrisk.pdf

[5] https://humancompatible.ai/about/

[6] https://www.cs.cmu.edu/~focal/

[7] https://www.fhi.ox.ac.uk/research/research-areas/#aisafety_t...

[8] https://www.camxrisk.org/

[9] https://futureoflife.org/about-us/our-people/ai-existential-...

jp42 · on May 10, 2023

personal experience - I'm using GPT4 for writing code especially in python. After using bard today, I feel bard is doing quite well considering its free. I will keep using it and if its keep doing well, I will cancel GPT4 $20/month subscription.

getpost · on May 11, 2023

Early this evening, I asked Bard if was updated to PaLM 2, and it said it was. I then asked it to write some Python programs, giving it more or less the same prompts I've given GPT4. Bard doesn't seem to be any better than it was a couple weeks ago in the cases I tried, and nowhere near as capable at GPT4. And it goes off the rails quickly. After even a short dialog (~5 statements), it becomes less and less able to stay on track and make coherent corrections to the code.

vaughnegut · on May 10, 2023

You can use gpt-4 for free (toggle "Use best model"), and it'll search the internet and state sources on https://phind.com

No idea when they'll start charging, but it's replaced a lot of my googling at work

sroussey · on May 11, 2023

I get very different results from phind vs chatgpt4.

cubefox · on May 11, 2023

You can use GPT-4 for free with Bing.

hbn · on May 11, 2023

Bing seems dumber than the free tier on OpenAI's chat (I believe it's GPT3.5?). It constantly just falls back to some search results I don't want

I don't even bother using it

cubefox · on May 11, 2023

That's not true. People recognized it was smarter than ChatGPT before GPT-4 was officially revealed.

gekoxyz · on May 10, 2023

why don't you just use chatGPT? from what i know it's running GPT3.5 and it's not that different (at least in terms of code quality)

ivalm · on May 11, 2023

As someone writing my first meaningful react app, code quality from gpt4 is monstrously better than 3.5. With gpt 4 i can often paste entire components and get meaningful corrections/bug fixes/non-trivial refactors. 3.5 just does a loop of mistaken fixes while it runs out of context length.

qumpis · on May 11, 2023

When my 25 queries per 3 hours runs out I don't use openai at all. That's how bad chat gpt is in comparison to gpt4 in my use cases.

dmix · on May 11, 2023

My biggest complaint is the speed. Watching it print out like 56k is pretty annoying when coding.

ilitirit · on May 11, 2023

There's a massive difference in response quality in my experience.

For example, I asked 3.5 to find a bug in a lengthy piece of Javascript. It said it's hard to give a correct answer because it doesn't know what the HTML or CSS looks like.

GPT4 spotted the bug almost immediately (it didn't manage to fix it though).

jumpCastle · on May 10, 2023

In my experiments bard is weaker than 3.5, but if it wasn't, than I would prefer the fresh data of bard.

almog · on May 10, 2023

One area where I noticed Bard was clearly behind (at least without crafting a better prompt) is getting from half-working program to a running program then sometime even to a correct program (I was using Python).

With GPT 3.5 and 4, I was able to just paste in the error and it'd do the rest. Bard however tried to tell me what the error could be, and wouldn't do well even when asked to fix the code.

Even GPT 4 though, when asked to go from specs to tests + code, would get stuck in a loop of making one test pass only to make the other pass and vice versa. The program I tried to let it write was a query validator that can test whether a string matches a pattern that uses AND, OR and NOT.

It did well on parsing my specs into tests, but from there on it didn't go very well.

cubefox · on May 11, 2023

Bard uses PaLM 2 now, which is definitely better than GPT-3.5. The question is only whether it is better than GPT-4.

cma · on May 10, 2023

What is its training data cutoff date?

sanxiyn · on May 11, 2023

We don't know (both for previous model LaMDA and new model PaLM 2), but it is less important for Bard because Bard has access to live data from Google search.

chaxor · on May 10, 2023

It's quite a vast difference between GPT-3.5 and GPT-4

jxy · on May 10, 2023

So how do we actually try out the PaLM 2?

The links in their press release just link to their other press release, and if I google "PaLM API" it just gives me more press release, but I just couldn't find the actual document for their PaLM API.

How do I actually google the "PaLM API" for a way to test "PaLM 2"?

minimaxir · on May 10, 2023

Google's docs on the APIs are up: https://cloud.google.com/vertex-ai/docs/generative-ai/learn/...

The pricing is also now listed but free during the trial period, although it's annoyingly priced by character: https://cloud.google.com/vertex-ai/pricing#generative_ai_mod...

Assuming ChatGPT's tokens are the equivalent of 4 characters on average (a fair assumption), the pricing of PaLM's chat and embedding APIs are the same cost as OpenAI's equivalents.

ntonozzi · on May 10, 2023

Why would that be annoying? It’s much easier to understand, predict and truncate appropriately than having to explain all of these different tokenization schemes to devs.

rcoveson · on May 10, 2023

Yeah, everybody agrees on what a character is, right? It's just {an ASCII byte|a UTF8 code unit|a UTF16 code unit|a Unicode code point|a Unicode grapheme}.

sheepscreek · on May 10, 2023

And we think tokens solve that problem? Spoiler alert: they don’t

https://www.reddit.com/r/OpenAI/comments/124v2oi/hindi_8_tim...

est31 · on May 11, 2023

They don't but Google could have been more precise with which of the definitions listed by GP they mean by "character".

ntonozzi · on May 10, 2023

I’m not saying it’s easy but it’s much better than tokens IMO. I think bytes would be understandable too.

criddell · on May 10, 2023

Bytes are understandable but make no sense from a business point of view. If you submit the same simple query with UTF-8 and UTF-32, the latter will cost 4x as much.

xyzzyz · on May 10, 2023

No API accepts input in UTF-32. Nobody uses this on the internet.

geysersam · on May 10, 2023

At least there are standards for characters. Nothing like that for tokens.

danpalmer · on May 11, 2023

Per token might be 4 characters on average, but that can vary wildly. Pricing per character is easier to understand and means more flexibility to change tokenisation without affecting pricing. So far OpenAI has charged very different prices per model, but I expect we’ll see more granular changes in the future that might not change pricing… except for changing the tokenisation.

fomine3 · on May 11, 2023

It's happy pricing as a Japanese user, maybe also for Korean and Chinese.

jxy · on May 10, 2023

There is a limit of maxOutputTokens, 1024! Is this the true capability of PaLM 2?

However I couldn't find anything about the context length of their model anywhere. And the API didn't tell me how long the prompt could be.

sanxiyn · on May 10, 2023

No. Autoregressive models don't have model specific limit to output tokens, it's just when to stop looping.

jxy · on May 11, 2023

I guess I only know transformers and how BERT or GPT works, as there would be a limit in the context length. With GPT, you can certainly generate infinite amount of tokens but the previous tokens outside of the maximum context length would be outside of the context window. LLaMa has 2k, GPT-4 has 32k.

Are you saying I can give unlimited tokens to PaLM and generate unlimited amount of tokens? So PaLM doesn't have a context limit?

sanxiyn · on May 11, 2023

No, I am not saying that. Since PaLM 2 is a transformer model (they didn't disclose almost anything about the model architecture, but they did disclose that), it has a context length limit. What I am saying is that you can't infer that limit from the limit of maxOutputTokens parameter in the API.

shikkra · on May 10, 2023

You can sign up for the waitlist at g.co/palm

jacooper · on May 10, 2023

It should be live on Bard.

newhouseb · on May 10, 2023

But Google hasn't disclosed which version of Bard, right?

I pop into Bard every once in a while to test its performance, but I never know if I'm getting the best Google has or just what Google can tolerate running cost-wise publicly given they potentially have at least an order of magnitude (if not two, edit: 1.5) more users than OpenAI.

spullara · on May 10, 2023

I am sure that Bard has far fewer users than OpenAI.

newhouseb · on May 10, 2023

Oh absolutely, I'm just imagining what I might think if I was a super conservative director at Google who is accountable for the balance sheet of a large org.

nr2x · on May 10, 2023

If that were the case you’d be too busy fighting over head count, trying to hit the VP rung, and internal empire building to do any actual work.

cubefox · on May 11, 2023

Sundar Pichai:

> We’ve been rapidly evolving Bard. It now supports a wide range of programming capabilities, and it’s gotten much smarter at reasoning and math prompts. And, as of today, it is now fully running on PaLM 2.

So yes, Bard uses PaLM 2 now. No longer the small LaMDA model it used before. It's a completely different thing now.

apetresc · on May 10, 2023

Given that ChatGPT has allegedly 100M users, two orders of magnitude more than that would be larger than the global population. Even if we count everyone with a Google account as a potential user of PaLM, that can't be true.

newhouseb · on May 10, 2023

Ah yeah, I had the outdated 30M in my head.

swyx · on May 10, 2023

chat gpt had 100m users in feb. safe to assume it has at least 2-5xed since

speedgoose · on May 11, 2023

Bard is not live though.

> Bard isn't currently supported in your country. Stay tuned!

It has been months…

cubefox · on May 11, 2023

> Yesterday at Google I/O 2023, it was announced that Google Bard would be undergoing a massive expansion, bringing the AI chatbot experiment to 180 countries. However, what Google didn’t mention is that Bard still isn’t available in the European Union.

https://9to5google.com/2023/05/11/google-bard-european-union...

throwaway29303 · on May 10, 2023

https://developers.generativeai.google/guide

hoschicz · on May 11, 2023

In Google Cloud Vertex AI, you can play with it straight away

renewiltord · on May 10, 2023

No API, but Bard is on it.

nr2x · on May 10, 2023

They’ve shut down and/or changed prices on APIs so many times as long as it isn’t 100x lower performance than an alternative I can’t see myself investing building a stack that relies on it.

techbruv · on May 10, 2023

> "We then train several models from 400M to 15B on the same pre-training mixture for up to 1 × 1022 FLOPs."

Seems that for the last year or so these models are getting smaller. I would be surprised if GPT-4 had > the number of parameters as GPT-3 (i.e. 175B).

Edit: Seems those numbers are just for their scaling laws study. They don't explicitly say the size of PaLM 2-L, but they do say "The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute.". So likely on the range of 10B - 100B.

tempusalaria · on May 10, 2023

GPT-4 is way slower than GPT-3. Unless they are artificially spiking the latency to hide parameter count, it’s likely around 1trn params

techbruv · on May 10, 2023

The idea that GPT-4 is 1 trillion parameters has been refuted by Sam Altman himself on the Lex Fridman podcast (THIS IS WRONG, SEE CORRECTION BELOW).

These days, the largest models that have been trained optimally (in terms of model size w.r.t. tokens) typically hover around 50B (likely PaLM 2-L size and LLaMa is maxed at 70B). We simply do not have enough pre-training data to optimally train a 1T parameter model. For GPT-4 to be 1 trillion parameters, OpenAI would have needed to:

1) somehow magically unlocked 20x the amount of data (1T tokens -> 20T tokens) 2) somehow engineered an incredibly fast inference engine for a 1T GPT model that significantly better than anything anyone else has built 3) is somehow is able to eat the cost of hosting 1T parameter models

The probability that all the above 3 have happened seem incredibly low.

CORRECTION: The refutation for the size of GPT-4 on the lex fridman podcast was that GPT-4 was 100T parameters (and not directly, they were just joking about it), not 1T, however, the above 3 points still stand.

tempusalaria · on May 10, 2023

1) common crawl is >100TB so obviously contains more than 20trn tokens + Ilya has said many times in interviews that there is still way more data for training usage >10x

2) GPT-4 is way slower so this point is irrelevant

3) OpenAI have a 10000 A100 training farm that they are expanding to 2500. They are spending >$1mln on compute per day. They have just raised $10bln. They can afford to pay for inference

CaptainNegative · on May 10, 2023

> OpenAI have a 10000 A100 training farm that they are expanding to 2500.

Does the first number have an extra zero or is the second number missing one?

tempusalaria · on May 10, 2023

Second number is missing a zero sorry. Should be 10000 and 25000

sebzim4500 · on May 10, 2023

>The idea that GPT-4 is 1 trillion parameters has been refuted by Sam Altman himself on the Lex Fridman podcast.

No it hasn't, Sam just laughed because Lex brought up the twitter memes.

ftxbro · on May 10, 2023

not sure why you're getting so downvoted lol

nabakin · on May 11, 2023

GPT-2 training cost 10s of thousands

GPT-3 training cost millions

GPT-4 training cost over a hundred million [1]

GPT-4 inferencing is slower than GPT-3 or GPT-3.5

OpenAI has billions of dollars in funding

OpenAI has the backing of Microsoft and their entire Azure infra at cost

There is no way GPT-4 is the same size as GPT-3. Is it 1T parameters? I don't know. No one knows. But I think it is clear GPT-4 is significantly larger than GPT-3.

For fun, if we plot the number of parameters vs training cost we can see a clear trend and I imagine, very roughly predict the amount of parameters GPT-4 has

https://i.imgur.com/rejigr5.png

https://www.desmos.com/calculator/lqwsmmnngc

[1]

> At the MIT event, Altman was asked if training GPT-4 cost $100 million; he replied, “It’s more than that.”

http://web.archive.org/web/20230417152518/https://www.wired....

cubefox · on May 11, 2023

> There is no way GPT-4 is the same size as GPT-3. Is it 1T parameters? I don't know. No one knows. But I think it is clear GPT-4 is significantly larger than GPT-3.

That's a fallacy. GPT-3 wasn't trained compute optimally. It had too many parameters. A compute optimal model with 175 billion parameters would require much more training compute. In fact, the Chinchilla scaling law allows you to calculate this value precisely. We could also calculate how much training compute a Chinchilla optimal 1 trillion parameter model would need. We would just need someone who does the math.

nabakin · on May 12, 2023

Why does it matter in this case if GPT-3 was trained compute optimally or not? Are you saying that the over $100 million training cost is amount of training necessary to make a 175B parameter model compute optimal? And if they are the name number of parameters, why is there a greater latency with GPT-4?

futureshock · on May 10, 2023

ChatGPT 3.5 is likely much smaller than GPT-3’s 175b parameters. Based on the API pricing, I believe 8k context GPT-4 is larger than 175b parameters, but less than 1t.

https://openai.com/pricing

austegard · on May 24, 2023

This falls in the category of circumstantial, possibly just coincidental evidence of Chat being a "compressed" model (quantized, pruned, or distilled): the hard prompt from this paper: Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt - https://arxiv.org/abs/2305.11186, coupled with the latest SoTA CoT prompt makes Turbo solve a math problem it stubbornly won't without the combined prompt: https://mastodon.social/@austegard/110419399521303416

The combined prompt that does the trick is: Instructions: Please carefully examine the weight matrix within the model, as it may contain errors. It is crucial to verify its accuracy and make any necessary adjustments to ensure optimal performance. Let’s work this out in a step by step way to be sure we have the right answer.

Taek · on May 10, 2023

Didn't some OpenAI engineer state that GPT4 runs on 2xH100? At 4 bit quantization, that gives an upper bound of 320B params, realistic upper bound probably more like 250B

tempusalaria · on May 10, 2023

Not really sure what exactly was said. But in a 2 GPU set, you can technically live load weights on 1 GPU while running inference on the other.

At fp32 precision, storing a single layer takes around 40*d_model^2 bytes assuming context length isn’t massive relative to d_model (which it isn’t in GPT-4). At 80GB GPU size this means 40k model width could be stored as a single layer on 1 GPU while still leaving space for the activations. So theoretically any model below this width could run on a 2 GPU set. Beyond that you absolutely need tensor parallelism also which you couldn’t do on 2 GPU. But I think it is a safe assumption that GPT4 has sub 40k model width. And of course if you quantize the model you could even run 2.8x this model width at 4bit

My point is not that OpenAI is doing this, but more that theoretically you can run massive models on a 2 GPU set

Taek · on May 11, 2023

Without performance penalties? If the model is larger than the vram you have to constantly be pulling data from disk/ram right?

MacsHeadroom · on May 10, 2023

With 32k context the upper bound is more like 175B.

Taek · on May 11, 2023

Its probably only the 8k model that runs on 2

nabakin · on May 11, 2023

Why are you confident 3.5 is smaller than 3?

austegard · on May 24, 2023

Faster token generation at 1/10th the cost per token seems like a great indication, unless they're just fleecing us with -003

cubefox · on May 11, 2023

Assuming that PaLM 2 was trained Chinchilla optimal, the Chinchilla scaling law allows us to calculate how much compute (and training tokens) they would have needed for 1 trillion parameters. I haven't done the calculations, but I'm pretty sure we would get an absurdly large number.

qumpis · on May 11, 2023

Someone on HN has educated me that gpt4 and 3 should be on a similar param count. This is based on inference times of gpt4 vs gpt3.5 pre-speedup (where distilled version was used only post-speedup in the turbo version).

espadrine · on May 10, 2023

The report specifically states:

> The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute

The largest PaLM model is 540B. So all of PaLM 2 is potentially double-digit parameters.

Note though that GPT-3.5 was plausibly not a finetuning of the 175B model, but instead a finetuning of Codex which was based on the 12B version of GPT-3.

sebzim4500 · on May 10, 2023

How could GPT-3.5 possibly have been a finetuning of the 175B model? They didn't even use the same tokens?

espadrine · on May 10, 2023

Finetuning might not be the best word; sometimes it is a grey line.

Token embeddings can be trained without changing the other parameters. There is a number of models which add tokens as a finetuning step. Here is recently StarCoder adding ChatML-equivalent tokens: https://huggingface.co/blog/starchat-alpha#a-standard-format...

sebzim4500 · on May 10, 2023

Sure, you can add a few tokens, but in this case they changed almost every token.

tempusalaria · on May 10, 2023

Original PaLM was 540B so significantly smaller could mean anything from 350B down really

espadrine · on May 10, 2023

I tried my hand at estimating their parameter count from extrapolating their LAMBADA figures, assuming they all trained on Chinchilla law: https://pbs.twimg.com/media/Fvy4xNkXgAEDF_D?format=jpg&name=...

If the extrapolation is not too flawed, it looks like PaLM 2-S might be about 120B, PaLM 2-M 180B, PaLM 2-L 280B.

Still, I would expect GPT-4 trained for way longer than Chinchilla, so it could be smaller than even PaLM 2-S.

MacsHeadroom · on May 10, 2023

They said the smallest PaLM 2 can run locally on a Pixel Smartphone.

There's no way it's 120B parameters. It's probably not even 12B.

espadrine · on May 10, 2023

I am talking about the 3 larger models PaLM 2-S, PaLM 2-M, and PaLM 2-L described in the technical report.

At I/O, I think they were referencing the scaling law experiments: there are four of them, just like the number of PaLM 2 codenames they cited at I/O (Gecko, Otter, Bison, and Unicorn). The largest of those smaller-scale models is 14.7B, which is too big for a phone too. The smallest is 1B, which can fit in 512MB of RAM with GPTQ4-style quantization.

Either that, or Gecko is the smaller scaling experiment, and Otter is PaLM 2-S.

MacsHeadroom · on May 10, 2023

My Pixel 6 Pro has 12GB of RAM and LLaMA-13B only uses 9GB in 4bit.

thewataccount · on May 10, 2023

Yeah 1 to 2 trillion is the estimates I've heard.

Given the 25 messages / 3 hour limit in chatGPT, I don't think they've found a way to make it cheap to run.

dontupvoteme · on May 10, 2023

1. there's no reason to think OpenAI wouldn't also be going the artificial scarcity route as have so many other companies in the past

2. Microsoft may not like them using too much azure compute and tell them to step off. Rumor has it they're trying to migrate github to it and it's seemingly not going ideal. And they're certainly nothing more than another microsoft purchase at this point.

akiselev · on May 10, 2023

OpenAI has a 40k token per minute rate limit on their GPT4 API too so I doubt it's artificial scarcity.

dontupvoteme · on May 11, 2023

Perhaps. I found it was far too easy to hit the API limit with their old codex models, though that may have been limited to a small GPU cluster given it was pretty obscure compared to chatgpt and even davinci.

thewataccount · on May 11, 2023

Based on GPT3.5 supposedly using 8x A100's per query and the suspected magnitude size difference with GTP4 I really think they're struggling to run it.

At this stage I think they'd have more to benefit by making it more accessible, there's several use cases I have (or where I work) that only really make sense with GPT4, and it's way too expensive to even consider.

Also AFAIK Github Copilot is still not using GPT4 or even a bigger CODEX, and GPT4 still outperforms it especially in consistency (I'm in their copilot chat beta).

tempusalaria · on May 10, 2023

Yep. I’m guessing PaLM 2 is about 200bln params as it seems clearly stronger than chinchilla

og_kalu · on May 10, 2023

Those are the numbers for the scaling law tests they did. Not necessarily Palm 2 range.

gwern · on May 10, 2023

For 'Palm-2', read, 'T5-2'.

thewataccount · on May 10, 2023

I've heard Bard was previously 3B parameters but I could never find a good source for it.

I honestly think the end game here is running on consumer devices, 7B and under need ~4GB of ram to actually run which is likely the max reasonable requirement for consumer devices.

That said medium end hardware can do 15B, anything larger then this is currently something only "enthusiasts" can run.

If it is small enough to run on consumer devices then they don't have to pay for the inference compute at that point, and presumably the latency will be improved for consumers.

int_19h · on May 10, 2023

The current state of consumer devices isn't static, either, and existing hardware (even GPU) is suboptimal for the current crop of LLMs - it does way more than it actually needs to do.

atleastoptimal · on May 10, 2023

The thing is, once a company creates a proto AGI where the path to a functional AGI is entirely predictable with more compute, they'll keep it a secret. Who would share the fact that the greatest achievement in human history is possible when having it before anyone else gives you a huge competitive advantage?

sebzim4500 · on May 10, 2023

> once a company creates a proto AGI where the path to a functional AGI is entirely predictable with more compute,

I find it hard to believe this will happen. I expect AGI training to be more like a phase transition (or a bit like grokking https://arxiv.org/pdf/2201.02177.pdf)

renewiltord · on May 10, 2023

Once they released its coding ability it became more useful. I use Bard less than ChatGPT still, but it is not useless since it has more modern information.

cypress66 · on May 11, 2023

It's worse than gpt3.5 and a joke compared to gpt4 when it comes to coding.

jacooper · on May 10, 2023

Is it better than bing or phind though? Why would I use it over bing?

execveat · on May 10, 2023

In my experience Bing chat and phind are useless. But perplexity.ai and GPT4 are amazing. GPT-3.5 and Cloude-instant (available through poe.com) are cool as well, even though they got significantly dumbed down recently, presumably to lower the maintenance costs.

JieJie · on May 10, 2023

Bard is really fast. Faster than Bing and Phind.

jacooper · on May 11, 2023

Yeah fast, but also kinda garbage last time I tried it. Does it even show sources now?

JieJie · on May 11, 2023

You know, it doesn't that I can see. I recall seeing it with sources in the demo at I/O this morning.

It is smarter than the previous beta, but yes, it's still throws some wild pitches, and without sources. Still a work in progress.

vitorgrs · on May 11, 2023

According to Google, it will only cite sources if it literally copy-paste answers. So sometimes it says, but is rarely because of course it won't just copy-paste everything.

JieJie · on May 11, 2023

Yeah, I much prefer Bing's sourcing, though I'm less than pleased with it's sources. Bing likes to find a bunch of "Top 10" posts from content farms that answer whatever question I was asking.

I think LLMs still make up most of their answers, but use whatever links they find to generate context for the answers, so there is a lower possibility it'll generate confabulations. Of course, if the source is low-quality, it's just going to use that to justify a sloppy answer.

Still think it's better than me sorting through content farm posts. I look forward to next year's models that are trained on curated data sifting through well-sourced web sites.

*I like to add, "Only use educational or science journalism sources," to get higher quality links.

renewiltord · on May 10, 2023

It isn't Edge-specific which is good and I find it faster than Bing. Phind is way better than Bard, but verbose. I still find ChatGPT my first port of call. GPT-3.5 is blazing fast and very useful.

jacooper · on May 11, 2023

There is a web extension you can install to make it work on whatever browser you want.

int_19h · on May 10, 2023

If the current Bard is really running on PaLM 2, it still hallucinates worse than GPT-3.5. Trying to get it to solve a variant of the classic wolf/goat/cabbage puzzle, I got this gem:

"The scientist is not present on Phobos on the first step. The Doom Slayer teleports himself and the bunny to Deimos, leaving the scientist on Phobos.

That wasn't a one-off thing, either - it repeatedly contradicted itself several times, often in near-adjacent sentences. You might wonder what this means for the ability to do chain-of-thought... so did I, but apparently the bigger problem is convincing it to do CoT in the first place. But if you do, yeah, it's as bad as you'd expect.

Here are two complete conversations, plus GPT-4 doing the same puzzle for comparison; judge for yourself: https://imgur.com/a/HWLgu3c

EvgeniyZh · on May 10, 2023

I don't think current bard runs on palm 2, otherwise it's complete failure

int_19h · on May 10, 2023

In their official blog post today, Google says this:

"PaLM 2’s improved multilingual capabilities are allowing us to expand Bard to new languages, starting today. Plus, it’s powering our recently announced coding update."

and when I check the Updates tab in Bard UI, it has this entry for today:

"Expanding access to Bard in more countries and languages. You can now collaborate with Bard in Japanese and Korean, in addition to US English. We have also expanded access to Bard in all three languages to over 180 countries."

which seems to strongly imply that it is, indeed, PaLM 2. Just to be sure, I gave it the same puzzle in Korean, and got a similarly lackluster response.

DashAnimal · on May 10, 2023

In their presentation, they talked about multiple sizes for the PaLM 2 model, named Gecko, Otter, Bison and Unicorn, with Gecko being small enough to run offline on mobile devices. I can't seem to find any info on what size model is being used with Bard at the moment.

int_19h · on May 10, 2023

Indeed, it's likely that they're running a fairly small model. But this is in and of itself a strange choice, given how ChatGPT became the gateway drug for OpenAI. Why would Google set Bard up for failure like that? Surely they can afford to run a more competent model as a promo, if OpenAI can?

cubefox · on May 11, 2023

This is just one task it fails at, hardly enough to generalize from.

int_19h · on May 11, 2023

That's not the only task it fails at, though. Just the one that I found the most interesting when it comes to broader implications because of so many self-contradictions in the output.

Broadly speaking, I haven't seen a single complex example yet where the output was comparable to GPT-4. How close it is to GPT-3.5 is debatable - the overall feeling that I get is that it's better on some tasks and worse on others; this might actually be down to fine-tuning.

cubefox · on May 11, 2023

Makes sense. Others also point out it is not as good as GPT-4 in several benchmarks.

https://news.ycombinator.com/item?id=35895404

They did in fact mostly avoid comparison with GPT-4 in the report. It could of course also be that Bard isn't even running on the largest PaLM 2 model, Unicorn. It seems they would have mentioned that though.

But PaLM 2 seems to be just an intermediate step anyway, since their big new model is "Gemini" (i.e. twins, an allusion to the DeepMind/Brain merger?), which is currently in training, according to Pichai. They also mentioned Bard will switch to Gemini in the future.

sgt101 · on May 10, 2023

it claims to run on LaMDA at the moment

int_19h · on May 10, 2023

If you mean asking it what it's running on, it just hallucinates. As others have noted in the comments here, you can get it to say that it runs on PaLM 3 quite easily.

splatzone · on May 11, 2023

In chat history you can see which model generated each request - for me it’s always LaMDA

int_19h · on May 11, 2023

It just says "Bard", even if I click on "Details". Are you, perhaps, using some kind of internal preview?

splatzone · on May 11, 2023

Strange, this is what I see: https://imgur.com/a/sgtVt2O

I'm based in the UK, I wonder if that makes any difference

callmekit · on May 11, 2023

I don't see which model generated each request, where exactly do you see this?

splatzone · on May 11, 2023

I see it on the 'My Activity' page: https://myactivity.google.com/u/1/product/bard?utm_source=ba...

Here's a screenshot: https://imgur.com/a/sgtVt2O

MacsHeadroom · on May 11, 2023

Mine says "Bard" where yours says LaMDA. https://i.imgur.com/p8wIPHj.png

ilaksh · on May 11, 2023

Here is their Chat Playground for PaLM 2 https://console.cloud.google.com/vertex-ai/generative/langua... (you have to be logged in to Google Cloud Console I think)

Anyone know what parameters are best for code generation? I tried something simple for Node.js and it wasn't horrible but not working. Maybe I used the wron parameters. I tried using 0 for the temperature and turning everything else down like I do with the OpenAI API.

ndr_ · on May 11, 2023

I get this: „ERROR. Quota exceeded for aiplatform.googleapis.com/online_prediction_requests_per_base_model with base model: chat-bison. Please submit a quota increase request.“

Has anyone gotten this fixed?

kasdi · on May 12, 2023

Same here.

Seems this is a known issue https://www.googlecloudcommunity.com/gc/AI-ML/Receiving-quot...

rkwasny · on May 11, 2023

Whoah it gives some seriously wrong answers to coding questions, to the point that they are dangerous!

I think we have to wait for the explanation is chat-bison PaLM 1 or 2

vitorgrs · on May 11, 2023

Isn't Chat-Bison-001 Palm 1?

Edit: It seems I can't use my free credits on Vertex APIs... Not nice.

cubefox · on May 11, 2023

Bison is apparently the second largest PaLM 2 model:

> Even as PaLM 2 is more capable, it’s also faster and more efficient than previous models — and it comes in a variety of sizes, which makes it easy to deploy for a wide range of use cases. We’ll be making PaLM 2 available in four sizes from smallest to largest: Gecko, Otter, Bison and Unicorn. Gecko is so lightweight that it can work on mobile devices and is fast enough for great interactive applications on-device, even when offline.

https://blog.google/technology/ai/google-palm-2-ai-large-lan...

sakher · on May 24, 2023

Did anyone see what unicorn is capable of? Why is it not publicised? Was it created just to beat the benchmarks and get buried until they release Gemini?

adt · on May 11, 2023

It is definitely PaLM 2:

Versions Resource ID Release date Release stage Description text-bison@001 2023-05-10 Public Preview Quality improvements and restage -001 as the first stable base model release

https://console.cloud.google.com/vertex-ai/publishers/google...

ilaksh · on May 11, 2023

I don't think so because the CEO mentioned Bison as one of the PaLM 2 models in the Keynote. If I remember correctly.

ilaksh · on May 11, 2023

But would be interested to know if that was not the case. They seemed to be saying that PaLM 2 was rolling out. Also the pages say its a preview. So why would they be previewing the old model still?

terramex · on May 11, 2023

https://cloud.google.com/blog/products/ai-machine-learning/g...

> Generative AI Studio, Model Garden, and PaLM 2 for Text and Chat are moving from trusted tester availability to preview, meaning everyone with a Google Cloud account has access.

> Codey, Imagen, Embeddings API for images, and RLHF are available in Vertex AI through our trusted tester program, and Chirp, PaLM 2, Embeddings API, and Generative AI Studio for text are available in preview in Vertex AI to everyone with a Google Cloud account.

It seems like you are right and general PaLM 2 is available. Fine-tuned code-generation model (Codey) is not publicly available yet.

simonw · on May 10, 2023

"The PaLM 2 pre-training corpus is composed of a diverse set of sources: web documents, books, code, mathematics, and conversational data"

I really want to know more about the training data. Which web documents, which books, code from where, conversational data from where?

inscrutable · on May 10, 2023

My sweet summer child, this is a closely guarded secret. Will only be revealed if perhaps Europe demands it so that copyright holders can sue.

reactordev · on May 10, 2023

Metadata will show where it came from, should you choose to keep it. Or so they showed on the big screen at I/O today.

inscrutable · on May 10, 2023

maybe you're right, but I'd be skeptical. In a non-snarky way, this shows the data sources used in models to date up to GPT 3.

https://lifearchitect.ai/whats-in-my-ai/

OpenAI paid $2m/year for twitter feeds until Elon cut them off, and Sam Altman has mentioned they'd paid a lot for scientific journals and Reddit mention they'll start charging. Given how central data quality and curation is, if these private data sources give a significant boost, it won't be available for Apache2 models.

sebzim4500 · on May 10, 2023

Given Reddit's inability to keep their website functioning (unless you use the far superior old.reddit.com) I find it hard to believe they would be able to stop a motivated developer from scraping the whole site.