No comparisons against GPT-4 except on three benchmarks where PaLM 2 does better...

reaperman · on May 10, 2023

I can't think of a paper where Google didn't present sparse or entirely lacking metrics vs. its peers. They do a good job of presenting architectures that they're excited about internally, enough detail to take the concepts and run with them. They also do a good job of showing why the new architecture is generally viable. They just miss out on detailed benchmark comparisons is all. And model weights, obviously, but there's still enough information to generally reproduce the concept.

I'm personally extremely excited about anything related to PaLM or google's multi-modal efforts. They're almost always worth the read.

tempusalaria · on May 10, 2023

Most of the GPT-4 benchmarks from their report were things like AP tests or leer code scores. Which aren’t benchmarks that can be compared by a different set of researchers as you don’t know the constituent parts of the test to run

YetAnotherNick · on May 10, 2023

GPT-4 report has MMLU score, which is believed to one of the most important metric for question answering task. GPT-4 MMLU score is slightly higher than PaLM 2(86 vs 81). Google didn't compare it in with PaLM 2 in this paper.

in3d · on May 11, 2023

86 vs 81 is not slightly higher. It’s 26% fewer wrong answer.

pama · on May 10, 2023

Table 2 of the OpenAI report had 7 public benchmarks and figure 5 had another 27.