Problem solving across 100,633 lines of code – Gemini 1.5 Pro Demo [video]

pfdietz · on Feb 15, 2024

I want to see something where we have a big piece of code, and a big standards document it purports to implement, and the system can answer questions like "is this part of the spec implemented? Where it is implemented? What does this piece of code mean (w.r.t. the spec)? If I implemented this part of the spec, where would the changes go?"

4silvertooth · on Feb 16, 2024

How about people just write the spec and AI give us the code on that, that would be mind blowing.

pfdietz · on Feb 16, 2024

Less immediately plausible, though.

A system for converting a natural language specification document into a formal specification would be interesting.

losvedir · on Feb 15, 2024

I'm pretty excited about the increased context length (e.g. in my other comment here[0]), but I'm kind of disappointed by the examples here.

The codebase is 100k lines, but the tasks it gave it seemed to be focused on just hundreds of examples. The examples are probably largely independent, so it doesn't seem like this is really flexing anything a relatively simple RAG approach with a much smaller context window couldn't handle. The prompts said "the demo that ...", so it's a matter of identifying the demo in question and looking at just that code, which is a much smaller necessary context. There was the "use the GUI approach from other examples" task, which kind of gets there, but that's kind of another distinct little bit of code.

In other words, while the codebase has lots of lines, the actual inference across them seemed to use relatively few of them, and identifying the relevant lines didn't seem that hard to me based on the tasks given. That means it could be done with some retrieval and a much smaller context window.

From the title, I thought it would be loading 100k into the context and then asking some deeper questions like "find the bug" that spans several function calls or something like that. Something that wouldn't be trivial to accomplish with current techniques.

[0] https://news.ycombinator.com/context?id=39384034

cheese4242 · on Feb 15, 2024

Why would anybody trust this after they faked the last Gemini demo?

buildbot · on Feb 15, 2024

Exactly. Give us access the model and let independant researchers test it. OpenAI did this with GPT4, opening access publicly and deeper access to researchers within and outside of Microsoft.

I simply don't believe the model is that good. Otherwise, maybe try to compete with OpenAI directley?

marfil · on Feb 15, 2024

Wonder why they're not just giving us access, if it's indeed so good? Seems it's just to generate some noise and hype around Gemini. Hardly believable after the previous faked demo, as someone already said.

jeffbee · on Feb 15, 2024

Google faces a different calculus than Microsoft/OpenAI when throwing these things out. It's just like Google Cloud. They have huge, valuable first-party workloads that compete for the hardware resources that would be used by generally-available free AI toys.

For Microsoft it doesn't make a difference. They are taking their own cash, investing it in OpenAI, and then turning right around and booking it as revenue. As a bonus it makes Google look wrong-footed. But fundamentally Microsoft doesn't care how much money they torch doing this.

1oooqooq · on Feb 15, 2024

Even the demo now is careful to show curated but possible things now. they learned their lesson.

The code changes are the most common tutorials you can find on the web. Adding a speed slider, the terrain tutorials are literary called "height maps" and focus on making it taller or flatter.

sjwhevvvvvsj · on Feb 15, 2024

The lesson was don’t get caught, not don’t do it.

mnk47 · on Feb 15, 2024

To be fair, they mostly faked the near instantaneous, real-time flow of the conversations. The answers were, as far as I know, legit. But I still agree that we should be skeptical.

2099miles · on Feb 15, 2024

The prompts they used were also different than the ones given like “is this the right order” was “is this the right order, consider the distance from the sun” they put this in their post on Google dev blog.

This one seems to be super straightforward about timeliness and capabilities, but the examples might be a bit simpler than people think. This is pretty amazing but like someone else said you could achieve similar results from rag due to the lack of novelty in these questions and the fact that each dealt with pretty independent examples as opposed to using custom code developed elsewhere in the codebase.

someotherperson · on Feb 15, 2024

It's really interesting to see where all of this is going. I guess a large part of the best practices behind clearly naming things for human interpretation has allowed for training and evaluation of things by ML models too.

Meta, but the speaker sounds eerily close to Mark Zuckerberg.

riku_iki · on Feb 15, 2024

Companies will fire 70% boilerplate coders in following years

jakewins · on Feb 15, 2024

Na we’ll all be moved to perpetual on-call, every day an endless fire drill as hundreds of services are launched on top of the crumbling crash looping burning landscape of millions of services launched last quarter, a Mad Max world of endless adrenaline and New Relic AI-enhanced alerts.

rokkitmensch · on Feb 16, 2024

Complexity will expand to consume all resources allocated to manage it.

riku_iki · on Feb 15, 2024

but some companies which build reliable old school software will win the market?..

visarga · on Feb 15, 2024

Or just write 3x more boilerplate. I don't think we are saturated with software yet.

m3kw9 · on Feb 15, 2024

Is this where unit tests will be very useful, where you ask it to fix all bugs found, and make sure it passes all unit tests. This is where all the github's public repos will get really interesting forks.

rst · on Feb 15, 2024

What they did in this demo is collect a bunch of small demos, small enough that earlier models could have answered questions about them, or dinked them, individually, and mostly demonstrated that the model could figure out which demo was pertinent to the question that they were asking, and focus only on that.

But the input was still divisible into self-contained little bits -- so this is still somewhat different from dumping the full source code for a database engine into it, and having it answer questions about, say, where foreign key constraints are implemented -- or, more dramatically, how several different parts of the codebase work together to implement, say, transaction isolation levels.

jimmySixDOF · on Feb 16, 2024

I like the fact this is a Three.js example because it implies some concept of 3D understanding (not exactly in the examples given) of worldspace as x-y-z and I have had hard times just getting xy plots right with GPT4. This would be a nice bonus in addition to the context increase.