Again one of the few advantages of having been round the sun a few more times than most is this isn’t the first time this has happened.
Packages were supposed to replace programming. They got you 70% of the way there as well.
Same with 4GLs, Visual
Coding, CASE tools, even Rails and the rest of the opinionated web tools.
Every generation has to learn “There is no silver bullet”.
Even though Fred Brooks explained why in 1986. There are essential tasks and there are accidental tasks. The tools really only help with the accidental tasks.
AI is a fabulous tool that is way more flexible than previous attempts because I can just talk to it in English and it covers every accidental issue you can imagine. But it can’t do the essential work of complexity management for the same reason it can’t prove an unproven maths problem.
As it stands we still need human brains to do those things.
This seems to apply to all areas of AI in its current form and in my experience 70% may be a bit generous.
AI is great at getting you started or setting up scaffolds that are common to all tasks of a similar kind. Essentially anything with an identifiable pattern. It’s yet another abstraction layer sitting on an abstraction layer.
I suspect this is the reason we are really only seeing AI agents being used in call centers, essentially providing stand ins for chatbots- because chatbots are designed to automate highly repetitive, predictable tasks like changing an address or initiating a dispute. But for things like “I have a question about why I was charged $24.38 on my last statement” you will still be escalated to an agent because inquiries like that require a human to investigate and interpret an unpredictable pattern.
But creative tasks are designed to model the real world which is inherently analog and ever changing and closing that gap of identifying what’s missing between what you have and the real world and coming up with creative solutions is what humans excel at.
Self driving, writing emails, generating applications- AI gets you a decent starting point. It doesn’t solve problems fully, even with extensive training. Being able to fill that gap is true AI imo and probably still quite a ways off.
> But for things like “I have a question about why I was charged $24.38 on my last statement” you will still be escalated to an agent because inquiries like that require a human to investigate and interpret an unpredictable pattern.
Wishful thinking? You'll just get kicked out of the chat because all the agents have been fired.
> You know what is even cheaper, more scalable, more efficient, and more user-friendly than a chatbot for those use cases?
> A run of the mill form on a web page. Oh, and it's also more reliable.
Web-accessible forms are great for asynchronous communication and queries but are not as effective in situations where the reporter doesn't have a firm grasp on the problem domain.
For example, a user may know printing does not work but may be unable to determine if the issue is caused by networking, drivers, firmware, printing hardware, etc.
A decision tree built from the combinations of even a few models of printer and their supported computers could be massive.
In such cases, hiring people might be more effective, efficient, and scalable than creating and maintaining a web form.
> but are not as effective in situations where the reporter doesn't have a firm grasp on the problem domain
Hum... Your point is that LLMs are more effective?
Because, of course people are, but that's not the point. Oh, and if you do create that decision tree, do you know how you communicate it better than with a chatbot? You do that by writing it down, as static text, with text-anchors on each step.
> Because, of course people are, but that's not the point.
Are they?
If the LLMs could talk to grandma for 40 minutes until it figures out what her problem actually is as opposed to what she thinks it is and then transfer her over to a person with the correct context to resolve it, I think that's probably better than most humans in a customer service role. Chatting to grandma being random for an extended amount of time is not something that very many customer service people can put up with day in and day out.
The problem is that companies will use the LLMs to eliminate customer service roles rather than make them better.
Great analysis, and I agree it's Fred Brooks' point all over again.
None of these tools hurt, but you still need to comprehend the problem domain and the tools -- not least because you have to validate proposed solutions -- and AI cannot (yet) do that for you. In my experience, generating code is a relatively small part of the process.
Yeah, its more like it can generate 70% of the code by volume, rather than get you 70% of the way to a complete solution. 12 week projects don't become 4 week projects, at best they are 9-10 week projects.
It’s an old one but I think Joel Spolsky‘s take on leaky abstractions is relevant again in this discussion as we add another abstraction layer with LLM assisted coding.
Agree.
So far "the progress" implied understanding (discovering) previously unknown things. AI is exactly the opposite: "I don't understand how, but it sorta works!"
Where AI really really shine is to help an engineer get proficient in a language they don't know well. Simon Willison says this somewhere and in my experience it's very true.
If you can code, and you understand the problem (or are well on your way to understanding it), but you're not familiar with the exact syntax of Go or whatever, then working with AI will save you hundreds of hours.
If you can't code, or do not (yet) understand the problem, AI won't save you. It will probably hurt.
I used to agree, but as an experienced engineer asking about rust and y-crdt, it sent me down so many wrong rabbit holes with half valid information.
I used Claude recently to refresh my knowledge on the browser history API and it said that it gets cleared when the user navigates to a new page because the “JavaScript context has changed”
I have the experience and know how to verify this stuff, but a new engineer may not, and that would be awful.
Things like these made me cancel all my AI subscriptions and just wait for whatever comes after transformers.
But actually whenever this happens you get rich signal about the hidden didactic assumptions of your method of prompting it about things you yourself are unsure you can verify for yourself, and also of how you thought the tool worked. This is a good meta skill to hone.
A visitor to physicist Niels Bohr's country cottage, noticing a horseshoe hanging on the wall, teased the eminent scientist about this ancient superstition. "Can it be true that you, of all people, believe it will bring you luck?"
"Of course not," replied Bohr, "but I understand it brings you luck whether you believe it or not."
s/luck/lower negative log likelihood. Besides, a participant can still think about and glean truths from their reflections about a conversation they had which contained only false statements.
The internal mechanisms by which a model achieves a low negative log likelihood become irrelevant as it approaches perfect simulation of the true data distribution.
I haven’t seen this demonstrated in gpt-4 or Claude sonnet when asking anything beyond the most extreme basics.
I consistently get subtly wrong answers and whenever I ask “oh okay, so it works like this” I always get “Yes! Exactly. You show a deep understanding of…” even though I was wrong based on the wrong info from the LLM.
Useless for knowledge work beyond RAG, it seems.
Search engines that I need to double check are worse than documentation. It’s why so many of us moved beyond stack overflow. Documentation has gotten so good.
That’s true. But if you are already an experienced developer who’s been around the block enough to call bullshit when you see it, these LLM thingies can be pretty useful for unfamiliar languages. But you need to be constantly vigilant, ask it the right questions (eg: “is this thing you wrote really best practice for this language? Cause it doesn’t seem that way”), and call bullshit on obvious bullshit…
…which sometimes feels like it is more work than just fucking doing it yourself. So yeah. I dunno!
> Where AI really really shine is to help an engineer get proficient in a language they don't know well.
I used GitHub Copilot in a project I started mostly to learn Go. It was amazing. I spent not so much time fiddling around with syntax, and much more time thinking about design.
I guess it depends on how you define "proficiency". For me, proficiency implies a fundamental understanding of something. You're not proficient in Spanish if you have to constantly make use of Google Translate.
Could code assistants be used to help actually learn a programming language?
- Absolutely.
Will the majority of people that use an LLM to write a Swift app actually do this?
- Probably not, they'll hammer the LLM until it produces code that hobbles along and call it a day.
Also, LEARNING is aided by being more active, but relying on an LLM inherently encourages you to adopt a significantly more passive behavior (reading rather than writing).
Not sure I get how that would work. It seems to me that to do my job I will have to validate the semantics of the program, and that means I will have to become familiar with the syntax of Go or whatever, at a fairly sophisticated level. If I am glossing over the syntax, I am inevitably glossing over the fine points of how the program works.
It depends on the language and the libraries you will use. Python with a well known library? Sure no problem. Almost any model will crank out fairly error free boilerplate to get you started.
Terraform? Hah. 4o and even o1 both absolutely sucked at it. You could copy & paste the documentation for a resource provider, examples and all, and it would still produce almost unusable code. Which was not at all helpful given I didn’t know the language or its design patterns and best practices at all. Sonnet 3.5 did significantly better but still required a little hand holding. And while I got my cloud architecture up and running now I question if I followed “best practices” at all. (Note: I don’t really care if I did though… I have other more important parts of my project to work on, like the actual product itself).
To me one of the big issues with these LLM’s is they have zero ability to do reflection and explain their “thought process”. And even if they could you cannot trust what it says because it could be spouting off whatever random training data it hovered up or it could be “aligned” to agree with whatever you tell it.
And that is the thing about LLM’s. They are remarkably good bullshitters. They’ll say exactly what you want them to and be right just enough that they fool you into thinking they are something more than an incredibly sophisticated next token generator.
They are both incredibly overrated and underrated at the same time. And it will take us humans a little while to fully map out what they are actually good at and what they only pretend to be good at.
Yes! Reading some basic documentation on the language or framework, then starting to build in Cursor with AI suggestions works so well. The AI suggests using functions you didn't even know about yet, then you can go read documentation on them to flesh out your knowledge. Learned basic web dev with Django and Tailwind this way and it accelerated the process greatly. Related to the article, this relies on being curious and taking the time to learn any concepts the AI is using, since you can't trust it completely. But it's a wonderfully organic way to learn by doing.
LLMs are a great help with terraform and devops configuration, they often invent things but at least the point at the documentation I need to look up on.
Of course everything needs double-checking but just asking the LLM: "how do I do X" will usually at least output all the names of terrraform resources and most configuration attributes I need to look up.
They are great for any kind of work that requires "magical incantations" as I like to call them.
So very much this. As I was learning Rust, I'd ask what the equivalent was for a snippet I could create in Java. It is funny. I look at the Java code provided by prompts and go meh. The Rust code looks great. I realize this is probably due to 1) me being that junior level in Rust or 2) less legacy crap in the training model. I'm sure it is both, with more of the former as I work from working to beautiful code.
The software development is absolutely a fractal. In 1960s we were solving the complexity by using high level language that compile to machine code to enable more people write simple code. This has happened again and again and again.
But different generations face different problems, which requires another level of thinking, abstraction, and push both boundaries until we reach the next generation. All of this is not solved by a single solution, but the combination based on basic principles that never changes, and these things, at least for now, only human can do.
Interestingly it seems like we are investing many more magnitudes of capital for smaller and smaller gains.
For example, the jump in productivity from adding an operating system to a computer is orders of magnitude larger than adding an LLM to a web development process despite the LLM requiring infrastructure that cost tens of billions to create.
It seems that while tools are getting more and more sophisticated, they aren’t really resulting in much greater productivity. It all still seems to be resulting in software that solves the same problems as before. Whereas when html came around it opened up use cases that has never been seen before despite being a very simple abstraction layer by today’s standards.
Perhaps the opportunities are greatest when you are abstracting the layer that the fewest understand when LLMs seem to assume the opposite.
The real gains in software are still to be had by aggressively destroying incidental complexity. Most of the gunk in a web app doesn't absolutely need to exist, but we write it anyway. (Look at fasthtml for an alternate vision of building web apps.)
The issue with LLMs is they enshrine the status quo. I don't want ossified crappy software that's hard to work with. Frameworks and libraries should have to fight to justify their existence in the marketplace of ideas. Subverting this mechanism is how you ruin software construction.
You mentioned a great point that LLMs are hitting the edge of a marginal gain decreasing point, at least I think so. Many applications are struggling to provide real benefits instead of just entertaining people.
Another funny thing is that we are using LLM to replace creative professionals, but the real creativity is from human experience, perception and our connections, which are exactly missing from LLM.
As someone is not an artist I want ai to do art so I can restore my antique tractor. Of course we all have diffeent hobbies but there are also hobbies we don't want to get into but may nee.
I think the parent comment mean "art" as "having fun", like playing a guitar, definitely no fun to see the robot playing it and not letting you even touch it.
AI generated art/music/etc is the answer to people having creative vision and lacking technical expertise or resources to execute it. There are lots of stories waiting to be told if only the teller had technical ability/time/equipment to tell it. AI will help those stories be told in a palatable way.
Curation of content is also a problem, but if we can come up with better solutions there, generative AI will absolutely result in more and better content for everyone while enabling a new generation of creators.
The AI will also take over your work of restoring antique tractors, much faster and cheaper. It won't be historically accurate, and it may end up with the fuel pump connected to the radio but it'll look mostly Good Enough. The price of broken tractors will temporarily surge as they need them for training data.
If it can create some decal close enough where nobody know the original other than fragmets that remain that helps. For common tractors we know but I'm interested in thing where exactly one is known to exist in the world.
I see it very differently. We are just at the very dawn of how to apply LLMs to change how we work.
Writing dumb scripts that can call out to sophisticated LLMs to automate parts of processes is utterly game changing. I saved at least 200 hours of mundane work this week and it was trivial.
My favorite example of this is grep vs method references in IDEs. Method references are more exact, but grep is much simpler (to implement and to understand for the user).
I think you're also right about LLMs. I think path forward in programming is embracing more formal tools. Incidentally, search for method references is more formal than grepping - and that's probably why people prefer it.
> The software development is absolutely a fractal.
I think this analogy is more apt than you may realize. Just like a fractal, the iterated patterns get repeated on a much smaller scale. The jump to higher-level languages was probably a greater leap then the the rest of software innovation will provide. And with each iterative gain we approach some asymptote, but never get there. And this frustration of never reaching our desired outcome results in ever louder hype cycles.
Too bad most of society is accidental as well. With which I mean to say that there are a lot of nonsensical projects being done out there, that still make a living for many people. Modern AI may well change things, similar to how computers changed things previously.
I get your sentiment, I've been through a few hype cycles as well, but besides learning that history repeats itself, there is no saying how it will repeat itself.
> With which I mean to say that there is a lot of nonsensical projects being done out there, that still make a living for many people.
I don't know why this is a bad thing. I don't think projects that you believe are nonsensical shouldn't exist just because of your opinion, especially if they're helping people survive in this world. I'm sure the people working on them don't think they're nonsensical.
The arts have a place in society. Tackling real problems like hunger or health do too, arguably more so - they create the space for society to tolerate, if not enjoy art.
But the down side is we have a huge smear of jobs that either don't really matter or only matter for the smallest of moments that exist in this middle ground. I like to think of a travel agent of yesteryear as the perfect example: someone who makes a professional experience of organising your leisure so you don't have to; using questionable industry deals. This individual does not have your consumer interests at heart, because being nice to you is not where the profit is generally.
The only role they actually play is rent seeking.
Efficiency threatens the rent seeking models of right now, but at the same time leads to a Cambrian explosion of new ones.
Yeah when you take 2 steps back, ignore IT for a second and look on whole mankind, there are hundreds of millions of jobs that could be called nonsensical from certain points of view. We are not above this in any meaningful ways, maybe its just a bit more obvious to keen eye.
Yet society and economy keeps going and nobody apart from some academic discussions really cares. I mean companies have 100% incentive to trim fat to raise income yet they only do the least minimum.
At this point, I don't think that (truly) AI-informed people believe that AI will replace engineers. But AI tools will likely bring a deep transformation to the workflow of engineers (in a positive and collaborative way).
It may not be tab-tab-tab all the way, but a whole lot more tabs will sneak in.
I think you have that backwards (sort of). The high tier programmers who can write things AI can't will be worth more since they'll be more productive, while the programmers below the AI skill floor will see their value drop since they've been commoditized. We already have a bimodal distribution of salaries for programmers between FAANG/not, this will just exacerbate that.
As somebody who makes extensive use of LLM’s, I very much disagree. Large language models are completely incapable of replacing the kind of stuff you pay a developer $200k for. If anything they make that $200k developer even more of a golden goose.
I suspect you're right, but I think it'll follow the COBOL engineer salary cycle, engineers that have a deeper understanding of the whole widget will be in demand when companies remember they need them.
No, I don’t believe you truly know where AI is right now. Tools like Bolt and v0 are essentially end to end development AIs that actually require very little knowledge to get value out of.
If I could sketch out the architecture I wanted as a flow chart annotated with types and structures, implementable by an AI, that would be a revolutionary leap.
I design top-down, component by component, and sometimes the parts don't fit together as I envisioned, so I have to write adapters or - worst case - return to the drawing board. If the AI could predict these mismatches, that would also be helpful.
Unfortunately, I don't think AI is great with only the accidental tasks either.
AI is really good at goldfish programming. It's incredibly smart within its myopic window, but falls apart as it is asked to look farther. The key is to ask for bite sized things where that myopia doesn't really become a factor. Additionally, you as the user have to consider whether the model has seen seen similar things in the past, as it's really good at regurgitating variations but struggles with novelty.
Maybe we need better terminology. But AI right now is more like pattern-matching than anything I would label as "understanding", even when it works well.
> Even though Fred Brooks explained why in 1986. There are essential tasks and there are accidental tasks. The tools really only help with the accidental tasks.
I don't know this reference, so I have to ask: Was "accidental" supposed to be "incidental"? Because I don't see how "accidental" makes any sense.
Chapter 16 is named "No Silver Bullet—Essence and Accident in Software Engineering."
I'll type out the beginning of the abstract at the beginning of the chapter here:
"All software construction involves essential tasks, the fashioning of the complex conceptual structures that compose the abstract software entity, and accidental tasks, the representation of these abstract entities in programming languages and the mapping of these onto machine languages within space and speed constraints. Most of the big past gains in software productivity have come from removing artificial barriers that have made the accidental tasks inordinately hard, such as severe hardware constraints, awkward programming languages, lack of machine time. How much of what software engineers now do is still devoted to the accidental, as opposed to the essential? Unless it is more than 9/10 of all effort, shrinking all the accidental activities to zero time will not give an order of magnitude improvement."
From the abstract that definitely sounds like he meant "incidental": Something that's a necessary consequence of previous work and / or the necessary but simpler part of the work.
Brooks makes reference to this at some point in a later edition of the book, and about the confusion the word choice caused.
By accidental, he means "non-fundamental complexity". If you express a simple idea in a complex way, the accidental complexity of what you said will be high, because what you said was complex. But the essential complexity is low, because the idea is simple.
Anniversary edition, p182.
"... let us examine its difficulties. Following Aristotle, I divide them into essence - the difficulties inherent in the nature of the software - and accidents - those difficulties that today attends its production but that are not inherent"
I wonder why people no longer write technical books with this level of erudition and insight; all I see is "React for dummies" and "Mastering AI in Python" stuff (which are useful things, but not timeless)
I'm actually writing a book right now, Effective Visualization, and I'll explain why. It is a book focused on Matplotlib and Pandas.
I have almost a dozen viz books. Some written over 50 years ago.
While they impart knowledge, I want the knowledge but also the application. I'm going to go out and paint that bike shed. You can go read Tufte or "Show me the Numbers" but I will show you how to get the results.
Right there is your problem. Read the Mythical Man-Month and Design of Design. They are not long books and it's material that's hard to find elsewhere. Old rat tacit knowledge.
Buy and read the book. There is a reason the 25th aniversery eddition has been still in print for more than 30 years. It is a timeless combuter book that everyone should read and keep an their bookshelf.
Same with 4GLs, Visual Coding, CASE tools, even Rails and the rest of the opinionated web tools.
How many of those things were envisioned by futurists or great authors? This AI stuff is the stuff of dreams, and I think it’s unwise to consider it another go around the sun.
Until it’s actually AI and not
Machine Learning masquerading as AI because AI is the sectors marketing pitch, I would strongly hesitate considering it as anything other than a tool.
Yes, a powerful tool, and as powerful tools go, they can re-shape how things get done, but a tool none the less and therefore we must consider what its limits are, which is all OP is getting at and the current and known near future state suggests we aren’t evolving passed the tool state
This AI stuff? No, not really. The stuff of dreams is an AI that you can talk to and interact infinitely and trust that it doesn’t make mistakes. LLMs ain’t it.
The better tech often lowers the barrier for people to do things but raises the bar of users (and stakeholders for contract projects) expectations. It is plainly visible with web development where the amount of tooling has grown dramatically (both frontend and backend) to do things.
Like, for example, all the big-data stuff we do today was unthinkable 10 years ago, today every mid-sized company has a data team. 15 years ago all data in a single monolithic relational database was the norm, all you needed to know was SQL and some Java/C#/PHP and some HTML to get some data wired up into queries.
The most valuable thing I want AI to do with regards to coding is to have it write all the unit tests and get me to 100% code coverage. The data variance and combinatorics needed to construct all the meaningful tests is sometimes laborious which means it doesn't get done (us coders are lazy...). That is what AI to do, all the mind numbing draining work so I can focus more on the system.
Not necessarily. I have used LLMs to write unit tests based on the intent of the code and have it catch bugs. This is for relatively simple cases of course, but there's no reason why this can't scale up in the future.
LLMs absolutely can "detect intent" and correct buggy code. e.g., "this code appears to be trying to foo a bar, but it has a bug..."
How do you expect AI to write unit tests if it doesn't know the precise desired semantics (specification)?
What I personally would like AI to do would be to refactor the program so it would be shorter/clearer, without changing its semantics. Then, I (human) could easily review what it does, whether it conforms to the specification. (For example, rewrite the C program to give exactly the same output, but as a Python code.)
In cases where there is a peculiar difference between the desired semantics and real semantics, this would become apparent as additional complexity in the refactored program. For example, there might be a subtle semantic differences between C and Python library functions. If the refactored program would use a custom reimplementation of C function instead of the Python function, it would indicate that the difference matters for the program semantics, and needs to be somehow further specified, or it can be a bug in one of the implementations.
I've been having good results having AI "color in" the areas that I might otherwise skimp on like that, at least in a first pass at a project: really robust fixtures and mocks in tests (that I'm no longer concerned will be dead weight as the system changes because they can pretty effectively be automatically updated), graceful error handling and messaging for edgier edge cases, admin views for things that might have only had a cli, etc.
Are we not already more or less there? It is not perfect, to be sure, but LLMs will get you pretty close if you have the documentation to validate what it produces. However, I'm not sure that removes the tedium the parent speaks of when writing tests. Testing is not widely done because it is not particularly fun having to think through the solution up front. As the parent alludes to, many developers want to noodle around with their ideas in the implementation, having no particular focus on what they want to accomplish until they are already in the thick of it.
Mind you, when you treat the implementation as the documentation, it questions what you need testing for?
>AI is a fabulous tool that is way more flexible than previous attempts because I can just talk to it in English
In an era when UIs become ever more Hieroglyphic(tm), Aesthetical(tm), and Nouveau(tm), "AI" revolutionizing and redefining the whole concept of interacting with computers as "Just speak Human." is a wild breath of fresh air.
Programming and interacting with computers in general is just translation to a more restricted and precise language. And that what's make them more efficient. Speaking human is just going the other way and losing productivity.
It's akin to how everyone can build a shelter, but building a house requires a more specialized knowledge. The cost of the later is training time to understand stuff. The cost of programming is also training time to understand how stuff works and how to manipulate them.
An inefficient computer you can use is more productive than an efficient computer you can't use.
Most people can't use mice or keyboards with speed, touchscreens are marginally better except all the "gestures" are unnatural as hell, and programming is pig latin.
Mice and keyboards and programming languages and all the esoteric ways of communicating with computers came about simply because we couldn't just talk Human to them. Democratizing access to computers is a very good and very productive thing.
That's the thing. You don't communicate with computers. You use them. You have a task to do that the computer have been programmed for and what you want is to get the parameters of that tasks to the computer. And you learn how to use the computer because the tasks is worth it, just like you learn how to play a game because you enjoy the time doing it. The task supersedes the tool.
Generative AI can be thought as an interface to the tool, but it's been proven that they are unreliable. And as the article outlines, if it can get to 70% of the task, but you don't have the knowledge requires to complete it, that's pretty much the same as 0%. And if you have the knowledge, more often than not you realize that it just go faster on a zigzag instead of the straight route you would have taken with more conventional tools.
The first lead I worked with inoculated me to this. He taught me about hype trains long before the idea was formalized. He’d been around for the previous AI hype cycle and told me to expect this one to go the same. Which it did, and rather spectacularly. That was three cycles ago now and while I have promised myself I will check out the next cycle, because I actually do feel like maybe next time they’ll build systems that can answer why not just how, this one is a snooze fest I don’t need to get myself involved in.
Just be careful you don'tet your pendulum swing too much in the other direction, where your turn into an old curmudgeon that doesn't get excited by anything and that thinks nothing is novel or groundbreaking anymore.
AI is a potential silver bullet since it can address the "essential complexity" that Fred Books said regular programming improvements couldn't address. It may not yet have caused an "order of magnitude" improvement in overall software development but it has caused that improvement in certain areas, and that will spread over time.
> The tools really only help with the accidental tasks
I don't think that's really the problem with using LLMs for coding, although it depends on how you define "accidental". I suppose if we take the opposite of "essential" (the core architecture, planned to solve the problem) to be boilerplate (stuff that needs to be done as part of a solution, but doesn't itself really define the solution), then it does apply.
It's interesting/amusing that on the surface a coding assistant is one of the things that LLMs appear better suited for, and they are suited for, as far as boilerplate generation goes (essentially automated stack overflow, and similar-project, cut and pasting)... But, in reality, it is one of the things LLMs are LEAST suited for, given that once you move beyond boilerplate/accidental code, the key skills needed for software design/development are reasoning/planning, as well as experienced-based ("inference time") learning to progress at the craft, which are two of the most fundamental shortcomings of LLMs that no amount of scale can fix.
So, yeah, maybe they can sometimes generate 70% of the code, but it's the easy/boilerplate 70% of the code, not the 30% that defines the architecture of the solution.
Of course it's trendy to call LLMs "AI" at the moment, just as previous GOFAI attempts at AI (e.g. symbolic problem solvers like SOAR, expert systems like CYC) were called "AI" until their limitations became more apparent. You'll know we're one step closer to AI/AGI when LLMs are in the rear view mirror and back to just being called LLMs again!
Other options are available, for instance ploughing into a village because your second stage didn't light, or, well, this: https://youtu.be/mTmb3Cqb2qw?t=16
Most of the "you'll never need programmers again!" things have ended up more "cars-showered-with-chunks-of-flaming-HTPB" than "accidentally-land-on-moon", tbh. 4GLs had an anomaly, and now we don't talk about them anymore.
(It's a terrible adage, really. "Oops, the obviously impossible thing didn't happen, but an unrelated good thing did" just doesn't happen that often, and when it does there's rarely a causal relation between A and B.)
AI totally is a silver bullet. If you don't think so, you're just using it wrong and it's your fault. If you think that it takes you just as long or longer to constantly double-check everything it does, then you don't understand the P vs NP problem. </sarcasm>
Hardly, if you worked with the web in the mid 90’s, modern tooling is a much larger improvement than what LLMs bring to the table on their own. Of course they aren’t on their own, people are leveraging generations of improvements and then stacking yet another boost on top of them.
Programming today is literally hundreds of times more productive than in 1950. It doesn’t feel that way because of scope creep, but imagine someone trying to create a modern AAA game using only assembly and nothing else. C didn’t show up until the 70’s, and even Fortran was a late 50’s invention. Go far enough back and people would set toggle switches and insert commands that way no keyboards whatsoever.
Move forward to the 1960’s and people coded on stacks of punch cards and would need to wait for access to a compiler overnight. So just imagine the productivity boost of a text editor and a compiler. I’m not taking an IDE with syntax checks etc, just a simple text editor was a huge step up.
Well, even with more primitive tools people would crete an abstraction of their own for the game - even in very old games you will find some rudimentary scripting languages and abstractions.
Yes that's the point. You needed to do this (accidental) work, in order to do what you actually wanted to achieve. Hence there was less time spend on the actual (~business) problem and hence the whole thing was less productive
Oh I disagree. Like the GP, I’ve been round the block too. And there’s entire areas of computing that we take for granted as being code free now but that used to require technical expertise.
Django/Rails-like platforms revolutionised programming for the web, people take web frameworks for granted now but it wasn't always like that.
And PHP (the programming language) just before that, that was a huge change in "democratising" programming and making it easier, we wouldn't have had the web of the last 20-25 years without PHP.
From what I have seen LLMs are the worst (by far) in terms of gained productivity. I'd rate the simple but type correct auto complete higher than what I get from the "AI" (code that makes little sense and/or doesn't comply)
Supermaven recently suggested that I comment a new file with “This file belongs to {competitor’s URL}.” So, it’s definitely not at the point you can just blindly follow it.
That said, it’s a really nice tool. AI will probably be part of most developer’s toolkits moving forward the way LSP and basic IDE features are.
I wish my ide would type correct the llm. When the funchion doesn't exist look for one with a similar name (often case is differnt or someother thing), also show me the prarmeter option because the llm never gets order right and often skips one.
Going from punched cards to interactive terminals surely must have been a big productivity boost. And going from text based CAD to what is possible on modern workstations has probably also helped a bit in that field.
In that view I'd say the productivity boost by LLMs is somewhat disappointing, especially with respect to how amazing they are.
I think the field is too new and the successful stories too private atm. However I think the best apples to apples example in this context is Amz's codebase update project that they've blogged about.
From memory, they took some old java projects, and had some LLM driven "agents" update the codebase to recent java. I don't know java enough to know how "hard" this task is, but asking around I've heard that "analog" tools for this exist, but aren't that good, bork often, are hardcoded and so on.
Amz reported ~70% of code that came out passed code review, presumably the rest had to be tweaked by humans. I don't know if there are any "classical" tools that can do that ootb. So yeah, that's already imrpessive and "available today" so to speak.
Java is intent as code. It’s so verbose that you have to use an IDE to not go crazy with all the typings. And when using an IDE, you autocomplete more than you type because of all the information that exists in the code
quantifying programmer productivity has been a problem since its inception. lines of code is a terrible metric. so is Jira ticket points. I can tell you that using an LLM, I can make a chrome extension to put a div that says "hello world" at the top of every webpage far quicker than if I had to read the specifications of extension manifests and how to do it manually but how do you quantify that generically? how do you quantify that vs the wasted time because it doesn't understand some nuance of what I'm asking it to do, or when it gets confused about something and goes in circles?
The problem is not what ai can do rather most people in the workforce don't how to use the current generation of Ai. As the children that grew up with using chat gpt etc get into the workforce then only will we see the real benefits of AI.
Oh yeah, the "digital native" myth. I'm not convinced children using ChatGPT to do their homework will actually make them more productive workers. More likely it's going to have the opposite effect, as they're going to lack deeper understanding that you can build only through doing the homework yourself.
Really it's not about just using technology, but how you use it. Lots of adults expected kids with smartphones to be generally good with technology, but that's not what we're witnessing now. It turns out browsing TikTok and Snapchat doesn't teach you much about things like file system, text editing, spreadsheets, skills that you actually need as a typical office worker.
That's different from what I talking about it's the problem of inertia people already in jobs are used to doing them in a particular way. New curious driven people that get into the work force would optimize a lot of office work. A 10-12 year old that has learned how to use Ai from the very start will be using an AI that has 12-15 years of incremental improvements when he or she gets into the work force.
A lot of people here on hacker news disparage newer generations. But how many of you can run a tube based or punched based computer. So if you don't know are you an idiot?
All pieces are there, we just need to decide to do it. Today's AI are able to produce an increasing tangled mess of code. But it's also able to reorganize the code. It's also capable of writing test code, and assess the quality of the code. It's also capable to make architectural decision.
Today's AI code, is more like a Frankenstein's composition. But with the right prompt OODA loop and quality assessment rigor, it boils down to just having to sort and clean the junk pile faster than you produce it.
Once you have a coherent unified codebase, things get fast quickly, capabilities grows exponentially with the number of lines of code. Think of things like Julia Language or Wolfram Language.
Once you have a well written library or package, you are more than 95% there and you almost don't need AI to do the things you want to do.
There is a huge gap in performance and reliability in control systems between open-loop and closed-loop.
You've got to bite the bullet at one point and make the transition from open-loop to closed-loop. There is a compute cost associated to it, and there is also a tuning cost, so it's not all silver lining.
>Once you have a coherent unified codebase, things get fast quickly, capabilities grows exponentially with the number of lines of code. Think of things like Julia Language or Wolfram Language.
>Once you have a well written library or package, you are more than 95% there and you almost don't need AI to do the things you want to do.
That's an idealistic view. Packages are leaky abstractions that make assumptions for you. Even stuff like base language libraries - there are plenty of scenarios where people avoid them - they work for 9x% of cases but there are cases where they don't - and this is the most fundamental primitive in a language. Even languages are leaky abstractions with their own assumptions and implications.
And these are the abstractions we had decades of experience writing, across the entire industry, and for fairly fundamental stuff. Expecting that level of quality in higher level layers is just not realistic.
I mean just go look at ERP software (vomit warning) - and that industry is worth billions.
"AI is like having a very eager junior developer on your team"
That's a perfect summary, in my opinion. Both junior devs and AI tools tend to write buggy and overly verbose code. In both cases, you have to carefully review their code before merging, which takes time away from all the senior members of the team. But for a dedicated and loyal coworker, I'm willing to sacrifice some of my productivity to help them grow, because I know they'll help me back in the future. But current AI tools cannot learn from feedback. That means with AI, I'll be reviewing the exact same beginner's mistakes every time.
And that means time spent on proofreading AI output is mostly wasted.
A very eager junior developer who is supremely confident, always says yes, does trivial work in seconds but makes very critical mistakes in the difficult stuff and when you thought he was learning and improving, he forgets everything and starts from square zero again.
In my experience, if I'm looking how to do something pretty standard with an API I'm unfamiliar with, it's usually correct and faster than trying to trawl through bad, build-generated documentation that would rather explain every possible argument than show a basic example.
And in the case it's wrong, I will know pretty quickly and can fall back to the old methods.
> Like economists, who have predicted 7 of the last 3 recessions, AI knows 17 out of 11 API calls!
It's definitely been said before, but the fact that they're calling out these non-existent functions in the first place can tell library devs a lot about what new features could be there to take up the namespace.
I love this idea. In the past I stored our coding style guidelines and philosophy in our wiki. Putting it into git brings it closer to where it is used. Also, it makes it more easily accessible to AI tools, which is an added bonus.
Interesting idea. I have been using a SPECIFICATION.md and TODO.md to keep my models on track. What kind of stuff do you put in LESSONS.md that can't just live in the system prompting?
Nothing, that's roughly the same idea I think. it's just when I'm using Aider I don't really have a good way to feed a system prompt in, so I just put REPOPROMPT.md in the root folder.
TODO.md and FEATURE_TODO.md are also very valuable for keeping on track.
I dont think people (in this context) are suggesting replacing the junior developers with AI, but to treat the AI like a junior: to be clear with what you need, and to be defensive with what you accept back from them; to try and be conscious of their limitations when asking them to do something, and to phrase your questions in a way that will get you the best results back.
They might not be but using language which equates these generative LLMs with junior developers does allow a shift of meaning to actually equate juniors with LLMs, meaning they are the interchangeable, and therefore generative LLMs can replace juniors.
LLMs are advancing as well, just not from your/my direct input. Or from our direct input ( considering they learn from our own questions ) and from 100k others that are using them for their work.
Juniors today can learn exponentially faster with LLMs and don't need seniors as much.
Take me for example, I've been programming for 20 years, been through C, C++, C#, Python, JS, PHP but recently had to learn Angular 18 and Fastapi. Even though I knew JS and Python before hand these frameworks have ways of doing things I'm not used to so I've been fumbling with them for the first 100 hours. However when I finally installed Copilot and had a little faith in it I boosted my productivity 3-4x. Of course it didn't write everything correct, of course it used outdated angular instead of latest (which is why I was so reluctant to ask stuff for it at the start) but it still helped me a lot because it is much easier (for me) to modify some bad/outdated code and get it to where I want it than write it from scratch without the muscle memory of the new framework.
So for me it's been a godsend. I expect for stuff that's not as cutting edge as new framework oddities that appeared in the last 12 months it is even more helpful and % of it being correct would be way higher so for juniors that are doing say Python coding on frameworks that have at least 3-4 years and are stable enough the seniors would need to intervene much much less in correcting the junior.
> Juniors today can learn exponentially faster with LLMs and don't need seniors as much. [...] Take me for example, I've been programming for 20 years
You are not a junior, you already rely on 20 years of experience.
Last time i did any sort of web development was 20 ago, but i thought to try some C# (touched last time ~10 years ago) + Blazor for an idea i had and it took me a couple of days to feel comfortable and start making stuff. While i haven't written for the web in a very very long time, my experience with other tech helped a lot.
His experience is the same in mine , the juniors in our team are super productive in a way that realistically would not have been possible for them before these tools. They just don't get stuck that much anymore so they don't need the seniors as much. I do think the field will be somewhat commoditized in the coming decade.
The web, especially frontend feels far more foreign than any backend or "traditional" programming. The errors suck, sometimes you get no error and have no idea why it isn't working etc. So in a sense I feel like a junior
It's interesting because it actually endangers the junior dev job market in the present.
And in the near future the mid/senior level will have no replacements as we've under-hired juniors and therefore don't have a pipeline of 5YOE/10YOE/etc devs who have learned to stop being juniors.
I see it the other way, assuming these tools keep on improving you will only need junior developers as there's no point on knowing more than the basics about programming to get a job done.
You say this like it is incremental improvement needed, or that we can see signs of a major shift in capabilities coming. Yes, people are predicting this. People were predicting personal travel by jet pack at one point as well.
My favorite is when I gave chatgpt a serializer function that calls a bunch of "is enabled" functions and asked to implement those according to the spec, then hit enter before adding the actual spec to the prompt.
And it happily wrote something. When I proceeded to add an actual spec he happily wrote something reasonable which couldn't work, because it assumed all 'is_something' functions can be used as guard statements. Ah oh.
Funny, I think it's a perfect summary, but in a positive sense. Some of the tools you can modify the prompt, or include a .md file in context to help direct it. But even without that, I don't find it a waste of time because I have lower expectations. "This just saved me 15 minutes of typing out html+css for this form, so I don't mind taking 2 minutes to review and tweak a few things."
My experience is that junior devs write most of the code at a company and senior devs spend most of their time making sure the junior devs don't break anything.
Which seems to work pretty well, in my experience.
Which is something that in principle AI tools could do, i.e. learn from feedback is how they got created in the first place.
However the current generation of models needs a specific form of training set that is quite different from what a human would produce through direct interaction with the model.
For one it needs many more examples than a human would need. But also the form of the example is different: it must be an example of an acceptable answer. This way a model can compute how far it is from the desired outcome.
Further research in how to efficiently fine tune models will make this gap narrower and perhaps senior devs will be able to efficiently give learnable feedback through their normal course of interaction
Well the time isn't wasted - you get code! In my experience even with the added work of checking the AI's output, overall it is faster than without coding assistants.
I think one of OPs points is that it is more of a boost for juniors and a "tax" for seniors. The senior engineer wouldn't open a PR without cleaning up the code; the junior can't tell the difference.
> But for a dedicated and loyal coworker, I'm willing to sacrifice some of my productivity
probably the more we sacrifice of our own productivity the quicker they gain experience (and seniority) right? the only confusing thing that confused me personally in your statement was that they would have to be loyal. Isn't that something that one can only hope but must be proven over time. Meaning that at the time you trust that they turn out well you have no way of proving that they are "loyal" yet. Loyalty is nigh impossible to request upfront? I mean, ... you have to deserve it. And a lot can also go wrong on the way.
I call AIs "aggressive interns". They're fantastic, very fast, and eager... but often go off the rails or get confused. And as you noted, never learn.
Just the dialog with an AI I find instructive. Sometimes it suggests things I don't know. Often after 1-2-3 mediocre AI solutions I'll code up something that re-uses some AI code but has much better code that I write.
"AI is like having a very eager junior developer on your team"
I think this also applies to AI having an early or intermediate senior engineer on your team.
So in effect it would be having less engineers and probably 1 or 2 at best senior engineers and the rest are guiding the AI senior engineer in the codebase.
I didn't need to hire any senior engs for a while for my SaaS and only needed good juniors for 3 months.
Everyone in the future is going have access to senior engineers building projects.
Not only that, but one who is infected with terminal Dunning-Kruger syndrome. Of all the things that LLMs are great at, demonstrating a hopeless case of Dunning-Kruger has to be at the very top.
For the most part, engineering interview processes haven't adapted to this yet. I think a lot of engineering orgs are kind of head in the sand about this shift.
There is a surprising lack of focus on code reviews as part of that process.
A few months back, I ran into one company (a YC company) that used code reviews as their first technical interview. Review some API code (it was missing validation, error handling, etc.), review some database code (missing indices, bad choices for ID columns, etc.), and more.
I think more companies need to rethink their interview process and focus on code reviews as AI adoption increases.
I worry about 2 main pitfalls for junior devs, one more tractable than the other.
Firstly there is the double edged sword of AI when learning. The easy path is to use it as a way to shortcut learning, to get the juice without the pressing, skipping the discomfort of not knowing how to do something. But that's obviously skipping the learning too. The discomfort is necessary. On the flip side, if one uses an llm as a mentor who has all the time in the world for you, you can converse with it to get a deeper understanding, to get feedback, to unearth unknown unknowns etc. So there is an opportunity for the wise and motivated to get accelerated learning if they can avoid the temptation of a crutch.
The less tractable problem is hiring. Why does a company hire junior devs? Because there is a certain proportion of work which doesn't take as much experience and would waste more senior developers time. If AI takes away the lower skill tasks previously assigned to juniors, companies will be less inclined to pay for them.
Of course if nobody invests in juniors, where will the mid and senior developers of tomorrow come from? But that's a tragedy of the commons situation, few companies will wish to invest in developers who are likely to move on before they reap the rewards.
I think the tragedy of the commons problem for juniors has already existed for some time. Previously, companies were reluctant to hire juniors because they had a tendency to leave after a year or two, once you finished training them up. AI will just make the situation a lot worse.
Another reason companies hire juniors is because they cannot find/afford seniors. The demand that stems from this reason will increase over time when companies are not hiring "enough" juniors (because if we aren't hiring juniors we aren't making more seniors, so they become increasingly scarce and expensive).
Yes but then as all else this can easily be cyclic. Too few seniors to hire and they ask for ridiculous packages? Well lets train some of them in house, its not like the situation will explode overnight.
Weird times ahead, probably, but we will be fine, mostly.
which is a python binding of rust's csscolorparser created by Claude without me touching editor or terminal. I haven't reviewed the code yet, I just ensured that test cases really passed (on github actions), installed the package and started using it directly.
The readme even confuses itself, as the example shows rgba_255 returning a list and not a tuple. Oh well, I guess Claude was confused by the conventions between Rust and Python.
Also, all the checks of "if u8 < 255" will make me not want to use this library with a 10-foot pole. It screams "ai" or "I don't know what I'm doing" so much.
First one is due to me asking it to return a 4 tuple instead of a list for the rgba_255 specifically, I guess it didn't update Readme or other return values.
The second is an artefact of a test case failing, which it tried to fix it using this check. Thankfully not a correctness failure, only optimisation issue.
You're right though it's not worth publishing for general public.
Well this is a good experiment. I don't find your idea bad at all: use AI to autogenerate bindings to expose a library in another language. This would be a good usecase for AI as it's not complex (well, most of the times) and is a lot of boilerplate.
Publishing the repo is worth it, because it showcases what the AI can (and cannot) do, and it is not there yet. But as a real package to pypy, indeed less.
What gets me is that tools like SWIG exist, in that case a tool which started in the 1990s to read a C header file and autogenerate bindings for Python and other languages.
Or, JPype uses reflection to generate run-time bindings from Python to Java.
Why does it require AI and all of this infrastructure?
For the same reason people use AI for some coding tasks: it generates boilerplate without needing to be programmed by an human. SWIG needs to be adapted to every language, and only targets C libraries. What about Rust, like in this case?
This is a case where LLMs can be good, because they can infer quite a bit about the needed scaffolding, and can generate bindings between languages where no such tool exist. We don't need to write a tool for every combination of languages (or adapt one like SWIG to add a language).
But in the long run, depending on how many bindings we need, the cost of each binding may end up outweighing the upfront cost of a human writing the tool.
I didn't explain myself well enough then. In the specific case of Rust, why doesn't Rust already have a way to tell tools what the API is, and why aren't there already tools which convert that API to bindings for other languages?
We already saw from pcwelder's other responses that this AI generation required human involvement ("me asking it to return a 4 tuple instead of a list").
We can also see style questions, like how is_valid is a static method on the ColorParser class, rather than a module-level function as I would have done, so the API intrinsically cannot be automated.
Which I knew from my own experience in developing Python bindings to C libraries. If you want it to feel like Python, rather than C, you will have to put in the work. (In my case, to add class structure, to hook into Python's garbage collection, to match the iterator protocol, and so on.)
I've been saying for a while that formal verification seems like the obvious way forward here. Test cases are good but only cover a limited number of cases, whereas formal verification can cover all cases.
The biggest reason that we don't see formal verification being used more is that it takes some time to formally define the output of an algorithm and significantly more time to convince a theorem prover that your code is correct according to the pre- and post-conditions. If we can get LLMs to work well alongside formal verification tools then I think that would greatly improve their usefulness.
Here's a made up and trivial example of how I imagine this might look:
User: Write me a binary search over "type A_Type is array (Integer range <>) of Integer;"
LLM: Implementing the following:
function Search (A : A_Type; Target : Integer) return Integer with Pre => Sorted (A) and Target in A, Post => A(Search'Result) = Target;
First implementation:
function Search (A : A_Type; Target : Integer) return Integer is
Left : Integer := A'First;
Right : Integer := A'Last;
begin
while Left <= Right loop
declare
Mid : Integer := (Left + Right) / 2;
begin
if A (Mid) = Target then
return Mid;
elsif A (Mid) < Target then
Left := Mid + 1;
elsif A (Mid) > Target then
Right := Mid - 1;
end if;
end;
end loop;
end Search;
Compiler:
Phase 1 of 2: generation of Global contracts ...
Phase 2 of 2: flow analysis and proof ...
wrapper.adb:12:36: medium: overflow check might fail, cannot prove lower bound for Left + Right
12 | Mid : Integer := (Left + Right) / 2;
| ~~~~~~^~~~~~~~
reason for check: result of addition must fit in a 32-bits machine integer
wrapper.adb:12:45: info: division check proved
wrapper.adb:14:19: medium: array index check might fail
14 | if A (Mid) = Target then
| ^~~
reason for check: value must be a valid index into the array
LLM: Looks like there's a possible overflow for very large arrays, attempting new solution...
Etc. until a valid solution is found and the result is returned to the user.
The use of formal methods has historically been limited to very limited niche areas. The volume of code that is covered by formal verification is some ridiculously small percentage of the overall amount of code out there.
The reason is probably that it is too tedious/difficult and you need some rare skills to do it. And mostly the categories of bugs it eliminates are not problematic enough. Either way, the amount of people capable of writing code vastly outnumber the people capable of formally verifying that code. I know a lot of programmers without computer science backgrounds that definitely have never been exposed to any of this. I have been exposed to some of this. But that's 25 years ago. And the persons teaching me that lived out his career in academia without ever working on real code that mattered. A lot of this stuff is rather academic and esoteric.
Of course, LLMs could change this a quite a bit. A lot of programming languages are optimized for humans. Lots of programmers prefer languages that sacrifice correctness for flexibility. E.g. static typing is the simplest form of adding some formal verification to a language and a lot of scripting languages get rid of that because the verification step (aka. compilation) is somewhat tedious and so is having to spell out your intentions. Python is a good example of a language that appeals to people without a lot of formal training in programming. And some languages go the other way and are harder to use and learn because they are more strict. Rust is a good example of that. Great language. But not necessarily easy to learn.
With LLMs, I don't actually need to learn a lot of Rust in order to produce working Rust programs. I just need to be able to understand it at a high level. And I can use the LLM to explain things to me when I don't. Likewise, I imagine I could get an LLM to write detailed specifications for whatever verifiers there are and even make helpful suggestions about which ones to pick. It's not that different from documenting code or writing tests for code. Which are two things I definitely use LLMs for these days.
The point here is that LLMs could compensate for a lack of trained people that can produce formal specifications and produce larger volumes of such specifications. There's probably a lot of value in giving some existing code that treatment. The flip side here is that it's still work and it's competing with other things that people could spend time on.
That Java issue you mentioned is an example of something that wasn't noticed for 9 years; probably because it wasn't that big of a problem. The value of the fix was lowish and so is the value of preventing the problem. A lot of bugs are like that.
Formalism starts with intent and then removing ambiguity from that intent. Having intent is easy, removing it is not. Especially when you do not know the limitation of what you're using to materialize that intent.
Python is easy because it lets you get somewhere because the inputs will roughly be the set of acceptable inputs, so the output will be as expected, and you can tweak as things go (much faster for scripting tasks). But when you need a correct program that needs to satisfies some guaranteed, then this strategy no longer cuts it, and suddenly you need a lot more knowledge.
I don't think LLM would cut it, because it doesn't understand ambiguity and how to chisel it away so only the most essential understanding remains.
I already wrote on another thread already but do it again: copilot failed me for any serious task. Let it be refactoring of a bit more complex Java method or iac code. Everytime there are hidden quirks and failures that make it easier to just do it myself instead of searching for the needle for minutes…. This combined the fact that ai already hitting a wall in terms of scaling gives a good outlook what’s its predictive future seems to be: successful in the far future when we have quantum computing or the like…
Personally I've been mostly avoiding using AI tools, but I have friends and colleagues who do use or have used LLMs, at least they've tried to.
Those who seems to get the best results asks for a prototype or framework for how to do something. They don't expect to use the AI generated code, it's purely there as inspiration and something they can poke at to learn about a problem.
Most seems to have a bad experience. Either the LLMs doesn't actually know much, if anything about the subject, and makes up weird stuff. A few colleagues have attempted to use LLMs for generating Terraform, or CloudFormation code, but have given up on making it work. The LLMs they've tried apparently cannot stop making up non-existing resources. SRE related code/problems anecdotally seems to do worse than actual development work, but it feel to like you still need to be a fair good developer to have much benefit from an LLM.
The wall we're hitting may be the LLMs not actually having sufficient data for a large set of problems.
Ive found copilot quite good at writing tests. Describe the test, let copilot generate, review + fix up takes me around 5 minutes versus 20 or so to write them myself most of the time. Also very good at scaffolding out new features, basically an extremely quick junior as the article said.
I see the same things as Addy, though I'm not 100% sure it's something new happening because of AI assistants. I started learning programming in the late nineties as a 9-year-old sitting at a library paying 10 NOK for an hour of internet access (the librarians were sweet and "forgot" how long I was sitting at the computer because they saw how much I was enjoying it). And I did the exact same thing described in this article: I grabbed whatever code I could that did something, didn't know how to debug it, and at best I could slightly tweak it to do something slightly different. After a few years I got better at it anyway. I started pattern matching, and through curiosity I found out what more senior developers were doing.
Maybe the fact that I was just a kid made this different, but I guess my point is that just because AI can now write you a code file in 10 seconds, doesn't mean your learning process also got faster. It may still take years to become the developer that writes well-structured code and thinks of edge cases and understands everything that is going on.
When I imagine the young people that will sit down to build their own first thing with the help of AI, I'm really excited knowing that they might actually get a lot further a lot faster than I ever could.
I started learning how to program around 2010. I was learning Blender and wanted to use its game engine. It was python based, so after running a few examples found on some forum, I downloaded the Python interpreter and some tutorials/book. Maybe my learning process is different, but I don't enjoy spending a day tweaking things in the hope I will get something. I much prefer getting a book on the subject and start learning how to do it instead.
Yeah that's fair, I think everyone has their own learning style. I mostly felt a need to respond with a slightly more optimistic view on what this new technology means for juniors, in particular responding to the part that "[AI coding tools] can actually impede learning." Though to be fair to Addy, I like his advice on how to approach this, those are good tips.
This mirrors my own experiences with Claude with one caveat.
GenAI can get deeper into a solution that consists of well known requirements. Like basic web application construction, api development, data storage, and oauth integration. GenAI can get close to 100%.
If you’re trying to build something that’s never been done before or is very complex, GenAI will only get to 50% and any attempt to continue will put you in a frustrating cycle of failure.
I’m having some further success by asking Claude to build a detailed Linear task list and tackling each task separately. To get this to work, I’ve built a file combining script and attaching these files to a Claude project. So one file might be project-client-src-components.txt and it contains all the files in my react nextjs app under that folder in a single file with full file path headers for each file.
We’ll see how deep I get before it can’t handle the codebase.
In general it can get further the better the design/interfaces are. I find that if you can define your problem with a really clean set of interfaces, it can generally implement them perfectly. Most of the real thinking work is at the interfaces anyway, so this makes sense.
But for a really tricky logic problem, accurately explaining it in English to an LLM might be less natural than just writing the code.
> While engineers report being dramatically more productive with AI
Where are these people in real life? A few influencers or wannabes say that on Twitter or LinkedIn, but do you know actual people in real life who say they’re "dramatically more productive with AI"?
Everyone I know or talked to about AI has been very critical and rational, and has roughly the same opinion: AIs for coding (Copilot, Cursor, etc.) are useful, but not that much. They’re mostly convenient for some parts of what constitutes coding.
I'm one of them. I've been coding for 22 years and now I use copilot all day every day. It gets stuff wrong, but I only ask it things I expect it to do well at, and I find it easy ignore or verify the bad answers.
I've never used electron and I got a small electron project up and running considerably faster than I would have otherwise.
I did some consulting for a project written in Vue and I know React, I got a good summary of the differences in the concepts, how to layout and structure files, etc. I had to modify a PHP project that was hosting Vue and I used chatgpt to find out where to look in the project to point to where I needed to look in the code to make the changes in the code.
Just this morning I needed to use git bisect but I couldnt remember the exact syntax, I could have googled it and gone through the verbose documentation, the stackoverflow reply, or a long blog post. Instead, I got exactly what I needed back in seconds.
I had to develop a migration plan for another project, I already had a rough idea of what to do, but I asked chatgpt anyway because why not, it takes seconds. It came up with what I had thought of already, some things I didn't need, and some things I hadn't thought of.
I didn't realise we were being that pedantic about the phrasing. I had 3 months to evaluate a project, it took 1 month, I got asked to create a technical document for a migration in 2 weeks, it took 1 week. I wrote a prototype in a framework i didnt know and did it in less time than I could in a framework I know well without AI.
I've always been a backend developer, but started my own company about a year ago. From that, I had to become full stack and AI has helped dramatically with the learning and implementation. There's a lot of things which would have simply just taken far longer to understand or fix with just stackoverflow/google.
And that's just on the coding side, even more on the actual start up side!
Oh, I know of a handful of people that report this.
The quality of their work has gone down dramatically since they started being "faster", and it needs rewriting way more often than before LLMs existed. But they do claim they are now much faster.
I do freelance/custom dev work. I continue to bid projects as if AI didn't exist. Recently (particularly since Cursor came onto the scene) I'm finding I finish projects 50-60% faster than my estimates.
It has been utterly game-changing and time-saving. I question each new bid I do: Should I assume AI will continue to help this much, and adjust my pricing?
Yeah, it's pretty decent when you're doing the nuts and bolts scaffolding part of coding.
Something like creating an ORM database table class, where all the fields are predictable and it's just a matter of writing out the right fields in the right format.
It's also replaced the really stupid Stack Overflow queries, when Ruby and Python and Javascript have corrupted my brain and I can't remember whether it's bar.lower() or bar.lowercase() or bar.lcase() or lower(bar) or whatever.
I'd like to share a particular case showing the necessity of verifying AI's work.
Yesterday I asked o1-preview (the "best" reasoning AI on the market) how could I safely execute untrusted JavaScript code submitted by the user.
AI suggested a library called vm2, and gave me fully working code example. It's so good at programming that the code runs without any modifications from me.
However, then I looked up vm2's repository. It turns out to be an outdated project, abandoned due to security issues. The successor is isolated-vm.
The code AI gave me is 100% runnable. Had I not googled it, no amount of unit tests can tell me that vm2 is not the correct solution.
I agree with the author. My work involves designing simulations for control. Yesterday, I asked GPT-4o to write a python-only simulation for a HVAC system (cooling tower, chiller on the water side, and multiple zones with air handling units on the air side).
It wrote functions to separately generate differential equations for water/air side, and finally combined them into a single state vector derivative for integration. Easy peasy, right?
No. On closer inspection, the heat transfer equations had flipped signs, or were using the wrong temperatures. I'd also have preferred to have used structured arrays for vectors, instead of plain lists/arrays.
However, the framework was there. I had to tweak some equations, prompt the LLM to re-write state vector representations, and there it was!
AI-assisted coding is great for getting a skeleton for a project up. You have to add the meat to the bones yourself.
My problem with ai assisted coding is if I use it to scaffold hundreds of lines of code, I the. Need to review every single line because bugs can be so subtle. Imagine for example Go’s for loop reference footgun. I really don’t know if AI can handle these cases or similar cases that I don’t know about. So this is potentially more work than just writing for scratch.
Using it as a smarter autocomplete is where I see a lot of productivity boosts. It replaces snippets, it completes full lines or blocks, and because verifying block likely takes less time than writing it, you can easily get a 100%+ speed up.
I'm replacing things that I used to delegate to juniors with generated code. Because it's quicker and better. And there's a category of stuff I used to not bother with at all that I'm also taking on. Because I can get it done in a reasonable time frame. It's more fun for me for sure and I definitely am more productive because of it.
My feeling is that this stuff is not bottle-necked on model quality but on UX. Chat is not that great of an interface. Copy pasting blobs of text back to an editor seems like it is a bit monkey work. And monkey work should be automated.
With AI interactions now being able to call functions, what we need is deeper integration with the tools we use. Refactor this, rename that. Move that function here. Etc. There's no need for it to imagine these things perfectly it just needs to use the tools that make that happen. IDEs have a large API surface but a machine readable description of that easily fits in a context window.
Recently chat gpt added the ability to connect applications. So, I can jump into a chat, connect Intellij to the chat and ask it a question about code in my open editor. Works great and is better than me just copy pasting that to a chat window. But why can't it make a modification for me? It still requires me to copy text back to the editor and then hope it will work.
Addressing that would be the next logical step. Do it such that I can review what it did and undo any damage. But it could be a huge time saver. And it would also save some tokens. Because a lot of code it generates is just echoing what I already had with only a few lines modification. I want it to modify those lines and not risk hallucinating introducing mistakes into the rest, which is a thing you have to worry about.
The other issue is that iterating on code gets progressively harder as there's more of it and it needs to regenerate more of it at every step. That's a UX problem as well. It stems from the context being an imperfect abstraction of my actual code. Applying a lot of small/simple changes to code would be much easier than re-imagining the entire thing from scratch every time. Most of my conversations the code under discussion diverges from what I have in my editor. At some point continuing the conversation becomes pointless and I just start a new one with the actual code. Which is tedious because now I'm dealing with ground hog day of having to explain the same context again. More monkey work. And if you do it wrong, you have to do it over and over again. It's amazing that it works but also quite tedious.
> this stuff is not bottle-necked on model quality but on UX. Chat is not that great of an interface. Copy pasting blobs of text back to an editor seems like it is a bit monkey work. And monkey work should be automated.
I agree wholeheartedly and that's why I recommend cursor to the point I'm being called a shill for them. I have no relationship to them, but they've shipped the first product that actually addresses this!
They have a "small model" that takes a suggestion in the chat mode (provided by claude3.5 usually but o1 / 4o also work) and "magic merges" it into your codebase at the click of a button. It feels like such an easy task, but I bet it's not and a lot of tinkering went into it and the small mdoel they use. But the UX results are great. You start a chat, frame the problem, get an answer, hit "apply" and see it go line by line and incorporate the changes into your existing code.
> The other issue is that iterating on code gets progressively harder as there's more of it and it needs to regenerate more of it at every step.
You might know this already, but if you're using the chatbot interfaces it helps quite a bit to prompt it with something along the lines of "only give me the bits that changed". There is nothing worse than fine-tuning a tiny bit of some code you didn't bother writing yourself only to have the bot give you an entire prompt's worth of code.
Check out continue.dev or (my favorite) Zed. It allows you to generate code in patch format and the editor will apply the changes to the various files itself. Copy pasting from ChatGPT is so 2023.
I don't want to replace my IDE (intellij) because I actually like it and use a lot of what it does most of which is not supported in other tools. I want AI models to work with my tools.
Tools like this are alright if your expectations of an IDE are low. E.g. if you are happy using just VS Code or whatever. Unfortunately I'm used to a bit more than that. Jetbrains has been doing some of their AI stuff. But I haven't really looked at it that much.
Don't get me wrong; I actually use VS Code for some stuff. But it's just not a replacement for intellij. Not even close. So, not really looking for IDEs with even less features just so I can have some AI.
I've done a little bit of javascript but I was doing a hobby project with a raspberry pi. That meant learning python and linux. Chatgpt was invaluable in completing the project because although I know the very basics I don't know the libraries. The script Chatgpt provided based on my description included several libraries that are super common and useful but I had never heard of. So instead of trying to reinvent the wheel or endlessly googling until I found something that sort of did what I wanted and then looked at that code, I was able to get something that worked. Then I could adjust it and add features.
But the fact remains that if you did all those things yourself, although much slower and more frustrating at times, you would have learned and remembered more, and would have understood the task, the libraries, and your own program on a deeper level.
Which many times is the actual point of the exercise.
This article looks like a case of skating to where the puck is. Over the next 2-4 years this will change - the rate of improvement in AI is staggering and these tools are in their infancy.
I would not be confident betting a career on any those patterns holding. It is like people hand-optimising their assembly back in the day. At some point the compilers get good enough that the skill is a curio rather than an economic edge.
There was a step change over the last few years but that rate of improvement is not continuing. The currently known techniques seem to have hit a plateau. It's impossible to know when the next "Attention is All You Need" will materialize. It could be in 2025 or 2035.
o1 sort of techniques (tree search, test-time training type things) have not hit any recognizable plateau. There's still low hanging fruits all around.
We said the same thing 3 years ago and we still have errors on basic questions. I don’t know where people get their estimation from ? Their intuition ?
I think the tech chauvinism (aka accelerationism) comes from the crypto-hype era and unfortunately has been merged into the culture wars making reasonable discussion impossible in many cases.
Yes that was exactly my point. For AI to get there ? Sure. But how do they throw out a specific time prediction ? 2-3 years is specific. I mean it’s so specific that companies could make strategic decisions to incorporate it faster and there is a huge price to pay if it reveals itself not to be as trustworthy and bug free as much as we hoped and that could be a huge problem for the economy, for companies needlessly dealing with problems that cost money and time to solve. If people said « it’s amazing now and in the next decade it will be production ready and could be used with trust » then it casts a different outlook and different strategies will be taken. But because of the specific and close estimates everything changes even if every 3 years for the next 10 years they say it again. So yeah eventually we’ll get there one day
> the actual software we use daily doesn’t seem like it’s getting noticeably better.
%100 agree, I am testing o1 for some math problems. I asked that to prove convolution of two gaussian is gaussian. It gave me 3 page algebraic solution it is correct but not elegant nor good. I have seen more ingenious solution. These tools are really good at doing something but not good at doing like expert human as they claimed.
> I asked that to prove convolution of two gaussian is gaussian. [The solution] is correct but not elegant.
The goalposts are moving at the speed of light.
A few years ago if someone told us that you could ask a computer to compose a poem, critique a painting, write a professional 360 performance review based on notes, design a website based on a napkin sketch, prove convolution theorems, ... they would say that's a stretch even for sci-fi.
Now we have a single LLM that can do all of that, at some level of quality. Yet, the solutions are not elegant enough, the code not completely correct, the design is not inspired and the poem is "slop".
Eh, I mean that proof is all around in its training set. It's a fundamental, basic theorem in probability. You can put the same thing into a search engine and get a better solution, for [example](https://jeremy9959.net/Math-5800-Spring-2020/notebooks/convo...)
Nobody's saying that these aren't fascinating, just that it's not looking like their models are getting significantly better and better as all the hype wants you to believe.
Transformers + huge data set is incredible. But literally we've scraped all the data on the web and made huge sacrifices to our entire society already
There's no thought or reasoning behind anything LLMs generate, it's just a statistical pile of stuff. It's never going to generate anything new. It literally can't.
However, they are really good at highlighting just how many people will believe nonsense stated confidently.
> It's never going to generate anything new. It literally can't.
It can't on it's own. But why does it need to? As a tool, the user can provide insight, imagination, soul, or guidance.
And let's be honest, very little in our life, work, entertainment or science is completely new. We all stand on the shoulders of giants, remix existing work and reinterpreting existing work.
While so far I consider NNs to be mostly useless / harmful myself, don't you think that you might be overestimating what human beings themselves are doing ?
At this point if you believe they don't produce anything new is either of two things- a) having not given a fair shot to the current flagship models or b) you have a very narrow definition of new that is satisfied only by a very minuscule of human population.
If it's the latter, then agreed it doesn't produce anything new but so doesn't most of humanity and it doesn't need to, to be able to be of assistance.
tbh I am ai skeptic, I think agi cannot be achieved by only with deep learning, but really impressed with o1. I didn't like 4o, but I am against only overhype.
That's far from my experience.
Last time I used chatgpt it added many comments to explain the generated code and also attached rough explanatory text.
You can also just ask more details about something, or to list other approaches, explain tradeoffs, etc.
You can also ask for general questions to help you get started and find a good design.
To me it's about asking the right questions no matter the level of experience.
If you're junior it will help you ramp up at a speed I could no imagine before.
It's great doing things that has a tutorial or something out there, it falls flat when you try to do something slightly novel that doesn't have tons of articles on how to do it.
In my case, ChatGPT was quite useful to bootstrap a code for working with Apple's CoreAudio and sucked when I tried to make it write some code to do something special to my app.
I got the same experience when I tried to hammer out an app dealing with cryptography(certificates, keys etc) and was only able to solve the issue through Apple's developer forms thanks to to the infamous eskimo1. Then some Russian dev who had the same issues appeared and gave me the full context. ChatGPT, Claude etc. can't put together a Swift code that does things usually not done in Swift but done in C and can't leverage its knowledge on Cryptography to help itself out.
I had similar problems in nodeJS etc. too, the instance you do anything non-standart AI falls apart and you can see how stupid actually is.
> Last time I used chatgpt it added many comments to explain the generated code and also attached rough explanatory text.
Good way to know if a PR opened by someone on your team was LLM generated is to look for inane explanatory comments like
// Import the necessary dependencies
import { Foo } from './Foo';
// Instantiate a new Foo
const myFoo = new Foo();
// return the expected value
return true;
I don't know if this is because a lot of the training materials are blog posts and documentation, which are focused more on explaining how trivial cases work than "real world" examples, and thus the LLM has "learned" that this is what real code looks like.
I would not leave it to junior developers to ask the right questions.
In my experience, from a few juniors I've cooperated with, they get absolutely awful code from ChatGPT. Things like manual opening and closing of files with unnecessary exception handling that crudely reimplements stuff that's already in the standard library and should have been a one-liner, and sure, ChatGPT will happily suggest explanatory comments and whatnot, but it's like copying in material from a reference manual, i.e. provocatively useless.
To me it also seems like they don't learn much from it either. They've made much more progress from a bit of mentoring and heavy use of REPL:s and similar shells.
ScreenshotToCode wants me to pay for a subscription, even before I have any idea of its capabilities. V0 keeps throwing an error in the generated code, which the AI tries to remedy, without success after 3 tries. Bolt.New redirects to StackBlitz, and after more than an hour, there are still spinners stuck on trying to import and deploy the generated code.
Sounds like snake oil all around. The days of AI-enabled low-/no-code are still quite a while away, I think, if at all feasible.
I fear that in the goal of going to "manual coding" to "fully automated coding", we might end up in the middle, where we are "semi manual coding" assisted by AI which needs different software engineer skill.
I do many other things as a software engineer and writing code was always small part of it but time consuming.
Second most time consuming thing are meetings and explaining things to non technical people something like: "No Jerry we cannot just transfer 100GB of data to WebApp in each user browser for faster searching while also having it 'real time' updated".
Probably right. Feedback loops are interesting. An anecdote: I spent my career learning how to do things well... so that I could eventually be promoted to try to do it by proxy. Infinitely more difficult, garners no interest. Now we're all disappointed.
I'm not learning, just forgetting. Entirely different skills - exercise is important.
Yeah that's what I had in mind from the very beginning.
In fact I built a tool [1] that applies this principle for semi automated coding. It uses LLMs for generating code, but leave the context selection and code editing for human to complete.
I find this a sweet spot between productivity and quality of output.
I like how you put the context and prompts into the foreground. In so many tools, it’s invisible. We all know that context and prompts are there - the operation of LLMs is well known. Yet tools try and hide this and pretend that they are magic, instead of exposing control points and handles for the developer to use.
We have always been there since compilers were invented. AI is just another (rather big) iteration. Previous steps included API documentation tools, syntax highlighting, checking and formatting tools, refactoring tools, linters, and of course Stackoverflow :)
Yeah, a bit like the many frameworks that were supposed to make development trivial in X, many of which ended up being quite useful, but remained _additional_ skills to master in an ever-growing list of things we need to know to be able to do our jobs.
Something that concerns me the most is how are we going to train new generations. I teached a course at the university and many students just chatgpted everything, without any critical thinking.
It doesn't matter how many times you showed that it invented assembly instructions or wxwidgets functions, they insist on cheating. I even told them the analogy of going to the gym: you lift with your own strength, you don't use a crane.
And of course, it is evident when you receive students that don't know what is a function or that cannot complete simple exercises during a written test.
We learned by reading, lots of trial and failing, curiosity, asking around, building a minimal reproducible bug for stackoverflow... they (the ones that rely only on chatgpt and not their brain) cannot even formulate a question by themselves.
In coding but also generally with deep expertise in other fields too, I find LLMs help only if you deal with them adversarially. You can quickly get its stabs on the current state of topics or opinions about your ideas, but you’ve got to fight it to get to better quality. It tends to give you first the generic crap, then you battle it to get to really interesting insights. Knowing what to fight it on is the key to getting to the good stuff.
This is it. Always be distrustful and challenge the LLM. That along with providing ample context and guiding it towards a solution you have in mind at a high level is the trick to make LLM assisted coding work really well.
And once you're in that groove, and have built the intuition on what works and what doesn't and where you should challenge or follow up, productivity really goes up quite a bit.
To quote Memento: "Don't believe his lies." (though thankfully, as the LLMs advance, this is becoming less of an issue. Claude 3.5 Sonnet V2 is already a huge step ahead compared to where it once was).
So for example if it responded with something like this to me, I would point out Clever Hans the trick horse that answered math questions was giving the specific answer the trainer provided, but an LLM is yielding well-structured content it was trained on that me the prompter is trying to get out that I haven’t seen before. I know what it does not look like and I know pieces of what it does. So no not the same.
Then the LLM would take this and redo its answer. Unlike internet strangers, who usually respond unhelpfully to adversarial exchanges because they tend to be too fixated, for my purposes, on finding ways to make their original answer right.
I used to think that as well. However, I’ve been thinking recently about how I might be biased.
What I realized that the above quote and what follows in the article was true before AI as well. Juniors always missed things that made code production ready.
I don’t think ai in the hands of juniors will create worse code. I think it will spawn a lot more code that was as bad as it was before.
This is still fresh thinking so I don”t have a satisfying conclusion. :)
But what about the learning rate that converts juniors into seniors over time? This will slow that process for most people, and introduce artificial cliffs and holes based on what the AI can’t fix and the junior doesn’t have enough learned skills to figure out.
OTOH, you can learn so much from the AI, so I think if you just stop and read and ask questions, I think it is a lot easier for a junior to learn than before.
> I think it will spawn a lot more code that was as bad as it was before.
And that makes it even harder for seniors to teach: it was always hard to figure out where someone has misconceptions, but now you need to work through more code. You don't even know if the misconceptions are just the AI misbehaving, the junior doing junior things, or if the junior should read up on certain design principles that may not be known to the junior yet. So, you end up with another blackbox component that you need to debug :)
I hope so. The problem is I learnt good design despite the popular "Clean Code" prevalance in the world, by actual experience. But if AI is doing most of the coding and if that code mostly works, would there be an incentive for the junior devs to try out different ideas?
I've been wondering about this. They'll still get that hard-won engineering wisdom that comes from watching AI agents code badly, deploy it, and have it fail---as much as we'd like it to be otherwise, engineering wisdom often comes from experience, not just from hearing stories from senior devs.
I guess the question is whether this role is still employable.
It's not really second hand. The LLM didn't merge and deploy the code. The developer did. The LLM won't fix the bad push, the developer will have to. This is no different than a more junior dev copying and pasting a snippet from Stack Overflow that they partially understand without accounting for the edge cases or sometimes even their specific use case. Experience comes from failure and LLMs will help you fail and potentially help you recover from it just like any other resource developers have been using for years.
I’ve found it effective in the past as a manager to assign the implementation of a feature, end to end to a junior developer, and basically rubber stamp their PRs.
Our company has a culture of expecting the person who wrote the code to support it, and so if it’s poorly written, they inevitably have to learn to fix it, and build it back in a way that can prevent issues in the future.
Obviously care has to be taken to assign the right projects with the right level of guidance and guard rails but when done well, people learn quickly.
I think the same spirit can be applied to AI generated code
Of course, the question remains of whether companies that buy into AI will hire a sufficient stream of junior developers so that some of them will graduate into seniors.
Between the pandemic (no real mentorship, terrible discipline, rampant cheating) and the productivity gains for seniors, the next generation of juniors will need a lot of help.
I often use the AI as a starting point and don’t use it unless I understand it. As a jr dev I think you just have to be smart and do the same. I’ve learned new ways to do things from the AI output, I imagine jr devs would be able to too.
I dunno. They’re starting with this as their default universe, right at the age where you pick up skills fastest. I’m worried about my kid but also the dinosaurs I know, who don’t even know there’s an asteroid.
I had a realisation today: AI gives you more shots at goal. When I was learning (the game) go, being able to play a lot of games against the computer in quick succession built intuition faster. When coding I can try more ideas, faster. Kids these days can make apps faster than you can have lunch. They’ll do it differently but they’ll learn faster than we will.
I have worked with juniors. Those who use AI copy bad code without learning anything. Those who learn and will become seniors are not using AI. We will have a massive reduction in the amount of seniors (good for me, not good for programming in general).
Yeah. I don't know junior devs, outside of those motivated by an insatiable quest for knowledge and never-ending curiosity, are ever going to close the skill gap when they use AI tools to shit out circa ~2021 best practices from stackoverflow.
They won't. Just as most modern devs can't edit assembly and would take days to write a bare bone network communcation, they won't need to learn certain things we did. And they will excell in other skills, making some old aged senior devs obsolete.
A senior that uses modern dev tool chains will allways have a huge edge. That has allways been true. But that senior relying only on their hard earned knowledge will become the kind of dinosaurs we knew when we started.
Yea they can't edit assembly because they have reliable tools that work 100% of the time, always. They don't have to manually inspect the output of the assembler every time they write any code. This is not even close to the same thing as LLMs.
They are not f*cked. They have a free tutor that they can ask any time, who will always do his best to help.
My son is currently studying engineering, and whenever he is stuck at anything (construction, math, programming, mechanics) he fires up ChatGPT and asks for help. In maybe 90% AI gives him a hint so he can continue, which an extremely short feedback cycle. In the remaining 10% he will have to ask his human tutor at university, who is usually available a few days later. And it is not blindly following the AI's advice, but rather picking ideas from it. It is actually pretty awesome to see what opportunities there are, if AI is not simply used for cheating.
One impressive example he showed me was feeding a screenshot of a finished (simple) construction drawing to the AI and asking for potential errors. The AI replied with a few useless, but also with one extremely helpful suggestion, which helped him eradicate the last mistake in his drawing. I am still not sure if those were generic suggestions or if the AI was able to interpret the drawing to a certain degree.
If they use it as a tutor or as a glorified Google search then that's okay. The problem is if they start using code generation tools which AI provides and directly use that into the code base.
As an experiment, at my work I've stopped using all AI tools and went back to my pre-ai workflows. It was kind of weird at difficult at first, like maybe having to drive without GPS navigation, but I feel like I'm essentially at pre-AI usage speed.
This experiment made me think, maybe most of the benefit from AI comes from this mental workload shift that our minds subconsciously crave. It's not that we achieve astronomical levels of productivity but rather our minds are free from large programming tasks (which may have downstream effects of course).
i usually say this about all assistive ai, not just coding. you still need a close-to-expert human at the keyboard who can detect hallucinations. a great answer can only be deemed so by someone already very knowledgeable in the technical / deep subject matter.
At least the bread making machine is predictable. But you now have shitty bread, and a technician that has no knowledge of whether or not the bread is acceptable or not. Exactly like LLMs, we're throwing away the past and forget why it was this way.
I just happen to live in Singapore at the moment. German supermarket (factory produced) bread is also better than most of what you can get in the UK.
It's just that German consumers demand a certain level of quality in their sourdough, and the market is big enough for people to build machines to deliver that quality at a good price.
Yes, bread here in Singapore is a bit sad. (But we got lots of other great food options to make up for that.)
Making bread is like implementing an algorithm, the machines are designed by humans, the parameters are dial in by humans, they are serviced/maintained by humans.
The technicians are monitoring the results. I don't think the analogy is any good.
Nope, that's quite an opposite.
Machines are designed to be as repeatable as possible with deterministic controls. Food engineer then monitors some meta parameters to gauge process deviations.
Whereas GPTs are built to be as unrepeatable as possible with non-deterministic controls.
and yet the food industry hasn't mastered the sourdough.
So you end up with factory white bread - amazingly fluffy, stores for unusually long without going stale and very shelf stable without refrigeration, has little to no nutrition, but tastes amazing.
it's because the type of product that is suited for industrialized, low skill but high automation production is a very different product from artisanal production (which is what sourdough is - you can't easily use automation for sourdough). I reckon ai coded products will have similarities.
No, it's full of carbohydrates, it might be lacking B vitamins and what not but energy/nutrition it does have. In some cases it would be sweetened with sugar (so both glucose/fructose)
What you explain is mostly the demand, though, in north europe the black bread (incl. rye) is common for instance.
What’s amazing to one person might not be to another. Is it rich and nuanced like a well-made, hand-crafted German sourdough bread (hard to get these days, nearly impossible in South Africa), or just overly sweet and processed?
If "other countriers" is "the UK" [which you mentioned in another comment] then that makes perfect sense: UK bread is uniformly shite, whether artisanal or not, so much so that I can tell you the one place where I've had decent bread in the last ten years (a sandwich shop in Canary Warf, in the train station; don't know if it's still there).
The bread I make at home is order of magnitudes better than any of the bland, dead, oversalted bread I get at local supermarkets in the UK. And I'm nowhere near a baking enthusiast, I just make it at home so I can eat tolerably good bread.
But try, say, France, or Italy, or Spain, or Greece. Just go to a bakery -if you can figure out which ones make the dough in house (in France there are rules for this). And then we can talk about mass-produced German sourdough.
Although I bet the Germans make great pumpernikel.
Yes, the UK is part of the other countries. However, I managed to get good bread in London. So it's not completely uniformly bad; just generally bad.
(They had some great Ficelle at a small bakery in one of those markets under the rail arches near the Bermondsey beer mile. I think it was 'Little Bread Pedlar', but don't quote me on this. Their other baked goods were also tasty. But this was in 2017.)
> The bread I make at home is order of magnitudes better than any of the bland, dead, oversalted bread I get at local supermarkets in the UK.
Interesting that you complain about oversalting. We put quite a lot of salt into our wheat/rye-mixed sourdough here in Singapore; partially for taste but also partially to retard the rapid fermentation you get in the local climate.
> But try, say, France, or Italy, or Spain, or Greece. Just go to a bakery -if you can figure out which ones make the dough in house (in France there are rules for this). And then we can talk about mass-produced German sourdough.
You can also get artisanal bread in Germany, and you can get arbitrarily fancy there. If you are in Berlin, try Domberger Brotwerk. (Their yeasted cakes and (open) sandwiches are also great.)
You can get decent-ish bread in the countries you mentioned, though I think it's all rather white and wheat-y? I prefer at least some rye mixed in. (So I might prefer a German factory produced Mischbrot over an artisanal white wheat; even though the latter might be a better example of its style.)
My point is not that German factory produced bread is the best bread ever. It is not. My point is that it's decent. Decent enough to deny the statement 'and yet the food industry hasn't mastered the sourdough.'
>> However, I managed to get good bread in London. So it's not completely uniformly bad; just generally bad.
Well, OK, you can find good bread if you get lucky and look for it really hard, but the thing is that the British don't really understand what good bread means. I'm sorry to be racist. I find the same thing about coffee and about most food. The British... they try, right? London is full of posh restaurants. But I really don't think they get it.
>> You can get decent-ish bread in the countries you mentioned, though I think it's all rather white and wheat-y?
You get a whole lot more than "decent-ish" bread in the countries I mentioned! And you don't need to go looking for "artisanal" bread. To my understanding that's a term that's applied to bread made in the UK or US because ordinary bread sucks. But the same is not needed in, e.g., France where there are rules for "pain tradition" ("bread made to tradition"; nothing to do with BDSM :| ) that basically enforce that the bread is made by the baker on the day it is sold. This is a French language site that explains the rules:
To summarise, the dough can't be refrigerated, the bread must be baked on premise and then there's some restrictions on the ingredients (e.g. no additives except fungal amylase).
Btw having rules like that is a very French thing. The French (well, some of them) are very picky about their food and so they have all sorts of standards like AOP (which was a French thing before it was an EU thing) for cheese, wine, pork products and everything else that you can eat really. And that's a good thing and it works: you really should try the bread in France. I get the feeling you haven't - no offence.
Other places like Italy and Greece may not have the same stringent rules so you find more variation (as in all things- e.g. coffee: good in Italy and Greece, passalbe in France, I wouldn't drink it in Belgium or Germany) but for whatever historical and cultural conditions in those countries you're very likely to get very good bread in any random bakery you walk in to.
Like you say white is the mainstay, but in Greece I find that in the last few years that has changed a good deal. Even out in the boondocks where I stay you can find like six or seven varieties of bread per bakery, with white the minority really. My local area has three bakeries, wall-to-wall and the two sell wholemeal, spelt and rye, with and without sourdough. That's partly thanks to the many Albanians who have migrated to Greece in the last few decades and who are master bakers (and stone masons to boot). Also: heavenly pies. Oh man. Now I want one of the "kourou" spinach pikelets with spelt from the Albanian bakery and I'm stuck in the UK :(
Btw, that Albanian bakery also makes bread without salt. In a couple different varieties. I've tried their wholemeal sourdough (I have family with health issues so). Not great but eh, it's without salt. Greece gets very hot in the summer (40+ degrees is unsurprising) but the salt-less bread works just as fine. After all, this is modern times: we can control the temperature and humidity of enclosed spaces, yes? Salt is not needed for preservation anymore, it's now only there for the taste. So I'm very suspicious of industries that claim they can't reduce the salt content of their products "because preservation". As far as I'm concerned, any such claims make me suspicious of a cover-up; specifically that extra salt is used to cover up poor ingredients and poor production.
The term you are looking for might be something like 'culturalist'?
> France where there are rules for "pain tradition" ("bread made to tradition"; nothing to do with BDSM :| ) that basically enforce that the bread is made by the baker on the day it is sold.
Yes, but that's still white wheat bread.
> To summarise, the dough can't be refrigerated, the bread must be baked on premise and then there's some restrictions on the ingredients (e.g. no additives except fungal amylase).
We do some of these things at home, they don't prevent you from making good bread.
> Btw having rules like that is a very French thing. The French (well, some of them) are very picky about their food and so they have all sorts of standards like AOP (which was a French thing before it was an EU thing) for cheese, wine, pork products and everything else that you can eat really. And that's a good thing and it works: you really should try the bread in France. I get the feeling you haven't - no offence.
I've had French bread. It's good for what it is, but it's rather limited. They don't even like rye.
These mandatory rules seem a bit silly to me. (The Germans also really like them.) If you want to make something that conforms to some arbitrary rules, you should be allowed to and be allowed to label it as such, but other people should also be allowed to use whatever ingredients and processes they like.
(I'm still sore about Bavaria forcing their beer purity law on our tasty North Germany beers. But I guess that was the concession we made to get them to join the Prussian-led German Reich.)
> Btw, that Albanian bakery also makes bread without salt.
Yeah, that's a mistake in my opinion.
> Not great but eh, it's without salt.
You seem to think being without salt is a benefit?
(From what I can tell, there are some people with specific health problems for whom salt might be a problem. But normal healthy people do just fine with salt, as long as they drink enough liquids---which the salt makes you want to do naturally anyway. Salt is especially important in your diet if you sweat a lot.)
> After all, this is modern times: we can control the temperature and humidity of enclosed spaces, yes? Salt is not needed for preservation anymore, it's now only there for the taste.
Well, if you want to live in harmony with the local environment, you'll go with salt rather than aircon. So in addition to helping slow down the fermentation, the salt and sourness also help our bread last longer once it's baked here in Singapore.
I think the same can be said about AI-assisted writing…
I like the ideas presented in the post but it’s too long and highly repetitive.
AI will happily expand a few information dense bullet points into a lengthy essay. But the real work of a strong writer is distilling complex ideas into few words.
I think the real problem is that people are misunderstanding what programming is: understanding problems.
The hard truth is that you will learn nothing if you avoid doing the work yourself.
I'm often re-reading ewd-273 [0] from Dijkstra, The programming task considered as an intellectual challenge. How little distance have we made since that paper was published! His burning question:
> Can we get a better understanding of the nature of the programming task, so that by virtue of this better understanding, programming becomes an order of magnitude easier, so that our ability to compose reliable programs is increased by a similar order of magnitude?
I think the answer AI-assistants provide is... no. Instead we're using the "same old methods," Dijkstra disliked so much. We're expected to rely on the Lindy effect and debug the code until we feel more confident that it does what we want it to. And we still struggle to convince ourselves that these programs are correct. We have to content ourselves with testing and hoping that we don't cause too much damage in the AI-assisted programming world.
Not my preferred way to work and practice programming.
As for, "democratizing access to programming..." I can't think of a field that is more open to sharing it's knowledge and wisdom. I can't think of a field that is more eager to teach its skills to as many people as possible. I can't think of any industry that is more open to accepting people, without accreditation, to take up the work and become critical contributors.
There's no royal road. You have to do the work if you want to build the skill.
I'm not an educator but I suspect that AI isn't helping people learn the practice of programming. Certainly not in the sense that Dijkstra meant it. It may be helping people who aren't interested in learning the skills to develop software on their own... up to a point, 70% perhaps. But that's always been the case with low-code/no-code systems.
>The hard truth is that you will learn nothing if you avoid doing the work yourself.
I understand where you're coming from, but this would imply that managers, product people and even tech leads don't learn anything from working on a project, which would strongly go against my experience. It is absolutely possible to delegate implementation details but stay close to the problem.
Do you feel that by using an AI-assisted editor you are doing less of "the work" of building a product than you would by delegating it to another developer? That is not my experience at all.
"Get a working prototype in hours or days instead of weeks"
This is nothing new. Algorithmic code generation has been around since forever, and it's robust in a way that "AI" is not. This is what many Java developers do, they have tools that integrate deeply with XML and libraries that consume XML output and create systems from that.
Sure, such tooling is dry and boring rather than absurdly polite and submissive, but if that's your kink, are you sure you want to bring it to work? What does it say about you as a professional?
As for IDE-integrated "assistants" and free floating LLM:s, when I don't get wrong code they consistently give suggestions that are much, much more complicated than the code I intend to write. If I were to let those I've tried write my code I'd be a huge liability for my team.
I expect the main result of the "AI" boom in software development to be a lot of work for people that are actually fluent, competent developers maintaining, replacing and decommissioning the stuff synthesised by people who aren't.
It seems most developers that are hyping AI capabilities have never used a proper IDE like IntelliJ IDEA. The productivity boost compared to using a text editor is real. Most experienced programmers that use a text editor subscribe to Unix as an IDE philosophy or have built their own integration using Emacs or what's not. Code generation is not an issue as I have an idea of what to write (If I don't, I should go learning about it instead of writing code). The issue is boilerplate (solved by code snippets and generators), syntax mistakes (highlighter and linting), and bugs (debugger and console log).
Probably true for at least some of them. Having crunched out prototypes by both more formal and ad hoc code generation in a bunch of languages, and also having inherited some mature applications that started that way, I'm not really impressed with the notion of putting down some foundations fast.
If you're building something possibly serious, you expect to be hacking away for years anyway. Whether it takes you one or five weekends to make something for recruiting or financing or whatever doesn't actually matter much, it's not a very good sales pitch.
I think a lot of the hype is from managerial people, the kind that have been swayed by promises from people selling RAD and "low code" and so on over the decades. And yeah, you can put your non-technical employees to work building things if you want, but when it breaks your profits are at risk anyway and the consultants you need to hire won't be cheap.
Using AI-assisted coding is like using an exoskeleton to lift things. Makes your life easy, you gradually lose strength because your muscles work less and when it breaks down, you break down too, because you no longer have the strength you used to have.
As someone who often forgets the syntax for a for loop in a language I've been using every day for 10 years, for some people the muscles were never there in the first place.
I don't really want to, which was kind of my point. I don't want to spend 5 days in the gym to lift a heavy box once a year, it just doesn't make sense.
Which tool can actually help coding and refactoring, not just autocomplete? Copilot plugin for Jetbrains IDE can only suggest source to copy paste or at most replace single snippet I selected.
What I'd like to do is to ask "write me libuv based event loop processing messages described by protobuf files in ./protos directory. Use 4 bytes length prefix as a frame header" and then it goes and updates files in IDE itself, adding them to CMakeLists.txt if needed.
That would be an AI assisted coding and we can then discuss its quality, but does it exist? I'd be happy to give it a go.
similar to how we have language servers for code analysis and syntax highlighting, we need to have AI assist server, so that any IDE can pull instruction what to do with what file
Great article, Addy gets to the core of it and explains in a non-biased (pro or con), non-hype way. The examples, patterns and recommendations match what I've seen pretty well.
I've been working on an agentic full-app codegen AI startup for about a year, and used Copilot and other coding assistance tools since it was generally available.
Last year, nobody even thought full app coding tools to be possible. Today they're all the rage: I track ~15 full codegen AI startups (what now seems to be called "agentic coding") and ~10 coding assistants. Of these, around half focus on a specific niche (eg. resolving github issues, full-stack coding a few app types, or building the frontend prototype), and half attempt to do full projects.
The paradox that Addy hints at is that senior, knowledgeable developers are much more likely to get value out of both of these categories. For assistants, you need to inspect the output and fix/adapt it. For agentic coders, you need to be able to micromanage or bypass them on issues that block them.
However, more experienced developers are (rightly) wary of new hyped up tools promising the moon. It's the junior devs, and even non-developers who drink the kool aid and embrace this, and then get stuck on 70%, or 90%... and they don't have the knowledge or experience to go past. It's worse than useless, they've spent their time, money, and possibly reputation (within their teams/orgs) on it, and got nothing out of it.
At the startup I mentioned, virtually all our dev time was spent on trying to move that breaking point from 50%, to 70%, to 90%, to larger projects, ... but in most cases it was still there. Literally an exponential amount of effort to move the needle. Based on this, I don't think we'll be able to see fully autonomous coding agents capable of doing non-toy projects any time soon. At the same time, the capabilities are rising and costs dropping down.
IMHO the biggest current limit for agentic coding is the speed (or lack of) of state-of-the-art models. If you can get 10x speed, you can throw in 10x more reasoning (inference-time computing, to use the modern buzzwords) and get 1.5x-2x better, in terms of quality or capability to reason about more complex projects.
Today an AI tool let me build a new tool from scratch.
I published it to a git repo with unit tests, great coverage, security scanning, and pretty decent documentation of how the tool works.
I estimate just coding the main tool would have been 2 or 3 days and all the other overhead would have been at least another day or two. So I did a week of work in a few hours today. Maybe it did 70%, maybe it did 42.5%, either way it was a massive improvement to the way I used to work.
Can you please capture the process next time? Make a screencast? Apparently not everyone is in tune with AI. It keeps doing something else than they want, resulting in frustration. Happens to me often.
Yeah, there was one point where I told it there was an error and it said "Oh, my mistake, I'll publish the update to the repository" and I said "Did you publish the update, I don't see it" and it said "yes" and I said "How did you do that? I don't think you can use those tools" and it said "Oh yeah, my mistake, I am not able to access external tools like github and make changes for you but I can share the changes with you"
I am paraphrasing of course. It was a little comical that all the sudden it switched from typing the artifacts to trying to tell me it committed the changes directly.
I realized today that the average person types half my typing speed. I wonder if this factors in people's tendency to use AI? Typing takes too long so just let the AI do it?
In some ways, I'm not impressed by AI because much of what AI has achieved I feel could have been done without AI, it's just that putting all of it in a simple textbox is more "sleek" than putting all that functionality in a complex GUI.
Most of my time isn't spent coding. It's spent designing, discussing, documenting, and debugging. If AI wrote 90% of my code for me I'd still be busy all day.
Surely not 70% but like 5-10% might be a better ballpark figure. Coding or generating with LLMs is just part of the problem and always the fastest part of software building. All the other things eat disproportionately larger amount of time. QA, testing, integration testing, making specs, dealing with outages, dealing with customers, docs, production monitoring etc. etc. It would be cool if we get AI involved there especially for integration testing though.
I really dislike the entire narrative that's been built around the LLMs. Feels like startups are just creating hype to milk as much money out of VCs for as long as they can. They also like to use the classic and proven blockchain hype vocabulary (we're still early etc.).
Also the constant antropomorphizing of AI is getting ridiculous. We're not even close to replacing juniors with shitty generated code that might work. Reminds me of how we got "sold" automated shopping terminals. More convenient and faster that standing in line with a person but now you've got to do all the work yourself. Also the promises of doing stuff faster is nothing new. Productivity is skyrocketing but burnout is the hot topic at your average software conference.
Out of curiosity, I tried to use freely available LLM to generate simple Python tests. I provided the code and specified exactly what requirements I want to be tested. What I found out is that initially it generates repetitive, non-DRY code so I have to write propmts for improvement like "use parametrization for these two tests" or "move this copy-paste code into a function". And it turns out that it is faster to take initial version of code and fix it yourself rather than type those prompts and explain what you want to be done. And worse, the model I was using doesn't even learn anything and will make the same mistakes the next time.
But, this was a generic LLM, not a coding assistant. I wonder if they are different and if they remember what you were unhappy with the last time.
Also LLMs seem to be good with languages like Python, and really bad with C and Rust, especially when asked to do something with pointers, ownership, optimization etc.
When the AI boom started in 2022 I've been already focused on how to create provably, or likely correct software on budget.
Since then, I've figured out how to create correct software fast, on rapid iteration. (https://www.osequi.com/)
Now I can combine productivity and quality into one single framework / method / toolchain ... at least for a niche (React apps)
Do I use AI? Only for pair programming: suggestions for algorithms, suggestions for very small technical details like Typescript polymorphism.
Do I need more AI? Not really ...
My framework automates most part of the software development process: design (specification and documentation), development, verification. What's left is understanding aka designing the software architecture, and for that I'm using math, not AI, which provides me provably-correct translatable-to-code-models in a deterministic way. None of these will be offered by AI in the foreseeable future
I have friends who are building products from scratch using tools like Cursor. It’s impressive what someone who is already an expert developer can do. What I don’t see (yet) are these tools delivering for non developers. But this as just a matter of time.
I see a lot of devs who appear to be in a complete state of denial about what is happening. Understandable, but worrying.
I've been a dev for 20 years. I recently watched a YT video of a non-developer putting together an iOS app, from scratch, using Cursor Composer. I can't vouch for the legitimacy of his claim at not being a dev, but some of the language used to describe things definitely suggested they were not.
Anyway, it was pretty impressive. I decided, having never used Swift and having never built a 2D iOS game, to give it a go myself. In just a couple of evenings, I have a playable prototype that I confidently say would've taken me weeks to get to on my own.
And I'm learning Swift. Reading open-source projects or Swift tutorials is one thing , but seeing code written to satisfy a prompt request - that provides an entirely different level of early comprehension.
I am not a professional software engineer but LLMs have showed me how hard software engineering is at the overall level of a system.
For me, it feels like being a magic skilled carpenter with the ability to build a giant building, but with no idea what architects do to make blue prints.
Just end up building a useless mess of wood and nails that eventually gets burned down.
I am actually not impressed by the new o1 at all. I think we might be in denial of how progress is slowing down.
All AI proponents use this phrase repeatedly. Yet I still can’t get these tools to stop making up APIs that don’t exist, and they hey constantly produce face-palm security issues like hard-coded credentials and SQL injection.
I built a full featured Laravel crud app recently with probably 10 tables, auth, users, ‘beautiful’ tailwind styling, dark/light mode button, 5 different tabs, history function, email functions. 99% ai generated code. I almost didn’t even look at the code, just run some tests and make sure I’ve got functionality and no unexpected ‘normal’ bugs like min/max/0. Took me 15-20 hours with Windsurf. Windsail. Waveshark. Whatever that nice VSCode ai editor skin is called (completely forgettable name btw). It’s completely blowing my mind that this is even possible. There were of course some frustrating moments and back/forth (why did you change the layout of the list? I just wanted a filter…), but overall phenomenal. Deploying it shortly because if it dies, so what, this was a free job for a farm stand anyway. =)
Thinking that AI assistants are going to make programmers better, as opposed to just faster, is liking thinking hiring a paralegal is going to make you a better lawyer, or hiring a nanny is going to make you a better parent, etc. It's helpful, but in terms of offloading some things you could do yourself.
Yes, by offloading the lower skilled tasks, you can spend more time on the higher skilled tasks, and be a better parent or chef or lawyer or programmer.
Just like I'm a better programmer in Rust than in C, because I offloaded lots of mundane checking to the compiler.
This tracks with my experience as a more "senior" dev using Copilot/Cursor. I can definitely see these tools being less useful, or more misleading, for someone just starting out in the field.
One worry I have is what will happen to my own skills over time with these tools integrated into my workflow. I do think there's a lot of value in going through the loop of struggling with -> developing a better understanding of technologies. While it's possible to maintain this loop with coding assistants, they're undoubtedly optimized towards providing quick answers/results.
I'm able to accomplish a lot more with these coding assistants now, but it makes me wonder what growth I'm missing out on by not always having to do it the "hard" way.
Saves the trip to stackoverflow, or provides many valid links to stackoverflow (SO) as a comment or chat answer. So it is not that bad, when comparing it to reading SO directly.
But yes, you often get stuck, with the genAI looping in its proposals.
The 70% framing suggests that these systems are asymptotically approaching some human "100%," but the theoretical ceiling for AI capabilities is much higher.
> the future isn't about AI replacing developers - it's about AI becoming an increasingly capable collaborator that can take initiative while still respecting human guidance and expertise.
I believe we will see humans transition to a purely ceremonial role for regulatory/liability reasons. Airplanes fly themselves with autopilot, but we still insist on putting humans at the yoke because everyone feels more comfortable with the arrangement.
Assistants that work best in the hands of someone who already knows what they're doing, removing tedium and providing an additional layer of quality assurance.
Pilot's still needed to get the plane in the air.
But even if the output from these tools is perfect, coding isn't only (or even mainly) about writing code, it's about building complex systems and finding workable solutions through problems that sometimes look like cul de sacs.
Once your codebase reaches a few thousand lines, LLMs struggle seeing the big picture and begin introducing one new problem for every one that they solve.
I don’t ever use the code completion functionality in fact it can be a bit annoying. However asking it questions is the new Google search.
Over the last couple of years I’ve noticed that the quality of answers you get from googling has steeply declined, with most results now being terrible ad filled blog spam.
Asking the AI assistant the same query yields so much better answers and gives you the opportunity to delve deeper into said answer if you want to.
No more asking on stack overflow and having to wait for the inevitable snarky response.
It’s the best money I’ve spent on software in years. I feel like Picard asking the computer questions
I’m actually surprised this isn’t talked about more… it has been great interrogating the docs of unfamiliar frameworks (and giving me good references on where to go in the actual docs for more info). A better mouse trap to “Google-like search” has been my exact experience as well. I once scoffed at a blog article by an “ai communicator” (right around a year ago) entitled something like “chatgpt vs google search, what’s the difference?” I have also found myself using chatgpt as a Google search because the results of actually searching Google have been so unsatisfactory (even after I found myself appending “reddit” as the suffix to every google search for the better part of 2 years). I can’t tell whether this says more about the AI or Google.
> I’ve noticed that the quality of answers you get from googling has steeply declined
Are you asking for solutions to a specific problem or searching for information on the problem? I still don't have issue with search engines, because I mostly use it for the latter, treating it as an index of the internet (which is what they really are). And for that AI is a huge step down, because I can't rely on the truthfulness of their replies.
Programming is not just about producing a program, it's about developing a mental model of the problem domain and how all the components interact. You don't get that when Claude is writing all your code, so unless the LLM is flawless (which it likely never be on novel problems), you won't understand the problem enough to know how to fix things when they go wrong.
Feels like a bit of a broad statement. I tend to request chunks of work/code that I place into my codebase. I still know what it's for and where it goes, I just care a lot less about the specific implementation details of the chunk, as long as it's typed and passed the smell test. To me it's just one higher rung on the abstraction latter.
But it goes to show how difficult it is to discuss this stuff, because depending on model, IDE, workflow, experience/seniority, intuition etc. LLM assisted developmental might look very different in each instance.
I mostly use it to get past the drudge-work. often I have mental blocks in doing super mundane things that 'just need to be done'. AI is good at those super defined and self contained problems ATM & thats okey for me. anything that requires deep expertise or knowledge base it falls flat IMO. It can change if the AI can get its own sandbox to try and learn with experimentation. IDk that has its own implications but its one way to improve its understanding of a system.
Academic studies are finding the same thing. Although there are a handful of beginners who are great at prompting, when you study beginning programmers at scale, you find that the mostly struggle to write prompts and understand why things go wrong. Here is one of several example studies:
>Error messages that make no sense to normal users
>Edge cases that crash the application
>Confusing UI states that never got cleaned up
>Accessibility completely overlooked
>Performance issues on slower devices
>These aren't just P2 bugs - they're the difference between software people tolerate and software people love.
I wonder if we'll see something like the video game crash of 1983. Market saturation with shoddy games/software, followed by stigmatization: no one is willing to try out new apps anymore, because so many suck.
I absolutely think this is a spot on analysis btw and ties well into my own experience with LLM based coding.
However one difference between these tools and previous human developed technologies is these tools are offering direct intelligence sent via the cloud to your environment.
That is unprecedented. Its rather like the the first time we started piping energy through wires. Sure it was clunky then, bit give it time. LLMs are just the first phase of this new era.
Currently, this is my favorite test prompt for AI coding tools:
Make a simple HTML page which
uses the VideoEncoder API to
create a video that the user
can download.
So far, not a single AI has managed to create a working solution.
I don't know why. The AIs seem to have an understanding of the VideoEncoder API, so it seems it's not a problem of not having the infos they need. But none comes up with something that works.
I'm afraid this is going to be like the calculator event. Before electronic calculators happened, kids at least learned to mentally do basic math. Now we have whole generations incapable of making change without a calculator. I met a graduate who claimed 32 ÷ 2 was too difficult because 32 is too big to work with mentally. I believe code development AI is going to lead to a whole generation of mediocre coders.
I don't know what happened to that student (I'm guessing not in a field commonly using math ??), but my experience as a student with access to calculators during exams (sometimes even to programmable calculators) is that they were just slowing you down for simple calculations, resulting in a worse grade if abused, especially since you have to lay out all the intermediate steps anyway. (And you just figure out with experience what the cutoff for 'simple' is, and it raises as you keep getting better at it, just as the bar to what is considered a minimum 'intermediate step' raises with the class level.)
But then also the importance of using a calculator as one of the tools to double-check your work (because they make much less mistakes) should not be underestimated... whereas the situation seems to be opposite with LLMs !
Same. In secondary school, calculators was mostly about the trig functions and operating on big numbers. In primary school, you wouldn't have access to a calculators in exams so you learn how to do without. And in university, I mostly had to deal with symbols.
Ah the calculator, the strawman of every maths teacher.
And yet, are we worse at math than the previous generations? I'm not so sure. Pretty quickly math becomes toying with letters and not numbers, and unless the calculator has an algebraic engine, it won't help you.
We can focus in more important aspects of math, like actually understanding the concepts. However, to me this will be worse than the calculator: LLM pretend to understand the problem, and can offer solutions. On the other hand, you don't get a calculator with a CAS until late university (if you ever get one). Calculators don't pretend to be more than what they are.
But IMHO we'll still get the benefits of calculators: let the humans focus on the difficult tasks. I don't want to write the full API scaffolding for the tenth time. I don't want to write the boilerplate for that test framework for the 15th time. LLMs are good at those tasks, let them do it!
There is a problem with AI coding where you want to let it write as much as possible, but when it hits a wall where is just looping back to the same error, you would have to roll up the sleeve and get dirty.
As AI is able to write more complex code, the skill of the engineer must increase to go in when necessary to diagnose the code it wrote, if you can’t, your app is stuck to the level of the AI
I agree with the article. I am in the camp "learning with the AI" and the main task is getting young foolish Einsteiborg to use simple things and go step by step without jabbering about the next steps or their alternative etc. I also have to go in blocks to get a usable whole and git branch saves the day every time. But its also really nice and you learn so much.
I think this is a pretty clear-eyed view. I've been a developer for 25 years. I use Copilot every day now exactly the way he describes. I get it to do XYZ, then use it to refactor what it just did or clean it up myself. Every now and then it goes sideways, but on the whole it saves me time and helps me focus on business problems more than ceremony.
If you can read code fast, it's very useful. I think this may be the biggest reason it's more helpful to seniors than juniors.
It's easy for a senior engineer to forget how exhausting it is for juniors to read code. I can glance at a page of code from Claude and tell pretty quickly if it's what I want. So it's useful to me if it's right more than half the time. For a junior this is definitely worse than just trying to write it themselves, they would learn more that way and come out less exhausted.
100% agreed.Much of my time involved with AI during the course of work is used delegating for better output and reevaluating everything it does. 70% sounds right.
And I have even told AI, dont act like an over-eager 14 year with ADHD. Which I was/am still, myself. XD
I find that irritating and iterative problem of watching the ai fail over and over helps me understand the problem I’m trying to solve. But that led me to believe none of the promises of Altman et al are connected to reality.
You break through the 70% barrier by writing detailed spec and architecture documents. Also tests that define behavior. Those go in every request as you build. Don’t ask an LLM to read your mind.
"the actual software we use daily doesn’t seem like it’s getting noticeably better"
Honestly, this seems like a straw man. The kind of distributed productivity tools like Miro, Figma, Stackblitz, etc. that we all use day-to-day are both impressive in terms of what they do, but even more impressive in terms of how they work. Having been a remote worker 15 years ago, the difference in what is available today is light-years ahead of what was available back then.
I expect Englebert would have been astounded by real time text generation, faster than he could read, of a first pass draft of a Supreme Court brief involving a James Bond incident uncovering a pharmaceutical plot to undermine the biosphere via a constitutional loophole written in Dr. Seuss prose, with the humor of Monty Python.
Or that he could get successive improvements in the form of real time collaboration with the model.
It is true that a tool that isn’t as reliable as an expert won’t impress an expert. Even if it’s better/faster on 99% of varied tasks for any given human, for fast response output. “It still isn’t great”, by an experts standards.
But as humans, each of our task/field span of expertise or informed amateur fluency is terrifyingly limited compared to the broad awareness of fields and subjects that these models incorporate.
And they are crazy impressive in terms of how much better they have become, qualitatively and quantitatively, inexpensive (to deploy/use) and available, in a few years.
IIRC all those mentioned products have been around before the AI craze started. Can you explain how bringing up these products is not a straw man by itself?
If engineers have this problem then people with no engineering skill at all that just want to build some app will be hopeless. The days of no longer needing to hire engineers will never come.
Our company never adds comments, because the code speaks for itself.
And with genAI I can have these comments added in very low time, helping me to get an overview of what happens.
But as for the "why are are doing this in the first place" business documentation is usually outside the source code and therefore out of reach of any genAI, for now.
As for what senior devs should do when coding:
> They're constantly:
> Refactoring the generated code into smaller, focused modules
> Adding edge case handling the AI missed
> Strengthening type definitions and interfaces
> Questioning architectural decisions
> Adding comprehensive error handling
Ain't nobody got time for that! The one girl and other guy that could do this, because they know the codebase, have no time to do it. Everyone else works by doing just enough, which is nearly what TDD dictates.
And we have PR code review to scrape up quality to barely get maintainable code. And never overcomplicate things, since writing code that works is good enough. And by the time you want to refactor a module three years later, you would want to use another data flow style or library to do the work altogether.
Oddly enough intern employment for the company is quite sophisticated (and exceptionally fair), they are also paid. Yet, there have been cases of having extraordinary interns, some of them got a direct job offers immediately as well.
70%, REALLY? My personal experience was that at least 50% of the time, Copilot backfires, sometimes the proposed code was beyond ridiculous. Thus I had to disable it.
An axiom that was true even before pervasive AI: If you're using the computer to do something you don't have time to do yourself, that's good. If you're using the computer to do something you don't understand, that's bad.
> While engineers report being dramatically more productive with AI, the actual software we use daily doesn’t seem like it’s getting noticeably better.
I would disagree with this. There are many web apps and desktop apps that I’ve been using for years (some open source) and they’ve mostly all gotten noticeably better. I believe this is because the developers can iterate faster with AI.
Packages were supposed to replace programming. They got you 70% of the way there as well.
Same with 4GLs, Visual Coding, CASE tools, even Rails and the rest of the opinionated web tools.
Every generation has to learn “There is no silver bullet”.
Even though Fred Brooks explained why in 1986. There are essential tasks and there are accidental tasks. The tools really only help with the accidental tasks.
AI is a fabulous tool that is way more flexible than previous attempts because I can just talk to it in English and it covers every accidental issue you can imagine. But it can’t do the essential work of complexity management for the same reason it can’t prove an unproven maths problem.
As it stands we still need human brains to do those things.
reply