Hacker News new | past | comments | ask | show | jobs | submit login

I've let people use GPT in coding interviews, provided that they show me how they use it. At the end I'm interested in knowing how a person solves a problem, and thinks about it. Do they just accept whatever crap the gpt gives them, can they take a critical approach to it, etc.

So far, everyone that elected to use GPT did much worse. They did not know what to ask, how to ask, and did not "collaborate" with the AI. So far my opinion is if you have a good interview process, you can clearly see who are the good candidates with or without ai.






Earlier this past week I asked Copilot to generate some Golang tests and it used some obscure assertion library that had a few hundred stars on GitHub. I had to explicitly ask it to generate idiomatic tests and even then it still didn't test all of the parameters that it should have.

At a previous job I made the mistake of letting it write some repository methods that leveraged SQLAlchemy. Even though I (along with my colleague via PR) reviewed the generated code we ended up with a preprod bug because the LLM used session.flush() instead of session.commit() in exactly one spot for no apparent reason.

LLMs are still not ready for prime-time. They churn out code like an overconfident 25-year-old that just downed three lunch beers with a wet burrito at the Mexican place down the street from the office on a rainy Wednesday.


I feel like I am taking crazy pills that other devs don't feel this way. How bad are the coders that they think these AI's are giving them super powers. The PR's with AI code are so obvious and when you ask the devs why, they don't even know. They just say, well the AI picked this, as if that means something in and of itself.

AI gives super powers because it saves you an insane amount of typing. I used to be a vim fanatic, I was very efficient but whenever I changed language there was a period where I had to spend getting efficient. Setup some new snippets for boilerplate, maybe tweak some LSP settings, save some new macros.

Now in cursor I just write "extract this block of code into its own function and set up the unit tests" and it does it, with no configuration from my part. Before I'd have a snippet for the unit test boilerplate for that specific project, I'd have to figure out the mocks mysel, etc.

Yes, if you use AI to just generate new code blindly and check it in without no understanding, you end up with garbage. But those people were most likely copy pasting from SO before AI, AI just made them faster.


AI saves you an insane amount of typing, but adds an insane amount of reading, which is strictly harder than typing (at least for me).

Hmm, that is interesting; reading is harder? You have to read a lot of code anyway right? From team members, examples, 3rd party code/libraries? Through the decades of programming at least I became very proficient and rapidly spotting 'fishy' code and generally understanding code written by others. AI coding is nice because it is, for me, the opposite of what you have; reading the code it generates is much faster than writing it myself even though I am fast at writing it; not that fast.

I have said it here before, because I would love to see some videos of HNers who complain AI gives them crap as we are getting amazing results on large and complex projects... We treat AI code the same as human code, we read it and recommend or implement fixes.


> Hmm, that is interesting; reading is harder?

Much, much harder. Sure, you can skim large volumes of code very quickly. But the type of close reading that is required to spot logic bugs in the small is quite taxing - which is the reason that we generally don't expect code review processes to catch trivial errors, and instead invest in testing.


But we are not talking about large volumes of code here; we are talking about; LLM generates something, you check it and close read it to spot logic bugs and either fix yourself, ask the LLM or approve. It is very puzzling to me how this is more work/taxing than writing it yourself unless for very specific examples;

Examples from every day reality in my company; writing 1000s of lines of react frontend code is all LLM (in very little time) and reviews catch all the issues while the database implementation we are working on we spend sometimes one hour on a few lines and the LLM suggest things but they never help. Reviewing such a little bit of code has no use as it's the result of testing a LOT of scenarios to get the most performance out in the real world (across different environments/settings). However, almost everyone in the world is working on (similar issues like) the former, not the latter, so...


> writing 1000s of lines of react frontend code

Maybe we just located the actual problem in this scenario.


Shame we cannot combine the two threads we are talking about, but our company/clients structure do not allow us to do this differently (quickly; our clients have existing systems with different frontend tech; they are all large corps with many external and internal devs which built some 'framework' on top of whatever frontend they are using; we cannot abstract/library-fy to re-use across clients). I would if I could. And this is actually not a problem (outside it being a waste to which I agree) as we have never delivered more for happier clients in our existence (which is around 25 years now) than in 2024 because of that. Clients see the frontend and being able to over-deliver there is excellent.

You should use testing and a debugger for this. Don’t just read code, run it and step through it and observe code as it mutates state.

I think this is great advice for folks who work on software that is well enough contained enough that you can run the entire thing on your dev machine, and it happens to be written in the same language/runtime throughout.

Unfortunately I've made some career choices that mean I've very rarely been in that position - weird mobile hardware dependencies and/or massive clouds of micro services both render this technique pretty complicated to employ in practice.


Yeah it’s not really career choices. The code should be properly done so that it can hit multiple automated test targets.

It’s a symptom less of the career choice and more of poor coding practices.


Oh, we have automated tests out the wazoo. Mostly unit tests, or single-service tests with mocked dependencies.

Due to unfortunate constraints of operating in the real world, one can only run integration tests in-situ, as it were (on real hardware, with real dependencies).


When you type the code, you definitely think about it, deepening your mental model of the problem, stopping and going back and changing things.

Reading is massively passive, and in fact much more mentally tiring if whole reading is in detective mode 'now where the f*ck are some hidden issues now'. Sure, if your codebase is 90% massive boilerplate then I can see quickly generated code saves a lot of time, but such scenarios were normally easy to tackle before LLMs came. Or at least those I've encountered in past few decades.

Do you like debugging by just tracing the code with your eyes, or actually working on it with data and test code? I've never seen effective use of such regardless of seniority. But I've seen in past months wild claims about magic of LLMs that were mostly un-reproduceable by others, and when folks were asked for details they went silent.


Depends ofc on the complexity of the area, but... reading someones code to me feels a bit like being given a 2D picture of a machine, then having to piece together a 3D model in my head from a single 2D photo from one projection of the machine. Then figuring out if the machine will work.

When I write code, the hard part is already done -- the mental model behind the program is already in my head and and I simply dump it to keyboard. (At least for me typing speed has never been relevant as a limiting factor)

But I read code I have to reassemble the mental model "behind" it in my head from the output artifact of the thought processes.

Of course one needs to read code of co-workers and libraries -- but it is more draining, at least for me. Skimming it is fast but reading it thoroughly enough to find bugs by reading requires making the full mental model of the code which takes more mental effort for me at least.

There is a huge difference in how I read code from trusted experienced coworkers and juniors though. AI falls in the latter category.

(AI is still saving me a lot of time. Just saying I agree a lot that writing is easier than reading still.)


Running code in your head is another issue that AI won't solve (yet); we had different people/scientists working on this; the most famous person there being Brett Victor, but also Jonathan Edwards [0] and Chris Granger (lighttable). I find the example in [0] the best; you are sitting there with your very logically weak brain trying to think wtf will this code do while there is a very powerful computer next to you that can tell you. But doesn't. And yet, we are mostly restricted to first think out the code to at least some extent before we can see it in action, same for the AI.

[0] https://vimeo.com/140738254


Don’t run code in your head. Run it in reality and step through it with a debugger.

You mean like a blueprint of a machine? Because that is exactly how machines are usually presented in official documentation. To me the skill of understanding how "2d/code" translates to "3d/program execution" is exactly the skill that sets amateurs apart from pros, saying that, I consider myself an amateur in code and a professional in mechanical design.

"In the small", it's easy to read code. This code computes this value, and writes it there, etc. The harder part is answering why it does what it does, which is harder for code someone else wrote. I think it is worthwhile expending this effort for code review, design review, or understanding a library. Not for code that I allegedly wrote. Especially weeks removed, loading code I wrote into "working memory" to fix issues or add features is much much easier than code I didn't write.

> The harder part is answering why it does what it does, which is harder for code someone else wrote.

That's a vital part of writing software though.


True. I will save effort by only expending it when needed (when I need to review my coworkers' code, legacy code, or libraries).

Here is a chat transcript from today, I don't know if it'll be interesting to you. You can't see the canvas it's building the code in: https://chatgpt.com/share/67a07afe-3e10-8004-a5ea-cc78676fb6...

Yes, I have to read what it writes, and towards the end it gets slow and starts making dumb mistakes (always; there's some magically bad length at which it always starts to fumble), but I feel like I got the advantages of pairing out of it without actually needing to sit next to another human? I'll finish the script off myself and review it.

I don't know if I've saved actual _time_ here, but I've definitely saved some mental effort on a menial script I didn't actually want to write, that I can use for some of the considerably more difficult problems I'll need to solve later today. I wouldn't let it near anything where I didn't understand what every single line of code it wrote was doing, because it does make odd choices, but I'll probably use it again to do something tomorrow. If it needs to be part of a bigger codebase, I'll give it the type-defs from elsewhere in the codebase to start with, or tell it it can assume a certain function exists


> AI gives super powers because it saves you an insane amount of typing.

I must be very different, as very little of my coding time is spent typing.


I think that spending a lot of time typing is likely an architectural problem. But I do see how AI tools can be used for "oneshot" code where pondering maintainability and structure is wasted time.

For me, getting what's in my head out onto the screen as fast as possible increases my productivity immensely.

Maybe it's because I'm used to working with constant interruptions, but until what I want is on the screen, I can't start thinking about the next thing. E.g. if I'm making a new class, I'm not thinking about the implementation of the inner functions until I've got the skeleton of the class in place. The faster I get each stage done, the faster I work.

It's why I devoted a lot of time getting efficient at vim, setting up snippets for my languages, etc. AI is the next stage of that in my mind.

Maybe you can keep thinking about next steps while stuff is "solved" in your head but not on the screen. It also depends on the type of work you're doing. I've spent many hours to delete a few lines and change one, obviously AI doesn't help there.


It's the same for me if I'm working on something unique or interesting, or a new technology.

Some kinds of coding require very little thinking, though. Converting a design to frontend interface or implementing a CRUD backend are mostly typing.


That's certainly the case for myself, too, though I've got roughly two fewer decades in this than yourself!

But typing throughput has never been my major bottleneck. Refactoring is basically never just straight code transforms, and most of my time is spent thinking, exploring or teaching these days


> AI gives super powers because it saves you an insane amount of typing

I feel like I'm going a little bit more insane whenever folks say this, because one of the primary roles of a software engineer is to develop tools and abstractions to reduce or eliminate boilerplate.

What kind of software are you writing, where generating boilerplate is the limiting factor (and why haven't you fixed that)?


> because one of the primary roles of a software engineer is to develop tools and abstractions to reduce or eliminate boilerplate.

Is it? Says who? Not only do I see entire clans of folk appearing who say ; DRY sucks, just copy/paste, it's easier to read and less prone to break multiple things with one fix vs abstractions that keep functionality restricted to one location, but also; most programmers are there to implement crap their bosses say, that crap almost never includes 'create tools & abstractions' to get there.

I agree with you actually, BUT this is really not what by far most people working in programming believe one of their (primary) roles entail.


See THIS is a usage that makes sense to me. Using AI to manipulate existing code like this is great. I save a ton of time by pasting in a json response and saying something like “turn this into Data classes” makes api work so fast. On the other hand I really don’t understand devs that say they are using AI for ALL their code.

Copilot kind of auto completes exactly what I want most of the time. When I want something bigger I will ask Claude to give me that, but I always know what I am going to get, and I could have written it myself, it would have just taken tons of typing. I feel like I am kind of an orchestrator of sorts.

So if you were to do a transformation like that you'd cut the code and paste it into a new function. Then you'd modify that function to make the abstraction work. An LLM will rewrite the code in the new form. It's not cut/paste/edit. It's a rewrite every time with the old code as reference.

Each rewrite is a chance to add subtle bugs, so I take issue with the description of LLMs "working on existing code". They don't use text editors to manipulate code like we do (although it might be interesting if they did) and so will have different issues.


> It's a rewrite every time with the old code as reference.

As far as I'm aware, tools like Aider don't do this, it does targeted changes.

The way I do it "copy/paste into chat interface", I just have the relevant context and just change the parts required.


> vim > LLM

I use both :)

Vim has a way to run shell programs with your selection as standard input as you'd know, and it will replace the selection with stdout.

So I type in my prompt e.g "mock js object array with 10 items each having name age and address" do `V!ollama run whatever` for example and it will fill it in there.

Now this is blocking and I have a hacky way to run it async and fill it in based on marks later in my vimrc. Neovim really since I use jobstart().

This also works with lots of other stuff, like quick code/mock generation e.g sometimes instead of asking an LLM I just write javascript/python inline and `vap!node`/python on it.


I do agree with the filling in of text, but only when the patterns are clear. Any kind of thinking on logic or using libraries I find it still leads me astray every time.

>I feel like I am taking crazy pills that other devs don't feel this way.

Don't take this the wrong way, but maybe you are.

For example, this weekend I was working on something where I needed to decode a Rails v3 session cookie in Python. I know, roughly, nothing about Rails. In less than 5 minutes ChatGPT gave me some code that took me around 10 minutes to get working.

Without ChatGPT I could have easily spent a couple hours putzing around with tracking down old Rails documentation, possibly involving reading old Rails code and grepping around to find where sessions were generated, hunting for helper libraries, some deadends while I tried to intuit a solution ("Ok, this looks like it's base64 encoded, but base64 decoding kind of works but produces an error. It looks like there's some garbage at the end. Oh, that's a signature, I wonder how it's signed...")

Instead, I asked for an overview of Rails session cookies, a fairly simple question about decoding a Rails session cookie, guided it to Rails v3 when I realized it was producing the wrong code (it was encrypting the cookie, but my cookies were not encrypted). It gave me 75 lines of code that took me ~15 minutes to get working.

This is a "spare time" project that I've wanted to do for over 5 years. Quite simply, if I had to spend hours fiddling around with it, I probably wouldn't have done it; it's not at the top of my todo list (hence, spare time project).

I don't understand how people don't see that AI can give them "superpowers" by leveraging a developers least productive time into providing their highest value.


> I don't understand how people don't see that AI can give them "superpowers" by leveraging a developers least productive time into providing their highest value.

I'm unwilling to write code^1 that's not correct, or at least as correct as I'm able to make it. The single most frustrating thing in my daily life is dealing with the PoC other devs cast into the world that has more bugs than features, because they can't tell it's awful. I've seen code I've written at my least productive, and it's awful and I'm often ashamed of it. It's the same quality as AI code. If AI code allows me to generate more code that I'm ashamed of, when I'm otherwise too tired myself to write code, how is that actually a good thing?

I get standards for exclusively personal toy projects, and stuff you want others to be able to use are different. But it doesn't add value to the world if you ship code someone else needs to fix.

^1 I guess I should say commit/ship instead, I write plenty of trash before fixing it.


Have you tried asking LLMs to clean up your code, making it more obvious, adding test harnesses, adding type hints or argument documentation, then reviewing it and finding out what you can learn from what it is suggesting?

Last year I took a piece of code from Ansible that was extremely difficult to understand. It took me the better part of a day to do a simple bug fix in it. So I reimplemented it, using several passes of having it implement it, reviewing it, asking LLMs to make it simpler, more obvious code. For your review: https://github.com/linsomniac/symbolicmode/blob/main/src/sym...


> took me ~15 minutes to get working

You didn't blindly use it, you still used your expertise. You gave it a high quality prompt and you understood the reply enough to adjust it.

In other words, you used it correctly. It complimented your expertise, rather than replace it.


I think you're making my point for me. Isn't it "taking crazy pills" if you use it incorrectly, and then stand on that experience as proof that other devs are insane? :-)

I agree with your assessment in the situation you describe: greenfielding a project with tools you are unfamiliar with. For me LLMs have worked best when my progress is hindered primarily by a lack of knowledge or familiarity. If I'm working on a project where I'm comfortable with my tools and APIs, they can still be useful from time to time but I wouldn't call it a "superpower", more like a regular productivity tool (less impactful than an IDE, I would say). Of course this comment could be outdated in a few months, but that's what it feels like to me in the here and now.

IMHO, still feels like a superpower in the editor when I type "def validate_session(sess" and it says "Are you wanting to type: `ion_data: dict[str, str]`?" Especially as the type annotations get more convoluted.

Devs who don't feel that way aren't talking about the stuff you're talking about.

Look at it this way - a powerful debugger gives you superpowers. That doesn't mean it turns bad devs into good devs, or that devs with a powerful debugger never write bad code! If somebody says a powerful debugger gives them superpowers they're not claiming those things; they're claiming that it makes good devs even better.


The best debugger in the world would make me about 5% more efficient. That's about the percentage of my development time I spend going "WTF? Why is that happening?" That's the best possible improvement from the best possible debugger: about 5%.

The reason is that I almost always have a complete understanding of everything that is happening in all code that I write. Not because I have a mega-brain; only because "understanding everything that is happening all the time" becomes rather easy if all of your code is as simple as you can possibly make it, using clear interfaces, heavily leveraging the type system, keeping things immutable, dependency inversion, and aggressively attacking any unclear parts until you're satisfied with them. So debuggers are generally not involved. It's probably a couple times per week that I enter debug mode at all.

It sounds a little like saying "imagine the driving superpowers you could have if your car could perfectly avoid obstacles for you!" Okay, sure, that'd be life-saving sometimes, but the vast majority of the time, I'm not just randomly dodging obstacles. Planning ahead and paying attention kinda makes that moot.


Now imagine working on a 10+ years old codebase that 100s of developer hands have gone over. And several dependencies not being possible to even run locally because of being so out of date. Why work on that in the first place? Sometimes it pays really well.

I would absolutely not be comfortable trusting current AI to work on this sort of codebase and make meaningful improvements

Not AI, but we were talking about a debugger specifically.

Please read things in context - my comment was about what people mean when they talk about a thing giving developers superpowers. I was not making a claim about how much you, personally, benefit from debuggers.

Also: I don't use debuggers much either. Is Superman's heat vision not a superpower because he rarely needs to melt things? :P


It’s different if you work on a large legacy codebase where you have maybe a rough understanding of the high level architecture but nobody at the company has seen the code you’re working on in the last five years. There a debugger is often very helpful.

That is your own code and only your libs then, no imports? Or you work in a language where imports are small (embedded?) so you know them all and maintain them all? Or maybe vanilla js/c without libs? Because import 1 npm and you have 100gb of crap deps you really don't understand which happens to work at moment t0 but at t1, when 1 npm updated, nothing works anymore and can't claim to understand it all as many/most people don't write 'simple code'.

You've never had to debug code that someone else wrote?

It's pretty uncommon. Most of my teammates' code is close to my standard -- I am referred to as the "technical lead" even though it's not an official title, but teammates do generally try to code to my standard, so it's pretty good code. If I'm doing a one-off bug fix or trying to understand what the code does, usually reading it is sufficient. If I'm taking some level of shared ownership of the code, I'm hopefully talking a lot with the person who wrote it. I'm very choosy about open-source libraries so it's rare that I need to dig into that code at all. I don't use any ecosystems like React that tempt you to just pull in random half-baked components for everything.

The conflux of events requiring me to debug someone else's code would be someone who wrote bad code without much oversight, then left the company, and there's no budget to fix their work properly. Not very common. Since I usually know what the code is supposed to be doing, such code can likely be rewritten properly in a small fraction of the original time.


> They just say, well the AI picked this, as if that means something in and of itself.

In any other professional field that would be grounds for termination for incompetence. It's so weird that we seem to shrug off that kind of behavior so readily in tech.


Nah, already had multiple cases of that; one with a lawyer at a big corp and some others; the story is not straight up 'ai said so' but more like: 'we use different online and offline tools to aid us in our work, sometimes the results are less than satisfactory, and we try to correct those cases'. It is the same response, just showing vulnerability; we are only human, even with our tools.

I think what you're saying is a bit idealistic. We like to think that people get terminated for incompetence but the reality is more complicated than that

I suspect people get away with saying "I don't know why that didn't work, I did what the computer told me to do" a lot more frequently than they get fired for it. "I did what the AI said" will be the natural extension of this


Depending on what language you use and what domain your problem is in, current AIs can vary widely in how useful they are.

I was amazed at how great ChatGPT and DeepSeek and Claude etc are at quickly throwing together some small visualisations in Python or JavaScript. But they struggled a lot with Bird-style literate Haskell. (And just that specific style. Plain Haskell code was much less of a problem.)


Because there are plenty of devs who take the output, actually read if it makes sense, do a code review, iterate back and forth with it a few times, and then finally check in the result. It's just a tool. Shitty devs will make shitty code regardless of the tool. And good devs are insanely productive with good tools. In your example what's the difference with that dev just copy/pasting from StackOverflow?

Because on SO someone real wrote it with a real reason and logic. With AI we still need to double check that what we are giving ever made any sense. And SO also has votes to show the validty.

I agree if devs iterated over the results it could be good, but that has not been what I have been seeing.

It is not a traditional tool because tools we had in the past had expected results.


Agreed!

To go off on a tangent: yes, good developers can produce good code faster. And bad developers can produce bad code faster (and perhaps slightly better than before, because the models are mostly a bit smarter than copy-and-paste is).

Everyone potentially benefits, but it won't suddenly turn all bad programmers into great programmers.


In my experience the lack of correctness and accuracy (I have seen a lot of calls to hallucinated apis), is made up from the "eagerness" to fill out boilerplate.

It's indeed like having a super junior, drunk intern working for you if we are using the gps analogy.

Some work is done but you have to go over it and fix a bunch of things.


You are. Only noobs do this, experts using a llm are way more efficient. (Unless you work like with the same language and libraries for years)

> (Unless you work like with the same language and libraries for years)

I'm not sure what you mean here? Are you saying that if you work with the same language for years that you're some how not proficient with using LLMs?


Sounds like someone who switches jobs and technologies every year, so someone else has to clean up the mess

The advantage of LLMs is smaller if you know everything by heart.

I understand exactly what you mean, feel like there are buckets of people who are just trying to gas light every reader.

I am x10 faster with gpt. No regex, no stackoverflow, no sql, no broiler plate, no text manipulations.

> well the AI picked this

hey look, we found him!


I don't follow this take. ChatGPT outputted a bug subtle enough to be overlooked by you and your colleague and your test suite, and that means it's not ready for prime time?

The day when generative AI might hope to completely handle a coding task isn't here yet - it doesn't know your full requirements, it can't run your integration tests, etc. For now it's a tool, like a linter or a debugger - useful sometimes and not useful other times, but the responsibility to keep bugs out of prod still rests with the coder, not the tools.


Yes and this means it doesn't replace anyone or make someone who isn't able to code able to code. It just means it's a tool for people who already know how to code.

> an overconfident 25-year-old that just downed three lunch beers with a wet burrito at the Mexican place down the street from the office on a rainy Wednesday

That's...oddly specific.


I'm only 33 and I've worked with at least two of 'em. They're a type :-D

You were one of them... 8 years ago?

(Not the OP.)

The LLMs are much more eager to please and to write lots of code. When I was younger, I would get distracted and play computer games (or comment on HN..), rather than churn out mountains of mediocre code all the time.


> The LLMs are much more eager to please and to write lots of code.

My process right now when working LLMs is to do the following:

- Create problem and solution statement

- Create requirements and user stories

- Create architecture

- Create skeleton code

- Map the skeleton code

- Create the full code

At every step, where I don't need the full code, the LLM will start coding and I need to stop it and add "Do not generate code. The focus is still design".


one of my biggest issues with the LLM is how it always wants to give me a mountain of code. A lot of the time Im using it for react, and it always gives me a full component no matter how much I specify I just want the method. It will not remember this for more than one message and will go back to giving me as much code as possible.

Yeah, it almost feels like you are talking to somebody with OCD. The frustrating part is, output tokens are usually a lot more expensive than input tokens, so they are wasting energy and money :-). Also, the more they generate, the greater the chance it will create attention issues as the conversation progresses.

This is why I built my chat app to let me manipulate LLM responses. If I feel it is not worth knowing, I'll just erase parts of it to ensure the conversation doesn't get side tracked. Or I will go back to the original user message and modify it to say

### IMPORTANT

- Do not generate more code than required.

The nice thing about LLM conversations are, every time you chat, the LLM treats it as a first time conversation, so this trick will work if the model is smart enough.


No this all tracks. Matches my own experience.

I thought it was nicely poetic.

Is that the LLM's fault or SQLAlchemy for having that API in the first place? Or was that a gap in your testing strategy, as (if I'm reading it right), flush() doesn't write anything to the database but is only intended as an intermediate step (and commit() calls flush() under water).

I think we're in a period similar to self-driving cars, where the LLMs are pretty good, but not perfect; it's those last few percent that break it.


> At a previous job I made the mistake of letting it write some repository methods that leveraged SQLAlchemy. Even though I (along with my colleague via PR) reviewed the generated code we ended up with a preprod bug because the LLM used session.flush() instead of session.commit() in exactly one spot for no apparent reason.

Ive had ChatGPT do the same thing with code involving SQLAlchemy.


>it used some obscure assertion library that had a few hundred stars on GitHub.

That sounds like a lot of developers I've worked with.


You are using it wrong.

Give examples and let it extrapolate.


"You're holding it wrong" doesn't make a small, too light, crooked, and backwards hammer any better.

You can't tell us that LLM's aren't ready for prime time in 2025 after you tried Copilot twice last year.

New better models are coming out almost daily now and it's almost common knowledge that Copilot was and is one of the worst. Especially right now, it doesn't even come close to what better models have to offer.

Also the way to use them is to ask for small chunks of code or questions about code after you gave them tons of context (like in Claude projects for example).

"Not ready for prime time" is also just factually incorrect. It is already being used A LOT. To the point that there are rumors that Cursor is buying so much compute from Anthropic that they are making their product unstable, because nvidia can't supply them hardware fast enough.


I stopped using AI for code a little over a year ago and at that point I'd used Copilot for 8-12 months. I tried Cursor out a couple of weeks ago for very small autocomplete snippets and it was about the same or slightly worse than Copilot, in my opinion.

The integration with the editor was neat but the quality of the suggestions were no different than what I'd had with Copilot much earlier, and the pathological cases where it just spun off into some useless corner of its behavior (recommending code that was already in the very same file, recommending code that didn't make any sense, etc.) seemed to happen more than with Copilot.

This was a ridiculously simple project for it to work on, to be clear, just a parser for a language I started working on, and the general structure was already there for it to work with when I started trying Cursor out. From prior experience I know the base is pretty easy to work with for people who aren't even familiar with it (or even parsing in general), so I think given the difficulties that Cursor had even putting together pretty basic things it might be that a user of Cursor would see minimal gains in velocity and end up having less understanding in the medium to long term, at least in this particular case.


The cursor auto complete? It's useful but the big thing is using sonnet with it.

I tried it with Claude Sonnet 3.5 or whatever the name is, both tab-completed snippets and chat (to see what the workflow was like and to see if it gave access to something special).

> It is already being used A LOT

Which is an argument for quality why? Bad coders are not going to produce better code that way. Just more with less effort.


The claim of the comment I replied to was "LLM's are not ready for prime time" and my opinion is that LLM prime time is already here. Using LLM's to code (or to learn how to code) is obviously super popular.

Who's talking about quality anyway?

Code quality is not and was never the number one most important thing in business, popularity of a product/service or keeping a job. You may find that unfortunate (I agree), but it's just how it is based on my own 15yr+ experience.


Copilot isn't a single model. Copilot is merely a brand and uses openAI and anthropics newest models.

I imagine most of the things that would be good uses for seniors in AI aren't great uses for a coding interview anyway.

"Oh, I don't remember how to do parameterized testing in junit, okay, I'll just copy-paste like crazy, or make a big for-loop in this single test case"

"Oh, I don't remember the API call for this one thing, okay, I'll just chat with the interviewer, maybe they remember - or I'll just say 'this function does this' and the interviewer and I will just agree that it does that".

Things more complicated than that that need exact answers shouldn't exist in an interview.


> Things more complicated than that that need exact answers shouldn't exist in an interview.

Agreed, testing for arcane knowledge is pointless in a world where information lookup is instant, and we now have AI librarians at our fingertips.

Critical thinking, capacity to ingest and process new information, fast logic processing, software fundamentals and ability to communicate are attributes I would test for.

An exception though is proving their claimed experience, you can usually tease that out with specifics about the tools.


We do the same thing. It's perfectly fine for candidates to use AI-assistive tooling provided that they can edit/maintain the code and not just sit in a prompt the whole time. The heavier a candidate relies on LLMs, the worse they often do. It really comes down to discipline.

Discipline for what?

To me it's the lack of skill. If the LLM spits out junk you should be able to tell. ChatGPT-based interviews could work just as well to determine the ability to understand, review and fix code effectively.


>> If the LLM spits out junk you should be able to tell.

Reading existing code and ensuring correctness is way harder than writing it yourself. How would someone who can't do it in the first place tell if it was incorrect?


Make the model write annotated tests too, verify that the annotations plausibly could match the test code, run the tests, feed the failures back in, and iterate until all tests are green?

This has been my experience as well. The ones that have most heavily relied on GPT not only didn't really know what to ask, but couldn't reason about the outputs at all since it was frequently new information to them. Good candidates use it like a search engine - filling known gaps.

Yea I agree. I don't rely on the AI to generate code for me, I just use it as a glorified search engine. Sure I do some copypasta from time to time, but it almost always needs modification to work correctly... Man does AI get stuff wrong sometimes lol

I don't really can't imagine being it usefull in the way where it writes logical part of the code for you. If you are not being lousy you still need to think about all the edge cases when it generates the code which seems harder for me.

> you are not being lousy you still need to think about all the edge cases

This is honestly where I believe LLMs can really shine. I think we like to believe the problems we are solving are unique, but I strongly believe most of us are solving problems that have already been solved. What I've found is, if you provide the LLM with enough information, it will surface edge cases that you haven't thought of and implement logic in your code that you haven't thought of.

I'm currently using LLMs to build my file drag and drop component for my chat app. You can view the problem and solution statement at https://beta.gitsense.com/?chat=da8fcd73-6b99-43d6-8ec0-b1ce...

By chatting with the LLM, I created four user stories that I never thought of to improve user experience and security. I don't necessarily think it is about knowing the edge cases, but rather it is about knowing how to describe the problem and your solution. If you can do that, LLMs can really help you surface edge cases and help you write logic.

Obviously what I am working on, is really not novel, but I think a lot of the stuff we are doing isn't that novel, if we can properly break it down.

So for interviews that allow LLMs, I would honestly spend 5 minutes chatting with it to create a problem and solution statement. I'm sure if you can properly articulate the problem, the LLM can help you devise a solution and a plan.


This makes me feel good because it’s exactly how I use it.

I’m basically pair programming with a wizard all day who periodically does very stupid things.


I like that you’re openminded to allow candidates to be who they are and judge them for the outcome rather than using a prescribed rigid method to evaluate them. Im not looking to interview right now but I’d feel very comfortable interviewing with someone like you, I’d very likely give out my best in such an interview. Id probably choose not to use an LLM during the interview unless I wanted to show how I brainstormed a solution.

same thing here. Interview is basically a representative thing of what we do, but also depends on the level of seniority. I ask people just to share the screen with me and use whatever you want / fell comfortable with. Google, ChatGPT, call your mom, I don't care as long as you walk me through how you're approaching the thing at hand. We've all googled tar xvcxfgzxfzcsadc, what's that permission for .pem is it 400, etc.. no shame in anything and we all use all of the things through day. Let's simulate a small task at hand and see where we end up at. Similarly, there is a bias where people leaning more on LLMs doing worse than those just googling or, gasp, opening documentation.

It took a while for googling during interviews to be accepted

This is the best way to go.

I would love to go through mock interviews for myself with this approach just to have some interview-specific experience.

>> So far, everyone that elected to use GPT did much worse. They did not know what to ask, how to ask, and did not "collaborate" with the AI.

Thanks for sharing your experience! Makes sense actually.


Yes, the current google search is somehow bad than sometime between covid or before that. Using chatgpt as search engine can save time sometimes and if you're somewhat knowledgeable, you can pinpoint the key info and crosscheck with google search.

That's an interesting point about watching how they work with AI. The people I know who are most successful actually engage in back & forth.

i like this. it seems like a good and honest use of time.

Yeah I've been doing the same. Have been pretty stunned at people's inability to use AI tools effectively.

What does effective use look like? I have attempted messing around with a couple of options, but was always disappointed with the output. How do you properly present a problem to a LLM? Requiring an ongoing conversation feels like a tech priest praying to the machine spirit.

In my opinion that's literally what we're aiming for, whether or not intentionally.

Candidates generally use it in one of two ways: either as an advanced autocomplete or like a search engine.

They'll type in things like, "C# read JSON from file"

As opposed to something like:

> I'm working on a software system where ... are represented as arrays of JSON objects with the following properties:

> ...

> I have a file called ... that contains an array of these objects ...

> I have ... installed locally with a new project open and I want to ... How can I do this?

No current LLMs can solve any of the problems we give them so pasting in the raw prompt won't be helpful. But the set up deliberately encourages them to do some basic scaffolding, reading in a file, creating corresponding classes, etc. that an LLM can bang out in about 30 seconds but I've seen candidates spend 30 minutes+ writing it all out themselves instead of just getting the LLM to do it for them.


GitHub Copilot Edit can do the second version of this. It is pretty good at it too. It sometimes gets things wrong but for your average code (and candidates typing in "C# read JSON from file" are way below average unless they never written in C#), if you give all the files for a specific self-contained part of the program, it can extend/modify/test/etc. it impressively well for an LLM.

The difference compared to where we were just 1-2 years ago is staggering.

Edit: the above is with Claude-3.5-Sonnet


High level - having a discussion with the LLM about different approaches and the tradeoffs between each

Low level - I'll write up the structure of what I want in the form of a set functions with defined inputs and outputs but without the implementation detail. If I care about any specifics with the functions I'll throw some comments in there. And sometimes I'll define the data structures in advance as well.

Once all this is set up it often spits out something that compiles and works first try. And all the context is established so iteration from that point becomes easier.


> High level - having a discussion with the LLM about different approaches and the tradeoffs between each

I honestly can't imagine this. If the AI says "However, a downside of approach B is that it takes O(n^2) time instead of the optimal O(nlog(n))", what do you think the odds are that it literally made up both of those facts? Because I'd be surprised if they were any lower than 30%. It's an extremely confident bullshitter, and you're going to use it to talk about engineering tradeoffs!?

> Once all this is set up it often spits out something that compiles and works first try

I'm sorry, but I'm extremely* doubtful that it actually works in any real sense. The fact that you even use "compiles and works first try" as some sort of metric that the code it's producing shows how easily it could slip in awful braindead bugs without you ever knowing. You run it and it appears to work!? The way to know whether something works -- not first try, but every try -- is to understand every character in the code. If that is your standard -- and it must be -- then isn't the AI just slowing you down?


I don't code for a living, and I'm probably worse than a fresh grad would be but I use:

"Please don't generate or rewrite code, I just want to discuss the general approach."

Bc I don't know any design patterns or idiomatic approach, being able to discuss is amazing.

Though quality and consistency of responses is another thing... :)


> I honestly can't imagine this. If the AI says "However, a downside of approach B is that it takes O(n^2) time instead of the optimal O(nlog(n))", what do you think the odds are that it literally made up both of those facts? Because I'd be surprised if they were any lower than 30%. It's an extremely confident bullshitter, and you're going to use it to talk about engineering tradeoffs!?

Being confidently incorrect is not a unique characteristic of AIs, plenty of humans do it too. Being able to spot the bullshit is a core part of the job. If you can't spot the bullshit from AI, I wouldn't trust you to spot the bullshit from a coworker.


But if I have a coworker who bullshits 30% of the time, I get them off my project. Because they too are just slowing everything down.

It can list tradeoffs and approaches you might have forgotten. Thats the big use case for me.

When I was last interviewing people (several years ago now), I’d let them use the internet to help them on anything hands on. I was astounded by how bad some people were at using a search engine. Some people wouldn’t even make an attempt.

[flagged]


> I am a 10x developer

lol, okay.


What is x in the equation is the real question



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: