AI is such a huge expectations dichotomy. For those of us used to the continual disappointment that pre-LLM AI was, the current crop of LLM's are amazing, mind blowing things. We start raving about them, so other people take a look expecting that modern LLM's are the greatest thing since sliced bread. They're not quite that, so we get HN comments complaining that AI sucks.
> We start raving about them, so other people take a look expecting…
If that’s what you want to call it.
I see hypemen overpromising and product underdelivering. And when pressed about specifics, attempts to drown queries in jargon or an ass-covering retreat to treating it like it’s just a tech demo not intended to be used for anything ever.
It looks like it's capable of anything, so it's easy to point out that it's pretty bad at certain things, but that's an easy way to miss the things it's actually very good at. Also even if it's bad at most things it's still often good enough to be useful.
And it seems likely that with an order of magnitude better hardware it might be good at some things that it seems really bad at now. So yes, it's a tech demo for a lot of things that aren't quite ready, and it's also very useful as it is.
> And management consultants solving every problem in the world by adding a “AI” box in their powerpoint flow diagram.
i work at a large consulting firm everyone knows and am seeing this first hand. I'm not on the bandwagon until i see real money in the bank from large AI projects succeeding. It's not happening yet. I not a naysayer but am still very skeptical.
I think the problem is that they aren’t mind-blowing, but an improvement.
I resent some LLM implementations on principle, but decided to give these code helpers a try. What I found was they’re reasonably bad, and I kept telling them the solution doesn’t work, only to be presented with a little tweak.
So I don’t see the point of outsourcing my thinking, I’d rather remain intelligent and do the search/try/tweak on my own, instead of pretending a half-assed LLM is genius.
That doesn’t mean they don’t have good use cases, or aren’t an improvement on previous tech. But we definitely should stop calling them mind-blowing. Jaron Lanier had long ago predicted we’d willingly downplay human intelligence to pretend AI was… I.
> I think the problem is that they aren’t mind-blowing, but an improvement.
You seem to be doing what GP is pointing out.
GP's claim is that the relative improvement itself is mind-blowing, not that the tech is mind-blowing in an absolute sense.
I tend to agree: much of the detraction hangs on current-state rather than a probable potential-state informed by recent relative advancements.
In other words, many proponents are chuffed because of that potential; not because actuality. Likewise, skeptics are reserved because of the actuality, and not because of potential, for whatever reason (there are at least a few main ones, I think).
See, I'm very skeptical that we can guess at the potential, because so much of it depends on research and figuring out better ways to do things going forward, and successful research programs are very unpredictable.
I feel like people are taking Moore's Law, which is definitely a real thing, and thinking that everything else is going to advance like semiconductors did, and I just am not seeing it in any other field. It's not true in software development (where gains, such as they are, are more linear than exponential) its definitely not true in rockets or civil engineering or steel or anything like that. So I am afraid a whole lot of people are expecting Moore's Law type improvements in AI, when really AI advances more like punctuated equilibrium: a sudden dramatic improvement, then a long period of consolidation and stasis, then another sudden dramatic improvement, often in a totally different unpredictable area.
But I've just been keeping tabs on AI since the hot way to do it was Expert Systems back in the 1990's, and I'm aware of its history since Norbert Weiner wrote Cybernetics back in 1947, and this seems to be a repeating pattern: a single major breakthrough (in this case, honestly, the combination of large quantities of data with NN's- with driving and natural language being the two easiest to get, and so the most prominent examples) followed by a lengthy fallow period where not much appreciable progress happens, then another breakthrough, often orthogonal to where earlier breakthroughs happened.
They do suck unconditionally. They're like having a mediocre person working with you that you have to verify and validate all the time that sucks up a ton of money and electricity and generally does bad things to the environment.
Edit: I haven't even started on the social and political impact this has either.
You missed the point. And you're also completely wrong. Copilot is extremely useful and in general a huge timesaver.
Yes if you tell it to write an entire program it will get it wrong and you'll spend some time verifying things. But that's not a sane way to use it. As a very clever auto-complete it's fantastic. It's also pretty great at getting past "blank page syndrome". Even if what it spits out is wrong it's still helpful to get you started.
I don't know if I'm doing something wrong, and admittedly I have only tried the free version of ChatGPT (3.5), but it basically never works for me. Not even for relatively simple things.
From my past history:
"Is "192.168.1.4" included in the subnet "192.168.0.0/16"?" -> No
"Check whether a widget overflows in flutter" -> returns a function that cannot be made to works even with a lot of massaging (uses stuff that does not exist)
"Write a parser for this multiline format in C++ (describe format)" -> parser only read first line
Admittedly a trick one: "Can you give me a C++ function to merge 2 uint32_t and one uint16_t into a unique uint64_t?" -> happily gives an answer
Sometimes it is salvageable, and sometimes it can provide ways I did not consider to solve a problem (though the proposed solution is usually broken), but usually I would have been faster to do it myself than to try to fix whatever it gives me.
I have basically given up on it, except for some generic "how would you solve problem X?", and when I see people talking about it, it feels like a totally different world.
> "Is "192.168.1.4" included in the subnet "192.168.0.0/16"?" -> No
ChatGPT is not good at numbers or complex maths like this.
> Check whether a widget overflows in flutter
I mean this would be closed as unclear on StackOverflow, but again, this is basically asking ChatGPT to write an entire function. It can do a good stab but it's not going to get it correct.
Copilot isn't for that sort of thing. Let me give you a more realistic autocomplete example from my code:
It completes `, cpp).expect("error writing main.cpp");` which is actually exactly what I had. I may have used Copilot to write that; I don't remember. The point is it is 100% correct and saved me writing all that. Traditional autocomplete can't compete with that.
However even for "do it all for me" queries it can still be useful. For example I asked:
> I have a C++ process paused in a debugger (lldb). It is consuming a lot of memory. Is there any way I can see what is using the memory? E.g. a heap profiler that can attach to the process?
It got it wrong and told me to run the process under Valgrind. I said:
> Those instructions aren't for attaching to an existing process.
And it told me about the Heaptrack project with can do exactly what I want. I can see why search engines are integrating this asap. Even though it hallucinates, it does that infrequently enough that it's still really useful. I mean it hallucinates frequently but even so it is very useful.
Perhaps a better example is this:
> Write a C++ program to run a child process and send data to its stdin and stdout while it is running.
After 2 more prompts I got it to output the code below. This code is not correct. But the fixes to make it work are relatively minor, and EASILY less work than writing it all from scratch.
In a few years we will look on people that don't use Copilot (or similar) like people that don't use IDEs. They're hurting their own productivity out of principal.
-------
#include <iostream>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <vector>
class ChildProcess {
public:
ChildProcess() : pid(-1), pipefd{-1, -1} {}
~ChildProcess() {
if (pid != -1) {
close(pipefd[1]); // Close write end of the pipe
waitpid(pid, nullptr, 0);
}
}
bool create(const std::string& command, const std::vector<std::string>& arguments) {
if (pipe(pipefd) == -1) {
std::cerr << "Failed to create pipe." << std::endl;
return false;
}
pid = fork();
if (pid < 0) {
std::cerr << "Failed to fork process." << std::endl;
return false;
} else if (pid == 0) {
// Child process
close(pipefd[0]); // Close unused read end of the pipe
// Redirect stdin and stdout to the pipe
if (dup2(pipefd[1], STDIN_FILENO) == -1) {
std::cerr << "Failed to redirect stdin." << std::endl;
return false;
}
if (dup2(pipefd[1], STDOUT_FILENO) == -1) {
std::cerr << "Failed to redirect stdout." << std::endl;
return false;
}
// Convert arguments to a C-style array
std::vector<char*> args;
args.reserve(arguments.size() + 2);
args.push_back(const_cast<char*>(command.c_str()));
for (const std::string& arg : arguments) {
args.push_back(const_cast<char*>(arg.c_str()));
}
args.push_back(nullptr);
// Execute the child process
execvp(command.c_str(), args.data());
// execvp() only returns if there's an error
std::cerr << "Failed to execute child process." << std::endl;
return false;
} else {
// Parent process
close(pipefd[1]); // Close unused write end of the pipe
}
return true;
}
void write(const std::string& data) {
if (pid != -1) {
::write(pipefd[1], data.c_str(), data.size());
}
}
std::string read(size_t numBytes) {
std::string output;
if (pid != -1) {
char buffer[numBytes + 1];
ssize_t bytesRead = ::read(pipefd[0], buffer, numBytes);
if (bytesRead > 0) {
buffer[bytesRead] = '\0';
output = buffer;
}
}
return output;
}
std::string readLine() {
std::string output;
if (pid != -1) {
char buffer;
ssize_t bytesRead;
while ((bytesRead = ::read(pipefd[0], &buffer, 1)) > 0) {
output.push_back(buffer);
if (buffer == '\n') {
break;
}
}
}
return output;
}
private:
pid_t pid;
int pipefd[2];
};
int main() {
ChildProcess childProcess;
std::vector<std::string> arguments = {"arg1", "arg2"};
if (childProcess.create("child_process", arguments)) {
childProcess.write("Hello, child process!");
std::string output = childProcess.read(1024);
std::cout << "Child process output: " << output << std::endl;
std::string line = childProcess.readLine();
std::cout << "Child process line: " << line << std::endl;
}
return 0;
}
No I didn't miss the point, not even slightly. You didn't read mine and are blinded by the insane virtues.
Do you really think it's fine blowing 400 watts because you can't be arsed to think or do not have the creative intelligence to get over the blank page syndrome and have to lean on a crutch?
> Do you really think it's fine blowing 400 watts because you can't be arsed to think or do not have the creative intelligence to get over the blank page syndrome and have to lean on a crutch?
I’m not sure about this pre-LLM AI being disappointing take. What are we considering as AI? AI encompasses all manner of fields. The face detection in my iPhone, nest doorbell works fantastically. The DSG in my car learns how I shift, fraud detection in finance is heavily driven by ML and typically very effective. It may not be helpful to you, but product recommendations and targeted advertising are ML driven and incredibly effective. They may not be exciting uses, but they’re incredibly helpful.
Eh I think people's expectations for things they don't understand are always overblown.
One thing I always find funny is the general expectation that machine learning models are both incredibly generalised and designed based on the way biological systems work, but should also be 100% perfect and never be wrong just like a machine and NOT like a biological system, those things are mutually exclusive; even the best, smartest most physically capable humans will still sometimes spill their coffee, yet we expect coffee-bot 2024 not to do this.
Certainly machines can be much better at a task than humans are, but if that tasks requires generalisation then it's still gonna fuck up from time to time.
Machine learning programs may think a dog is actually a cat sometimes, but afaik they ain't ever called their teacher "Mum" yet.
>> For those of us used to the continual disappointment that pre-LLM AI was, the current crop of LLM's are amazing, mind blowing things.
Aren't you overgeneralising a bit? Not even I would say that CNNs for image classification, or Deep-RL for board game-playing are a "continual disappointment" and they certainly predate LLMs. Are you talking about NLP? Even Neural Turing Machines were quite capable in language pairs with large parallel corpora (and similar linguistic structure).
Basically, what do you mean by "continual disappointment"? What I'm aware of is an incessant hype crescendo that crashing over everything like a relentless wave.
Pre-2022 most AI was disappointing not because it was unimpressive but because it near exclusively took the form of blog posts or university papers that announced something interesting but unusable. Either because it was merely an academic curiosity (AlphaGo) or because the tech wasn't being released for public usage at all, and wasn't really easy to replicate externally either, or because there was no obvious way to apply it to normal problems. So we all got used to this constant year-in-year-out stream of "amazing" "breakthroughs" that ended up being a thats-cool followed by a shrug.
RLHF trained GPT3 and then DALLE/Midjourney/Stable Diffusion changed all that. Suddenly AI not only got good, but the field broke loose of the inane and insane obsession with pseudo-safety that had been holding it back. Now the rest of us can use it without dropping $5M on a GPU cluster and hiring a dozen researchers first. AI is no longer a disappointment.
Thus, the price of Github Copilot is going to go up, 2x to 5x.
For all of the criticism of Github Copilot, for a lot of developers (but not all), the value Github Copilot is incredibly high, much more than $20/month.
The current rock bottom pricing is low compared to the value provided for those users.
As such there is a big opportunity to multiply the price here being charged. Probably an increase between 3x to 5x.
The excuse is because they are losing money, but the underlying reason is that the value it is providing is so high in terms of developer productivity.
Are we talking about Copilot in particular or AI code assistants in general?
The value, for me, is extremely high.
My teammates feel the same. Our shared opinion/experience is that ChatGPT 4 is better than Copilot in general but Copilot shines in-editor because it's aware of your project. So we use both in tandem. They mostly use Chat GPT and I split about 50%/50%. (Note: I'm using the Copilot X beta which I believe uses GPT-4)
People say they're "only good for boilerplate code" but well, that's the vast majority of what anybody is writing IMO.
If I need to traverse a tree or list or something, I'm letting AI write that code. Could I write it myself faster? No, and it's going to have an off-by-one error some non-zero portion of the time if I write it. I also find it's superior to e.g. memorizing all 10,000 CSS properties along with all the classes that pertain to Boostrap or Tailwind or whatever.
I see the AI code assistant hate here and it just baffles me. It's so obviously useful to me, and I really can't imagine I'm that atypical.
Edit 1: AI help is especially pertinent if you are a "full stack coder" who is working on everything from database to frontend. Since frontends really multiplied in complexity about 15 years ago, I have not met a single "full stack engineer" who is truly fluent and expert in the entire db->app->frontend stack, because complexity and choice has proliferated at each of those levels.
Edit 2: While most of us are (hopefully) not literally writing tree or list traversals by hand in our actual daily programming lives, I hope my meaning is still clear -- I'm talking about that mundane sort of code, iterating over things, etc.
> If I need to traverse a tree or list or something, I'm letting AI write that code. Could I write it myself faster? No, and it's going to have an off-by-one error some non-zero portion of the time if I write it.
Many languages/companies have existing well understood solutions that _won't_ have errors. Maybe that is the disconnect? I can't remember the last non-interview time I had to write a non-trivial traversal.
Many languages/companies have existing well understood
solutions that _won't_ have errors.
I admit: I chose poor examples in my above post.
In a literal sense it has been years since I wrote a tree or list traversal by hand and I would be very surprised and concerned to see a PR where somebody is doing it by hand rather than using a library.
But, I hope my meaning comes through despite that. I mean the sort of mundane "iterate through a thing, and do a thing with some of the things" sort of code that many/most of us are writing on a regular, hour-to-hour basis.
Maybe that is the disconnect?
Maybe! Another disconnect might be the level of polyglot one is expected to be.
I'm generally a "full stack" web developer (currently switching between Python and Ruby on the backend) and I don't mind admitting: front end crap changes fast enough that I can't possibly keep up with it. In my experience nobody is expert in the whole stack. Altogether it's just a really big surface area of Shit I Need To Know. AI is very welcome here for me.
Other coders might have a smaller surface area of shit they need to know, and they already know it inside and out, and therefore see no real value add from an AI buddy who is not correct and optimal 100% of the time.
Having used Copilot for a year now I very much doubt this figure. In my experience it only works well in boilerplate kind of situations where most code is copy/paste work anyhow. As soon as the code gets a little complicated it stops working well. It has also gotten quite slow for me lately. So I doubt it increases my work efficiency by more than 5%, but I do like it for reducing strain on the hands. For that I find the price appropriate.
> In my experience it only works well in boilerplate kind of situations where most code is copy/paste work anyhow.
As a data point, this matches my personal observations. But reducing the time spent on that bolierplate plus not needing to search where the "copy" portion comes from, may justify $30/month (and probably much more than that). My 2c.
The quote from the article says "more than 1.5 million people have used it and it is helping build nearly half of Copilot users’ code"
Not a native speaker, but to me this sounds much more ambiguous than up to 50% of code is produced by Copilot.
Also, how different from previous solutions is this actually? I use autocomplete and code snippets extensively. Never measured it , but I wouldn't be surprised if my IDE had generated more source code than I myself typed over the last 10 or so years.
It doesn't sound ambiguous to me. It says that those people have Copilot enabled while they write more than half of their code. AKA it's on the editor they use for most things.
I mean you could have made a similar argument about the productivity benefits of smoking cigarettes 100 years ago. Just because a lot of people are doing something doesn't mean it's valuable or that we have an accurate picture of the cost/benefits. The verdict is still very much out on LLMs.
Any new product that gained +1M paying subscribers (Github Copilot) in its first year is a success. You cannot like it, that is allowed. And it may not help you, but +1M subscribers is a lot of people.
They are definitely here to stay until they are superseded by even better technology or it gets sued into oblivion.
They don't give any source or anything; the way it's worded sounds ambiguous enough that I suspect it's not that 50% of their code is generated by Copilot. Is there something a bit more convincing elsewhere?
I feel like part of the productivity is just moved to reviewers. Every once in a while, I have to review large PRs and find out which comment is AI-generated nonsense.
LLMs are really good at fooling us into thinking that code comments make sense.
That's why I prefer if devs don't use copilot, but instead cherry pick good output from ChatGPT.
This is why I'm concerned about AI. It enables people who don't value their colleagues time to value it even less. Be that code or contributing to a discussion with unedited and false LLM output "as a starting point".
Earlier this year, we had this one developer who has worked with us for a while (8 months or so). He was never a top developer, but he was average.
Anyway, I have to review all of his code, and I've been doing this the whole time he worked for us. But, starting around the beginning of the year, I started to notice problem after problem coming through his merge requests (MRs). At first I gave the guy the benefit of the doubt, like I said this was unusual for him. He always needed moderate coaching, but the mistake ratio was way higher than usual.
Anyway, I did some basic coaching and whatever. I talked to him about the increase in errors and he said he was stressed with moving to a new house at the time. It made sense, so I decided not to even record the first performance review with HR.
But the problems kept coming and getting worse. It was getting to the point that he had so many problems, I wouldn't even address them all. I would only kick back the top dozen or so. I was noticing that his code had entirely different styling and voice across methods like it wasn't written by him. I also started to see weird lines that were trying to catch unusual edge cases that would never even apply in our scenario. There was one time where there was this insanely complex filter that fed into an insane regex that I couldn't decipher. I asked him about what it is even trying to do and he had no clue, he couldn't even explain the purpose of the line, let alone how it worked. I pushed him on where it came from and he said "StackOverflow". But I reversed searched it and couldn't find it.
We ended up doing performance reviews almost weekly for a while, and now I was formally writing them up. I told him that he was more of a burden on the company at this point. He was offering a negative value. I was getting stressed out because I was spending so long fixing his code and coaching him, that I was working longer hours because of him. Furthermore, I could have fired him and not even replaced him, just taken over his job myself and it would have been less work than this back and forth of trying to repair his work.
Eventually I did fire him. After he left and I talked to the team about him being let go, one of his friends told me that this guy had been using ChatGPT and Copilot for all of his work. It was a secret because our company doesn't allow AI tools in our codebase right now because of compliance reasons. Which is why he would rather say his mistakes were "copypasta from StackOverflow" as he always put it, than admit it was copilot.
That's not to say that CoPilot is bad. But its not ready to replace developers just yet. Even junior developers still need to wrangle and check the work coming out of the AI. And yes, people do notice. Even after this employee polishing the work coming out of the AI model, it still wouldn't work in our codebase.
And to the original point. Yes, these AI tools really are just passing increased burden onto the experienced developers. The juniors are using these tools as crutches to speed things up so they can watch more YouTube videos. But the code gatekeepers who have to defend the application are the ones with increased burden and workloads of fixing the mistakes the AI is generating. I've been saying for a long time that AI will simply separate the good from the bad in software development. For the past 15 years you could make $150k or more a year as a developer that is only capable of producing sub-par code. But those days are gone. The sub-par code is being produced by AI now. So you need to be at least mediocre now.
At some point, the open source models you can run locally are going to start to get competitive. I suspect the cost problem came from upgrading to ChatGPT 4 over ChatGPT 3 - right there their costs more than 5x.
Did they? I'm surprised to hear that. Hard to tell for autocomplete of course. For conversational answers, inference speed is _far_ higher than I'm used to from GPT-4, and lower quality. As in, it matches GPT 3.5/ChatGPT.
I don't think so. It helps increase productivity but it's just too inaccurate and generates too much stupid output.
Not that I could do better but it's just not good enough yet.
I doubt the price will go up that much, if at all. Maybe there will be more expensive tiers, but $20 will probably stay.
Between better quantization, pruning, smaller models through better training/distilling/architecture, hardware price drops and purpose-built hardware, it will probably be much cheaper to run such a model in the future.
The moment I can train Copilot on my codebase and improve its predictions, I’d gladly pay more than $20. Even now, I find that I can’t live without it simply as a slightly more intelligent auto-complete.
I've used some prototypes that tried it. One of the things it does it regurgitate the old patterns you want to stop using and not the new stuff you do. Basically it's a copy pasta tech debt generator.
That's a problem with all LLMs. They will average what they are trained on, without any reflection what is good and what is not. They shouldn't be called intelligent (AI), they're information meatgrinders.
>That's a problem with all LLMs. They will average what they are trained on, without any reflection what is good and what is not. They shouldn't be called intelligent (AI), they're information meatgrinders.
This is an issue with your prompting not the models.
If you tell someone "Do this thing" they'll just do it, LLM's too. And how it is done will probably be terrible.
If you ask someone "come up with 5 ways to accomplish this, compare and contrast between them, list pros and cons, think through best practices, maintainability, readability, security, performance and cost. Come to a final decision and then do it" their response will be different. So too will the LLM's. You can even have the LLM present its opinions and recommendation but pause and wait for your choice on the go-ahead.
Fair enough, let's put aside that in Copilot, prompting is automatic.
Even if you do prompt it correctly, and it responded in the abstract with all these pros and cons, you cannot be reasonably sure that the code it also provided actually follows these best practices. It will just anytime mindlessly wander from "the best experts on the Internet are saying this" territory to "I just made this up" territory.
Compare that with googling a human-written stack overflow answer on the topic, there usually is some good soul who pointed out the inconsistency, if there is one.
Look, I like grinded meat. But the fact is, without a detailed analysis, it's hard to tell what's actually in it.
>Even if you do prompt it correctly, and it responded in the abstract with all these pros and cons, you cannot be reasonably sure that the code it also provided actually follows these best practices. It will just anytime mindlessly wander from "the best experts on the Internet are saying this" territory to "I just made this up" territory.
And?
As a Sr. Engineer, this is exactly how all the slop my Jr's and Mids send me looks. Copy pasted, Stack Overflow, and when I read through Jr code my mind is boggled. I want to shake them "did you even read your own code. do you even know what your code is doing? Why did you do this?" Forget style, best practices, etc,
In my opinion, LLM's produce code that is as good or better than most Jr engineers and as a Sr it is my responsibility to audit, review and test all code. As a Sr level engineer I spend 90% of my time judging/fixing/improving others code, and less than 10% of my time writing my own.
LLM is just another source, and unlike the Jr, I can quickly ask it why it did what it did or to refine it. You ask the Jr to iterate on the project and you won't hear from them until they mention a blocker at tomorrow's stand-up (or you just pair it out and spend 2 hours teaching them, while the LLM turns it around in 15 seconds. There is value in teaching of course, but we build quickly, too).
They are great information meat grinders, the best. /s
Prompting allows you to direct the model into a different way than the training data. If it were not so, LLMs would never solve problems that were not explicitly in their training set.
I've actually had the opposite happen to me several times - of trying to get Copilot to help me fix something I wrote that isn't quite working, and instead of giving me an alternative solution, it just returns my bad code.
I have mixed feeling on this. On the one hand, sometimes it's able to guess correctly and save me a couple of seconds, up to half a minute I think. On the other hand, very often it's completely wrong, so I needed to constantly verify the suggestions and I found out it interrupted my flow and in the end I'd code faster without it.
I tried Copilot in the early days but stopped using it pretty quickly as it just wasn't that useful in an already established codebase. Often times I'm not writing completely new code but rather implementing a pattern that I've previously defined somewhere else. In this case Copilot just completely falls over because it doesn't know about that existing code. My other gripe is language specific but in VSCode it breaks Typescript hints, often times I'm writing something like `MyCustomObject.` and expect Typescript to help me autofill the field, this breaks with Copilot as it always tries to suggest some other code.
The second issue I could work around but combined with the fact that it doesn't have context of my current codebase, my impression was that it would be a fantastic tool for someone going to school for CS but practically useless in a professional codebase.
Copilot pulls in similar code from your current project as part of its prompt.
If you understand how it works you can sometimes lay out your code in a way that makes it more likely to include the relevant examples to get the effect that you want.
There is no guidance or documentation for this at all!
It's very interesting to me that you say that because it hasn't been my experience. I am working in a ~20k-50k SLOC codebase with a few common patterns, and it seems to have picked up on most of them. It's not always right, but even when it isn't, I can write a comment above an empty line to nudge it in the right direction.
I have found it surprisingly good at some pretty sophisticated “grunt work”. For example, I’m converting some code to TypeScript. Looking at a long and twisty “getSomething” JavaScript function that builds up an object one field at a time, it can complete “interface Something {“ with all those fields, with types! It might duplicate or miss a field sometimes but it makes fewer mistakes than I would.
I’m pretty sure it pulls context from at least the other open files in VS Code, if not the ones on disk. Hard to tell for sure.
It’s kind of like having an eager, well-read, but naive and over-caffeinated intern as a pair programmer.
Could it be the language and/or the editor's integration?
In neovim w/ a medium-sized go project, I"m finding that it's really good at providing meaningful and at least directionally-accurate contextual suggestions. It seems that the plugin is providing a good bit of context, but maybe the vscode one doesn't?
Yeah, I really, really want to be able to configure copilot to explicitly include directories to scan in addition to its built-in corpus of knowledge. That would increase its worth to me a lot.
It's amazing and all, but can be so subtly wrong. I let it autocomplete a function for generating time zone offset. It sneaked a minus sign in there so central US offset would be +0600 instead of -0600. My manager caught it during code review. We both are proponents of copilot but it was still embarrassing for me.
Sounds like something a few tests would have caught. I've found my GPT-4 coding use to be much more satisfactory when I have it generate a few dozen test cases after finishing (either my code or its code), and then auto-repair based on any errors.
I would love for Copilot to receive lsp information together with the raw code, I think that could already provide it with enough context, especially in strongly typed codebases.
They just need to keep it going until GPUs get cheaper. The point is to build/protect the monopoly until that happens. Then costs come down and they start printing an absurd amount of money.
The problem is that they don't have a monopoly. Not hard to see how CodeLlama and other future open source models can't stand-in here for GPT-4 at dramatically lower cost.
well, and straight caching. If they know that 80% of people ask the same 10,000 questions without a back and forth dialogue, it's not hard to just write a front end for that.
They are proving the market to generate demand to invest to vertically integrate, which then drives down costs while revenue remains flat or (hopefully) increases.
I suspect many would pay far higher than $20/month to use an intelligent LLM. Just that too many players are subsidizing free use so its non-competitive to charge that much.
Eventually reality will force cost to the consumer towards cost of serving. Cost of serving LLMs will decline alongside this, so maybe the magic number stays at $20
Unless even their paid api usage is subsidized (and it very well could be for all I know) they can’t be losing that much.
But it doesn’t matter, I’m they never would have gotten the valuations they did if it weren’t for the insane hype around the (free) ChatGPT offering. It more than paid for itself just in that regard.
OpenAI is amassing an ungodly amount of though, there’s efficiency efforts that have been made for sure but scooping all that data up is what they need to train for gpt5. Those chat logs are worth their proverbial weight in gold.
I think it’s a good bet based on watching inference speed of llama.cpp consistently improving and model ability / size on a similar trajectory. I expect there’s similar room for optimization (probably more) with hosted models. If you don’t care about code/model privacy (I do but it seems like most don’t yet at least) there are even more batching/caching tricks to engineer.
Copilot is useless compared to chat-gpt4. There's really no comparison. As far as the costs, it seems like transmitting things repeatedly would be more expensive than the chat model where you just transmit things when you want them.
The underlying model might be as good, but how you interface with it seems to deliver wildly different results. I can easily get chat-gpt to give me entire blocks of code. Copilot just trickles out small bits. Copilot is also very distracting because I find myself constantly waiting to see if what it's going to do next is helpful or not, whereas just typing into chat gpt I know I'm going to get something in the direction of what I want.
For me it is very different use cases. Copilot often prevents me from needing to write mundane blocks of code, as it will often fill in what I'm looking for after I write the function name.
If I'm still unsure how I might approach a certain problem, or if it isn't immediately clear to me how I'd write the function I want, I might type in a prompt to ChatGPT and see what it comes up with. But it would really slow down my workflow if I had to prompt ChatGPT for every mundane function I plan to write.
I suspect the cost problem arose specifically upgrading to ChatGPT 4 over ChatGPT 3 - right there their costs increased more than 5x. So before it was likely under $5/month on average, but once you changed to ChatGPT 4, it jumped to like $20/month on average.
It's not GPT-4, it's gpt-3.5-turbo (or a variant there-of). Source: I'm sitting in the audience of a talk about it at AI Engineering summit right now, the speaker confirmed it as gpt-3.5-turbo, switched from Codex.
Is that actually true? I remember Copilot-X being containing the GPT4 version and I'm pretty sure that wasn't out back in April. I don't even think it was out this summer
Interesting that you say that. I had the opposite thought that it doesn’t seem to have improved much over time, but I think my perception might be influenced by my habit of ignoring large suggestions and only looking at the results when it fills the rest of the line I was typing.
I’m also using it with intellij instead of vscode, so for all I know I could be using an old version still.
I disagree. I have both but I use Copilot a lot more. Editing the results of ChatGPT takes as long as it would take for me to write the code myself. Whereas Copilot is autocomplete, you can choose whether to accept it or not line by line.
How are you prompting ChatGPT such that you get useful results in a real world code base? It doesn’t have any context. The context is what makes Copilot work.
By default, ChatGPT can only give a generic answer. Do you just paste the entire file in there?
For example let’s say I want to write a function foo() that calls functions bar() and baz() defined in the same file and uses a library Y that I already imported. If I just write the name of the function, Copilot will often autocomplete a reasonable body for me. If I wanted to use ChatGPT then I would have to first tell it about foo() and bar() and the dependency on Y, and by the time I’ve finished telling it about all of that I could have written the function by hand twice over.
I feel like the centralisation of server resources for something like copilot doesn’t really make sense. Many (most) professional developers are working on beefy laptops. If ever there were a case to run these models client side, a software developer’s laptop is probably the ideal place. What are the specs that are needed to run inference on these models?
There's a huge range of CPU and memory requirements to run these models. The lower end of the range can be run locally but performance is noticeably worse. I'm guessing this will change as more people are looking at efficiency now. Specialized hardware like Apple's Neural Engine on the MacBook Pro may also help.
> I feel like the centralisation of server resources for something like copilot doesn’t really make sense.
For whom? For those selling them it absolutely does, because that's how they get to charge a huge ongoing markup on hardware and electricity costs. That's why everything is a service now.
I don't think most dev laptops have 8GB+ of GPU memory, which (based on ollama requirements) seems like it's on the low-mid end of the requirements. I've tried experimenting with some local models and:
a) They're much slower on my 6GB laptop GPU
b) The seem to not be as good, functionally
b) I can't make use of larger models
I haven't done more than just some experimentation but I can see how this would make sense to put on a server.
MacBooks especially pros are in a very privileged position due to their memory architecture. With some dedicated hardware it may be reasonable to run, perhaps even train, useful LLMs on device.
I guess we’ll see in a year or two. This must be on everyone’s radar now, Apple won’t be the odd man out.
AMD is probably best positioned to make a laptop/desktop SoC with the same kind of unified memory. Though I wouldn't be surprised if Intel's little experiment in budget dedicated GPUs is also a byproduct of similar R&D.
Lisa Su was interviewed at Code recently, and the AI discussion focused on competing against NVIDIA in big datacenters, but I hope they're also thinking about client-side stuff like that.
I would hope developers value their time at least 100USD/hour. So unless the local option is less than 5 hours per year of maintenance, the subscription makes sense.
The main functionality I get out of copilot in vscode is autocomplete on steroids, so it's getting all code as I write it. I don't use the question based generator most of the time.
I'm probably not using it to the fullest. I mostly use it for boilerplate that I know it's good at, like `// Write a function to run the cpu profiler for 30 seconds and write the result to the filename in variable "foo"`. But I guess if it knows your codebase, then you can say `// Write a test to ensure that Foo accepts a nil FooRequest.` or something. I should figure out how to set that up.
All in all it saves me a little bit of typing a couple of times a week.
You should try it on vscode for a moment or try to find a plugin for emacs.
I'm often writing legal rules in code and from the name of my function, it predicts the way the law is written. It's pretty incredible how well it works. Clearly it's just stealing someone else's work from github, but the autocomplete is very awesome.
GitHub Copilot is not using GPT-4 (yet)! I'm seeing a lot of people under the wrong impression.
For code completions it uses an improved version of Codex[1], and for chat beta (part of Copilot X), it uses 3.5-Turbo[2].
However, they're claiming to have "early adoption of OpenAI's GPT-4" for Copilot X on their marketing page[3], which is confusing/deceptive. To be fair, it's still in beta, but they should state outright what model you currently get if you sign up for it.
So the service costs MSFT an average of $30/user/month. So if they want to make a business with decent margins out of this, they're going to need to charge $60/user/month or more.
That's not too much if it's adding a lot of value but it's also definitely a lot of money. It would be the most expensive SaaS my team pays for (and we have a lot of subscriptions).
Also, that is assuming that the group willing to pay $60/month includes many who use the service less than average. It's possible that only heavy users would stick around after a 6x price increase.
As users' codebases grown over time, the average usage of users will probably continue to grow. By a static price increase alone won't be a great model. They need to make their models and infrastructure more efficient first.
Or create a pricing model where the monthly sub covers a certain amount of tokens and then beyond that is pay-as-you-go. Sucks for users, but a scalable model.
The problem is that the industry has't figured out how to properly pricing LLM applications yet.
Before LLM, it's perfectly possible to spin up a SaaS on 5$ Digital Ocean VM and charge $4.99 per seat monthly. If you're using low overhead techs like Go and SQLite you might get away pretty far with a decent user base.
But LLM is inheirently costly compared to those traditional apps. No matter if you're calling OpenAI or DIY your own GPU cluster it's gonna be way more expensive. Spin your own GPU might ended to be more expensive because utilization problems and upfront costs.
The subscription model was kind of the silver-bullet for SaaS but it's probably not going to work well in the AI era.
OpenAI, Elevenlabs, Runway, and Midjourney: they have subscription model but the quota is strict and tight. The "unlimited" plan is simply pay-as-you-go.
Early wave of LLM products with unlimited subscription models like Github Copilot and Notion AI are probably pricing way too low. $7 or $10 is way too low to support heavy usage.
But charging $50 might scare most user away because it exceeded people's expectation for SaaS. And probably still end up losing money. And hobby users may ended up paying too much for the core users - that will lead us back to sophisticated pricing tiers like Elevenlabs and Runway.
Are there alternatives? I dunno. Maybe implement bring-your-own-key properly? Like OAuth but for LLMs? It's definitely interesting to see how things will turn out eventually.
I have a small cloud based Ai image processing service running that's still running (we did it before it was cool). The processing is currently done locally on a computer in my partners closet, but scaling it to that point, whilst ensuring it runs smoothly for multiple users with reasonable processing times was no joke.
The article says Microsoft wants computers to one day have Neural Processing Units (NPU) like most have GPUs. If that led to faster suggestions for copilot, I'd be all for it.
Right now, a GPU is an NPU. Does anyone how an NPU would differ from a graphics card?
> Does anyone how an NPU would differ from a graphics card?
Compare NVIDIA to Groq TSP (Tensor Streaming Processor)
This is how an AI chip should be designed, you write the neural net compiler first, then design the chip. These guys have combined a number of smart ideas in their product: synchronous operation makes memory and network operations simpler to plan and optimise; all software defined memory access, no caches; a simple set of primitive operations that can express every model architecture - so they can simply make it pytorch compatible, no need to write kernels, no need to have 100 versions of 2D conv, for all sizes and shapes.
GPUs are a SIMD architecture designed for 4x4 float32 matrix multiplications found in video games.
But NPUs (like Googles NPU) are systolic arrays designed for 16x16 or even 256x256 float16 or even int8 matrix multiplications instead.
-------
NVidia builds larger matrix multiplications out of the 4x4 float16 base that a SM is designed for. After all, a 8x8 matrix multiplication is just four of the smaller 4x4 matrix multiplications.
Yes, it's eight multiplication with the naive algorithm. You can get to seven with a clever trick (Strassen's algorithm). As far as I'm aware, with these precise numbers (8x8 matrix decomposed into four 4x4) it's unknown if we can do better.
I think Meta (facebook) is already working with android phone manufacturers to include chips with specific instruction sets to support the llama architecture.
probably intel is working on the same. I assume the next generation of mobile chips (laptop or phone) will all include special LLM processors.
I've done my own startup calculations for this. You shouldn't provide a heavily used openai wrapper below 30$/month/user. That will be likely a loss leader. At 46$/month it starts getting profitable for me.
A lot of people are commenting about how gpt4 is better than copilot, both in terms of getting it to do exactly what you want and in the quality of response. I've also found this and so I wrote a simple VS Code coding assistant that uses gpt4 and let's you be direct in asking what you want it to do: https://github.com/jpallen/biggles. Maybe it can be useful to some of you too.
It also supports voice, which is a fun way to code!
I use Copilot... Or rather I pay for it. I pretty much always end up using ChatGPT4. I honestly haven't had a very good experience with Copilot. It never "learns" anything. It doesn't remember where my imports are, it gives me the wrong paths all the time. Sure occasionally it will write a nice comment for me when I make a new prop, but other than that it's not as good for writing code as GPT4, and its auto suggestions are pretty weak. Wondering if others have different opinions.
Does anyone know vscode extension that can use OpenAI API to perform code completions like Copilot?
There are several ChatGPT-like UIs that you can self host and pay only for API and not for ChatGPT plus. For example, I'm using https://github.com/Yidadaa/ChatGPT-Next-Web. It would be nice to use Copilot in the same way.
continue.dev can complete code as well, but it is not a mere code completion plugin like copilot, it is more like an interactive chatbot where you can edit and talk about parts of the code which you have selected.
FWIW, a GitHub employee just said on stage at the AI Engineer Summit that this claim is "not true" and that it's a roughly $100m ARR product (first keynote speaker in this stream: https://www.youtube.com/watch?v=qw4PrtyvJI0)
Correct, but he also explicitly said the reports that it was losing money were "not true". Given that he's a VP at GitHub, I would consider him to be more authoritative on the matter than the WSJ's "multiple sources."
This is not surprising, the chat function is mind-blowing. It knows about your code, it is helpful, explains concepts, and is rarely wrong. It is the first time I'm genuinely impressed about an AI tool, and makes learning new frameworks very easy.
Are you sure it's the same people? That said, even if it is, there's something to be said for intentionally sharing a constrained set of code vs using something that spies on you in real life.
However. It will eventually, chunk by chunk, upload the full source code of the app you are writing. Complete with all typos and mis-pastes. Like youtube blocks videos with just a hint of copyrighted music, they may be able to detect use of "patented" algos or something like that and block you/ send a lawyer/etc.
Inded, some folks are so worried about their code being sucked up by an AI bot they forget they ship every single revision and byte to MS via github. If MS wanted to be devious with the data entrusted to them they could have done it already.
I suppose it happens, but I suspect the set of people who distrust github/MS enough to not use them for git hosting but still use copilot is quite small.
Indeed. Our company looked at the IP issues and issued a very strong "do not put any company IP in an AI" edict.
Have you double-checked the terms of service? Are you absolutely certain that there's no risk of your copilot use impairing your intellectual property rights in the future?
Given what happened to Unity, how comfortable are you with becoming dependent on copilot and then having the rules change underneath you?
HN has never been a privacy-minded crowd. It's been a collection of random people, and depending on the threads of topic, you'll get more people speaking out than others. See the Dropbox post from 2007. Nary a privacy concern. HN isn't a single voice.
Why do you think it's a privacy-minded crowd? Because you probably react and respond to similar posts, and you see people with a like-mind there.
> How came that usually privacy-minded hn crowd uses run-by-someone-else models so cheerfully and carelessly?
Let's answer this question: they probably aren't doing it carelessly. Consider that Copilot is run by Microsoft, who owns GitHub and your code already. So, they already have the code. I'm not sending them something they don't already have.
Anyways, when I see something I didn't expect to see on HN, my initial thought isn't how it's wrong, but rather, what am I missing? And rather than post an antagonistic question, I instead try to research more to get a better understanding of what I'm missing.
My own code is on my own gitea instance in my home, exposed via a cheap jump server. My companys code is stored on-premise. Nothing touches github and no, ms does not already have my code.
What’s remarkable to me is that when copilot launched there were a lot of people on HN saying it was not good. Now it seems like everybody has adopted it and can’t live without it.
I would gladly pay for copilot 5 USD a month (its an specific use case IA), but the current price point 20 month is incredible expensive it only add costs if the developer you are hiring requires it to properly function... thats the only value it adds.
IMHO If you are a code monkey copilot makes you a 5x developer, 10x if you have 5+ years of experience on the business
Maybe selling tokens/search could be benefitial for copilot, like only pay for what i use will be an interesting approach.
$20 a month is incredibly expensive? If it saves 15 minutes over a month, it pays for itself (assuming your work is ~$100/hr). If it can provide any positive value, it is going to be worth more the $5 a month.
Maybe if you live around the bay area where you get paid absurd money by the hour for a job that can be done overseas for 10 bucks, outside or the rest of the world that 100 usd is per week tops, thats way too much i would pay for service that only barely helps my in my day to day...
If you need to constantly use chatgpt/copilot to do your job properly, maybe its time to learn something else because your job will be automated sooner or later, what will be more cheaper a 100/hr or the 20/month service.
are we talking about copilot or chatgpt? chatgpt4 provides more value than copilot since i can use it for other things, copilot is just for "code" and doesn't provide any value other than autocomplete prediction to me...
Maybe if you are website designer... ohh m sorry i should say the modern way website designers call themselves today "frontend engineer", maybe if you are a designer it could help you with all the boilerplate your react/astro/svelte requires just to add a <table> with two columns.
Capitalism cracks me up. We live under this fantasy that everything we do can be turned into a profitable business. Even the government should be run like a business, right? Like somehow educating a child for 12+ years should be a profitable venture. Or housing the elderly. Or providing a basic service like water, sanitation or electricity that everyone uses at about the same rate. Surely we can stand on each other to turn a buck.
I think that AI will be the biggest loss leader ever invented. We'll buy a robot one time for $10,000 that will perform $1 million of mental and physical labor over its lifetime. Except nobody will pay for that $990,000. There will just be this expectation of something for nothing, so the anticipated payment never comes. And it will happen so quickly, so completely, that we'll wonder how it was ever possible to pay people to do all this free stuff.
No, the GitHub Copilot model of charging for AI services isn't going to scale. People are going to open source AI and own it themselves and stop paying for goods and services. Why would they when the AI can provide everything they need? AI is the beginning of the end of money. The headline foretells the end of the era of artificial scarcity, because capitalism can't compete with free.
For some developers for sure, but not all. I am a senior developer with +20 years of experience and I find Github Copilot amazing. It significantly boosts my productivity. I do not expect the world from it, and I do run into limitations, but it helps me tremendously to be productive in languages and tools which I have low/no familiarity.
I think Github Copilot paired with ChatGPT fails the most in idiosyncratic code bases where you are doing maintenance or complex refactors. I think Github Copilot with ChatGPT shines in writing green field code in areas where you are will to use the best practices and the open sources libraries that Chat GPT recommends. Basically, if you align with where Chat GPT recommends to go, then it will help you a lot more than if you try to fight it all the way.
If you are an expert (or very good) with your tools, these AI products can save tons of time.
Due to hallucinations, beginners can get knee deep very quickly, but experts can see through this and use these tools will skill.
I don't write as much code as I used to, but I still write code daily and appreciate how they can turn what was a 10-60 minute task into 1 minute of prompting plus 30 seconds of generation/reading to make sure it is valid.
It even does an ok job with ‘rewrite this callback heavy code as async/await’ if you tell it which functions to use instead of ones accepting callbacks.
To clarify, I have been paying the 10$ since it was released last year. I used it in neovim, vscode and found it helpful.
Cut to few months later with OpenAI API release, so many code completers released in many useful form factors willing to go wherever I code even if its Android Studio or Xcode. The product has stagnated completely.
I do expect it to change significantly in the future though, might go back then.
I find that ChatGPT/Github Copilot knows certain types of patterns, libraries better than others. If you get it to suggest the path forward, and then do that path, it will be better at maintaining that code. Basically ChatGPT sort of has happy paths which you want to follow.
Or you could say it in another way: AI designed code / architecture is easier for an AI to maintain.
imo it's the form factor that works. ChatGPT is good for doing standalone things like generating a script. CoPilot is good for making incremental changes to an existing code base. Honestly, its REALLY good at it. One of my favorite parts is that it mimics the style of the surrounding code.