Hacker News new | past | comments | ask | show | jobs | submit login
GitHub Copilot (copilot.github.com)
2905 points by todsacerdoti on June 29, 2021 | hide | past | favorite | 1272 comments



To see all the 1100+ comments, you'll need to click More at the bottom of the page, or like this:

https://news.ycombinator.com/item?id=27676266&p=2

https://news.ycombinator.com/item?id=27676266&p=3

https://news.ycombinator.com/item?id=27676266&p=4

https://news.ycombinator.com/item?id=27676266&p=5

(Comments like this will go away when we turn off pagination. I know it's annoying. Sorry.)


I've been using the alpha for the past 2 weeks, and I'm blown away. Copilot guesses the exact code I want to write about one in ten times, and the rest of the time it suggests something rather good, or completely off. But when it guesses right, it feels like it's reading my mind.

It's really like pair programming, even though I'm coding alone. I have a better understanding of my own code, and I tend to give better names and descriptions to my methods. I write better code, documentation, and tests.

Copilot has made me a better programmer. No kidding. This is a huge achievement. Kudos to the GitHub Copilot team!


I’ve also been using the Alpha for around two weeks. I'm impressed by how GitHub Copilot seems to know exactly what I want to type next. Sometimes it even suggests code I was about to look up, such as a snippet to pick a random hex color or completing an array with all the common image mime-types.

Copilot is particularly helpful when working on React components where it makes eerily accurate predictions. I see technology like Copilot becoming an indispensable part of the programmer toolbelt similar to IDE autocomplete for many people.

I also see it changing the way that programmers document their code. With Copilot if you write a really good descriptive comment before jumping into the implementation, it does a much better job of suggesting the right code, sometimes even writing the entire function for you.


They finally did it. They finally found a way to make me write comments.


The real purpose of this tool.


Jesus. From a guy with your track record that means a lot.


Has anyone used Copilot with a more succinct language? It appears to only automate boilerplate and rudimentary patterns, which while useful in repetitive low signal to noise ratio languages like React or Java, sounds less appealing if you're writing Clojure.


I've not used Copilot but I've experimented with two other AI driven autocompletion engines in Java and Kotlin. In both cases I uninstalled the plugins due to a combination of two problems:

1. The AI suggestions were often less helpful than the type driven IDE autocompletions (using IntelliJ).

2. The AI plugins were very aggressive in pushing their completions to the top of the suggestions list, even when they were strictly less helpful than the defaults.

The result was it actually slowed me down.

Looking at the marketing materials for these services, they're often focused on dynamic languages like Python or JavaScript where there's far less information available for the IDE to help you with. If you've picked your language partly due to the excellent IDE support, it's probably harder for the AI to compete with hand-written logic and type system information.


I'd recommend TabNine, it is extremely helpful. I tried Kite once, and it is WAY overrated. So slow that by the time it provided me suggestions I was only a few characters away from finishing. Tabnine has saved me hours.


Good luck using type-based autocomplete to write entire functions for you.


Or the converse?

If Copilot is as good as it gets but only for some languages, won’t it influence what languages will be chosen by devs or companies?


Indeed, I could imagine it becoming more difficult to adopt a language that doesn't already have a large corpus to train on.


Could there be a boom followed by a bust? Sometimes a greedy algorithm looks good until it doesn't. It's at least imaginable that AI coding helps you do stuff locally that eventually ties you in knots globally, because you were able to rush in without thinking things through.

(It's also conceivable I'm put out of a job, but I'm not worried yet. So far I can only imagine AI automates away the boring part, and I super-duper-want that.)


I really wonder sometimes if Java would never have made it that far if it wasn’t for eclipse and later IntelliJ.


Well, there are few things many programmers enjoy better than automating away repetitive tasks. If not exactly intellij or eclipse, something that achieved the same end would certainly have arisen.

I'm sure there's at least a few relevant XKCD strips to insert here.


Would be an interesting fact so see companies adopt languages that need significantly more code to reach the same result just because some AI can automate a lot.

Thinking about generating 10 times the code you need, just because you can generate it instead of writing (perfomant?) code.


This is a good point and it will be interesting to see if something like copilot will get developers/companies to adopt a language that is better supported by AI.

Edit: You are honestly downvoting me for saying something that might actually happen. If copilot lives up to the hype, but for a limited number of languages, this can have a profound affect on what languages people might decide to use in the future.


Boilerplate is the most annoying type of code to write/try to remember, having all of that automated away would be awesome.


This approach is kind of a hack, though. The proper way to automate boilerplate is better PL/library design that makes it unnecessary.


Wait until it's time to maintain all this autogenerated code. It's going to be a nightmare.


Don't worry, the most common bug fixes will become part of the suggested code, so when you start writing patches, Copilot will have great suggestions. And then when you have to fix the new bugs, same deal.


The real nightmares will revolve around changing requirements. That's where a statistical analyzer is not going to be smart enough to know what's going on, and you're going to have to understand all this autogenerated code yourself.


[flagged]


Is this a joke? Top 3 comments follow almost an identical format?


It's a feature not a bug. Copilot also assists with HN comments!


Amazing, this is exactly what I was going to type!


I suppose you were both trained on the same data set.


Right? Im getting heavy astroturf vibes from these repetitive, nearly perfectly phrased, corporate sounding paragraphs of pure praise.


This is the HN copilot, which writes comments for you.

Basically, if you look at what’s not yet available to the public, we have engines that can write an entire program that does what you want, a test suite, documentation, and write thoughtful comments on all the forums about why the program is good. They could make about 100,000 programs an hour on a cluster, together wih effusive praise for each, but that would make everyone supicious.


Ferross is a well known figure and a monster talent. I seriously doubt he has sold out as a Github shill.


> it does a mucho mejor trabajo de sugerir the right code, sometimes it is even writing the entire function para ti.

Whether it's a joke or astroturfing, this alone makes it brilliant.


It's a direct reply to the other comment you mention and it parses like it's autogenerated but then swerves off into another language. I'm pretty sure it's a joke.


Yes it was a joke. I wrote it myself. Seems many people didn't get it. Oh well...


Does it sometimes switch into a different language mid-function?


I do it todo el tiempo


Is there a rule for banning users that sell their account for this type of comment ? If not there should be


[flagged]


They were replaced by GPT too :)


More like GPT three :)


What is the licensing for code generated in this way? GPT-3 has memorized hundreds of texts verbatim and can be prompted to regurgitate that text. Has this model only been trained on code that doesn't require attribution as part of the license?


The landing page for it states the below, so hopefully not too much of an issue (though I guess some folks may find a 0.1% risk high).

> GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set.


If you pulled a marble out of a bag ten times a day, with a 0.1% chance each time that it was red: after a day you'd have a 1% chance of seeing a red marble, the first week you'd have a 6.7% chance, the first month you'd have a 26% chance, and the first working year you'd have a 92.6% chance of having seen at least one red marble.

Probabilities are fun!


Well within the margin of fair use.


That's not how fair use works. It doesn't matter how unlikely it is, if Copilot one day decides to "suggest" a significant snippet of ckxd from a GPLed app, you'd better be planning to GPL your project.


No, and this has been outlined in the past why that is not the case.

e.g. https://lwn.net/Articles/61292/ and most likely only one opinion.

on the other hand, it would be interesting to learn about what the copyright implications are of

a) creating a utility like copilot (it is a software program) and contains a corpus based on copyrighted material (the database that has been trained)

b) using it to create code based on the corpus and resulting in software as a work under copyright.


And you would have a whole lot of blue marbles.


I only murdered them once isn't the best of legal defenses.


Automatic refactoring could be useful for avoiding a lot of dumb legal disputes.

I say dumb because I am, perhaps chauvinistically, assuming that no brilliant algorithmic insights will be transferred via the AI copilot, only work that might have been conceptually hard to a beginner but feels routine to a day laborer.

Then again that assumption suggests there'd be nothing to sue over.


True, but that definitely wouldn't stop Oracle from suing over it anyway. (See the rangeCheck chapter of Oracle v Google [0])

Also, Oracle v Google opens the possibility of a fair-use defense in the event that Copilot does regurgitate a small snippet of code.

[0] https://news.ycombinator.com/item?id=11722514


I'd be surprised if a company's legal department would be OK with that 0.1% risk.


Google already learned that one. "There's only a tiny chance we may be copying some public code from Oracle." may not be a good explanation there.


Like wouldn’t be entertaining without the License Nazis. No code for you! (Seinfeld reference)


Did they have a license to use public source code as a data source for data set though?


God I wish contracts were encoded semantically rather than as plain text. I just tried to look through Github's terms of service[1]. I'd search for "Github can <verb> with <adjective> code" if I could. Instead I'm giving up.

[1] https://docs.github.com/en/github/site-policy/github-terms-o...


A world in which all laws and contracts were required to be written in Lojban would be interesting.


That looks hard. More politically feasible might be a language I've unfortunately forgotten the name of, ordinary English but with some extra rules designed to eliminate ambiguity -- every noun has to carry a specifier like "some" or "all" or "the" or "a", etc.


Legalese might be similar to code, and there is lots of interest in making law machine readable. So don't give up; check back later.


Yes, it's public source code.


Public doesn’t mean it’s not encumbered by copyrights


Pretty much everything is trained on copyrighted content: machine translation software, TWDNE, DALL-E, and all the GPTs. Software people are bringing this up now because it's their ox being gored. It's the same as when furries got upset about This Fursona Does Not Exist.[1][2]

1. https://news.ycombinator.com/item?id=23093911

2. https://www.reddit.com/r/HobbyDrama/comments/gfam2y/furries_...


To expand on your argument, pretty much every person is trained on copyrighted content too. Doesn't make their generated content automatically subject to copyright either.


yeah, except that Oracle and google have way more lawyer power than furries artists.


You have no idea how much money they make. Some of them have payment plans for commissions.


This is an argument for why this is a bigger problem, not a smaller one.


If it's BSD-licensed, the encumbrance doesn't matter much.


Update: Nat Friedman answered this as part of this thread on twitter:

https://twitter.com/natfriedman/status/1409883713786241032

Basically they are building a system to find explicit copying and warn developers when the output is verbatim.


Not sure how this is handled in US, but in Germany a few lines of code have in general not enough uniqueness to be licensed.


So the corpus has been compiled under license and the derivative work is eligible for distribution?


Finally, a faster way to spread bugs than copy/paste.


You're using the word "memorized" in a very loose way.


His point still holds, GPT-3 can output large chunks of licensed code, verbatim


How is it loose? Both in the colloquial sense and in the sense it is used in machine learning it is fitting. https://bair.berkeley.edu/blog/2020/12/20/lmmem/ is a post demonstrating it.


Pack it all up, boys, programming's over. Hello, AI.

Anyone want to hire me to teach your grandma how to use the internet?


Few days back, Sam Altman tweeted this

"Prediction: AI will cause the price of work that can happen in front of a computer to decrease much faster than the price of work that happens in the physical world. This is the opposite of what most people (including me) expected, and will have strange effects"

And I was like yeah I gotta start preparing for next decade.


>Prediction: AI will cause the price of work that can happen in front of a computer to decrease much faster than the price of work that happens in the physical world.

I'm skeptical.

The envelope of "programming" will continue to shift as things get more and more complex. Your mother-in-law is not going to install Copilot and start knocking out web apps. Tools like this allow programmers to become more productive, which increases demand for the skills.


I strongly agree with you.

Reminds me of something I read that claimed when drum machines came out, the music industry thought it was the end of drummers. Until people realized that drummers tended to be the best people at programming cool beats on the drum machine.

Every single technological advancement meant to make technology more accessible and eliminate expertise has instead only redefined what expertise means. And the overall trend has been a lot more work opportunities created, not less.


I had a front row seat to the technological changes in the music industry. I opened a recording studio when you had to "edit" using a razor blade and tape to cut tape together. I remember my first time doing digital editing on a Macintosh IIfx. What happened to drummers is that technology advanced to the point where studio magic could make decent drummers sound great. But it's still cheaper and faster (and arguably better) to get a great session drummer who doesn't need it. Those pros are still in high demand.


Yeah, but less drummers are being hired than before drum machines came out. What you describe sounds like work has become more concentrated into fewer hands. Perhaps this will happen with software as well.


What happened is what typically happens: Concentration of expertise. The lower expertise jobs (just mechanically play what someone else wrote/arranged) went away and there was increased demand for higher expertise (be an actual expert in beats _and_ drum machines).

So the winners were those that adapted earlier and the losers were those that didn't/couldn't adapt.

This translates to: If you're mindlessly doing the same thing over and over again, then it's a low value prop and is at risk. But if you're solving actual problems that require thought/expertise then the value prop is high and probably going to get higher.


But there's also the subtext that if you find yourself at the lower-skill portion of your particular industry, then you should probably have a contingency plan to avoid being automated out of a job, such as retiring, learning more, or switching to an adjacent field.


Exactly, and AI only means that this adage now applies to programming as well.


but this was true anyways -- the lower your skill, the more competition you have. At the lowest skill levels, you better damn well have a contingency plan, because any slight downward shift in market demand is sword coming straight for your neck.


I think you have another thing coming. Think about what really got abstracted away. The super hard parts like scaling and infrastructure (aws), the rendering engines in React, all the networking stuff that’s hidden in your server (dare you to deal with tcp packets), that’s the stuff that goes away.

We can automate the mundane but that’s usually the stuff that requires creativity, so the automated stuff becomes uninteresting in that realm. People will seek crafted experiences.


It would be funny if after the AI automates away "all the boring stuff" we're left with the godawful job of writing tests to make sure the AI got it right.


I think it'll be much more likely that the AI writes the tests (the boring stuff) for the buggy code I write.


I can see the AI suggesting and fixing syntax in the tests. Determining their semantics, not without true AGI.


I'm not sure that all of that has really gone away.

It's just concentrated into the hands of a very few super specialists, it's much harder to get to their level but their work is much much more important.


True, and if the specialists retire, there may be some parts that no one understands properly anymore.

See: https://www.youtube.com/watch?v=ZSRHeXYDLko / Preventing the Collapse of Civilization / Jonathan Blow


Better yet -- the jobs of those specialists got better, and the people who would have done similar work did not end up unemployed, they just do some other kind of programming.


Do you have any actual data for that? Last I saw, most bands are still using live drummers and studio recordings for small to mid-sized bands are still actual drummers as well - unless it's a mostly studio band trying to save cost.

I think the analog to programming is a bit more direct in this sense; most companies aren't going to go with something like Copilot unless it's supplemental or they're on an entirely shoestring budget; it'll be the bigger companies wanting to squeeze out that extra 10% productivity that are betting hard on this - same with where larger bands would do this to have an extremely clean studio track for an album.


Source? I would actually expect there to be around the same amount of drummers, but more people making music.


Based on these very unreliable sources, the number of drummers in the US may have increased from ~1 million in 2006 to ~2.5 million in 2018. That's during a time when the population increases from 298 million to 327 million.

So, during this period, a ~10% increase in population saw a 250% increase in drummers.

It does not appear that the drum kit killed the drummer.

Big caveats about what these surveys defined as "drummer" and that this doesn't reflect professional drummer gigs, just the number of drummers.

[1] https://m.facebook.com/Bumwrapdrums/posts/how-many-drummers-...

[2] https://www.quora.com/How-many-people-play-drums-in-the-US


Are we in a drummer bubble?


If you could get by with a drum machine, did you really need a real drummer in the first place? Maybe a lot of drummers were used for lack of any automated alternative in the early days?

By the same line of thinking, If you can get by with AI generated code did you really require a seasoned, experienced developer in the first place? If your product/company/service can get by with copy pasta to run your CRUD app (which has been happening for some time now sans the AI aspect) did you ever really need a high end dev?

I think its like anything else, 80% is easy and 20% is not easy. AI will handle the 80% with increasing effectiveness but the 20% will remain the domain of humans for the foreseeable future.

Worth considering maybe.


The counter argument comes from photography. 20 years ago digital photography didn't exist and if you were a photographer it was a lot easier to make a living.

Nowadays everyone can make professional looking photos so the demand for photographers has shrunk, as the supply has increased.


Drummers do tend to be good drum programmers, but I believe they're a small fraction of the pool of pros. The drum machine made percussion feasible for lots of people living in dense housing for whom real drums were out of the question. (Also drummers tend to dislike using drum machines, because the real thing is more expressive.)

AI will be similar -- it will not just give more tools to people already in a given field (programming, writing, whatever), but also bring new people in, and also create new fields. (I personally can't wait for the gardening of AI art to catch on. It'll be so weird[1].)

[1] https://www.youtube.com/watch?v=MwtVkPKx3RA


Those drum machines didn't use machine learning though.


Exactly. I'm currently reading The Mythical Man-Month. 90% of what the book discusses in term of programming work that actually has to be done is completely irrelevant today. Still the software industry is bigger then ever. In the book it is also mentioned that programmers spend about 50% of their time on non-programming tasks. In my experience this is also true today. So no matter the tools we've got, the profession stayed the same since the early 70s.


What are notable books nowadays? It seems all the books I can cite are from 2005-2010 (Clean Code, JCIP, even the Lean Startup or Tribal Leadership…) but did the market for legendary books vanish in favor of Youtube tutorials? I’m running out of materials I can give to my interns to gobble knowledge into them in bulk.


[prioritization] Effective engineer - lau

[systems] Designing data intensive applications - kleppman

[programming] SICP - sussman & abelson

Last one is an old scheme book. No other book (that I read) can even hold a candle to this one, in terms of actually developing my thought process around abstraction & composition of ideas in code. Things that library authors often need to deal with.

For example in react - what are the right concepts to that are powerful enough to represent a dynamic website & how should they compose together.


I have the same question when it comes to a modern version of the Mythical Man Month. I know some computer history, so I can understand most examples. But still it would be great to have a comparable modern book.


The amount of "programming time" I spend actually writing code is also quite low compared to the amount of time I spend figuring out what needs to be done, the best way for it to be done, how it fits together with other things, the best way to present or make it available to the user, etc. Ie most of my time is still spent on figuring out and refining requirements, architecture and interfaces.


Same, the actual solution is pretty darn easy once I know what needs to be done.


Tools like email, instant messenger, and online calendars made secretaries much more productive which increased demand for the skills. Wait...

Replacement of programmers will follow these lines. New tools, like copilot (haven't tried, but will soon), new languages, libraries, better IDEs, stack overflows, Google, etc will make programming easier and more productive. One programmer will do the work that ten did. That a hundred did. You'll learn to become an effective programmer from a bootcamp (already possible - I know someone who went from bootcamp to Google), then from a few tutorials will.

Just like the secretary's role in the office was replaced by everyone managing their own calendars and communications the programmer will be replaced by one or two tremendously productive folks and your average business person being able to generate enough code to get the job done.


Secretaries became admin assistants who are much more productive and valuable since they switched their job to things like helping with the content of written communications, preparing presentations, and managing relationships. I saw my mother go through this transition and it wasn't really rough (though she's a smart and hard-working person).


> Secretaries became admin assistants

That doesn't mean anything. The last 20 years have seen an absurd chase of more and more stupidity in job titles to make people feel they are "executive assistants" instead of secretaries, "vice presidents" instead of whatever managerial role, etc, etc.


but secretaries are still a thing, they are just usually shared by a whole team / department these days


I had a corporate gig as a coder reporting to at most three economists at any one time. I spent at least two hours of every day getting them to explain what they wanted, explaining the implications of what they were asking for, explaining the results of the code to them, etc. So even if I didn't need to code at all my efficiency would have expanded by at most a factor of 4.


The future as I see it is that coding will become a relatively trivial skill. The economists would each know how to code and that would remove you from the equation. They would implement whatever thing they were trying to do themselves.

This would scale to support any number of economists. This would also be a simpler model and that simplicity might lead to a better product. In your model, the economists must explain to you, then you must write the code. That adds a layer where errors could happen - you misunderstand the economists or they explain poorly or you forget or whatever. If the economists could implement things themselves - less room for "telephone" type errors. This would also allow the economists to prototype, experiment, and iterate faster.


That game of telephone is certainly an enormous pain point, and I can imagine a future where I'm out of a job -- but it's extremely hard for me to see them learning to code.


And if what the economists needed to do could be programmed trivially with the help of AI then their job is probably also replaceable by AI.


That would be harder. I shuffled data into a new form; they wrote papers. All I had to understand was what they wanted, and how to get there; they had to understand enough of the world to argue that a given natural experiment showed drug X was an effective treatment for condition Y.


That was the point, I think.


>Tools like email, instant messenger, and online calendars made secretaries much more productive which increased demand for the skills. Wait...

There are more "secretaries" than ever, and they get to do far more productive things than delivering phone messages.


It may reduce the demand for the rank-and-file grunts, though.

Why would an architect bother with sending some work overseas if tools like this would enable them to crank out the code faster than it would take to do a code review?


I think the thought process is from the perspective of the employer, if you assume these two statements are true:

1) AI tools increase developer productivity, allowing projects to get completed faster; and

2) AI tools offset a nonzero amount of skill prerequisites, allowing developers to write "better" code, regardless of their skill level

With those in mind, it seems reasonable to conclude that the price to e.g. build an app or website will decrease, because it'll require either fewer man-hours 'til completion and/or less skill from the hired developers doing said work.

You do make a good point that "building an app" or "building a website" will likely shift in meaning to something more complex, wherein we get "better" outputs for the same amount of work/price though.


Now replace “AI” in your 1 & 2 points with “Github” (and the trend of open-sourcing libraries, making them available for all). All you said still works, and it did not harm programmer jobs in any way (quite the opposite).

And actually, I really don't see AI in the next decade making more of a difference than what Github did (making thousands of man-hour of works available for free). Around 2040 or 2050, maybe. But not soon, AI is still really far.


>that the price to e.g. build an app or website will decrease

Yes, and this in turn increases demand as more people/companies/etc.. can afford it.


If there are diminishing returns to expanding a given piece of software, an employer could end up making a better product, but not that much better, and employing fewer resources (esp. people) to do so.

And even that could still be fine for programmers, as other firms will be enticed into buying the creation of software -- firms that didn't want to build software when programming was less efficient/more expensive.


Decreasing the price of programming work doesn't necessarily mean decreasing the wages of programmers, any more than decreasing the price of food implies decreasing the wages of farmers.

But on the other hand, it also can mean that.


here's the difference: the farmers are large industrial companies that lobby the government for subsidies, such that they can continue to produce food that can, by law, never be eaten.

programmers, on the other hand, are wage laborers, individually selling their labor to employers who profit by paying them less.

industry is sitting on the opposite side of the equation here. I wonder what will replace "learn to code". whatever it is, the irony will be almost as rich as the businesses that profit from all this.


It's hard to use history to predict the implications of weird new technologies. But history is clear on at least two happy facts. Technology generates new jobs -- no technology has led to widespread unemployment. (True general AI could be different; niche "AI" like this probably won't be.) Technology raises standards of living generally, and making a specific sector of the economy more productive increases the wealth accrued to that sector (although it can change who is in it too).

There are exceptions -- fancy weapons don't widely raise standards of living -- but the trends are strong.


what does unemployment have to do with it? if you're 22 and paying off loans and suddenly find yourself unable to work as much but a barista, continued employment isn't exactly a perk. meanwhile, the benefits of the increased productivity brought by new technology do accrue somewhere - it's just not with workers. the trends on that are also quite clear. productivity growth has been high over the past 40 years but real wages have at best been stagnant. and on the wheel turns, finding souls to grind beneath it's weight, anew.


On your second point I agree -- the distribution within a sector matters, and most of them at the moment are disturbingly top-heavy.

On the first though, we have little reason to think tech will systematically diminish the roles people can fill. In the broad, the opposite has tended to happen throughout history -- although the narrow exceptions, like factory workers losing their jobs to robots, are real, and deserve more of a response than almost every government in the world has provided. For political stability, let alone justice.


For every programmer who can think and reason about what they are doing, there are at least 10 who just went to a bootcamp and are not afraid to copy and paste random stuff they do not understand and cannot explain.

Initially, they will appear more productive with CoPilot. Businesses will decide they do not need anybody other than those who want to work with CoPilot. This will lead to adverse selection on the quality of programmers that interact with CoPilot ... especially those who cannot judge the quality of the suggested code.

That can lead to various outcomes, but it is difficult to envision them being uniformly good.


>Tools like this allow programmers to become more productive, which increases demand for the skills.

Well like everything in life I guess it depends? The only iron rule I can always safely assume is supply and demand.

But for programming especially on the web, it seems everyone has a tendency of making things more difficult than it should be, that inherent system complexity isn't going to be solved by ML.

So in terms of squeezing out absolute efficiency from system cost, I think we have a very very long way to go.


This is just shortening the time it takes developers to Google it, search through StackOverflow and then cut and paste and modify the code they are looking for. It will definitely speed up development time. Sounds like a nice quality of life improvement for developers. I don't think it will cause the price of work to decrease, if anything a developer utilizing copilot efficiently should be paid more.


You don't know my mother-in-law.


I think this will result in classic Jevons paradox: https://en.wikipedia.org/wiki/Jevons_paradox . As the price of writing any individual function/feature goes down, the demand for software will go up exponentially. Think of how many smallish projects are just never started these days because "software engineers are too expensive".

I don't think software engineers will get much cheaper, they'll just do a lot more.


I'm guessing low expertise programmers whose main contribution was googling stackoverflow will get less valuable, while high expertise programmers with real design skill will become even more valuable.


I'm both of those things, what happens to my value?


Your legs will have to move faster than your arms.


Sonic the Hedgehog's employment prospects are looking up.


It goes up/down


Googling Stackoverflow itself can sometimes be a high expertise skill, simply because sometimes you need a fairly good understanding of your issue to figure out what to search for. A recent example: we had an nginx proxy set up to cache API POST requests (don't worry - they were idempotent, but too big for a query string), and nginx sometimes returned the wrong response. I'm pretty sure I found most of the explanation on Stackoverflow, but I didn't find a question that directly addressed the issue, so Googling was a challenge. You can keep your job finding answers on Stackoverflow of you are good at it.


unfortunately companies don't make interviewing for real design skills a priority. you'll get weeded out because you forgot how to do topographical sort


Hopefully tools like this will finally pursuade companies that being able to do leetcode from memory is not a skill they need.


Certainly but the higher expertise isn't a requirement for most dev jobs I would argue; If you are developing custom algorithm and advanced data structure, you are probably in the fringe of what the dev world do.

Otherwise I am struggling explaining why there is such a great demand for devs that short courses (3-6 months) are successful, the same courses that fail at teaching the fundamental of computing.


Guessing that if all you have to do is keep your metrics green, they are not selecting for the skills they are educating for.


With AI now they are on your level. It equalizes.


> Think of how many smallish projects are just never started these days because "software engineers are too expensive".

Maybe many. If the cost/benefit equation doesn't work, it makes no sense to do the project.

> I don't think software engineers will get much cheaper, they'll just do a lot more.

If they do more for the same cost, they are cheaper. You as a developer will be earning less in relation to the value you create.


> If they do more for the same cost, they are cheaper. You as a developer will be earning less in relation to the value you create.

Welcome to the definition of productivity increases, which is the only way an economy can increase standard of living without inflation.


Inflation and productivity might be correlated but neither is a function of the other. Given any hypothetical world where increased productivity leads to inflation, there's a corresponding world equal in all respects except that the money supply shrinks enough to offset that inflation.


> You as a developer will be earning less in relation to the value you create.

Doesn't matter as long as I create 5x value and earn 2x for it. I still am earning double within the same time and effort.


Oh now I see! This is how we will enter a new era of ‘code bloat’ - Moore’s law applied to software - where lines of code double every 18 months-


We went through the same hype cycle with self driving cars. We are now ~15 years out from the DARPA challenges and to date exactly 0 drivers have been replaced by AI.

It is certainly impressive to see how much the GPT models have improved. But the devil is in the last 10%. If you can create an AI that writes perfectly functional python code, but that same AI does not know how to upgrade an EC2 instance when the application starts hitting memory limits, then you haven't really replaced engineers, you have just given them more time to browse hacker news.


Driving is qualitatively different from coding: an AI that's pretty good but messes up sometimes is vastly more useful for coding than for driving. In neither case can you let the AI "drive", but that's ok in coding as software engineering is already set up for that. Testing, pair programming and code reviews are popular ways to productively collaborate with junior developers.

You're not replacing the engineer, but you're giving every engineer a tireless companion typing suggestions faster than you ever could, to be filled in when you feel it's going to add value. My experience with the alpha was eye opening: this was the first time I've interacted with an AI and felt like its not just a toy, but actually contributing.


Writing code is by far the easiest part of my job. I certainly welcome any tools that will increase my productivity in that domain, but until an AI can figure out how to fix obscure, intermittent, and/or silent bugs that occur somewhere in a series of daisy-chained pipelines running on a stack of a half-dozen services/applications, I am not going to get too worked up about it.


I agree. It kind of amazes me though there is so much room for obscurity. I would expect standardisation to have dealt with this a long time ago. Why are problems not more isolated and manageable in general?


It's extremely hard to reason about the global emergent behavior of a complex system than the isolated behavior of a small component.


I don't think it's a function of complexity per se, but determinism. This is why Haskellers love the IO monad. Used well, it lets you quarantine IO to a thin top-level layer, below which all functions are pure and easy to unit test.


Distributed systems are anything but deterministic.


What is your definition of "replace"? Waymo operates a driverless taxi service in Phoenix. Sign ups are open to the general public. IMO this counts as replacing some drivers as there is less demand for taxi service in the operating area.

https://blog.waymo.com/2020/10/waymo-is-opening-its-fully-dr...


According to this article[1], the number of ride hailing drivers has tripled in the last decade.

I think full self driving is possible in the future, but it will likely require investments in infrastructure (smarter and safer roads), regulatory changes, and more technological progress. But for the last decade or so, we had "thought leaders" and VCs going on and on about how AI was going to put millions of drivers out of work in the next decade. I think it is safe to say that we are at least another decade away from that outcome, probably longer.

[1] https://finance.yahoo.com/news/number-american-taxi-drivers-...


Self driving is used in the mining industry, and lots of high paid drivers have been replaced.

But you are clearly more knowledgeable with your 0 drivers replaced comment.


Mining as in those big trucks or mining as in trains on tracks?



excellent if dsytopian article. thank you for sharing!


> AI does not know how to upgrade an EC2 instance when the application starts hitting memory limits

That's exactly the kind of thing "serverless" hosting has done for a while now.


Yeah really bad example there.


Ahh yes, the serverless revolution! I was told that serverless was going to make my job obsolete as well. Still waiting for that one to pan out. Not going to hold my breath.


This isn't self driving for programming, its more like GPS and lane assist.


15 years is no time at all.


I am blown away but not scared for my job... yet. I suspect the AI is only as good as the training examples from Github. If so, then this AI will never generate novel algorithms. The AI is simply performing some really amazing pattern matching to suggest code based on other pieces of code.

But over the coming decades AI could dominate coding. I now believe in my lifetime it will be possible for an AI to win almost all coding competitions!


I guess it's worth pointing out that the human brain is just an amazing pattern matcher.

They feed you all these algorithms in college and your brain suggests new algorithms based on those patterns.


Humans are more than pattern matchers because we do not passively receive and imitate information. We learn cause and effect by perturbing our environment, which is not possible by passively examining data.

An AI agent can interact with an environment and learn from its environment by reinforcement learning. It is important to remember that pattern matching is different from higher forms of learning, like reinforcement learning.

To summarize, I think there are real limitations with this AI, but these limitations are solvable problems, and I anticipate significant future progress


Fortunately the environment for coding AI is a compiler and a CPU which is much faster and cheaper than physical robots, and doesn't require humans for evaluation like dialogue agents and GANs.


Well you still have to assess validity and code quality which is a difficult task , but not unsolvable.

Also Generative Adversarial Networks original implementation was to pit neural networks against each other to train them , they don't need human intervention.


> They feed you all these algorithms in college and your brain suggests new algorithms based on those patterns.

Some come from the other end of the process.

I want to solve that problem -> Functionally, it’d mean this and that -> How would it work? -> What algorithms / patterns are there out there that could help.

Usually people with less formal education and more hands on experience, I’d wager.

More prone to end up reinventing the wheel and spend more time searching for solutions too.


What?


Most people I know who’ve been to college, or otherwise educated in the area they work in, tend to solve problems using what they know (not implying it’s a hard limit. Just an apparently well spread first instinct).

Which fits the pattern matching described by the grandparent.

A few people I know, most of which haven’t been to college, or done much learning at all, but are used to work outside of what they know (that’s an important part), tend to solve problems with things they didn’t know at the time they set out to solve said problems.

Which doesn’t really fit the pattern matching mentioned by the grandparent. At least not in the way it was meant.


To reduce intelligence to pattern matching begs the question: How do you know which patterns to match against which? By some magic we can answer questions of why, of what something means. Purpose and meaning might be slippery things to pin down, but they are real, we navigate them (usually) effortlessly, and we still have no idea how to even begin to get AI to do those things.


I think those are the distance metrics, which is what produces inductive bias, which is the core essence of what we consider 'intelligence'. - Consider a more complicated metric like a graph distance with a bit of interesting topology. That metric is the unit by which the feature space is uniformly reduced. Things which are not linearized by the metric are considered noise, so this forms a heuristic which overlooks features which may have been in reality salient. - This makes it an inductive bias.

(Some call me heterodox, I prefer 'original thinker'.)


To generate new, generally useful algorithms, we need a different type of "AI", i.e. one that combines learning and formal verification. Because algorithm design is a cycle: come up with an algorithm, prove what it can or can't do, and repeat until you are happy with the formal properties. Software can help, but we can't automate the math, yet.


I see a different path forward based on the success of AlphaGo.

This looks like a clever example of supervised learning. But supervised learning doesn't get you cause and effect, it is just pattern matching.

To get at cause and effect, you need reinforcement learning, like AlphaGo. You can imagine an AI writing code that is then scored for performing correctly. Overtime the AI will learn to write code that performs as intended. I think coding can be used as a "playground" for AI to rapidly improve itself, like how AlphaGo could play Go over and over again


Imparting a sense of objective to the AI is surely important, and an architecture like AlphaGo might be useful for the general problem of helping a coder. I'm not seeing it, however, for this particular autocomplete-flavored idiom.

AlphaGo learns a game with fixed, well-defined, measurable objectives, by trying it a bazillion times. In this autocomplete idiom the AI's objective is constantly shifting, and conveyed by extremely partial information.

But you could imagine a different arrangement, where the coder expresses the problem in a more structured way -- hopefully involving dependent types, probably involving tests. That deeper encoding would enable a deeper AI understanding (if I can responsibly use that word). The human-provided spec would have to be extremely good, because AlphaGo needs to run a bazillion times, so you can't go the autocomplete route of expecting the human to actually read the code and determine what works.


> we can't automate the math, yet

This exists: https://en.wikipedia.org/wiki/Automated_theorem_proving


This is moreso automation-assisted theorem proving. It takes a lot of human work to get a problem to the point where automation can be useful.

It's like saying that calculators can solve complex math problems; it's true in a sense, but it's not not strictly true. We solve the complex math problems using calculators.


and there's already GPT-f [0], which is a GPT-based automated theorem prover for the Metamath language, which apparently submitted novel short proofs which were accepted into Metamath's archive.

I would very much like GPT-f for something like SMT, then it could actually make Dafny efficient to check (and probably avoid needing to help it out when it gets stuck!)

0. https://analyticsindiamag.com/what-is-gpt-f/


Someone tell Gödel


You mean like AlphaGo where the neural net is combined with MCTS?


> If so, then this AI will never generate novel algorithms. This is true, but the most programmers don't need to generate novel algorithms themselves anyway.


> I now believe in my lifetime it will be possible for an AI to win almost all coding competitions!

Then we shall be reaching singularity.


We will only reach a singularity with respect to coding. There are many important problems beyond computer coding like engineering and biology and so on


Coding isn't chess playing it's likely about as general as math or thinking. If you can write novel code you can ultimately do biology or engineering or ultimately anything else.


Reading this thread it seems to me that AI is a threat for "boilerplate-heavy" programming like website frontends, I can't really imagine pre-singularity AI being able to replace a programmer in the general case.

Helping devs go through "boring", repetitive code faster seems like a good way to increase our productivity and make us more valuable, not less.

Sure, if AI evolves to the point where it reaches human-level coding abilities we're in trouble, but that's the case this is going to revolutionize humanity as a whole (for better or worse), not merely our little niche.


C’mon guys, your standard backend schema with endpoints is like way easier to automate away.


I mean, we already have? Use Django Rest Framework and simple business models and you're pretty much declaratively writing API endpoints by composing behavior. Almost nothing in a DRF endpoint definition is boilerplate.

The hard part has always been writing an API that models external behavior correctly.


Generating the code with help from AI: 25 cents

Knowing what code to generate with the AI: 200k/yr


Will this tend to blur the distinction between coder and manager? In the end a manager is just a coder who commands more resources, and relies more on natural language to do it.

Or maybe I'm thinking of tech leads. I don't know, my org is flat.


Issue is not writing code. Its changing, evolving or maintaining it.

This is the problem with things like Spreadsheets, dag-drop programming, code generators.

Its not easy to tell a program what to change and where to change.


I feel like you might be moving the goalposts. Maybe they're different problems, but it's not at all clear to me that mutation is harder than creation.


"Prediction: AI will cause the price of work that can happen in front of a computer to decrease much faster than the price of work that happens in the physical world. This is the opposite of what most people (including me) expected, and will have strange effects"

I've been saying something like that for a while, but my form was "If everything you do goes in and out over a wire, you can be replaced." By a computer, a computer with AI, or some kind of outsourcing.

A question I've been asking for a few years, pre-pandemic, is, when do we reach "peak office"? Post-pandemic, we probably already have. This has huge implications for commercial real estate, and, indeed, cities.


I just don't believe it. Having experienced terrible cheap outsourced support and things like Microsoft's troubleshooting assistant (also terrible), I'm willing to pay for quality human professionals. They have a long way to go before I change my mind.


huh... I've found that people who tend to describe their occupation as "knowledge work" are the most blind to the fact that white collar jobs are the first to get optimized away. Lawyers are going to have a really bad time after somebody manages to bridge NLP and formal logic. No, it won't result in a Lawyerbot-2000 - it will result in software that enables lawyers to do orders of magnitude more work of a higher quality. What do you think that does to a labor market? It shrinks it. That or people fill the labor glut with new, cheaper, lawsuits...


i don't think that people will ever fully trust an AI lawyer, given all the possible legal consquences of a misunderstanding between the AI and the client. You could literally go to jail because of a bug/misunderstanding due to an ambiguous term (this might make a good sci-fi story ...)

But yes, getting some kind of legal opinion will probably be cheaper with an AI.


Nor I, which is why I said so. A SaaS will pop up called "Co-chair" and that'll be that. It would definitely be a lot easier to trust than any of the black box neural networks we are all familiar with - as the field of formal logic is thousands of years old and pretty thoroughly explored. I used a SAT solver just last night to generate an exhaustive list of input values that result in a specific failure mode for some code I'm reverse engineering - I have no doubts about the answer the SAT solver provided. That definitely isn't the case with NN based solutions - which I trust to classify cat photos, but not much else.


Legal discovery and other "menial" law tasks are already quite automated.


I wouldn't describe keyword search engines or cross reference managers as "quite automated" - so I would expect little market change from whatever LexisNexis is currently selling.


I would -- I remember my mom as a lawyer having to schlep to the UCLA law library to photocopy stuff -- but current legal automation includes NLP at the level of individual clauses.

https://www.suls.org.au/citations-blog/2020/9/25/natural-lan...


Oof, as somebody who has studied the AI winter - that article hurt, suggesting that an unsupervised NN-centric approach is going to lead somewhere other than tool-assist... its the 1970s all over again.

> I would

Well you're going to have a problem describing actual automation when you encounter it. What would you call it when NLP results are fed into an inference engine that then actually executes actions - instead of just providing summarized search results? Super-duper automation?



I kinda believe this but I still think it hugely depends on what you're doing in front of a computer. If you're just a generic developer that gets a task and codes it by the spec, then you can probably be replaced by AI in a few years.

But I don't think AI will become capable of complex thought in the next one/two decades, so if you're training to be a software architect, project manager, data analyst I think you should be safe for some time.


People have been saying AI would end everything and anything since I was a wee baby. It still hasn't happened. How about instead of making the same old tired boring predictions about the impending apocalypse of things we love we start making and listening to predictions that actually talk about how life has been observed to progress. It's not impossible, science-fiction authors get it right occasionally.


As it stands, this looks like it will actually increase the productivity of existing programmers more than it will result in "now everyone is a programmer".

Over time it will certainly do more, but it's probably quite a long time before it can be completely unsupervised, and in the meantime it's increasing the output of programmers.


Honestly, the only reason I'm still doing work in front of a computer is that it pays well. I'm really starting to think I should have followed my gut instincts when I was 17 and go to trade school to become an electrician or a carpenter...


True self-improving general AI would put all labor out of business. Incomes would then be determined by only the ownership of capital and government redistribution. Could be heaven, could be hell.


I’m not sure why it’s unexpected when it’s essentially a reframing of Baumol's cost disease. Any work that does not see a productivity increase becomes comparatively more expensive over time.


So either his prediction or expectation will be correct.

I think I’ll side with his expectation, but then again, my salary depends on it.


I wonder does Sam Altman also believe that you can measure programmer productivity by lines-of-code?


Don't hold your breath. (Coming soon right after the flying self-driving car.)


Automation has always produced an increase in jobs so far, although sometimes in a disruptive way. I consider this like the switch from instruction-level programming to compiled languages, a level of abstraction added that buys a large increase in productivity and makes projects affordable that weren’t affordable before. If anything this will probably lead to a boom in development work. But there’s a bunch of low skill programmers who can’t do much more than follow project templates and copy paste things. Those people will have to level up or get out.


I feel like the inevitable path will be:

1) AI makes really good code completion to make juniors way more productive. Senior devs benefit as well.

2) AI gets so good that it becomes increasingly hard to get a job as a junior--you just need senior devs to supervise the AI. This creates a talent pipeline shortage and screws over generations that want to become devs, but we find ways to deal with it.

3) Another major advance hits and AI becomes so good that the long promised "no code" future comes within reach. The line between BA and programmer blurs until everyone's basically a BA, telling the computer what kind of code it wants.

The thing though that many fail to recognize about technology is that while advances like this happen, sometimes technology seems to stall for DECADES. (E.g. the AI winter happened, but we're finally out of it.)


I could also see an alternative to #2 where it becomes increasingly hard to get a job as a senior dev when companies can just hire juniors to produce probably-good code and slightly more QA to ensure correctness.

You'd definitely still need some seniors in this scenario, but it feels possible that tooling like this might reduce their value-per-cost (and have the opposite effect on a larger pool of juniors).

As another comment said here, "if you can generate great python code but can't upgrade the EC2 instance when it runs out of memory, you haven't replaced developers; you've just freed up more of their time" (paraphrased).


No, programmers won't be replaced, we'll just add this to our toolbox. Every time our productivity increased we found new ways to spend it. There's no limit to our wants.


The famous 10 hour work week right? I am orders of magnitude more productive than my peers 50 years ago wrt programming scope and complexity, yet we work the same 40 hour week. I just produce more (sometimes buggy) code/products.


I live in a foreign country and study the language here. I frequently use machine translation to validate my my own translations, read menus with the Google Translate augmented reality camera, and chat with friends when I'm too busy to manually look up the words I don't understand in a dictionary. What I have learned is that machine translations are extremely helpful in a pinch, but often, a tiny adjustment in syntax, adding an adjective, or other minor edit like that will produce a sentence in English with entirely different meaning.

For context-specific questions it's even worse. The other day a stop owner that sells coffee beans insisted that we try out conversing with Google translate. I was trying to find the specific terms for natural, honey, and washed process. My Chinese is okay, but there's no way to know vocab like that unless you specifically look it up and learn it. Anyway, I felt pressured to go through with the Google translate charade even though I knew how the conversation would go. I said I wanted to know if this coffee was natural process. His reply was 'of course all of our coffees are natural with no added chemicals!' Turns out the word is 日曬, sun-exposed. AI is no replacement for learning the language.

State of the art image classification still classifies black people as gorillas [1].

I rue the day we end up with AI-generated operating systems that no one really understands how or why they do what they do, but when it gives you a weird result, you just jiggle a few things and let it try again. To me, that sounds like stage 4) in your list. We have black box devices that usually do what we want, but are completely opaque, may replicate glitchy or biased behaviors that it was trained on, and when it goes wrong it will be infuriating. But the 90% of the time that it works will be enough cost savings that it will become ubiquitous.

[1]: https://www.theverge.com/2018/1/12/16882408/google-racist-go...


> For context-specific questions it's even worse. The other day a stop owner that sells coffee beans insisted that we try out conversing with Google translate. I was trying to find the specific terms for natural, honey, and washed process. My Chinese is okay, but there's no way to know vocab like that unless you specifically look it up and learn it. Anyway, I felt pressured to go through with the Google translate charade even though I knew how the conversation would go. I said I wanted to know if this coffee was natural process. His reply was 'of course all of our coffees are natural with no added chemicals!' Turns out the word is 日曬, sun-exposed. AI is no replacement for learning the language.

Does "natural process" have a Wikipedia page? I've found that for many concepts (especially multi-word ones), where the corresponding name in the other language isn't necessarily a literal translation of the word(s), the best way to find the actual correct term is to look it up on Wikipedia, then see if there is a link under "Other languages".


Looks like in this case it's only part of a Wikipedia page[0] but the Chinese edition is only a stub page. But your suggestion is absolutely spot-on. One of the things I love about Wikipedia is that it's human-curated for human evaluation, not a "knowledge engine" that produces wonky results.

[0]https://en.wikipedia.org/wiki/Coffee_production#Dry_process


I feel like you're neglecting to mention all the people who need to build and maintain this AI. Cookie cutter business logic will no longer need programmers but there will be more highly skilled jobs to keep building and improving the AI


AI will keep building and improving the AI, of course!


But you need orders of magnitude fewer people to build and maintain the AIs then you do to manually create all the software running the world. And this is the unique peril of AI. The scale of capabilities of AI have the promise to grow faster than the creation of new classes jobs.


Telling the computer what you want IS programming...

When a new language / framework / library comes around, GitHub copilot won't have any suggestions for when you write in it.


> Automation has always produced an increase in jobs so far

Do you have a source for this re the last 20 years? It seems to me automation has been shifting the demand recently towards more skilled cognitive work.


I think you are confusing correlation and causation. Not automation produces jobs, more people and more income for that people produces jobs, because more people means more demand.


Yes, but AI isn't the same as automation.

Automation is a force multiplier. AI is a cheaper way of doing what humans do.

And the AI doesn't even need to be "true" AI. It simply needs to be able to do stuff better than what humans do.


> AI is a cheaper way of doing what humans do.

Like protein solving? /s


>Pack it all up, boys, programming's over. Hello, AI.

I don't know, cranking out some suggestion for a function is not the same as writing a complete module / application.

Take the job of a translator, you would say the job would go extinct with all the advances in autotranslation? here it says that 'employment of interpreters and translators is projected to grow 20 percent from 2019 to 2029, much faster than the average for all occupations' [1]. You still need a human being to clear up all of the ambiguities of languages.

Maybe the focus of the stuff we do will change, though; but on the other hand, we do tend to get a lots of changes in programming; it goes with the job. maybe we will get to do more code reviews of what was cranked out by some model.

However, within a decade it might be harder to get an entry level job as a programmer. I am not quite sure if i should suggest my profession to my kids, we might get a more competitive environment in some not so distant future.

[1] https://www.bls.gov/ooh/media-and-communication/interpreters...


> Anyone want to hire me to teach your grandma how to use the internet?

Only for the first time to train a model for that.


The next skill will be perfectly casting the right spell to make the AI spit out the product as spec'd.

Repurposed Google-fu. We'll always have jobs :)


Nah, we will just all be low-code function couplers instead of coders..


On the contrary. This tool increase the programmer productivity, hence you will get more salary not less.

You assumption is that programming demand is finite, AND that all programmers are equal, both of those are false.

I must also say that actual programming is around 10-15 % of the programmer job, so the tool will make you around 10% more overall productive.


With VSCode, Github and a perhaps a little bit of help from OpenAI, Microsoft is poised to dominate the developer productivity tools market in the near future.

I wouldn't be surprised to see really good static analysis and automated code review tools coming out of these teams very soon.


And still Windows is a mess.


Windows is a mess and I hope it will stay that way.

The real strength of Windows is backwards compatibility, especially dealing with proprietary software. Its messiness is a relic of the way things were done in the past, it is also the reason why my 20+ year old binary still runs. And it does without containers or full VMs.

I much prefer developing on Linux (I still miss the Visual Studio debugger), but different platform, different philosophy.

Note: I don't know much about mainframes. I heard they really are the champions of backwards compatibility. But I don't think the same principles are applicable to consumer machines.


Your 20 year old statically compiled binary still works on Linux, probably.


I've been on both Windows and Ubuntu for a while. I'd say Ubuntu has a ton more issues and requires a ton more initial configuration to behave "normally".

I don't even remember the last time Windows got in my way, in fact.


I guess the difference is that you can put in a weekend of effort on an Arch Linux installation and get a machine tailored to your workflow, few bugs, fast boot times, easy to maintain for the future, etc.

But no matter how much work you put into your Windows install it will be just as slow/fast, uncustomizable, and unoptimizable as it was out of the box.


I'd bet money that the VSCode and Windows teams are basically on different planets and Microsoft.


I bet there are people that use Windows to develop VSCode and use VSCode to develop Windows, so some people probably know each other internally. I think what escapes HN is how massively successful Microsoft is. Sure, the search built into Windows sucks. There are many, many more complicated components of a platform and OS than that, and those seem to work as well as any other platform and OS.


Compared to what other operating system(s)?

wsl on windows 10 has been amazing to develop and work on.


> wsl on windows 10 has been amazing to develop and work on.

Now imagine how amazing it would be just on Ubuntu ;-)


The problem is that not many IT departments support ubuntu. They are making lots of improvements to the UI and application management, but it can be cumbersome to get some applications working on linux. Having windows to install whatever gui apps you need or whatever other apps that aren't needed in linux, then having linux there to develop on has been pretty great. It's almost like a hybrid linux+windows operating system and not at all like running a vm on windows.

e.g. this is in my .bashrc in wsl, it writes stdout to my windows clipboard:

function cb () { powershell.exe -command "\$input | set-clipboard" }

Windows gets tons of hate in our community, but I gave it a chance a couple years ago after being frustrated with osx and it has been amazing and I think a lot of people would come around to it if they gave it a chance. I am biased towards linux though since I'm an sre, so maybe that is why I never could quite get comfortable on osx. I really disliked having to learn how to do something once on a mac, then do that same thing again on linux to get something into production.


For enterprise adoption of Ubuntu, Ubuntu 21.04 now start supporting AD Group Policy thanks to samba. I wonder how does it help.


Every other OS. It's full of legacy APIs and scrapped new APIs. Every release is like two steps one step forward, one step back and one two the side. Just because thousands of companies have written software and drivers for it, it's still existing. If it were released today it wouldn't stand a chance.


Are you talking about developing for windows or developing on windows? I'm talking about developing on windows. I don't really care what the apis look like underneath it all. wsl on windows is a lot more intuitive to develop on when you have a target environment of linux compared to something like osx where its almost like linux, but not really at all.


WSL was slower than dialup internet last I used it...


IMO a lot of what Windows does isn't something you can apply Copilot-style tech to. The only thing you could train it on would be Windows, really.


Have you used any intelligent code completion in the past? E.g. I'd really be interested how it compares to TabNine[0], which already gives pretty amazing single line suggestions (haven't tried their experimental multi-line suggestions yet).

[0]: https://www.tabnine.com


Interestingly the founder of TabNine (which was acquired by Codota[0]) is currently working at Open AI (edit: comments corrected me he left in December 2020 according to his blog). I imagine they're livid about Open AI creating a competing product.

TabNine at times was magical, but I stopped using it after Codota started injecting ads directly into my editor[1]

[0] https://betakit.com/waterloo-startup-tabnine-acquired-by-isr... [1] https://github.com/codota/TabNine/issues/342


Ah, thanks for the insight! It seems though that he is no longer working with OpenAI according to his personal website[0].

[0]: https://jacobjackson.com/about


I'm curious as to how relevant Copilot would be when autocompleting code that is specific to my codebase in particular, like Tabnine completes most used filters as soon as I type the db table name for the query. I'm a big tabnine fan because it provides this feature. I'm much more often looking to be suggested a line than an entire function because I'm mostly writing business logic.

also tabnine is useless in multi-lines completes. which is where co-pilot should be strong.


Yeah, I've been very happy with Tabnine for a while, but the prospect of good multi-line completions is appealing. I might try running both Tabnine and Copilot simultaneously for a bit to A/B test.


I've been using TabNine for a couple years – constantly impresses me, especially how quickly it picks up new patterns. I wouldn't say it's doing my job for me, but definitely saves me a lot of time.


I have used IDEs with good knowledge of the types and libraries I'm using (e.g. VSCode with TypeScript). They offer good suggestions once you start typing a function name.

But nothing gets close to Copilot. It "understands" what you're trying to do, and writes the code for you. It makes type-based autocompletions useless.


tabnine works quite similarly to copilot. it's not a thing that "knows about types and libraries", it's a similar predictive machine learning method as copilot seems to use.


I tried out TabNine. It was a very frustrating experience. In almost all cases it gave completely useless suggestions which overrode the better ones already suggested by IntelliJ. I persevered for a few days and then uninstalled it.


Maybe it's just because humans are not as creative as they think. Whatever you do, thousands of others have done the same already. So no need to pay a high level programmer, just a mediocre one and the right AI assistant gives the same results.


With an AI assistant, in the best scenario, you'll get a "wisdom of crowds" effect on implementation details and program architecture. At worst, you'll get a lot of opinionated code bloat and anti-patterns as suggestions.

For most backend programming jobs, the challenge is not in writing complex code but figuring out what the business wants, needs and should have in the first place and distinguishing between them. Figuring out how to integrate with existing systems, processes, fiefdoms and code. Knowing when to say yes and no, how to make code future proof, etc. This is a task fundamentally unfit for what we currently call "AI", because it's not actually intelligent or creative yet.

On the frontend, it becomes even more nebulous. Maybe Copilot can suggest common themes like Bootstrap classes for a form, CSS properties, the basic file structure and implementations for components in an SPA, etc. As I see it, the main challenge is in UX there, not the boilerplate, which again makes it about understanding the user, figuring out how to make UI feel intuitive, etc. Again: unfit for current AI.

I cannot offer any opinion on the utility for actually complex, FANG-level code, for lack of experience.


Quite the opposite. Menial work will be automated away (e.g. CRUD) and only good programmers will be needed to do the more complicated work.


> So no need to pay a high level programmer, just a mediocre one and the right AI assistant gives the same results.

I think of it as not needing juniors for boring work, all you need as a company is seniors and AI.


So where do these seniors come from?


That’s beyond the planning horizon.


You could have a much smaller pool of juniors/journeymen that are focused on maximum learning rather than amount of output.


without juniors, how do you get more seniors?


Maybe programmers will adopt a similar model to artists and musicians. Do a lot of work for free/practically nothing hoping that some day you can make it "big" and land a full time role.


That is already the recommendation. Pick an open source project and contribute.


And most people who consider the job are driven away by the economics of it.


we already have that with unpaid interns and PhD students


I think it's more that this tool is only capable of automating the non-creative work that thousands have done already.

It's still insanely impressive (assuming the examples aren't more cherry picked than I'd expect).


Side note: I recently suffered from a tennis elbow due to sub optimal desktop setup when working from home. Copilot has drastically reduced my keystrokes, and therefore the strain on my tenders.

It's good for our health, too!


I watched the same thing happen with the introduction of Intellisense. Pre-Intellisense I had tons of RSI problems and had to use funky ergonomic keyboards like the Kinesis keyboard to function as a dev. Now I just hop on whatever laptop is in front of me and code. Same reason - massive reduction in the number of keys I have to touch to produce a line of code.


Unfortunately, one in 10 times is far from good enough (and this is with good prompt engineering which after using large language models for a while, one starts to do).

I feel like the current generation of AI is bringing us close enough to something that works once in a while but requires constant human expertise ~50% of the time. The self-driving industry is in a similar situation of despair where millions have been spent in labelling and training but something fundamental is amiss in the ML models.


You are correct. I feel this is why the service is called Copilot, not Pilot :)


I think 1/10 is incredible. If it holds up it means they may have found the right prior for a path of development that can actually lead to artificial general intelligence. With exponential improvement; humans learning to hack the AI and the AI learning better suggestions, this may in theory happen very quickly.

We live in a very small corner of the space of possible universes, which is why finding a prior in program space within it is a big deal.


I keep wondering how much time it could possibly save you, given that you're obligated to read the code and make sure it makes sense. Given that, the testimonials here are very surprising to me.


It seems to replace/shorten the loop of “Google for snippet that does X” copy, paste, tweak, no? Which of course is super cool for many tasks!


It's smarter than that. It suggests things that have never been written. It actually creates code based on the context, just like GPT-3 can create new text documents based on previous inputs.

Edit: Check this screencast for instance: https://twitter.com/francoisz/status/1409908666166349831


Anyone tried this on any sort of numeric computation? Numpy, Julia, Pandas, R, whatever?

I definitely see the utility in the linked screencast. But I am left to wonder whether this effectiveness is really a symptom of the extreme propensity for boilerplate code that seems to come with anything web-related.


I'm not convinced that code snippet in the screencast had never been written. It's fairly generic React, no?


Is the model similar to GPT-3? That seems extremely resource-intensive, particularly given how fast the suggestions appear.


How big/complicated are the functions Copilot is autocompleting for you? I'm thinking perhaps reading 10 potential candidates is actually slower and less instructive than trying to write the thing yourself.


It shows the suggestions line by line, and only shows the best guess. It's not more intrusive than Intellisense.

You can actually see all the code blocks Copilot is thinking about if you want to, but that is indeed a distraction.


The problem I see with that is that's not possible for it to understand well which code is the best, GPT-3 is trying to mimic human writing in general, the thing is most human code is garbage, if this system was able to understand how to make code better you could keep training it until you had perfect code, which is not what the current system is giving you (a lot of the times anyway).


I guess you miss the point. It's not trying to suggest the perfect code. Only you know it. It's saving you time by writing a good (sometimes perfect) first solution based on method/argument names, context, comments, and inline doc. And that is already a huge boost in productivity and coding pleasure (as you only have to focus on the smart part).


Maybe you are right, in my experience either the code you have easily available to you (either because another person or a computer wrote it) is perfect for your use case (to the best of your knowledge anyway) or rewriting it from scratch is usually better than morphing what you have into what you need.


>if this system was able to understand how to make code better you could keep training it until you had perfect code

Based on the FAQ, it looks like some information about how you interact with the suggestions is fed back to the Copilot service (and theoretically OpenAI) to better improve the model.

So, while it may not understand "how to make code better" on its own, it can learn a bit from seeing how actual devs do make code better and theoretically improve from usage.


You're missing the problem he stated: Code written by humans is usually bad so the model is trained on garbage.


The proof is in the pudding. I was skeptical too but the testimonials here are impressive.


The animated example on https://copilot.github.com/ shows it suggesting entire blocks of code, though.


It does actually suggest entire blocks of code. I haven't quite figured out yet when it suggests blocks or lines - if I create a new function / method and add a doc string it definitely suggests a block for the entire implementation for me.


I see, I think the most useful case for me would be where I write a function signature+docstring, then get a list of suggestions that I can browse and pick from.

Do you have examples of what the line ones can do? The site doesn't really provide any of those.


Take a look at this minimal example I just created where I did just that -- created a new function and docstring. This example is of course super simple - it works for much more complex things.

https://gist.github.com/berndverst/1db9bae37f3c809e5c3f56262...


Is there actually a search.json endpoint?

https://github.com/HackerNews/API doesn't list this as a valid endpoint. Indeed, when I try to access it via curl, I get a HTTP 405 with a "Permission denied" error (same result when I try to access nonexistent-endpoint.json).

Based on the HN search on the website, I'd expect the correct autocomplete to involve hn.algolia.com [0].

[0] https://hn.algolia.com/api points at https://hn.algolia.com/api/v1/search?query=...

To me, this points at the need for human input with a system like this. There is a Firebase endpoint, yes, and Copilot found that correctly! But then it invented a new endpoint that doesn't exist.


Even snippets are so bad I have to turn them off. I can't even fathom how bad the suggestions are going to be for full blocks of code. But I guess I'll see soon...


What's your estimate of the productivity boost expressed as a percentage? I.e. if it takes you 100 hours to complete a project without Copilot, how many hours will it be with Copilot?


I'm not sure I spend less time actually coding stuff (because I have to review the Copilot code). But the cost of the code I write is definitely reduced, because:

- the review from my peers is faster (the code is more correct) - I come back less to the code (because I have thought about all the corner cases when checking the copilot code) - As I care more about naming & inline docs (it helps copilot), the code is actually cheaper to maintain.


Check out how it helped me write a React component: https://twitter.com/francoisz/status/1409919670803742734

I think hit the Tab key more than I hit any other key ;)


I can't see anything due to the variable bitrate.


Having been a TabNine user for a while I can say that it's less of a productivity boost and more of a quality of life win. It's hard to measure - not because it's small, but because it's lots of tiny wins, or wins at just the right moment. It makes me happy, and that's why I pay for it - the fact that it's probably also saving me some time is second to the fact that it saves me from annoying interrupts.


In addition, I would like to see GitHub reviewing my code & giving me suggestions on how I could improve. That will be more educative & a tool to ensure consistency across code.


I'm surprised this doesn't exist. Google, FB, and Apple (and I imagine Microsoft) have a lot of this stuff built in that is light-years better than any open source solution I'm aware of.

Given that MS owns GitHub and how valuable this is - I imagine it will be coming soon.


SonarQube and IntelliJ does it in some way for me.


+1 for SonarQube. Very easy way to add value to a project without a lot of overhead.


I tried TabNine and it wasn’t a huge improvement because what costs the most time isn’t typing stuff but thinking about what to type.


How did you manage to get early access? Is there some kind of GitHub Early Access programme (or something similar part of Enterprise)?


do you still go over the generated code line by line and touchup in places where it did not do a good job?


It suggests code line by line, so yes


I guess I dont see the point if only 10% of the time it's exactly what you want and the rest of the time you have to go back and touch up the line.

Does it train a programmer for accepting less than ideal code because it was suggested? Similar to how some programmers blindly copy code from StackOverflow without modification.

Seems like there is a potential downside that's being ignored.


> Does it train a programmer for accepting less than ideal code because it was suggested? Similar to how some programmers blindly copy code from StackOverflow without modification.

Maybe juniors, but I don't see this being likely for anyone else. I've been using TabNine for ages and it's closer to just a fancy autocomplete than a code assistant. Usually it writes what I would write, and if I don't see what I would write I just write it myself (until either I wrote the whole thing or it suggests the right thing). Of course, barring some variable names or whatever.

I don't have it "write code for me" - that happens in my head. It just does the typing.


> I guess I dont see the point if only 10% of the time it's exactly what you want and the rest of the time you have to go back and touch up the line.

Think of it as tab complete. If it's wrong, don't hit tab.


This is not true - and I've been using copilot for many months :)

It suggests entire blocks of code - but not in every context.


My bad, you're right. I remember now that it suggested me entire code blocks from time to time.

Do you know in which "context" it suggests a block?


It usually suggests blocks within a function / method in my experience. Here's an example I created just now:

https://gist.github.com/berndverst/1db9bae37f3c809e5c3f56262...


for those of you like this concept but 1. did not get into the alpha (or want something that has lots of great reviews) 2. need to run locally (security or connectivity) 3. want to use any IDE... please try Tabnine


It could start to replace us in 20 years. Or reduce. It is exciting for now


Until it automatically knows when and how it's wrong, you'll still need a human to figure that out, and that human will need to actually know how to program, without the overgrown auto-complete.

May or may not reduce the demand for programmers, though. We'll see.


But that's exactly what you are doing as a programmer who uses it. If you autocomplete using it and then fix the code, you are literally telling it what it got wrong.


I was responding to "It could start to replace us in 20 years." I think if it can do that, it'll basically be AGI and a lot of things will change in a hurry. I don't think that a tool that can do that is even all that similar to this tool.


> and the rest of the time it suggests something rather good, or completely off

In what, proportion roughly?


Hard to say, really. When writing React components, Jest tests and documentation, it's often not very far. I found it off when writing HTML markup (which is hard to describe with words).


Seems like it's best classed (for now) as an automated tool to generate and apply all the boilerplate snippets I would be creating & using manually, if I weren't too lazy, and/or too often switching between projects, to set all those up and remember how to use them (and that they exist).


It sounds similar to the editor plugin called TabNine


do you have an invite? very interested to check it out


Hi HN, we've been building GitHub Copilot together with the incredibly talented team at OpenAI for the last year, and we're so excited to be able to show it off today.

Hundreds of developers are using it every day internally, and the most common reaction has been the head exploding emoji. If the technical preview goes well, we'll plan to scale this up as a paid product at some point in the future.


Lots of questions:

  - the generated code by AI belongs to me or GitHub?
  - under what license the generated code falls under?
  - if generated code becomes the reason for infringment, who gets the blame or legal action?
  - how can anyone prove the code was actually generated by Copilot and not the project owner?
  - if a project member does not agree with the usage of Copilot, what should we do as a team?
  - can Copilot copy code from other projects and use that excerpt code?
    - if yes, *WHY* ?!
    - who is going to deal with legalese for something he or she was not responsible in the first place?
    - what about conflicts of interest?
  - can GitHub guarantee that Copilot won't use proprietary code excerpts in FOSS-ed projects that could lead to new "Google vs Oracle" API cases?


In general: (1) training ML systems on public data is fair use (2) the output belongs to the operator, just like with a compiler.

On the training question specifically, you can find OpenAI's position, as submitted to the USPTO here: https://www.uspto.gov/sites/default/files/documents/OpenAI_R...

We expect that IP and AI will be an interesting policy discussion around the world in the coming years, and we're eager to participate!


You should look into:

https://breckyunits.com/the-intellectual-freedom-amendment.h...

Great achievements like this only hammer home the point more about how illogical copyright and patent laws are.

Ideas are always shared creations, by definition. If you have an “original idea”, all you really have is noise! If your idea means anything to anyone, then by definition it is built on other ideas, it is a shared creation.

We need to ditch the term “IP”, it’s a lie.

Hopefully we can do that before it’s too late.


> Ideas are always shared creations, by definition. If you have an “original idea”, all you really have is noise! If your idea means anything to anyone, then by definition it is built on other ideas, it is a shared creation.

Copyright doesn't protect "ideas" it protects "works". If an artist spends a decade of his life painting a masterpiece, and then some asshole sells it on printed T-shirts, then copyright law protects the artist.

Likewise, an engineer who writes code should not have to worry about some asshole (or some for-profit AI) copy and pasting it into other peoples' projects. No copyright protections for code will just disincentivize open source.

Software patents are completely bullshit though, because they monopolize ideas which 99.999% of the time are derived from the ideas other people freely contributed to society (aka "standing on the shoulders of giants"). Those have to go, and I do not feel bad at all about labeling all patent-holders greedy assholes.

But copyright is fine and very important. Nothing is perfect, but it does work very well.


Copyrights are complete bullshit too though. In your 2 examples. First, the artist I assume is using paints and mediums developed arguably over thousands of years, at great cost. So just because she is assembling the leaf nodes of the tree, the far majority of the “work” was created by others. Shared creation.

Same goes for an engineer. Binary notation is at the root of all code, and in the intermediate nodes you have Boolean logic and microcode and ISAa and assembly and compilers and high level Lang’s and character sets. The engineer who assembles some leaf nodes that are copy and pasteable is by definition building a shared creation of which they’ve contributed the least.


The basis of copyright isn’t that the sum product is 100% original. That insane since nothing we do is ever original. It’ll always be a component ultimately of nature. The point is that your creation is protected for a set amount of time and then it too eventually becomes a component for future works.


> the artist I assume is using paints and mediums developed arguably over thousands of years, at great cost.

And they went to the store and paid money for those things.


And they handed the cashier money and then got to do whatever they want with those things. Now they want to sell their painting to the cashier AND control what the cashier does with it for the rest of the cashier's life. They want to make a slave out of the cashier by a million masters.


Remind me when GitHub handed anyone any money for the code they used?


I'm sure natfriedman will be thrilled to abolish IP and also apply this to the Windows source code. We can expect it on GitHub any minute!


I used to work at Microsoft and occasionally would email Satya the same idealistic pitch. I know they have to be more conservative , but some of us have to envision where the math can take us and shout out loud about it, and hope they steer well. When I started at MS, my first week was heckled for installing Ubuntu on my windows machine. When I left, windows was shipping with Ubuntu. What may seem impossible today can become real if enough people push the ball forward together. I even hold out hope that someday BG will see the truth and help reduce the ovarian lottery by legalizing intellectual freedom.


Talking about the ovarian lottery seems strange in a thread about an AI tool that will turn into a paid service.

No one will see the light at Microsoft. The "open" source babble is marketing and recruiting oriented, and some OSS projects infiltrated by Microsoft suffer and stagnate.


All I know is that if a lawsuit comes around for a company who tried to use this, Github et al won't accept an ounce of liability.


You can't abolish IP without completely restructuring the economic system (which I'm all for, BTW). But just abolishing IP and keeping everything the same is kind of myopic. Not saying that's what you're advocating for, but I've run into this sentiment before.


Sure, but usually I tend to think "abolish X" means "lets agree on an end goal of abolishing X and then work rapidly to transition to that world." So in that sense I tend to think the person is not advocating for the simple case of changing one law, but on the broader case of examining the legal system and making the appropriate changes to realize a functioning world where we can "abolish X".


I agree that it would be a huge upheaval. Absolutely massive change to society. But I have every confidence we have a world filled with good bright people who can steer us through the transition. Step one now is just educating people that these laws are marketed dishonestly, are inequitable, and counter productive to the progress of ideas. As long as the market starts to get wind that the truth is spreading, I believe it will start to prepare for the transition.


In practical terms, IP could be referred to as unique advantages. What is the purpose of an organization that has no unique qualities?

In general, what is IP and how it's enforced are two separate things. Just because we've used copyright and patents to "protect" an organization's unique advantages, doesn't mean we need to keep using them in the same way. Or maybe it's the best we can do for now. That's why BSD style licences are so great.


> training ML systems on public data is fair use

Uh, I very much doubt that. Is there any actual precedent on this?

> We expect that IP and AI will be an interesting policy discussion around the world in the coming years, and we're eager to participate!

But apparently not eager enough to have this discussion with the community before deciding to train your proprietary for-profit system on billions of lines of code that undoubtedly are not all under CC0 or similar no-attribution-required licenses.

I don't see attribution anywhere. To me, this just looks like yet another case of appropriating the public commons.


@Nat, these questions (all of them, not just the 2 you answered) are critical for anyone who is considering using this system. Please answer them?

I for one wouldn't touch this with a 10000' pole until I know the answers to these (very reasonable) questions.


How do you guarantee it doesn't copy a GPL-ed function line-by-line?


Yup, this isn't a theoretical concern, but a major practical one. GPT models are known for memorizing their training data: https://towardsdatascience.com/openai-gpt-leaking-your-data-...

Edit: Github mentions the issue here: https://docs.github.com/en/github/copilot/research-recitatio... and here: https://copilot.github.com/#faq-does-github-copilot-recite-c... though they neatly ignore the issue of licensing :)


That second link says the following:

> We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set

That's kind of a useless stat when you consider that the code it generates makes use of your existing variable/class/function names when adapting the code it finds.

I'm not a lawyer, but I'm pretty sure I can't just bypass GPL by renaming some variables.


It's not just about regurgitating training data during a beam search, it's also about being a derivative work, which it clearly is in my opinion.


> GPT models are known for memorizing their training data

Hash each function, store the hashes as a blacklist. Then you can ask the model to regenerate the function until it is copyright safe.


What if it copies only a few lines, but not an entire function? Or the function name is different, but the code inside is the same?


If we could answer those questions definitively, we could also put lawyers out of a job. There’s always going to be a legal gray area around situations like this.


Matching on the abstract syntax tree might be sufficient, but might be complex to implement.


You can probably tokenize the names so they become irrelevant. You can ignore non-functional whitespace, so that code C remains. Maybe one can hash all the training data D such that hash(C) is in hash(D). Some sort of Bloom filter...


Surprised not to see more mention of this. It would make sense for an AI to "copy" existing solutions. In the real world, we use clean room to avoid this.

In the AI world, unless all GPL (etc.) code is excluded from the training data, it's inevitable that some will be "copied" into other code.

Where lawyers decide what "copy" means.


It's not just about copying verbatim. They clearly use GPL code during training to create a derivative work.

Then you have the issue of attribution with more permissive licenses.


How do you know that when you write a simplish function for example, it is not identical to some GPL code somewhere? "Line by line" code does not exist anywhere in the neural network. It doesn't store or reference data in that way. Every character of code is in some sense "synthesized". If anything, this exposes the fragility of our concept of "copyright" in the realm of computer programs and source code. It has always been ridiculous. GPL is just another license that leverages the copyright framework (the enforcement of GPL cannot exist outside such a copyright framework after all) so in such weird "edge cases" GPL is bound to look stupid just like any other scheme. Remember that GPL also forbids "derivative" works to be relicensed (with a less "permissive" one). It is safe to say that you are writing code that is close enough to be considered "derivative" to some GPL code somewhere pretty much every day, and you can't possibly prove that you didn't cheat. So the whole framework collapses in the end anyways.


> How do you know that when you write a simplish function for example, it is not identical to some GPL code somewhere?

I don't, but then I didn't go first look at the GPL code, memorize it completely, do some brain math, and then write it out character by character.


I truly don't think they can guarantee that. Which is a massive concern.


(1) That may be so, but you are not training the models on public data like sports results. You are training it on copyright protected creations of humans that often took years to write.

So your point (1) is a distraction, and quite an offensive one to thousands of open source developers, who trusted GitHub with their creations.


   (1) training ML systems on public data is fair use 

This one is tricky considering that kNN is also a ML system.


kNN needs to hold on to a complete copy of the dataset itself unlike a neural net where it's all mangled.


What about privacy. Does the AI send code to GitHub? This reminds me of Kite


Yes, under "How does GitHub Copilot work?":

> [...] The GitHub Copilot editor extension sends your comments and code to the GitHub Copilot service, which then uses OpenAI Codex to synthesize and suggest individual lines and whole functions.


Fair use doesn't exist in every country, so it's US only?


Yes, my partner likes to remind me we don't have it here in Australia. You could never write a search engine here. You can't write code that scrapes websites.


It exists in EU also (and it much mire powerful here).


The EU doesn't have a copyright related fair use. Quite the opposite, that why we are getting upload filters.


False. In Spain you have it as under "uso legitimo".


Spain is only part of the EU not the EU.


One exception makes the whole "EU doesn't have" incorrect.

EU doesn't enforce it on the states, yes. But some (maybe all) countries that are in EU do have it.


> We expect that IP and AI will be an interesting policy discussion around the world in the coming years, and we're eager to participate!

Another question is this: let's hypothesize I work solo on a project; I have decided to enable Copilot and have reached a 50%-50% development with it after a period of time. One day the "hit by a bus" factor takes place; who owns the project after this incident?


Your estate? The compiler comparison upthread seems to be perfectly valid. If you work on a solo project in c# and die, Microsoft doesn’t automatically own your project because you used visual studio to produce it


> the output belongs to the operator, just like with a compiler.

No it really is not that easy, as with compilers it depends on who owned the source and which license(s) they applied on it.

Or would you say I can compile the Linux kernel and the output belongs to me, as compiler operator, and I can do whatever I want with it without worrying about the GPL at all?


> training ML systems on public data is fair use

So, to be clear, I am allowed to take leaked Windows source code and train an ML model on it?


Or, take leaked Windows source code, run it through a compiler, and own it!


What does "public" mean? Do you mean "public domain", or something else?


Unfortunately, in ML "public data" typically means available to the public. Even if it's pirated, like much of the data available in the Books3 dataset, which is a big part of some other very prominent datasets.


So basically youtube all over again? I.e bootstrap and become popular by using widely available whatever media (pirated by crowdsourced piracy) and then many years later, when it gets popular, dominant, it has to turn around and "do things right" and guard copyrights.


Fair Use is an affirmative defense (i.e. you must be sued and go to court to use it; once you're there, the judge/jury will determine if it applies). But taking in code with any sort of restrictive license (even if it's just attribution) and creating a model using it is definitely creating a derivative work. You should remember, this is why nobody at Ximian was able to look at the (openly viewable, but restrictively licensed) .NET code.

Looking at the four factors for fair use looks like Copilot will have these issues: - The model developed will be for a proprietary, commercial product - Even if it's a small part of the model, the all training data for that model are fully incorporated into the model - There is a substantial likelihood of money loss ("I can just use Copilot to recreate what a top tier programmer could generate; why should I pay them?")

I have no doubt that Microsoft has enough lawyers to keep any litigation tied up for years, if not decades. But your contention that this is "okay because it's fair use" based on a position paper by an organization supported by your employer... I find that reasoning dubious at best.


It is the end of copyright then. NNs are great at memorizing text. So I just train a large NN to memorize a repository and the code it outputs during "inferencing" is fair use ?

You can get past GPL, LGPL and other licenses this way. Microsoft can finally copy the linux kernel and get around GPL :-).


> - under what license the generated code falls under?

Is it even copyrighted? Generally my understand is that to be copyrightable it has to be the output of a human creative process, this doesn't seem to qualify (I am not a lawyer).

See also, monkeys can't hold copyright: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...


> Is it even copyrighted?

Isn't it subject to the licenses the model was created from, as the learning is basically just an automated transformation of the code, which would be still the original license - as else I could just run some minifier, or some other, more elaborate, code transformation, on some FOSS project, for example the Linux kernel, and relicense it under whatever?

Does not sound right to me, but IANAL and I also did not really look at how this specific model/s is/are generated.

If I did some AI on existing code I'd be quite cautious and group by compatible licences classes, asking the user what their projects licence is and then only use the compatible parts of the models.-Anything else seems not really ethical and rather uncharted territory in law to me, which may not mean much as IANAL and just some random voice on the internet, but FWIW at least I tried to understand quite a few FOSS licences to decide what I can use in projects and what not.

Anybody knows of some relevant cases of AI and their input data the model was from, ideally in jurisdictions being the US or any European Country ones?


This is a great point. If I recall correctly, prior to Microsoft's acquisition of Xamarin, Mono had to go out of its way to avoid accepting contributions from anyone who'd looked at the (public but non-FOSS) source code of .NET, for fear that they might reproduce some of what they'd seen rather than genuinely reverse engineering.

Is this not subject to the same concern, but at a much greater scale? What happens when a large entity with a legal department discovers an instance of Copilot-generated copyright infringement? Is the project owner liable, is GitHub/Microsoft liable, or would a court ultimately tell the infringee to deal with it and eat whatever losses occur as a result?

In any case, I hope that GitHub is at least limiting any training data to a sensible whitelist of licenses (MIT, BSD, Apache, and similar). Otherwise, I think it would probably be too much risk to use this for anything important/revenue-generating.


> In any case, I hope that GitHub is at least limiting any training data to a sensible whitelist of licenses (MIT, BSD, Apache, and similar). Otherwise, I think it would probably be too much risk to use this for anything important/revenue-generating.

I'm going to assume that there is no sensible whitelist of licenses until someone at GitHub is willing to go on the record that this is the case.


> I hope that GitHub is at least limiting any training data to a sensible whitelist of licenses (MIT, BSD, Apache, and similar)

Yes, and even those licences require preservation of the original copyright attribution and licence. MIT gives some wiggle room with the phrase "substantial portions", so it might just be MIT and WTFPL


Interesting to see since Nat was a founder of Xamarin


(Not a lawyer, and only at all familiar with US law, definitely uncharted territory)

No, I don't believe it is, at least to the extent that the model isn't just copy and pasting code directly.

Creating the model implicates copyright law, that's creating a derivative work. It's probably fair use (transformative, not competing in the market place, etc), but whether or not it is fair use is github's problem and liability, and only if they didn't have a valid license (which they should have for any open source inputs, since they're not distributing the model).

I think the output of the model is just straight up not copyrighted though. A license is a grant of rights, you don't need to be granted rights to use code that is not copyrighted. Remember you don't sue for a license violation (that's not illegal), you sue for copyright infringement. You can't violate a copyright that doesn't exist in the first place.

Sometimes a "license" is interpreted as a contract rather than a license, in which you agreed to terms and conditions to use the code. But that didn't happen here, you didn't agree to terms and conditions, you weren't even told them, there was no meeting of minds, so that can't be held against you. The "worst case" here (which I doubt is the case - since I doubt this AI implicates any contract-like licenses), is that github violated a contract they agreed to, but I don't think that implicates you, you aren't a party to the contract, there was no meeting of minds, you have a code snippet free of copyright received from github...


So if I make AI that takes copyrighted material in one side, jumbles it about, and spits out the same copyrighted material on the other side, I have successfully laundered someone else's work as my own?

Wouldn't GitHub potentially be responsible for the infringement by distributing the copyrighted material knowing that it would be published?


I exempted copied segments at the start of my previous post for a reason, that reason is I don't really know, I doubt it works because judges tend to frown on absurd outcomes.


Where does copying end though? If an AI "retypes" it, not only with some variable renaming but some transformations that are not just describable by a few code transformations (neural nets are really not transparent and can do weird stuff), it wouldn't seem like a copy when just comparing parts of it, but it effectively would be one, as it was an automated translation.


Probably, copying ends when the original creative elements are unrecognizable. Renaming variables actually goes a long way to that, also having different or standardized (and therefore not creative) whitespace conventions, not copying high level structure of files, etc.

The functional parts of code are not copyrightable, only the non functional creative elements.

(Not a lawyer...)


> The functional parts of code are not copyrightable, only the non functional creative elements.

1. Depends heavily on the jurisdiction (e.g., Software patents are a thing in America but not really in basically all European ones)

2. A change to a copyrightable work, creative or not, would still mean that you created a derived work where you'd hold some additional rights, depending on the original license, but not that it would now be only in your creative possession. E.g., check §5 of https://www.gnu.org/licenses/gpl-3.0.en.html

3. What do you think of when saying "functional parts"? Some basic code structure like an `if () {} else {}` -> sure, but anything algorithmic like can be seen as copyrightable, and whatever (creative or not) transformation you apply, in its basics it is a derived work, that's just a fact and the definition of derived work.

Now, would that matter in courts? That depends not only on 1., but additionally to that also very much on the specific case, and for most trivial like it probably would be ruled out, but if an org would invest enough lawyer power, or suing in a for its case favourable court (OLG Hamburg anyone). Most small stuff would be thrown out as not substantial enough, or die even before reaching any court.

But, that actually scares me a bit when thinking about that in this context, as for me, it seems like when assuming you'd be right, this all would significantly erodes the power of copyleft licenses like (A)GPL.

Especially if a non-transparent (e.g., AI), lets call it, code laundry would be deemed as a lawful way to strip out copyright. As it is non-transparent it wouldn't be immediately clear if creative change or not, to use the criteria for copyright you used. This would break basically the whole FOSS community, and with its all major projects (Linux, coreutils, ansible, git, word press, just to name a few) basically 80% of core infrastructure.


If the model is a derivative work, why wouldn’t works generated using the model also be derivative works?


Because a derivative work must "as a whole, represent an original work of authorship".

https://www.law.cornell.edu/uscode/text/17/101

(Not a lawyer...)


In the US, yes. Elsewhere, not necessarily.


It is output of humans creative processes, just not yours. Like an automated stackoverflow snippet engine.


>Generally my understand is that to be copyrightable it has to be the output of a human creative process

https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...


You should read the FAQ at the bottom of the page; I think it answers all of your questions: https://copilot.github.com/#faqs


> You should read the FAQ at the bottom of the page; I think it answers all of your questions: https://copilot.github.com/#faqs

Read it all, and the questions still stand. Could you, or any on your team, point me on where the questions are answered?

In particular, the FAQ doesn't assure that the "training set from publicly available data" doesn't contain license or patent violations, nor if that code is considered tainted for a particular use.


From the faq:

> GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set.

I'm guessing this covers it. I'm not sure if someone posting their code online, but explicitly saying you're not allowed to look at it, getting ingested into this system with billions of other inputs could somehow make you liable in court for some kind of infringement.


That doesn't cover it, since that is a technical answer for a non-technical question. The same questions remain.


that doesn't include patent violations nor license violations or compatibility between licenses. Which would be the most numerous and non-trivial cases.


How is it possible to determine if you've violated a random patent from somewhere on the internet via a small snippet of customized auto-generated code?

Does everyone in this thread contact their lawyers after cutting and pasting a mergesort example from Stackoverflow that they've modified to fit their needs? Seems folks are reaching a bit.


For that very reason, many companies have policies that forbid copying code from online (especially from StackOverflow).


That mitigates copyright concerns, but patent infringement can occur even if the idea was independently rediscovered.


I was answering a specific question, "How is it possible to determine if you've violated a random patent from somewhere on the internet via a small snippet of customized auto-generated code?" The answer is that many companies have forbidden that specific action in order to remove the risk from that action.

You are expanding the discussion, which is great, but that doesn't apply in answer to that specific question.

There are answers in response to your question, however. For example, many companies use software for scanning and composition analysis that determines the provenance and licensing requirements of software. Then, remediation steps are taken.


Not sure what you're getting at. Are you suggesting that independent discovery is a defense against patents? Or are you clear that it isn't a defense, but just arguing that something from the internet is more likely to be patented than something independently invented in-house? Maybe that's true, but it doesn't really answer the question of

> How is it possible to determine if you've violated a random patent from somewhere on the internet via a small snippet of customized auto-generated code?

The only real answer is a patent search.


There are different ways to handle risk, such as avoidance, reduction, transfersal, acceptance. I was answering a specific question as to how people manage risk in a given situation. In answer I related how companies will reduce the risk. I was not talking about general cases of how to defend against the risk of patents but a specific case as to reducing the risk of adding externally found code into a product.

My answer described literally what many companies do today. It was not a theoretical pie in the sky answer or a discussion about patent IP.

To restate, the real-world answer I gave for, "How is it possible to determine if you've violated a random patent from somewhere on the internet via a small snippet of customized auto-generated code?" is often "Do not take code from the Internet."


I think a patent violation with CoPilot is exactly the same scenario as if you violated a patent yourself without knowing it.


Sounds like using CodePilot can introduct GPLd code into your project and make your project bound by GPL as a result...

0.1% is a lot when you use 100 suggestions a day.


The most important question, whether you own the code, is sort of maybe vaguely answered under “How will GitHub Copilot get better over time?”

> You can use the code anywhere, but you do so at your own risk.

Something more explicit than this would be nice. Is there a specific license?

EDIT: also, there’s multiple sections to a FAQ, notice the drop down... under “Do I need to credit GitHub Copilot for helping me write code?”, the answer is also no.

Until a specific license (or explicit lack there-of) is provided, I can’t use this except to mess around.


None of the questions and answers in this section hold information about how the generated code affects licensing. None of the links in this section contain information about licensing, either.


I dont see the answer to a single one of their questions on that page - did you link to where you intended?

Edit: you have to click the things on the left, I didn't realize they were tabs.


Sorry Nat, but I don't think it really answers anything. I would argue that using GPL code during training falls under Copilot being a derivative work of said code. I mean if you look at how a language model works, than it's pretty straightforward. The word "code synthesizer" alone insinuates as much. I think this will probably ultimately tested in court.


This page has a looping back button hijack for me


Does Copilot phone home?


When you sign up for the waitlist it asks permission for additional telemetry, so yes. Also the "how it works" image seems to show the actual model is on github's servers.


Yes, and with the code you're writing/generating.


This obviously sucks.

Can't companies write code that runs on customer's premises these days? Are they too afraid somebody will extract their deep learning model? I have no other explanation.

And the irony is that these companies are effectively transferring their own fears to their customers.


It's a large and gpu-hungry model.


Some of your questions aren't easy to answer. Maybe the first two were OK to ask. Others would probably require lawyers and maybe even courts to decide. This is a pretty cool new product just being shared on an online discussion forum. If you are serious about using it for a company, talk to your lawyers, get in touch with Github's people, and maybe hash out these very specific details on the side. Your comment came off as super negative to me.


> This is a pretty cool new product just being shared on an online discussion forum.

This is not one lone developer with a passion promoting their cool side-project. It's GitHub, which is an established brand and therefore already has a leg up, promoting their new project for active use.

I think in this case, it's very relevant to post these kinds of questions here, since other people will very probably have similar questions.


I think these are very important questions.

The commenter isn't interrogating some indy programmer. This is a product of a subsidiary of Microsoft, who I guarantee has already had a lawyer, or several, consider these questions.


No, they are all entirely reasonable questions. Yeah, they might require lawyers to answer - tough shit. Understanding the legal landscape that ones' product lives in is part of a company's responsibility.


Regardless of tone, I thought it was chock full of great questions that raised all kinds of important issues, and I’m really curious to hear the answers.


What do you think about this being overall detrimental to code quality as it allows people to just blindly accept completions without really understanding the generated code. Similar to copy-and-paste coding.

The first example parse_expenses.py uses a float for currency - that seems to be a pretty big error that's being overlooked along with other minor issues around no error handling.

I would say the quality of the generated code in parse_expenses.py is not very high, certainly not for the banner example.

EDIT - I just noticed Github reordered the examples on copilot.github.com in order to bury the issues with parse_expenses.py for now. I guess I got my answer.


How is it different from the status quo of people just doing the wrong thing or copy pasting bad code? Yes there's the whole discussion below about float currency values, but I could very well see the opposite happening too, where this thing recommends better code that the person would've written otherwise.


> How is it different from the status quo of people just doing the wrong thing or copy pasting bad code?

Well, yes, the wrong code would be used. However - the wrong code would then become more prevelant as an answer from gh, causing more people to blindly use it. It's a self-perpetuating cycle of finding and using bad and wrong code.


Hmm, not quite. My point was that if they aren't a good enough programmer to understand why the code is wrong, then chances are they would've written bad code or copy pasted bad code anyways. It just makes the cycle faster.

And again, I could argue that the opposite could happen too, people who would otherwise have written bad code could be given suggestions of better code that they would've written.


> Hmm, not quite. My point was that if they aren't a good enough programmer to understand why the code is wrong, then chances are they would've written bad code or copy pasted bad code anyways. It just makes the cycle faster.

No, not quite. It also makes the cycle more permanent and its results deeply ingrained, which is what is actually relevant.


Either way it wouldn’t matter since the only thing short of stopping the cycle is stack overflow to close down and a new stack overflow not to open up. A very unlikely scenario for this industry. Either way , no matter the difference in time frame, the result would have always been permanent.


People make mistakes. With computers people make mistakes much faster :)


To err is human. To really mess things up, you need a computer.


“People can create tech debt, but robots can do it at scale.”


It seems that copilot lets one cycle through options, which is an opportunity for it to facilitate programmers moving from a naive solution to one they hadn't thought of that is more correct.

(Unclear to me yet whether the design takes advantage of this opportunity)


I use a similar feature in IntelliJ idea, and I've often found that first time I learn about new feature in the language is when I get a suggestion. I usually explore topic much more deeply at that time. So far from helping me copy-paste, I find code suggestions help me explore new features of the language and framework, that I might not have known about.


>The first example parse_expenses.py uses a float for currency

I have made this mistake without the help of any AI or copy/paste. It's still in the hands of the developer to test and review everything they commit.


Why would you say it's an error to use a float for currency? I would imagine it's better to use a float for calculations then round when you need to report a value rather than accumulate a bunch of rounding errors while doing computations.


It is widely accepted that using floats for money[1] is wrong because floating point numbers cannot guarantee precision.

The fact that you ask is a very good case in point though: Many programmers are not aware of this issue and would maybe not question the "wisdom" of the AI code generator. In that sense, it could have a similar effect to blindly copy-pasted answers from SO, just with even less friction.

[1] Exceptions may apply to e.g. finance mathematics where you need to work with statistics and you're not going to expect exact results anyway.


Standard floats cannot represent very common numbers such as 0.1 exactly so they are generally disfavored for financial calculations where an approximated result is often unacceptable.

> For example, the non-representability of 0.1 and 0.01 (in binary) means that the result of attempting to square 0.1 is neither 0.01 nor the representable number closest to it.

https://en.wikipedia.org/wiki/Floating-point_arithmetic#Accu...



By definition, currency uses fixed point arithmetic not floating point arithmetic.


Not even remotely true. It is entirely context dependent. I've always used floats when working in finance.


Some people say "goin der" instead of "going there", that doesn't change the definitions of words, just because people are lazy with their language.


I fail to see your point. Floats are best practice for many financial applications, where model error already eclipses floating point error and performance matters.


Your point in the floating-point discussion is true, but you’re wrong about this one- linguistics is a descriptive field, not a prescriptive one.


Standard practice is to use a signed decimal number with an appropriate precision that you scale around.


You don't want to kick the can down to the floating point standard. Design for deterministic behavior. Find the edge cases, go over it with others and explicitly address the edge case issues so that they always behave as expected.


micro-dollars are a better way of representing it (multiply by 10e6); store as bigint.

See: https://stackoverflow.com/a/51238749


Googler, opinions are my own.

Over in payments, we use micros regularly, as documented here: https://developers.google.com/standard-payments/reference/gl...

GCP on there other hand has standardized on unit + nano. They use this for money and time. So unit would 1 second or 1 dollar, then the nano field allows more precision. You can see an example here with the unitPrice field: https://cloud.google.com/billing/v1/how-tos/catalog-api#gett...


No, they aren't. Micro-dollars do not exist, so this method is guaranteed to cause errors.


this is a common approach when you are dealing in rates less than .01 -- you just need to be sure you are rounding correctly


When you are approximating fixed-point using floating-point there is a lot more you need to do correctly other than roun ding. Your representation must have enough precision and range for the beginning inputs, intermediate results, and final results. You must be able to represent all expected numbers. And on. There is a lot more involved than what you mentioned.

Of course, if you are willing to get incorrect results, such as in play money, this may be okay.


When did mdellavo anything about floating point? You can, and should, use plain old fixed-point arithmetic for currency. That’s what he means by “microdollar”.


> store as bigint


Thank you. I made a mistake due to the starting comment in the thread.


Using float for currency calculations is how you accumulate a bunch of rounding errors. Standard practice when dealing with money is to use an arbitrary-precision numerical type.


Because it's an error to use floats in almost every situation. And currency is something where you don't want rounding errors, period. The more I've learned about floating point numbers over the years, the less I want to use them. Floats solve a specific problem, and they're a reasonable trade-off for that kind of problem, but the problem they solve is fairly narrow.


Using float is perfectly OK since using fixed point decimal (or whatever "exact" math operations) will lead to rounding error anyway (what about multiplying a monthly salary by 16/31 (half a month) ?)

The problem with float is that many people don't understand how they work to handle rounding errors correctly.

Now there are some cases where float don't cut it. And big ones. For example, summing a set of numbers (with decimal parts) will usually be screwed if you don't round it. And not many people expect to round the results of additions because they are "simple" operations. So you get errors in the end.

(I have written applications that handle billions of euros with floats and have found just as many rounding errors there as in any COBOL application)


It seems incorrect to determine a half a month as 16/31 but ok , for your proposed example:

    >>> from decimal import Decimal
    >>> Decimal(1000) * Decimal(16) / Decimal(31)
    Decimal('516.1290322580645161290322581')
    >>> 1000 * 16 / 31
    516.1290322580645
The point is using Decimal allows control over precision and rounding rather than accepting ad-hoc approximations of a float.

https://docs.python.org/3/library/decimal.html

If it were me, I wouldn't go around bragging about how much money my software manages while being willfully ignorant of the fundamentals.


OK, the salary example was a bit simplified; in my case it was about giving financial help to someone. That help is based on a monthly allowance and then split in the number of allocated days in the month, that's for the 16/31.

Now for your example, I see that float and decimal just give the same result. Provided I'm doing financial computations of a final number, I'm ok with 2 decimals. And both your computations work fine.

Th decimal module in python gives you number of significant digits, not number of decimals. You'll end up using .quantize() to get to two decimals which is rounding (so, no advantage over floats).

As I said, as soon as you have division/multiplication you'll have to take care of rounding manually. But for addition/subtraction, then decimal doesn't need rounding (which is better).

The fact is that everybody say "floats are bad" because rounding is tricky. But rounding is always possible. And my point is that rounding is tricky even with the decimal module.

And about bragging, I can tell you one more thing : rounding errors were absolutely not the worse of our problems. The worse problem is to be able to explain to the accountant that your computation is right. That's the hard part 'cos some computations imply hundreds of business decisions. When you end up on a rounding error, you're actually happy 'cos it's easy to understand, explain and fix. And don't start me on how laws (yes, the texts) sometimes explain how rounding rules should work.


  sum = 0
  for i in range(0, 10000000):
    sum += 0.1
  print(round(sum*1000, 2))
what should this code print? what does it print?

I mean, sure, this is a contrived example. But can you guarantee that your code doesn't do anything similarly bad? Maybe the chance is tiny, but still: wouldn't you like to know for sure?


We agree, on additions, floats are tricky. But still, on division, multiplications, they're not any worse. Dividing something by 3 will end up in an infinite number of decimals that you'll have to round at some point (except if we use what you proposed : fractions; in that case that's a completely different story).


No, exact precision arithmetic can do that 16/31 example without loss of precision:

  from fractions import Fraction
  
  # salary is $3210.55
  salary = Fraction(321055,100)
  monthlyRate = Fraction(16,31)

  print(salary*monthlyRate)
This will give you an exact result. Now, at some point you'll have to round to the nearest cent (or whatever), true. However, you don't have to round between individual calculations, hence rounding errors cannot accumulate and propagate.

The propagation of errors is the main challenge with floating point numbers (regardless of which base you use). The theory is well understood (in the sense that we can analyse an algorithm and predict upper bounds on the relative error), but not necessarily intuitive and easy to get wrong.

Decimal floating-point circumvents the issue by just not introducing errors at all: money can be represented exactly with decimal floating point (barring very exotic currencies), therefore errors also can't propagate. Exact arithmetic takes the other approach where computations are exact no matter what (but this comes at other costs, e.g. speed and the inability to use transcendental functions such as exp).

For binary floating point, that doesn't work. It introduces errors immediately since it can't represent money well and these errors may propagate easily.


Of course, if you use "fractions" then, we agree, no error will be introduced nor accumulated over computations which is better. The code base I'm talking about is Java 10 years ago. I was not aware of fractions at that time. There was only BigDecimal which was painful to work with (the reason why we ditched it at the time).


It's mostly painful because Java doesn't allow custom types to use operators, which I think was a maybe reasonable principle applied way too strictly. The same applies to any Fraction type you'd implement in Java.

Still, I'll take "verbose" over "error-prone and possibly wrong".


I visited https://copilot.github.com/, and I don't know how to feel. Obviously it's a nice achievement, not gonna lie.

But I have a feeling it will end up causing more work. e.g. the `averageRuntimeInSeconds` example, I had to spend a bit of time to see if it was actually correct. It has to be, since it's on the front page, but then I realized I'd need to spend time reviewing the AI's code.

It's cool as a toy, but I'd like to see where it is one year from now when the wow factor has cooled down a bit.


Interesting comment and I agree. Reading and writing code seem to involve different parts of the brain. I wonder if tools like this will create some sort of code review fatigue. I can write code for a few hours a day and enjoy it but I couldn't do code review for hours, every day.

This isn't like skimming through a codebase to get a sense of what the code does. You'd have to thoroughly review each line to make sure it does what you want it to do, that there are no bugs. And even then, you'd feel left behind pretty quickly because your brain didn't create the paths to the solution to the problem you're trying to solve. It is like reading a solution to a problem on leetcode vs coming up with it yourself.


It has the ability to generate unit tests as well, which will help cut down some on the verification side if you feed it enough cases.


I think I'd love to use this to generate tests and then write the functions myself. Test generation seems like a killer feature.


Yes!! Totally agree. Imagine writing a method and then telling an AI to write your unit tests for it. The AI would likely be able to come up with the edge cases and such that you would not normally take the time to write.

While I think the AI generating your mainline code is interesting, I must certainly agree that generating test code would be the killer feature. I would like to see this showcased a little more on the copilot page.


You don’t need AI for that. While example based testing is familiar to most, other approaches exist that can achieve this with less complexity. See: property based testing.


Yes, I agree. But just to ask, wouldn't we consider that a form of AI testing, even just in very primitive form? We're begging the question for the very definition of AI. I would argue that your example is just the primordial version of what machine reasoned testing could potentially offer.


Well then you have to check the generated tests. That's just one more layer, isn't it?


If you question the veracity of the code that is produced, you have to question the usefulness of the unit test that is produced.


> I had to spend a bit of time to see if it was actually correct.

Interesting point - it reminds me of the idea that it’s harder to debug code than to write it. Is it also harder to interpret code you didn’t write than to write it?


Might this end up putting GPL code into projects with an incompatible license?


It shouldn't do that, and we are taking steps to avoid reciting training data in the output: https://copilot.github.com/#faq-does-github-copilot-recite-c... https://docs.github.com/en/early-access/github/copilot/resea...

In terms of the permissibility of training on public code, the jurisprudence here – broadly relied upon by the machine learning community – is that training ML models is fair use. We are certain this will be an area of discussion in the US and around the world and we're eager to participate.


> ...the jurisprudence here – broadly relied upon by the machine learning community – is that training ML models is fair use.

To be honest, I doubt that. Maybe I am special, but if I am releasing some code under GPL, I really don't want it to be used in training a closed source model, which will be used in a closed source software generating code for closed source projects.


The whole point of fair use is that it allows people to copy things even when the copyright holder doesn't want them to.

For example, if I am writing a criticism of an article, I can quote portions of that article in my criticism, or modify images from the article in order to add my own commentary. Fair use protects against authors who try to exert so much control over their works that it harms the public good.


This isn't the same situation at all. The copying of code doesn't seem to be for a limited or transformative purpose. Fair use might cover parody or commentary & criticism but not limitless replication.


They are not replicating the code at all. They are training a neural network. The neural network then learns from the code and synthesises new code.

It's no different from a human programmer reading code, learning from it, and using that experience to write new code. Somewhere in your head there is code that someone else wrote. And it's not infringing anybody's copyright for those memories to exist in your head.


We can't yet equivocate ML systems with human beings. Maybe one day. But at the moment, it's probably better to compare this to a compiler being fed licensed code. The compilation output is still subject to the license. Regardless of how fancy the compiler is.

Also, a human being that reproduces licensed code from memory - because they read that code - would constitute a license violation. The line between derivative work, and authentic new original creation is not a well defined one. This is why we still have human arbiters of these decisions and not formal differential definitions of it. This happens in music for example all the time.


If avoiding copyright violations was simply "I remembered it", then I don't think things like clean-room reverse engineering would be ever legally necessary [1]

[1] https://en.wikipedia.org/wiki/Clean_room_design


It is replication, maybe not of a single piece of code - but creating a synthesis is still copying. For example, constructing a single piece of code of three pieces of code from your co-workers is still replication of code.

Your argument would have some merit if something were created instead of assembled, but there is no new algorithm that is being created. That is not what is happening here.

On the one hand, you call this copying in fair use. On the other hand, you say this is creating new code. You can't have it both ways.


> Your argument would have some merit if something were created instead of assembled, but there is no new algorithm that is being created. That is not what is happening here.

If you're going to set such a high standard for ML tools like this, I think you need to justify why it shouldn't apply to humans too.

When a human programmer who has read copyrighted code at some point in their life writes new code that is not a "new algorithm", are they in violation of the copyrights of every piece of code they've ever read that was remotely similar in any respect to the new work?

I mean, I hope not!

> On the one hand, you call this copying in fair use. On the other hand, you say this is creating new code. You can't have it both ways.

I'm not a lawyer, but this actually sounds very close to the "transformative" criterion under fair use. Elements of existing code in the training set are being synthesized into new code for a new application.

I assume there's no off-the-shelf precedent for this, but given the similarity with how human programmers learn and apply knowledge, it doesn't seem crazy to think this might be ruled as legitimate fair use. I'd guess it would come down to how willing the ML system is to suggest snippets that are both verbatim and highly non-generic.


From https://docs.github.com/en/github/copilot/research-recitatio...: "Once, GitHub Copilot suggested starting an empty file with something it had even seen more than a whopping 700,000 different times during training -- that was the GNU General Public License."

On the same page is an image showing copilot in real-time adding the text of the famous python poem, The Zen of Python. See https://docs.github.com/assets/images/help/copilot/resources... for a link directly to copilot doing this.

You are making arguments about what you read instead of objectively observing how copilot operates. Just because GH wrote that copilot synthesizes new code doesn't mean that it writes new code in the way that a human writes code. That is not what is happening here. It is replicating code. Even in the best case copilot is creating derivative works from code where GH is not the copyright owner.


> You are making arguments about what you read instead of objectively observing how copilot operates.

Of course I am. We are both participating in a speculative discussion of how copyright law should handle ML code synthesis. I think this is really clear from the context, and it seems obvious to me that this product will not be able to move beyond the technical preview stage if it continues to make a habit of copying distinctive code and comments verbatim, so that scenario isn't really interesting to me. Github seems to agree (from the page on recitation that you linked):

> This investigation demonstrates that GitHub Copilot can quote a body of code verbatim, but that it rarely does so, and when it does, it mostly quotes code that everybody quotes, and mostly at the beginning of a file, as if to break the ice.

> But there’s still one big difference between GitHub Copilot reciting code and me reciting a poem: I know when I’m quoting. I would also like to know when Copilot is echoing existing code rather than coming up with its own ideas. That way, I’m able to look up background information about that code, and to include credit where credit is due.

> The answer is obvious: sharing the prefiltering solution we used in this analysis to detect overlap with the training set. When a suggestion contains snippets copied from the training set, the UI should simply tell you where it’s quoted from. You can then either include proper attribution or decide against using that code altogether.

> This duplication search is not yet integrated into the technical preview, but we plan to do so. And we will both continue to work on decreasing rates of recitation, and on making its detection more precise.

The arguments you've made here would seem to apply equally well to a version of Copilot hardened against "recitation", hence my reply.

> Even in the best case copilot is creating derivative works from code where GH is not the copyright owner.

It would be convenient for your argument(s) if it were decided legal fact that ML-synthesized code is derivative work, but it seems far from obvious to me (in fact, I would disagree) and you haven't articulated a real argument to that effect yourself. It has also definitely not been decided by any legal entity capable of establishing precedent.

And, again, if this is what you believe then I'm not sure how the work of human programmers is supposed to be any different in the eyes of copyright law.


> Of course I am. We are both participating in a speculative discussion of how copyright law should handle ML code synthesis. I think this is really clear from the context, and it seems obvious to me that this product will not be able to move beyond the technical preview stage if it continues to make a habit of copying distinctive code and comments verbatim, so that scenario isn't really interesting to me. Github seems to agree (from the page on recitation that you linked):

No. We both aren't. I am discussing how copilot operates from the perspective of a user concerned about legal ramifications. I backed that concern up with specific factual quotes and animated images from github, where github unequivocally demonstrated how copilot copies code. You are speculating how copyright law should handle ML code synthesis.


> No. We both aren't

You say I'm not ... but then you say, explicitly in so many words, that I am:

> You are speculating how copyright law should handle ML code synthesis.

I don't get it. Am I, or aren't I? Which is it? I mean, not that you get to tell me what I am talking about, but it seems like something we should get cleared up.

edit: Maybe you mean I am, and you aren't?

Beyond that, I skimmed the Github link, and my takeaway was that this is a small problem (statistically, in terms of occurrence rate) that they have concrete approaches to fixing before full launch. I never disputed that "recitation" is currently an issue, but honestly that link seems to back up my position more than it does yours (to the extent that yours is coherent, which (as above) I would dispute).


> They are not replicating the code at all.

Now that five days have passed, there have been a number of examples of copilot doing just that, replicating code. Quake source code that even included comments, the famous python poem, etc. There are many examples of code that has been replicated - not synthesized but duplicated byte for byte from the originals.


surely that depends on the size of the training set?

I could feed the Linux kernel one function at a time into a ML model, then coerce its output to be exactly the same as the input

this is obviously copyright infringement

whereas in the github case where they've trained it on millions of projects maybe it isn't?

does the training set size become relevant legally?


Fair Use is specific to the US, though. The picture could end up being much more complicated when code written outside the US is being analyzed.


The messier issue is probably using the model to write code outside the US. Americans can probably analyze code from anywhere in the world and refer to Fair Use if a lawyer comes knocking, but I can't refer to Fair Use if a lawyer knocks on my door after using Copilot.


It is not US specific, we have it in EU. And e.g. in Poland I could reverse engineer a program to make it work on my hardware/software if it doesn't. This is covered by fair use here.


Is it any different than training a human? What if a person learned programming by hacking on GPL public code and then went to build proprietary software?


It is different in the same way that a person looking at me from their window when I pass by is different from a thousand cameras observing me when I move around city. Scale matters.


> a thousand cameras observing me when I move around city. Scale matters. reply

While I certainly appreciate the difference, is camera observation illegal anywhere where it isn't explicitly outlawed? Meaning, have courts ever decided that the difference of scale matters?


No idea. I was not trying to make a legal argument. This was to try to convey why someone might feel ok about humans learning from their work but not necessarily about training a model.


This is a lovely analogy, akin to "sharing mix tapes" vs "sharing MP3s on Napster". I fear the coming world with extensive public camera surveilance and facial recognition! (For any other "tin foil hatters" out there, cue the trailer for Minority Report.)


>I fear the coming world with extensive public camera surveilance and facial recognition!

I fear the coming world of training machine learning models with my face just because it was published by someone somewhere (legally or not).


You can rest assured that this is already the case if your picture was ever posted online. There are dozens of such products that law enforcement buys subscriptions to.


A human being who has learned from reading GPL'd code can make the informed, intelligent decision to not copy that code.

My understanding of the open problem here is whether the ML model is intelligently recommending entire fragments that are explicitly licensed under the GPL. That would be a licensing violation, if a human did it.


Actually, I believe it's tricky to say if even human can actually do that safely. There's the whole concept of "cleanroom rewrite" - meaning, if you want to rewrite some GPL or closed-source project into a different license, you should make sure you never ever seen even a glimpse of the original code. If you look on GPL or closed-source code (or, actually, code governed by any other license), it's hard to prove you didn't accidentally/subconsciously remember parts of this code, and copy them into your "rewrite" project even if "you made a decision to not copy". The border between "inspired by" and "blatant copyright infringement" is blurry and messy. If that was already so tricky and troublesome legal-wise before, my first instinct is that with the Copilot it could be even more legally murky territory. IANAL, yet I'd feel better if they made some [legally binding] promises that their model is based only on code carefully verified to have one of an explicit (and published) whitelist of permissive licenses. (Even this could be tricky, with MIT etc. actually requiring some mention in your advertising materials [which is often forgotten], but now that's a completely different level of trouble than not knowing if I'm infringing GPL or some closed-source code, or other weird license.)


> A human being who has learned from reading GPL'd code can make the informed, intelligent decision to not copy that code.

A model can do this as well. Getting the length of a substring match isn’t rocket science.


But wouldn't a machine learning AGPL code it be hosting AGPL code in its memory?


Pretty sure merely hosting code hoesn't trigger AGPL; if it did, github would have to be open-sourced.


Would you hire a person who only knew how to program by taking small snippets of code from GPL and rearranging them? That's like hiring monkey's to type Shakespeare.

The clear difference is that a human's training regimen is to understand how and why code interacts. That is different from an engine that replicates other people's source code.


What if a person heard a song by hearing it on the radio and went on to record their own version?


There is already a legal structure in place for cover song licensing.

https://en.wikipedia.org/wiki/Cover_version#United_States_co...


Exactly so it needs licensing of some sort - this is closer to cover tunes than it is to someone getting a CS degree and being asked to credit Knuth for all their future work.


How do you distribute a human?


A contractor seems equivalent to SaaS to me


Perhaps we need GPL v4. I don't think there is any clause in current V2/V3 that prohibits learning from the code, only using the code in other places and running a service with code.


Would you be okay with a human reading your GPL code and learning how to write closed source software for closed source projects?


> To be honest, I doubt that.

Okay, but that's...not much of a counterargument (to be fair, the original claim was unsupported, though.)

> Maybe I am special, but if I am releasing some code under GPL, I really don't want it to be used in training a closed source model

That's really not a counterargument. “Fair use” is an exception to exclusive rights under copyright, and renders the copyright holder’s preferences moot to the extent it applies. The copyright holder not being likely to want it based on the circumstances is an argument against it being implicitly licensed use, but not against it being fair use.


> a closed source model

It seems like some of the chatter around this is implying that the resultant code might still have some GPL still on it. But it seems to me that it's the trained model that Microsoft should have to make available on request.


That's the point of fair use. To do something with a material the original author does not want.


Can you explain why you think this is covered by fair use? It seems to me to be

1a) commercial

1b) non-transformative: in order to be useful, the produced code must have the same semantics as some code in the training set, so this does not add "a different character or purpose". Note that this is very different from a "clean room" implementation, where a high-level design is reproduced, because the AI is looking directly at the original code!

2) possibly creative?

3) probably not literally reproducing input code

4) competitive/displacing for the code that was used in the input set

So failing at least 3 out of 5 of the guidelines. https://www.copyright.gov/fair-use/index.html


1a) Fair use can be commercial. And copilot is not commercial so the point is moot.

1b) This is false. This is not literally taking snippets it has found and suggesting it to the user. That would be an intelligent search algorithm. This is writing novel code automatically based on what it has learned.

2) Definitely creative. It's creating novel code. At least it's creative if you consider a human programming to be a creative endeavor as well.

3) If it's reproducing input code it's just a search algorithm. This doesn't seem to be the case.

4) Most GPLed code doesn't cost any money. As such the market for it is non-existent. Besides copilot does not displace the original even if there were a market for it. As far as I know there is not anything even close to comparable in the world right now.

So from my reading it violates none of the guidelines.


This is what is so miserable about the GPL progression. We went from GPLv2 (preserving everyone's rights to use code) to GPLv3 (you have to give up your encryption keys) - I think we've lost the GPL as a place where we could solve / answer these types of questions which are good ones - GPL just tanked a lot of trust in it with the (A)GPLv3 stuff especially around prohibiting other developers from specific uses of the code (which is diametrically different from earlier versions which preserved rights).


Think what you will of GPLv3, but lies help no one. Of course it doesn't require you to give up your encryption keys.


Under GPLv2 I could make a device with GPLv2 software and maintain root of trust control of that device if I wanted (ie, do an anti-theft activation lock process, do a lease ownership option of $200/month vs $10K to buy etc).

Think what you will, but your lies about the GPLv3 can easily be tested. Can you point me to some GPLv3 software in the Apple tech stack?

We actually already know the answer.

Apple had to drop Samba (they were a MAJOR end user use of Samba) because of GPLv3

I think they also moved away from GCC for LLVM.

In fact - they've probably purged at least 15 packages I'm aware of and I'm aware of NO GPLv3 packages being included.

Not sure what their App Store story is - but I wouldn't be surprised if they were careful there too.

Oh - this is all lies and apple's lawyers are wrong? Come one - I'm aware of many other companies that absolutely will not ship GPLv3 software for this reason.

In fact, by 2011 even it was clear that GPLv3 is not really workable in a lot of contexts and alternatives like MIT became more popular.

https://trends.google.com/trends/explore?date=all&geo=US&q=%...

Apple geared up to fight DOJ over maintaining root control of devices (San Bernadino case).

Even Ubuntu has had to deal with this - SFLC made it clear that if some distributor messed things up ubuntu would have to release their keys, which is why they ended up with a MICROSOFT (!) solution.

"Ubuntu wishes to ensure that users can boot any operating system they like and run any software they want. Their concern is that the GPLv3 makes provisions by which the FSF could, in this case as the owner of GRUB2, deem that a machine that won't let them replace GRUB2 with something else is in violation of the GPLv3. At that point, they can demand that Ubuntu surrender its encryption keys used to provide secure bootloader verification--which then allows anyone to sign any bootloader they want, thus negating any security features you could leverage out of the bootloader (for example, intentionally instructing it to boot only signed code--keeping the chain trusted, rather than booting a foreign OS as is the option)." - commentator on this topic.

It's just interesting to me that rather than any substance the folks arguing for GPLv3 reach for name calling type responses.


You can do what you describe with the GPLv3. You'll just have to allow others to swap out the root of trust if they so please.

Everything else you write is just anecdotes about how certain companies have chosen to do things.


Let me be crystal clear.

If I sell an open source radio with firmware limiting broadcast power / bands etc to regulated limits and ranges - under GPLv3 I can lock down this device to prevent the buyer from modifying it? I'm not talking about making the software available (happy to do that, GPLv2 requires that). I'm talking about the actual devices I build and sell (physical ones).

I can build a Roku or Tivo and lock it down? Have you even read the GPLv3? It has what is commonly called the ANTI-tivoisation clause PRECISELY to block developers from locking devices down for products they sell / ship.

If I rent a device and build in a monthly activation check - I can use my keys to lock device and prevent buyer from bypassing my monthly activation check or other restrictions?

The problem I have with GPLv3 folks is they basically endlessly lie about what you can do with GPLv3 - when there is plenty of VERY CLEAR evidence that everyone from Ubuntu to Apple to many others who've looked at this (yes, with attorney's) says that no - GPLv3 can blow up in your face on this.

So no, I don't believe you. These aren't "just anecdotes" These care companies incurring VERY significant costs to move away / avoid GPLv3 products. AGPLv3 is even more poisonous - I'm not aware of any major players using it (other than those doing the fake open source game).


No, you can't lock it down without letting its owner unlock it. That's indeed the point. But your original comment said you have to give up your encryption keys. That's the lie I was getting at.

Now we can debate whether or not it's a good thing that the user gets full control of his device if he wants it. I think it is. You?


That's why Apple's SMB implementation stinks! Finally, there's a reason for it, I thought they had just started before Samba was mature or something.


Yeah, it was a bit of a big bummer!

Apple used to also interoperate wonderfully if you were using Samba SERVER side too because - well, they were using Samba client side. Those days were fantastic frankly. You would run Samba server side (on Linux), then Mac client side - and still have your windows machines kind of on -network (for accounting etc) too.

But the Samba folks are (or were) VERY hard core GPLv3 folks - so writing was on the wall.

GPLv3 shifted things really from preserving developer freedom for OTHERs to do what they wanted with the code, to requiring YOU to do stuff in various ways which was a big shift. I'd assumed that (under GPLv2) there would be natural convergences, but GPLv3 really blew that apart and we've had a bit of a license fracturing relatively.

AGPLv3 has also been a bit weaponized to do a sort of fake open source where you can only really use the software if you pay for a commercial license.


The macOS CIFS client was from BSD, not from Samba.


BSD's have also taken a pretty strong stance against GPLv3 - again for violating their principles on freedom.

I can't dig it up right now but someone can probably find it.

But the BSD's used samba for a while as well.


As of Darwin 8.0.1 (so Tiger?) smbclient(1)'s man page was saying it was a component of Samba. I think some BSDs used Samba.


These claims are absurd. AGPL and GPLv3 carry on the same mission of GPLv2 to protect authors and end users from proprietization, patent trolling and freeloading.

This is why SaaS/Cloud companies dislike them and fuel FUD campaigns.


> ...the jurisprudence here – broadly relied upon by the machine learning community – is that training ML models is fair use.

If you train az ML model on GPL code, and then make it output some code, would that not make the result a derivative of the GPL licensed inputs?

But I guess this could be similar to musical composition. If the output doesn't resemble any of the inputs, or contains significant continous portions of them, then it's not a derivative.


> If the output doesn't resemble any of the inputs, or contains significant continous portions of them, then it's not a derivative.

In this particular case, the output resembles the inputs, or there is no reason to use Github Copilot.


> It shouldn't do that, and we are taking steps to avoid reciting training data in the output

This just gives me a flashback to copying homework in school, “make sure you change some of the words around so it’s not obvious”

I’m sure you’re right Re: jurisprudence, but it never sat right with me that AI engineers get to produce these big, impressive models but the people who created the training data will never be compensated, let alone asked. So I posted my face on Flickr, how should I know I’m consenting to benefit someone’s killer robot facial recognition?


Wait I thought y'all argued Google didn't copy Java for Android, now that big tech is copying your code you're crying wolf?


The whole point of that case begins with the admission "yes of course Google copied." They copied the API. The argument was that copying an API to enable interoperability was fair use. It went to the Supreme Court because no law explicitly said that was fair use and no previous case had settled the point definitively. And the reason Google could be confident they copied only the API is because they made sure the humans who did it understood both the difference and the importance of the difference between API and implementation. I don't think there is a credible argument that any AI existing today can make such a distinction.


>training ML models is fair use

How does that apply to countries where Fair Use is not a thing? As in, if you train a model on a fair use basis in the US and I start using the model somewhere else?


Fair use doesn’t exist in Germany.


I don’t think it’s fair to ask a US company to comment on legalities outside of the US.


It's fair to expect a international company pushing its products all over the world to be prepared to comment on non-US jurisdictions. (I have some sympathy for "we have a local market, and that's what we are solely targeting and preparing for" in companies where that is actually the case, but that's really not what we are dealing with in the case of Microsoft/GitHub)


One would expect GitHub (owned by Microsoft) to have engaged corporate counsel for an opinion (backed by statue and case law), and to be prepared to disable the functionality in jurisdictions where it’s incompatible with local IP law.


You just shared a URL that says "Please do not share this URL publicly".


Well, he's also GitHub's CEO so it's probably just fine.


Would i be able to use something like this in the near future to produce a proprietary linux kernel?


> training ML models is fair use

In what context? You are planning on commercializing Copilot and in that case the calculus on whether or not using copyright protected material for your own benefit changes drastically.


It isn't. US copyright law says brief excerpts of copyright material may, under certain circumstances, be quoted verbatim

----> for purposes such as criticism, news reporting, teaching, and research <----, without the need for permission from or payment to the copyright holder.

Copilot is not criticizing, reporting, teaching, or researching anything. So claiming fair use is the result of total ignorance or disregard.


This is obviously controversial, since we are thinking about how this could displace a large portion of developers. How do you see Copilot being more augmentative than disruptive to the developer ecosystem? Also, how you see it different from regular code completion tools like tabnine.


We think that software development is entering its third wave of productivity change. The first was the creation of tools like compilers, debuggers, garbage collectors, and languages that made developers more productive. The second was open source where a global community of developers came together to build on each other's work. The third revolution will be the use of AI in coding.

The problems we spend our days solving may change. But there will always be problems for humans to solve.


This innovation does not seem like a natural successor to compilers, debuggers and languages. If today's programming environments still require too much boilerplate and fiddling with tools, it seems like better programming languages, environments that require less setup, etc would be a better use of time. Using GPT to spit out code you may or may not understand seems more like a successor to WSDLs and UML code generators. I really hope we're just in a wild swing of the pendulum towards complex tooling and that we swing back to simplicity before too long.

Edit:

To expand a little and not sounds so completely negative towards AI, seems like there could be value in training models to predict whether a patch will be accepted, or whether it will cause a full build to fail.


If this is the drive behind this project, seems like you are putting too many eggs in one basket. Maybe a good attempt to get rid of the "glue" programming but, I wouldn't pay for this. Its all trivial stuff that I now need to review.

It would be a "cool tool" if it inspected the code statically and dynamically. Testing the code to see if it actually does what the AI thinks it should do. From running small bits of code on unit level to integration and acceptance testing. Suggest corrections or receive them. _That_ will save time and I and companies will pay for.

Also you cannot call this the "third revolution" if it is a paid service.


I appreciate this insight, as a proponent of progress studies. It is indeed a pragmatic view of what the industry will be or should be. I believe the thing that would be also appreciated would be a pair security auditor. Most vulnerabilities in software can be avoided early on in development , I believe this could be a great addition to Github's Security Lab securitylab.github.com/


Do you or 'natfriedman have authored any works in a public repository, so that we can judge the validity of the pragmatic view?


I'm super interested to read more about your theory/analysis. Have you written on it in a blog or anything?


There's a good amount of discussion on this topic in "The Mythical Man-Month". The entire book is discussing the factors that affect development timeframes and it specifically addresses whether AI can speed it up (albeit from 1975, 1986 and 1995 viewpoints, and comparing progress between those points.)


Thanks! That's a great suggestion. I forgot that was in there.

I read Mythical Man Month many years ago and enjoyed it. Time for a re-read. Of course it won't cover the third wave very well though. Would love to see a blog post cover that.


Let's solve the problem of replacing CEOs next. The above paragraph could have been written by GPT-3 already.


I think this is already happening. There's credible evidence that the Apple CEO, Tim Cook, has been essentially replaced by a Siri-driven clone over the last 7 months. They march the real guy out when needed, but if you watch closely when they do, it's obvious he's under duress reading lines prepared by an AI. His testimony in the Epic lawsuit for example. They'll probably cite how seriously he and the company take 'privacy' to help normalize his withdrawal from the public space in the coming years.


This is exactly the kind of fun, kooky conspiracy theory I've missed with all the real conspiracies coming to light over the last decade or so.


Can you cite some of this credible evidence?


I think you’re looking at the problem the wrong way. This provides less strong engineering talent with more leverage. The CEO (which could be you!) gets closer to being a CTO with less experience and context necessary (recall businesses that run on old janky codebases or no code platforms; they don’t have to be elegant, they simply have to work).

It all boils down to who is capturing the value for the effort and time expended. If a mediocre software engineer can compete against senior engineers with such augmentation, that seems like a win. Less time on learning language incantations, more time spent delivering value to those who will pay for it.


That's not really how it's going to go though. Just look at what your average person is able to accomplish with Excel.

Your own example of the CEO becoming a CTO can be used in every level and part of the business.

Now the receptionist is building office automation tools because they can describe what they want in plain English and have this thing spit out code.


> Just look at what your average person is able to accomplish with Excel.

Approximately nothing.

The average knowledge worker somewhat more, but lots of them are at the level of “I can consume a pivot table someone else set up”.

Sure, there are highly-productive, highly-skilled excel users that aren't traditional developers that can build great things, but they aren’t “your average person”.


https://news.ycombinator.com/item?id=24791017 (HN: Excel warriors who save governments and companies from spreadsheet errors)

https://news.ycombinator.com/item?id=26386419 (HN: Excel Never Dies)

https://news.ycombinator.com/item?id=20417967 (HN: I was wrong about spreadsheets)

https://mobile.twitter.com/amitranjan/status/113944938807223... (Excel is every #SAAS company's biggest competitor!)


Yes, Excel “runs the world”, and in most organizations, you’ll find a fairly narrow slice of Excel power users that build and maintain the Excel that “runs the world”.

We may not call them developers or programmers (or we might; I’ve been one of them as a fraction of my job at different times, both as a “fiscal analyst” by working title and as a “programmer analyst” by title), but effectively that's what they are, developers using (and possibly exclusively comfortable with) Excel as a platform.


Well, agree to disagree here as I’ve seen it with my own eyes, but it’s kind of besides the point.

Is it a coincidence that the same company that makes Excel is trying to… “democratize” and/or de-specialize programming?

I don’t really think so, but shrug.


LOL. But you actually make a good point here. GPT-3 can replace most comms / PR type jobs since they all sound like Exec-speak.


Usually I agree but I think Nat's comment here makes perfect sense and isn't just some PR buzzword stew. Tools like these are basically a faster version of searching stack overflow. You could have suggested that things like github and stack overflow would replace programmers since you could just copy and paste snipits to write your code.

And sure, we do now have tools like square space which fully automate making a basic business landing page and online store. But the bar has been raised and we now have far more complex websites without developer resources being wasted on making web stores.


natfriedman is a human being like you and me, not an AI; let's treat them with consideration for that.


Perhaps he should go easy on the euphemisms then and show respect for the developers who wrote the corpus of software that this AI is being trained on (perhaps illegally).


OK, then ask him to go easy! Great idea, and it might get a good response.


How many jobs have developers helped displace in business and industry? I don't think it's controversial that we become fair game for that same automation process we've been leading.


>How many jobs have developers helped displace in business and industry? I don't think it's controversial that we become fair game for that same automation process we've been leading.

historically when has that sort of 'tit-for-tat' style of argument ever been helpful?

the correct approach would be "we've observed first hand the problems that we've cause for society, how can we avoid creating such problems for any person in the future?"

It might seem self-serving, and it is, but 'two wrongs don't make a right'. Let's try to fix such problems rather than serving our sentence as condemned individuals.


> historically when has that sort of 'tit-for-tat' style of argument ever been helpful?

It's not tit-for-tat, it's a wake up call. As in, what exactly do you think we've been doing with our skills and time?

> ""we've observed first hand the problems that we've cause for society"...

But not everyone agrees that this is actually a problem. There was a time when being a blacksmith or a weaver was a very highly paid profession, and as technology improved and the workforce became larger, large wages could no longer be commanded. Of course the exact same thing is going to happen to developers, at least to some extent.


> How many jobs have developers helped displace in business and industry?

How many?

> I don't think it's controversial that we become fair game for that same automation process we've been leading.

This is not correct. A human (developer) displacing another human (business person) is entirely different than a tool (AI bot) replacing a human (developer).

Regardless, this is the Lump of Labour fallacy (https://en.wikipedia.org/wiki/Lump_of_labour_fallacy).

In this case, it is assumed that the global amount of development work is fixed, so that, if AI takes a part of it, the equivalent workforce in terms of developers, will be out of job. Especially in the field of SWE, this is obviously false.

It also needs to be seen what this technology will actually do. SWE is a complex field, way more than typing a few routines. In best case (technologically speaking) this will be an augmentation.


> A human (developer) displacing another human (business person) is entirely different

That's not what is happening though, a few developers replace thousands of business and industry people with automated tools. Say, automated route planning for package delivery, would take many thousands of humans if not for the AI bots that do the job instead.

> SWE is a complex field, way more than typing a few routines. In best case (technologically speaking) this will be an augmentation.

Of course there will always be some jobs for humans to do. Just like there are still jobs for humans loading thread into the automated looms and such.

But your arguments against automation displacing programming jobs ring hollow. People said the same thing about chess playing programs, they would never be able to understand the subtlety or complexity like a human could.


> That's not what is happening though, a few developers replace thousands of business and industry people with automated tools. Say, automated route planning for package delivery, would take many thousands of humans if not for the AI bots that do the job instead.

Without reading and understanding the lump of labour fallacy, it can't be understood the relation between the fallacy and the displacement of jobs. In short, the fallacy is not incompatible with the displacement argument; the difference is in the implications.

> But your arguments against automation displacing programming jobs ring hollow. People said the same thing about chess playing programs, they would never be able to understand the subtlety or complexity like a human could.

Chess is a finite problem, SWE isn't, so they can't be compared.


Nope, before the modern approach to shipping stuff you simply couldn't get many different things unless you were in a big city. There weren't humans doing the route planning, there was no one because it wasn't worth doing at all.


> SWE is a complex field, way more than typing a few routines. In best case (technologically speaking) this will be an augmentation.

If there is a pathway to improving this AI assist efficiency say by restricting the language, methodology, UI paradigm and design principles, it will happen quick due to market incentives. The main reason SWE is complex is it's done manually in myriad subjectively preferred ways.


Indeed. It should be the goal of society to automate away as much work as possible. If there are perverse incentives working against this then we should correct them.


1. How do you define work differently from "that which should be automated"?

2. While I agree with your stance, it is not by itself sufficient. If you provide the automation but you do not correct the perverse incentives (or you worry about correcting them only later) that you mention, then you are contributing to widening the disparity between a category of workers (who have now lost their leverage) and those with assets and capital (who have a reduced need for workers).


I agree, the fact we're even talking about this is evidence that our society has the perverse incentive baked in and we should be aware of and seek to address that.

Regardless, programmers would be hypocritical to decry having their jobs automated away.


That's why it's best to get unions or support systems (like UBI) before they're needed. It's hard to organize and build systems when you have no leverage, influence, or power left.


Why is the disparity bad?


What do you mean by "bad"? If you're asking why it makes sense to structure society with an eye toward avoiding disparity, then it's enough to just observe empirically that people have an aversion to unfair treatment. And not just people: https://en.wikipedia.org/wiki/Social_inequity_aversion

If you're asking why do people respond the way they do to disparity, then I can only speculate that it has something to do with the meaning of life.


Human beings need something to do to have a fulfilling life. I do not agree at all that the ultimate goal of society is to automate everything that’s possible. I think that will be horrible overall for society.


My job is probably the least fulfilling activity in my life and I'm sure that goes for a lot of people.

By your reasoning, maybe we don't need backhoes and should just hire a bunch of guys with spoons instead?


I typically find other things fulfill my life more than work.


I would be up for that, if said society did not leave us destitute as a result


Nothing is inevitable. Doctors and lawyers have protected their professions successfully for centuries.

Only some software developers seem interested in replacing themselves in order to enrich their corporate masters (mains?) even further.

Just don't use this tool!


This tool only replaces a small part of a good programmer and just further highlights the differences between people blindly writing code and people building actual systems.

The challenge in software development is understanding the real world processes and constraints and turning them into a design for a functional & resilient system that doesn't collapse as people add every little idea that pops into their head.

If the hard part was "typing in code" then programmers would have been replaced long ago. But most people can't even verbally explain what they do as a series of steps & decision points such that a coherent and robust process can be documented. Once you have that it's easy to turn into code.


> Doctors and lawyers have protected their professions successfully for centuries.

And one could argue that this means we all pay more for health and legal services than we otherwise would. You have to calculate both costs and benefits; what price does society pay for those few people having very high paying jobs?


This feels like people protesting against automation by not using the automated checkout machines at a store. Go ahead, faster queues for me.


> This is obviously controversial, since we are thinking about how this could displace a large portion of developers.

It... couldn't, in net.

Tools which improve developer productivity increase the number of developers hired and the number of tasks for which it is worthwhile to employ them and the market clearing price for development work.

See, for examples, the whole history of the computing industry as we’ve added more layers of automation between “conceptual design for software” and “bit patterns in hardware implementing that conceptual design as concrete software”.

It might displace or disadvantage some developers in specific (though I doubt a large portion) by shifting the relative value of particular subskills within the set used in development, I suppose.


I agree with this viewpoint.

A tool which increases how rapidly we can output code—correct code—would allow for more time spent on hard tasks.

I can see the quality of some "commodity" software increasing as a result of tools in this realm.


In just 2-3 years time DL algorithms will have as many synapses (parameters) as the human brain. The only thing needed to teach such algorithm to program better than us it to say to it that program needs to be faster, more secure and more user friendly (it's not impossible to teach this to DL alg.). This "tool" will make our work 90% easier in the next few years, so unless we have much more work to do we will earn much less and juniors will most likely be not needed anymore...


Intel put itself out of business since all the world's computing needs can now be done on one CPU.

Or, perhaps, lower unit costs of computing lead to far far greater demand for computing since it became practical to apply to more domains.


I have been using this - for example working in Go on Dapr (dapr.io) or in Python on one of its SDKs.

I love it. So often the code suggestions accurately anticipate what I planned to do next.

It's especially fun to write a comment or doc string and then see Copilot create a block of code perfectly matching your comment.


I'm glad they find it head exploding but my concern is that it would be most head exploding to newbies who don't have the skill to discern if AI code is how it should be written.

For a seasoned veteran writing the code was never really the hard part in the first place.


> For a seasoned veteran writing the code was never really the hard part in the first place.

Yes, to most coders this Copilot software is just a fancy keyboard.


Sounds great. I'm a bad typist, anything that makes me type less (vim, voice assistant, completion) is a big win to me


Vim is great because it doesn't try to be smart.


And you can't quit it so you are forced to learn it well.


If I put a section in my LICENSE.txt prohibiting use as training data in commercial models, would that be sufficient to keep my code out of models like this?


In the end this would slightly increase likelihood of such sections appearing in licenses generated by AIs.


> If I put a section in my LICENSE.txt prohibiting use as training data in commercial models, would that be sufficient to keep my code out of models like this?

Neither in practice (because it doesn't look for it) nor legally in the US, if Microsoft’s contention that such use is “fair use” under US copyright law.

That “fair use” is an Americanism and not a general feature of copyright law might create some interesting international wrinkles, though.


Their contention is

> Why was GitHub Copilot trained on data from publicly available sources?

> Training machine learning models on publicly available data is now common practice across the machine learning community. The models gain insight and accuracy from the public collective intelligence. But this is a new space, and we are keen to engage in a discussion with developers on these topics and lead the industry in setting appropriate standards for training AI models.

Personally, I'd prefer this to be like any other software license. If you want to use my IP for training, you need a license. If I use MIT license or something that lets you use my code however you want, then have at it. If I don't, then you can't just use it because it's public.

Then you'd see a lot more open models. Like a GPL model whose code and weights must be shared because the bulk of the easily accessible training data says it has to be open, or something like that.

I realize, however, that I'm in the minority of the ML community feeling this way, and that it certainly is standard practice to just use data wherever you can get it.


When I referenced their contention on Fair Use, that's not what I was referencing, but instead Github CEO Nat Friedman’s comment in this thread that “In general: (1) training ML systems on public data is fair use”.

https://news.ycombinator.com/item?id=27678354


> however you want

I don't see any attribution here.

MIT may say "substantial portions" but BSD just says "must retain".


would be interesting if someone uploaded a leaked copy of the NT kernel, then coerced the system to regurgitate it piece by piece

would MS position then be different?


Don't make your code public. Someone could read it and train the model in their brain to synthesize some code based on it.

If its publicly available than its fair game to use it to learn and base ideas on.


Only if they trained a model to be able to read and understand LICENSE.txt files -- wowzers what a monster improvement that would be for the world

Or, I guess a sentinel phrase that the scraper could explicitly check: `github-copilot-optout: true`


Or it could explicitly check for known standard licenses that permit it, if it were opt in instead of opt out, the way most everything else in software licensing is opt-in for letting others use.


This looks really cool. Do you plan to release this in some other form like a language server so that it can be easily integrated to other editors?


Have there yet been reports of the AI writing code that has security bugs? Is that something folks are on the lookout for?


I haven't seen any reports of this, but it's certainly something we want to guard against: https://copilot.github.com/#faq-can-github-copilot-introduce...


Has there been an attempt to train a similar ML model on a smaller dataset of standards-compliant code? e.g. MISRA C.

I started working at a healthcare company earlier this year, and my whole approach to software has needed to change. It's not about implementing features any more - every change to our embedded code requires significant unit testing, code review, and V&V.

Having a standards-compliant Copilot would be wonderful. If it could catch some of my mistakes before I embarrass myself to code-reviewing colleagues, the codebase would be better off for it and I'd be less discouraged to hear those corrections from a machine than a person.


Is there a public API? Will it be documented? Are you open to folks porting the VSCode plugin to other editors (I.e. kakoune’s autocomplete)?


Hi Nat! Just signed up for the preview (even though I'm the type to turn off intellisense and smart indent). I was wondering if WebGL shader code (glsl) was included in the training set? Translating human understandable graphics effects from natural language is a real challenge ;)


The technical preview document points out that editor context is sent back to the server not just for code generation but as feedback for improvement. Are you (or OpenAI) improving the ML models based on the usage of this extension? It is interesting what the pricing will look like given that the model was originally trained on FOSS and then you go and harvest test cases from real users. If that’s the case I think that should be clearly explained upfront.


Has this been tested for accessibility yet, particularly with a screen reader?



Are those developers worried about having their jobs replaced by a code-writing AI? :)

I mean... why would 95% of developer jobs exist with this tech available?

You just need that 5% of devs who actually write novel code for this thing to learn from.


https://en.wikipedia.org/wiki/Profession_(novella)

An Isaac Asimov story about someone who didn't take to the program and, as a result, got picked to create new things because someone has to make them.


I love this story.

If you want to read the whole thing, it's here:

https://www.abelard.org/asimov.php


Is there anyway to port this into emacs?


This is impressive. And scary. How long has your team been working on this first release?


Cool project! Have you seen any interesting correlations between languages, paradigms and the accuracy of your suggestions?


I assume this will also work alright on other languages that aren't in the demo, e.g. C++?


One question: how long is the waitlist? Very excited to try this!


I don't think we need to start looking for new career paths yet. This example has a few bugs and it took me longer to track them down than it would have to write it myself:

  #!/bin/bash
  # List all python source files which are more than 1KB and contain the word "copilot".
  find . \
    -name "*.py" \
    -size +1000 \
    -exec grep -n copilot {}\;
"-exec grep -n copilot {}\;" needs to have a space before the semicolon otherwise find fails with "find: missing argument to '-exec'".

The "1000" in "-size +1000" has a unit of 512 byte blocks so it is looking for files that are greater than 512000 bytes, not 1KB. This would be very easy to miss in a code review and is one of those terrible bugs that causes the code to mostly work, but not quite.

https://linux.die.net/man/1/find


It also doesn't list the files. It prints all matching lines (and their line numbers), without the corresponding filenames.


You are right! I missed that. The number of bugs to lines of code ratio is approaching one.


The following line is quite widespread in use, but not as portable as it could be:

    #!/bin/bash
For increased portability, respect users' PATH environment variable using:

    #!/usr/bin/env bash
Using #!/bin/bash could lead to subtle bugs on systems with more than one version of bash installed, or outright breakage on systems where bash isn't installed at /bin/bash. (OpenBSD is known to use /usr/local/bin/bash, for example.)


That isn't a Copilot bug though, it was typed by a person.


I wonder if copilot would have produced better code if the user had used /usr/bin/env instead. If you already have buggy code, will copilot also suggest buggy code?


the main argument against Copilot for me. it takes longer to grok existing code than just write it from ground up.


This is actually a pretty important thing to understand. It can be better to rewrite than fix an existing mess. It’s similar to construction work in that sense: if you rebuild, you know what’s inside the walls.


If your house doesn’t have fire proofing (Sheetrock / lathe & plaster) then you know what’s inside the walls.


Kind of comical to open a rebuttal with “if”, showing you don’t in fact know. That aside - no, you don’t know necessarily.


That's why this copilot won't fly. The junior programmer will not be able to spot subtle errors but will kind-of feel "productive" by some random pastes from a giant brain, which cannot be interrogated.

If anything, I see copilot generating more work for existing, senior programmers - so there you have it.


Is that your main argument against Stack Overflow? Because for some percentage of people the use cases will be similar (that is, learning some quick ways to do something that they can then explore and learn about).

Sometimes resources like these are used as references and the solution is used as is. Sometimes they are used as a survey, and it's more like asking a librarian "I want to learn more about how to X" and having them give a short exposition and point out some sources to study. It's important not to let you view of it's usefulness as one type of resource bleed into your view of how useful it might be for the other.

As a learning resource, this is very interesting.


When one gets snippets from Stack Overflow, he/she doesn't assume it is tailored specifically for the task, so extra check, modification are applied. With copilot people (eventually, especially beginners probably) will assume that the code produced by "magic" AI does exactly what they asked for.


Funnily this problem seems trivial when you have good test coverage for your code. Especially trivial compared to the insane amount of help you could get in writing boilerplate and simple logic from a magical tool like this, if it delivers on its promises.


Plus if you use this for anything non-trivial, it's functionally similar to using a third-party dependency, except the licensing terms are vague and there's no one to ask for help.

I could see it being useful maybe for testing. Even if it straight up copies code from the Linux kernel, you don't usually ship tests to customers so the GPL (probably) isn't a problem.

But the problem with testing is that you need to be 100% sure about how it behaves, and an AI generated test might not be reliable enough to be useful.


i'd argue the danger is even greater with tests bc the entire point of tests is to provide certainty. wrong tests are a false sense of safety that's unquantifiable without understanding every single line.


I read this as a criticism of bash


I don't see why. It is on Copilot to produce syntactically correct code that at least doesn't fail to run, even if we ignore the correctness.


This doesn’t really have anything to do with bash, just the ‘find’ posix utility. But I agree that the arguments not intuitive.


Give them a year. It sounds like the things people said about chess engines that played stupidly

Just kidding... writing imperative code is fundamentally different than most AI recognition tasks, ie you can have GPT-4 produce consistently nice HTML but not C++


Also: "-exec grep {} \+" will be a lot faster if you have many files, and you will get the actual filename too instead of just the line number which is what the comment says it should do.


I convinced I'll do same mistake.


I am not surprised. Bash is notoriously unreadable.


There is no bash code in that script, it's POSIX find.


POSIX find is also terrible. I'm sorry but I go out of my way to avoid using it as it's always a pain.


I'm amazed to see how positive the overall response is to this idea. Almost as if programmers think that writing programs is the worst part of the job and ready to be automated away.

As someone more aligned with the Dijkstra perspective, this seems to me like one of the single worst ideas I've ever seen in this domain.

We already have IDEs and other tools leading to an increase in boilerplate and the acceptance of it because they make it easier to manage. I can only imagine what kinds of codebases a product like this could lead to. Someone will produce 5000 lines of code in a day for a new feature, but only read 2000 of those lines. Folks that still expect you to only check in code you understand and can explain will become a minority.

I wonder how long it will be until someone sets up the rest of the feedback loop and starts putting up github projects made of nothing but code from this tool, and it can start to feed on itself.

Cargo-cult programming has always been a problem, but now we're explicitly building tools for it.


I mean you say this, but you and most likely the majority of programmers rely on dozens of repositories, packages and libraries with likely zero deep understanding of it (and at the very least haven't read the source code of ) so I don't really understand the difference here.

The advantage of something like this is that instead of having to go to stack overflow or any number of reference sites and copy pasta it can just happen automatically without me having to leave my IDE.

The enjoyable part of programming for me is not typing the Ajax post boilerplate bullcrap for the millionth time, it's the high-level design and abstract reasoning.


I really wonder who those folks copy-pasting from Stack Overflow all day are. I only rarely find pieces of code that I can copy-paste. Typically Stack Overflow only gives me an idea of how to solve something, but incorporating that idea into my code base is still not trivial.


A few weeks ago someone on a call said "we all know that programming is mostly copy and pasting anyway" A few people laughed, but I said that if I catch myself copy and pasting then I know that something is very wrong. It was kind of awkward, but I didn't like my job being trivialized by people who never really did it.

It would be like if I said plumbing or auto repair is just watching youtube videos and going to lowes. Just because I've managed to do a few simple things, doesn't mean I'm in a position to belittle an entire profession.

That said, I am also shocked by how many full time developers don't take the time to understand their own code. Let alone the libraries they use.


> That said, I am also shocked by how many full time developers don't take the time to understand their own code. Let alone the libraries they use.

Me too, then I understood that code and programming is commoditized. As long as it works and looks pretty on the outside and it can be sold, it's fair game.

"There'll be bugs anyway, we can solve these problems somehow" they probably think.

Heck even containers and K8S is promoted with "Developers are unreliable in documenting what they've done. Let's make things immutable so, they can't monkey around on running systems, and make undocumented changes" motto.

I still run perf on my code and look for IPC and cache trashing ratio numbers and try to optimize things, thinking "How can I make this more efficient so it can run faster on this". I don't regret that.


> A few weeks ago someone on a call said "we all know that programming is mostly copy and pasting anyway"

That's a dog whistle to find a better company/location.


Some people consider using stack as a heavy inspiration to be equivalent to "copy and pasting." It's linguistic shorthand really.


I am one of those. Unless it’s a few lines of standard library calls for the behavior I’m seeking, If I am copy and pasting a function over its as a template.

I then modify the majority of the answer to fit any special criteria outside of the general case I asked, or more frequently, modify the code beyond the minimum viable answer to fit in the test suite/logging framework/performance monitoring/etc that is involved in my platform

Still call it copy and pasting in casual conversation even if there’s not a single line that’s recognizable between where I started in stack overflow vs where I ended up


> That said, I am also shocked by how many full time developers don't take the time to understand their own code. Let alone the libraries they use.

Own code would be bad but libraries? That's kinda the idea of an abstraction layer.


I always look into the internals of libraries I use, it's usually easier, and faster than reading the documentation. Sure, there are some libraries that so advanced that it makes this difficult, but you should still know what it's doing, and how the code is organized.

The idea of an abstraction layer is to make it easier to read and write code that is at a different layer. Many of us write our own abstractions, it's not because we don't understand what its doing.


>That said, I am also shocked by how many full time developers don't take the time to understand their own code. Let alone the libraries they use.

So many STLs are fucking unreadable. Golang's is the only one where I've actually enjoyed it a bit.


> I really wonder who those folks copy-pasting from Stack Overflow all day are.

I can think about people who can do this happily. Some of them are professional programmers. Some are self-taught. Some have CS education. Seriously.

OTOH, I'm similar to you. I either re-use my own snippets or read SO or similar sites to get an idea generally how a problem is solved and adapt it to my code unless I find the concept I'm looking for inside the language docs or books I have.

Yes, I'm a RTFM type of programmer.


I had a colleague who once tried to copy-paste a Scala snippet into Python code and came to complain that it doesn't work. We're no longer colleagues.


Yep. And now imagine these people with Github Copilot in their arsenal. God help us all.


Well, at least Copilot will bring some kind of "intelligence" to the party.


There is certainly a balance. When I want to implement feature X a client has requested but I have to deal with home grown database abstraction layers and custom AJAX API structures - I get the feeling that a third party library probably does it better and has more eyes on the code than exist at my company.

That said I would probably not look to a third party library to just to simple data transformation stuff. Probably the only thing I do copy almost verbatim from SO are things like Awk/Sed commands that are easy/low risk to test but would take hours to derive myself.


Thank you! I've never understood this meme about copy-pasting. I honestly rarely end up using SO and when I do, I almost never copy paste from there. It's not because it's some kind of weird principle I have - I will copy paste if I think it makes sense - it's just rare that I want the exact thing they have written there (it's in JS and I'm using TS, it's using another library than I am, etc) and generally I'm looking for information rather than code.


Importing an external, tested, reliable dependency is completely different from anonymous non-checked untested code in your repository committed by someone who did not even read it.

Check out the memoize example. That fails as soon as you pass anything non-primitive but there’s no one documenting that.


> anonymous non-checked untested code

What? It's not anonymous, it's still committed by a dev. It can be non-checked and untested, that's true. But it's not any less untested than any other code. If you choose not to write tests for your code, this won't change anything.

The only issue I see with this is it being potentially unchecked. And the solution to that is reading all the code you commit, even though it's generated by AI.


It's about affordances. As presented, this tool streamlines copy-pasting random snippets. The easier something is, the more people do it.

Testing doesn't even enter the picture here, we're at the level of automating the stereotypical StackOverflow-driven development - except with SO, you at least get some context, there's a discussion, competing solutions, code gets some corrections. Here? You get a black-box oracle divining code snippets from function names and comments.

> the solution to that is reading all the code you commit, even though it's generated by AI

Relying on programmer discipline doesn't scale. Also, in my experience, copy-pasting a snippet and then checking it for subtle bugs is harder than just reading it, getting the gist of it, and writing it yourself.


> Programmer discipline doesn’t scale

Thank you for putting this so eloquently. This has basically been the sole tenet of my programming philosophy for several years, but I’ve never been able to put it into words before.


> As presented, this tool streamlines copy-pasting random snippets.

It synthesizes new code based on a corpus of existing code.


Yes. Given how it does it, this makes it even more dangerous than if it was just trying to find a matching preexisting snippet.


>Tests without the toil. Tests are the backbone of any robust software engineering project. Import a unit test package, and let GitHub Copilot suggest tests that match your implementation code.

It looks to me like they're suggesting you use Copilot to write the tests.


Who wrote that snippet? Who knows? That’s anonymous. Just because I committed it it doesn’t mean I wrote it. There’s no link to the source, so it’s anonymous, even if committed by me. This is 100% equivalent to copying a whole function from StackOverflow but without placing a link to the answer for context.


A programmer who commits untested sloppy code of their own writing, will do it regardless of having access to such a service. Nothing will make me commit the generated code without testing it. I think this tool could take care of the boilerplate and the rest will still be on the programmer. At least in the near future.


anonymous non-checked untested code is problematic in all cases. This doesn't change that.


You don't see the difference between relying on a few battle-hardened libraries, and copy-pasting into your own code some mishmash of code that looked similar that other people wrote and is probably something like what a machine learning model thought you probably meant? Maybe we're in worse shape then I thought.

> The advantage [...] without me having to leave my IDE.

You're arguing for the convenience, my point was that that convenience creates a moral hazard, or if you prefer, a perverse incentive, to increase the number of lines of code, amount of boilerplate, code duplication, and to accept horrible, programmer-hostile interfaces because you have tied yourself to a tool that is needed to make them usable.

> Ajax post boilerplate

This is an argument for choosing the most appropriate abstractions. The problem with boilerplate isn't that you have to type it, it's that it makes the code worse: longer, almost certainly buggier, harder to read and understand, and probably even slower to compile and run. You could have made an editor macro 20 years ago to solve the typing boilerplate problem, but it wasn't the best answer then and it isn't now.


> The advantage of something like this is that instead of having to go to stack overflow or any number of reference sites and copy pasta it can just happen automatically without me having to leave my IDE.

In the examples, I wish the auto-generated code came with comments or an explanation like SO does. The code I need help with the most is the code that's a stretch for me to write without Googling. The code I can write in my sleep I'd rather just write without a tool like this.


> I mean you say this, but you and most likely the majority of programmers rely on dozens of repositories, packages and libraries with likely zero deep understanding of it (and at the very least haven't read the source code of ) so I don't really understand the difference here.

I spend a probably half my coding time testing and digging into those libraries because I don't understand them and because they cause performance issues because nobody on the team understands them sufficiently to make their "high level design and abstract reasoning" accurate.

One problem with the current world of programming tools is that there's no good way to know which libraries are suitable for use when correctness and performance and reliability really matters, and which are only really meant for less rigorous projects.


> you and most likely the majority of programmers rely on dozens of repositories, packages and libraries with likely zero deep understanding of it

Perhaps AI should work on simplifying the existing stack first, without breaking the functionality. What about that?


if you are typing out boilerplate you should look to abstract it away


Oh god please don't do this indiscriminately. If you're typing out boilerplate, document it and add a generator for it. I've been bitten probably hundreds of times by bad abstractions created to save some keystrokes that turned 50 lines of boring easily readable code into an ungrokable dense mess.


"Okay, so you pass the function a lambda. And the input parameter to that lambda is another function that itself consumes a list of lambdas. And this is so that you don't have to init and fill in a few dictionaries OR because you might need to otherwise use an if-statement."

I like abstractions as much as the next person, but oftentimes you can just make due with the exact thing.


What do you mean "add a generator for it"? Do you mean something like a templating for source code, like the C pre-processor?

I think there is a pro and a con to that approach.

The pro is that there is a meaningful and familiar intermediate representation --- the output of the C pre-processor is still C code. Another example is https://doc.qt.io/qt-5/metaobjects.html

The con is that, well, it introduces a meta layer, and these meta layers are more often than not ad-hoc and ultimately became quite unwieldy. It's a design pattern that suggests that there is a feature missing from the language.


No absolutely not, I think that's the worst of both worlds. I mean something like `rails generate` where it's a parameterized block of generated code that you insert inline and then edit to your needs.

The disadvantage is that making sweeping changes is more work. The advantage is that making sweeping changes can be done incrementally. But the big win with code generators is that all you need to understand what's happening is right in front of you instead of having to mentally unwind a clunky abstraction.

Don't get me wrong if you have a good abstraction that reduces the use of mental registers do it! But you would and should do that regardless of boilerplate.


It's a design pattern that suggests that there is a feature missing from the language.

This is the part where I’ve been entertaining the idea of solving things at a language level (transpiler). It’s a dangerous idea because if we get better tools to extend the language and everyone does it, we’re all going to be fucked.

But I can’t shake the idea that some of our frameworks, if incorporated at a language level, could entirely eliminate the boilerplate.


I think this goes beyond one project. In your lifetime you just have to write certain things again and again and then you have to write the abstractions again and again.

Maybe that warrants a library, but then you also have to hook that up with the ever so slightly different boilerplate code.

If this 90% of the easy stuff is done for you, that gives you more time to focus on the 10% that matter.


I'd be willing to bet a reasonable amount that there's a large future for "subtractive software development" (maybe a slightly misleading or unfair term, since it'd include bugfixes).

Once we have multiple proven technologies that handle each of the functional areas that we collectively need, then we'll start to find greater benefit in maintenance, bugfixes, and performance improvements for those existing technologies and their dependencies than we find writing additional code and libraries.


> you and most likely the majority of programmers

You might be right (I haven't met the majority, I have no idea) about "the majority of programmers", but it seems a little rude to assume the person you are responding to is some talentless hack that can only import things off the internet they don't understand.


The point of APIs and libraries is abstraction: you can understand and use without the details.

(Now... you may argue that apis are often badly designed and implemented... but at least they are trying).


One problem with IDE's is that they can be antagonistic of good practices such as writing comprehensible code, small code bases, and good documentation.


I think as our field evolves, more work will be dealing with high level abstractions. There is a massive need for distributed systems design. Companies have big ambitions, but not enough labor to accomplish them.

There will still be plenty of low level systems programming work. The field is growing, not shrinking.

One impact this may have is that it may make tasks easier and more accessible, which could bring lots of new talent and could also apply downward force on wages. But the counter to that is that there is so much more work to be done.

I'm all for new tools.


Trying to figure out why in the world your comment was down voted.


You can only solve relatively small problems this way. As I get older, I like the physical act of programming less and less, and just want to solve problems so I can get going on all of my ideas backlog. I've been programming almost every day for the last 38 years. What I really want to do is solve (my) problems.


I really agree with this even though my experience is much smaller than yours.

The biggest place I think it is frustrating to write code is for ML pipelines where you know what you need, but it takes a few hours to wrangle files and pandas until you can run your idea/experiment.


Ermahgerd, yes! Doing this right now.


But programming is the greatest fun on earth. Wait no, being able to work on your ideas and communicating them to the computer is the greatest fun on earth. Now, if your only tools for communication are made by "productive developers", this is where the problem is. Not with programming itself.


I think that there's a division between "productive" developers and "meticulous" developers. I know that I'm not the first one. My best days are when I'm removing code. I'm very wary from using frameworks and huge libraries. I learned few frameworks and libraries, I've chosen few that correspond to my style and I'm very careful when it comes to adapting new ones. I prefer to spend a week coding auth layer rather than installing some SaaS connector and call it a day. I prefer to spend few days reading source code and developing my own solution (or just discarding the whole idea) rather than quickly google something and move on.

May be I'm just unprofessional, I don't know. I get my stuff done, may be not as fast, but at least I understand every bit of my code and I rarely have unexpected surprises. I understand that there are other approaches, but I just don't enjoy that way, so I follow mine as long as I can find work. And I actually like things that other people find boring, according to this thread. Withing "business code" - hate it, writing "auth layer" - love it.


I'm the same. I tend to get derailed on making my code philosophically "right" and aesthetically "soothing" (for lack of a better word), even when it doesn't obviously matter to the scope of the project, rather than just it to the point where it _works_ by some operation of the Holy Spirit. Unsuprisingly I'm the "is he working ?" guy (I may fit in the attention disorder category that was a point of discussion in the confidence thread[1] the other day). But at least I'm not the "his code broke our shit again" guy.

[1] https://news.ycombinator.com/item?id=27533988


I don't think you're unprofessional. In fact, your sentiment is a belief that was strongly held by early UNIX programmers. Two quotes I particularly like:

"The real hero of programming is the one who writes negative code." -- Douglas McIllroy

"One of my most productive days was throwing away 1000 lines of code." -- Ken Thompson

But unfortunately, we've come to a situation where SLOC and innovation for the sake of innovation is more important than code quality.


Sounds like you should be working on libraries instead of products then?!


> I think that there's a division between "productive" developers and "meticulous" developers.

There’s a huge difference between:

1. Writing the core logic for an algorithm from scratch

2. “Connecting things”: e.g. writing the “click” handler for a button press, adding an auth middleware to a HTTP request handler

The latter consists mostly of boilerplate code: code that thousands of developers have already written. Copilot is useless for the former, and great for the latter. It has no idea what algorithm you want to implement, since you may be the first to write it, but it has seen the code of countless other individuals who needed to write the same boilerplate.


> Almost as if programmers think that writing programs is the worst part of the job and ready to be automated away.

writing the programs is definitely boring garbage work. Typing is so slow and annoying - hence autocomplete being a standard tool. This is, to me, just fancy autocomplete.

> to an increase in boilerplate and the acceptance of it because they make it easier to manage.

Boilerplate optimizes for the right things - code that's easier to read and analyze, but takes longer to write. IDEs and tools like this cut the cost of writing, giving us easier to read code and easier to analyze code for free.

IDEs have supported templates forever. I never write out a full unit test, I type `tmod` and `tfn` and then fill in the blanks. This is basically the same thing to me.

> Folks that still expect you to only check in code you understand and can explain will become a minority.

This isn't true at all. Having used TabNine I don't have it write code for me, it just autocompletes code that's already in my head, give or take some symbol names maybe being different.

All this is is a fancy autocomplete with a really cool demo.


Boilerplate is only easier to read and analyze if you can be sure it is consistent, so you can tune it out. Usually though, there is this one getter method that is not quite like the others and you literally will not see the difference until it bites you.

We'll need more IDE enhancements, to highlight interesting pieces and desaturate standard boilerplate...


When I think of boilerplate I think of context that is made explicit. Things like type annotations, longer variable names, the lifetime or attributes of some class or data etc. These things are extremely helpful for a number of things - they convey context from writer to readers, they aid in proving the code correct, and they can make code faster.

The context almost always exists in the writers head. We all have a specification of our program based on our expectations, and we type out code to turn that model into an implementation. We only spend so much time conveying that model though - most of us don't write formal proofs, but many of us will write out type annotations or doc comments.

The cost is usually as simple as expressing and typing out the model in our head as code. Languages that are famous for boilerplate, like Java, enforce this - and it makes writing Java slower, but can also make Java code quite explicit (I'm sure someone will respond talking about confusing Java code, that's not the point).

Reducing the cost of conveying context from writer to reader means we can convey more of that context in more places. That's a huge win, in my opinion, because I've personally found that so much implicit context gets lost over time, but it can be hard to always take the time to convey it.

Think about how many programs you've read with single character variable names, or no type annotations, or no comments. The more of that we can fix, the better, imo.

Tools like this do that. TabNine autocompletes full method type signatures for me in rust, meaning that the cost of actually writing out the types is gone. That's one less cost to pay for having clearer, faster code.


>Boilerplate optimizes for the right things - code that's easier to read and analyze, but takes longer to write.

This is wrong and the same retarded logic Java used to defend not introducing var and similar features for ages. Boilerplate is usually noise around the actual logic - it's a result of limited abstractions. When you're repeating same code over and over you raise that segment to a separate concept, that's how abstraction and high level programming works - it increases readability and maintainability. Being easier to type has nothing to do with it.


Thanks for being the person who I knew would try to make this about Java. I don't care about Java, it was a trivial example.

The rest of your post doesn't really have to do with mine. Yeah, you can cut down on boilerplate with changes to languages... duh. But in terms of conveying context there's always a tradeoff of explicit vs implicit, and one of those costs is taking the time to actually turn your mental model into a written implementation - this eases that burden.

As I said, it's a fancy autocomplete.


>But in terms of conveying context there's always a tradeoff of explicit vs implicit

Exactly - if a tool let's you write it explicitly too easily you're making that the default, and it ignores the readability/maintainability side of the tradeoff.

Maybe it gets good enough to recognise when things can be factored out for better readability as well. But in my experience code generators rarely result in maintainable code.


> I wonder how long it will be until someone sets up the rest of the feedback loop and starts putting up github projects made of nothing but code from this tool, and it can start to feed on itself.

This is my actual hoped-for endgame for the ad based internet. At some point Twitter, FB, etc will be exclusively populated by bots that post ads, and bots that simulate engagement with those ads to drive up perceived value of advertising. They'll use AI to post comments that are really just ads or inflammatory keyword strings to drive further "engagement." The tech companies will rake in billions and billions of dollars in ad revenue, we'll tax all of it and use it to create flying cars, high speed rail, and an ad-free internet closed off to all non-humans. Occasionally a brave ML researcher may venture out into the "internet" to take field notes on the evolution of the adbot and spambot ecosystem.


I invite you to try automating this and let us know what happens. Try creating, let’s say 1000 accounts and try liking or posting and see what happens. I’ve seen that system at work and doubt you’d get very far.

More than that, you misunderstand how advertisers prioritise their money. They pay for outcomes. If they notice that over the past couple of months they’ve been receiving mostly bot traffic, they stop advertising. Not everyone all at once, but enough that revenue begins to decline. An ad based business that cares about the long term will do it’s best to weed out the inauthentic engagement.


> I invite you to try automating this and let us know what happens. Try creating, let’s say 1000 accounts and try liking or posting and see what happens. I’ve seen that system at work and doubt you’d get very far.

Yes, naive attempts at manipulation will be detected these days on big platforms. 5+ years ago such naive attempts were successful, though. A few years ago I made a proof of concept to show how easy it is to make new Reddit accounts and automate them, and had registered hundreds of accounts. Those logins still work even though Reddit has cracked down on naive automation attempts.

Today, that's why many firms buy real users' accounts. They'll hire people to manually login to the accounts and post. There's also the perpetual cat and mouse game between bot creators and platform owners, and the platform owners who benefit from the appearance of increased growth and engagement that's actually just bot activity.


> More than that, you misunderstand how advertisers prioritise their money. They pay for outcomes.

How they verify those outcomes, however, gets interesting. See, for example, College Humour going under due to inflated engagement metrics fed to them by Facebook video.


The most unlikely thing you mentioned is that we will be able to tax huge corporations.


You jest, but the parent is close to what I believe is a possible scenario - the one Nick Bostrom calls "a Disneyland with no children".

No tax, no flying cars, eventually not even humans around - just AI-driven companies endlessly trading with each other, in a fully-automated, self-contained, circular economy, from which sentient beings were optimized away.


The whole thing sounds a bit like Autofac (https://en.wikipedia.org/wiki/Autofac)


As a counterpoint to most of the other responses, I agree with this comment. In my eyes, this is very similar to the issue of bloated software being enabled by faster processors. This doesn't mean slower processors are the answer, but rather that there are often unintended consequences that need to be considered when solving for a problem. So, as an example of what could be better than making boilerplate easier to type, I would suggest programming languages, frameworks, and tooling that reduce the need in the first place would be worth considering.


Some people think the problem is that we don't have enough code. Anyone that has to maintain code knows that the problem is that we have too much code.


This is even better! Now we can generate code we might not even understand, without even hitting all the keys.


Not looking forward to dealing with this from a security point of view. It's difficult to get developers to accept responsibility for security vulnerabilities in libraries they've selected for their project ("That's not my code!"). I can see the same thing happening with generated code where they don't want to take responsibility for finding a way to remediate any vulnerabilities they didn't personally type in. Of course those who exploit the vulnerabilities won't care how it got into the code. They're just happy they're able to make use of it.


It says it's trained on "billions of lines of code"

I would augment that to "billions of lines of code that may or may not be safe and secure"

If they could tie in CodeQL into Copilot to ensure the training set only came from code with no known security concerns, that would be a big improvement.


This ignores the ladder of abstraction, which you apparetly need to be reminded of exists. Not all programmers need to work at the same level of abstraction: some programmers need to write original code all day long because their subject field is close to the metal and there are no premade solutions. For those folks, the idea of copy-pasting from SO is pretty ridiculous, although SO might have questions and answers that allow them to write their own code solutions based on the insights of others. Because we're not going to dismiss highly respected experts in our fields just because they helped answer good questions with good detailed answers on Stackoverflow, are we?

A few rungs up on the ladder we still have programmers but now the kind whose job it is to write as little code as possible, where their worth comes from knowing exactly how little code glue is needed to, necessarily and sufficiently, make other people's libraries work together to functionality that is larger than the sum of its parts. These folks aren't solving unique problems, they make things work with as little code as possible, and copy-pasting from SO for problems that have been solved countless times already by others is 100% fine: their expertise is in knowing how to judge other people's code to determine whether that's the code they need to copy-paste.

And then, of course, there all the folks in between those two levels of abstraction.

The biggest mistake would be to hear "programming" and think "only my job is real programming, all those other people are just giving me a bad name". Different horses and different courses, and different courses for different horses.


> As someone more aligned with the Dijkstra perspective, this seems to me like one of the single worst ideas I've ever seen in this domain.

Absolutely true.

If your code is so repetitive that can be correctly predicted by an AI, you are either using a language that lacks expressiveness or have poor abstraction.

The biggest problem in the software world is excessive complexity, excessive amounts of (poor) code and reinventing the wheel.


You're underestimating the sophistication of said AI.


All I can think of is how many times I've grabbed a code example from StackOverflow only to discover it had some obvious bug in it. The answer is many, many times.


If this results in more overall programmers or enabling existing programmers to make products quicker than before, it's a win!

Most codebases already majorly contain unread code; the libraries (node_modules, etc). I am sure we can figure out a pattern to separate human vs machine code in similar way.

If the code you are about to write is already written by someone else on the internet, that's probably not the most innovative part of your codebase anyway, so why waste time?


> If this results in more overall programmers or enabling existing programmers to make products quicker than before, it's a win!

I don't think I really agree with this sentiment. "More programmers" or "faster programmers" is meaningless (or even actively detrimental) if the quality of their output is lower. It's even worse if their output is plagued by subtle bugs, as AI systems are likely to produce.


The "quality programmers" (neckbeards) can continue their quality work. These new faster programmers can get other stuff done.

The one paying for these jobs can decide what level of quality they want, and hire/pay accordingly. How does that sound?


Cargo cults concerned me too but I realized that cargo-cult programming flourishes when it's enabled by a culture that doesn't care how the sausage is made. If the culture seeks full stack truth, it's not likely to get fooled by bad generated code, no matter whether it's generated by copy/paste, metaprogramming, or AI.

I'd love to know what Donald Knuth thinks given the history of literate programming.


I've used tabnine for a while, and it's mostly just been a faster executing normal autocomplete, with a %90 accuracy rate. It's a tradeoff. It didn't have the large snippet behavior in my usage like this new one although.


Somehow I don't see people discuss this kind of tools from the perspective of managing essential complexity versus accidental complexity. Maybe copilot just increases the abstraction level of coding, so we can treat generated code as a building block, just like we nowadays rarely needs to care about how to write assembly code or how a balanced tree works?


> Maybe copilot just increases the abstraction level of coding, so we can treat generated code as a building block

At this point it doesn't, and we can't, because Copilot is just a fancy autocomplete. The code is there, first class, in your file. It doesn't introduce new concepts for you, it just tries to guess what you mean by function signature + descriptive comments, and generates that code for you.


I think the mistake you might be making is assuming that any tool adopted is always used all the time. Even professional race car drivers probably opt for an automatic transmission over the manual on the mini-van if they get one. Different choices for different needs.

There will always be a place for meticulous consideration of exactly what's being done, and many levels of that as well. For the same reason people reach for python to mock up a proof of concept or throw something together that is non-essential but useful to have quickly, even meticulous programmers might use this to good effect for small things they don't care to spend a lot of time on because it's not as important as something else, or the language they're using for this small task isn't one they feel as proficient in.


The way I see it, the tool is only as good as the programmer using it. This tool will generate the individual code blocks for you, but you still have to understand how to put it all together to deliver a working app.

Sure there will be some codebases out there that are plastered together using this tool, but when it comes to delivering software that is well written, performant and maintainable over the course of several years, you're still going to need a lot of skilled engineers to pull that off.


Okay, and I'd argue that a good portion of programmatic wrangling is simply trying to do Y with Z. Something that's probably done 10,000 times over by others but in the silo'd confines of that single developer's workspace; an utter fucking mystery to them.

What's the carbon displacement for wasted time on those tasks? It might be brow raising.


In the (now not very) long run, programming was a job meant for computers, anyway. The future will look back at "programming" the way we now look at Charles Dickens characters toiling in soot-filled factories. It's not what people are best at, and it looks like soon there will be better ways to accomplish this job.


I don’t know if I agree with this, but I think it depends what you mean when you say “programming”.

I use programming as one of the examples of intrinsically human work.

You have a (general computing) machine that can do anything(), it only has meaning and utility from humans thinking about problems they have and how to solve them.

The hard part of programming is analysing what problems you have, and what you want to do about them.

This is why one of the distinguishing features between junior and senior programmers is that seniors tend to think about the (human) problem they’re solving a lot more.

Tools like copilot help with the physical interfacing between programmer and machine, but they don’t eliminate programming.

But a recurring worry people are expressing in this thread is: some people think they _do_, and how insidious that idea is.

The tool is great, the managers deciding that time spent thinking about its output, or developing their staff’s skills, is a waste because “just use what the AI says”, are the danger.

As long as people and computers exist, there will be programming.

not actually “anything”


> Someone will produce 5000 lines of code in a day for a new feature, but only read 2000 of those lines.

Shouldn't the one who has produced this code be responsible for making sure the integrity of it? 5k LOC in a day without test cases, then that is no code, it is a disaster.

I think the marketing here is about right. This is no AI programmer, but Copilot. It is an intelligent assistant that does some mundane things for you with probability of failing some of that even, but when the stars align, you are in luck.

I see this as INCREDIBLY useful for certain niche of programming:

1. Front end. Some components are really trivial but still requires some manual rewiring and stuff, this could be the life saver.

2. Templates for devops. Those are as soul crushing as possible, and I couldn't think of a better domain to apply Copilot to it.

Overall, this is a huge win for programmer productivity, with reasonable compromises.


Indeed, using tools to manage complexity tends to make the complexity acceptable and leads to more complexity.


If we exclude the VM the code is running on, and the OS layer, and the kernel, and the micro-code, and the standard lib, then people also like to include library code, and also like to depend on third party PaaS and SaaS aka the "cloud"... If you do know what all bits of your code does you can send me a PM. cough All software is shit, with few exceptions. Not necessary because the developers don't know their stuff, but likely because of business priorities, politics, and layers of management. Software is a "people problem". So if we remove the "people" we might get better software ;)


Your fears seem justified, as per the site itself:

Whether you’re working in a new language or framework, or just learning to code, GitHub Copilot can help you find your way. Tackle a bug, or learn how to use a new framework without spending most of your time spelunking through the docs or searching the web.


Thank you. If one trains an AI on a corpus of mediocrity, one should expect a mediocre result.


The first thing that comes to mind is the recent article on the front page about the docker footgun/default that allowed a hacker to wipe a website's database.


Coding is inherently difficult -- any tools even as basic as color highlighting/spell checking massively help understanding code in front of you. There isn't a hope this can replace any programmers but instead aid their workflow. I great example is simply refactoring code with SOLID after building a feature or fixing a bug -- a lot of this can be easily automated. Having a machine suggest and a human accept is a worthy trade off. Another similar example is the Google bot that presents search suggestions for you.

I don't think your concerns are well grounded.


Next step will be AI to approve the code, because. if someone is producing 5k LOC a day there are people who need to read & approve this code...


If we didn't need programmers to do the programming, that would be a perfect world.


This is inevitably leading to the moment where we don't need humans, but I'm fine with that.


People have made some variation of this argument since the move from writing binary to writing assembly.

With every new layer of abstraction there’s more power.

The long term benefit of a tool that can do this well far exceeds what humans can do by hand, but that may not be true in the very short term.

Either way, I suspect the benefits to be big.


I disagree with the comparison. This isn't abstraction, it is syntax completion. As if you typed the first four bytes and GitHub (mostly correctly it must be mentioned!) completed the remaining.

Unlike an additional abstraction layer, the readibility is not increased.


The end goal of this based on what I've seen from Open AI examples and related beta projects is more high level language -> code.

"Write a standard sign up page" -> Generated HTML

"Write a unit test to test for X" -> Unit Test.

It's more than just syntax completion - I'd argue that's the beginning of a new layer of abstraction similar to previous new abstraction layers. The demo on their main page is more than syntax completion - it writes a method to check for positive sentiment automatically using a web service based on standard english.

This is extremely powerful and it's still super early.

I saw one example that converted english phrases into bash commands, "Search all text files for the word X" -> the correct grep command.

That is a big deal for giving massive leverage to people writing software and using tools. We'll be able to learn way faster with that kind of AI assisted feedback loop.

Similarly to compilers, the end result can also be better than what humans can do eventually because the AI can optimize things humans can't easily, by training for performance. Often the optimal layout can be weird.


Not at all: this tool does not encourage more powerful abstractions, but the very opposite.

It makes boilerplate cheaper to churn out.


That entirely depends on the quality of the suggestion, does it not?


You sound afraid you'll be replaced by software.


Are you not?


Nah.


Cargo-cult programming has always been a problem, but now we're explicitly building tools for it.

I get what you're saying, but I'm not worried. At the end of the day, the programmer has to understand the code they're submitting, both the fine grain and the holistic context. If they don't know how to, or can't be bothered at least curate the suggestions the tool is making... then your organization has much bigger problems than can be helped by reading a Dijkstra paper or two.


I worry about auto-complete on a more philosophical level. I’ve noticed with gmail that it’ll often suggest it’s way of either replying to an email or completing a sentence even though I’d never actually use those words in that situation, simply because it’s easier.

It’s a pretty bad feedback loop that robs us of our independent thought by way of falling victim to laziness, a fundamental human weakness. You can imagine something where in this case the code autocomplete is so large that you really want to make that autocompletion work, even if you know it’s not elegant or possibly even correct code… or maybe it’s just repetitive and not abstracted well, but here it is autocompleted and done so why would you fight that?

If we continue abstracting more and more of this way, based upon datasets that are averaged across everyone, we lose the individual in favor of the masses, bringing us all down to a common denominator.

If we must lose our humanity to the machine, I’d at least like to see an autocomplete from Peter Norvig’s code, or writing from particularly effective communicators or famous authors.


> feedback loop that robs us of our independent thought by way of falling victim to laziness

This was my first thought when I saw this. I intentionally don't use predictive text whenever I can to preserve whatever originality I have left.


Email is a just such a communication medium. It requires both prose at length, and also quick response. Even though the type of information passed around is usually binary: OK/NOT OK, or a status update.

Gmail offers a way to reduce the "prose at length" to a few buttons. And it adds the pleasantries for you.

You can think of it as a workflow automation tool. Your email chains are tickets and you're moving them through different statuses.

Notwithstanding, personal/intimate email is different and spending time writing a beautiful letter is a thing of its own.


Absolutely. While there's benefits and drawbacks to every technology, auto-complete in Gmail saves me so much time.

If I'm sending to a relative or family member for example, or even a professional contact, then I'm not going to be lazy and auto-complete. But when someone says "Does that sound good to you?" it's great to push a button that replies "Sounds great, thank you!"

Maybe I would have naturally written "That sounds great, thank you [name]", but what's the difference?


> Maybe I would have naturally written "That sounds great, thank you [name]", but what's the difference?

Social graces?


So if it was trained using "source code from publicly available sources, including code in public repositories on GitHub." was it also GPLv2?

So everything generated also GPLv2?


You bring up a really good point. I'm super curious what the legality and ethics around training machines on licensed or even proprietary code would be. IIRC there are implications around code you can build if you've seen proprietary code (I remember an article from HN about how bash had to be written by someone who hadn't seen the unix shell code or something like that).

How would we classify that legally when it comes to training and generating code? Would you argue the machine is just picking up best practices and patterns, or would you say it has gained specifically-licensed or proprietary knowledge?


I would argue that a trained model falls under the legal category of "compilation of facts".

More generally, keep in mind that the legal world, despite an apparent focus on definition is very bad at dealing with novelty, and most of it will end up justifying a posteriori existing practices.


You might argue that, but you would likely be wrong.

Even a search engine is not merely a "compilation of facts". A trained model is the result of analysis and reasoning, albeit automated.


A search engine provides snippets of other data. You can point explicitly to where it got that text from. A trained model generates its own new data, from influence of millions of different sources. It's entirely different.


> (I remember an article from HN about how bash had to be written by someone who hadn't seen the unix shell code or something like that).

I believe you're referring to Clean Room Design[1].

[1] https://en.wikipedia.org/wiki/Clean_room_design


This is a bit tricky, because at least in the U.S., I don't believe it's settled question in law yet. Some of the other posters on here have said that the resulting model isn't covered by GPL--that's partially true, but provenance of data, and the rights to it, definitely does matter. A good example of this was the Everalbum ruling, where the company was forced to delete both the data and the trained models used they were used to generate due to lack of consent from the users from whom the data was taken[1]. Since open source code is, well, open, it's definitely less a problem for permissively-licensed code.

That said, copyright is typically generally assigned to the closest human to the activation process (it's unlikely that Github is going to try to claim the copyright to code generated by Copilot over the human/company pair-programming with it), but since copyleft in general is a pretty domain-specific to software, afaik the way that courts interpret the legality of using code licensed under those terms in training data for a non-copyleft-producing model is still up in the air.

Obligatory IANAL, and also happy to adjust this info if someone has sources demonstrating updates on the current state.

[1] https://techcrunch.com/2021/01/12/ftc-settlement-with-ever-o...



> The case debates the legal right for Google to use copyrighted books in its training database in order to train its Google Book Search algorithm

That's not even remotely the same thing.


until the legal position is clear it you'd have to be insane to allow output from this process to be incorporated into your codebases

imagine if the output was ruled as being GPLv2, then having to go through a proprietary codebase trying to rip out these bits of code

it would be basically impossible


No, a model trained on text covered by a license is not itself covered by the license, unless it explicitly copies the text (you cannot copyright a "style").


But it actually is explicitly copying the text. That's how it works. The training data are massive, and you will get long strings of code that are pulled directly from that training data. It isn't giving you just the style. It may be mashing together several different code examples taking some text from each. That's called "derivative work".


No, that's not how it works.

"[...] the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set"

https://copilot.github.com/#faqs


If that's the case (only 0.1%), the developers must have done something that differs from other openai experiments that suggest code sequences that I recall seeing, where significant chunks of code from Stack Overflow or similar sites were appearing in answers.


So you're gambling on whether that the code that was generated or copied.


No you aren't. Courts will consider it fair use.


How are you going to prove it was the AI that generated the GPL licensed function ad verbatim from another project, rather than you just opening that project and copying the function yourself?


I will not. Courts will simply consider a single function not to be substantive enough piece of work to constitute unfair use.


use a bloom filter to skip/regenerate that 0.1%


Synthesising material from various sources isn't copyright infringement, that's called writing.

It's only infringement if the portion copied is significant either absolutely or relatively. A line here or there of the millions in the Linux kernel is okay. A couple of lines of a haiku is not. Copyright is not leprosy.


Google Books actually displays full pages of copyrighted works Google did not license. It was considered legal.

[1] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....


We all don't have Google resources. What if someone comes after us individually because some model-generated code is near identical to code in a GPL codebase? Where is the liability here?

edit: from https://copilot.github.com/

> What is my responsibility when I accept GitHub Copilot suggestions?

> You are responsible for the content you create with the assistance of GitHub Copilot. We recommend that you carefully test, review, and vet the code, as you would with any code you write yourself.

Well, that solves that question.


We are all vulnerable to predatory lawyer trolls, whether we do things correctly or not. If you are accused of reusing a GPL code, then you ask clarification on which and you rewrite. It is likely to be just a snippet. I doubt Copilot would write a whole lib by copying it from another project.

And yes, of course github is not going to take responsibility for things you do with their tools.


If you learn programming from Stack Overflow and Github, and then repeat something that you learned over your time at reading, that's not just copying text. That's having learned the text. You could say the human brain is mashing together several different code examples, taking some text from each.


hmm, so let's think this through.

Wouldn't that imply that a person who learned to code on GPLv2 sources wrote writes some more code in that style (including "long strings of code", some of which are clearly not unique to GPL) is writing code that is "born GPLv2"?

I don't think it currently works that way.


My guess is that it is, if we think of a machine learning framework as a compiler and the model as compiled code. Compiled GPL code is still GPL, that's the entire point.

Anyways, GitHub is Microsoft, and Microsoft has really good lawyers so I guess they did everything necessary to make sur that you can use it the way they tell you so. The most obvious solution would be to filter by LICENSE.txt and only train the model with code under permissive licenses.


> you cannot copyright a "style"

This line of thinking applies to the code generated by the model, but not necessarily to the model itself, or the training of it.


Thanks- in retrospect, I shoudl have explicitly said "code generated by the model".


The trained model is a derivative work that contains copies of the corpus used for training embedded in the model. If any of the training code was GPL the output is now covered by GPL. The music industry has already done most of the heavy lifting here in terms of scope and nature of derived works, and while IANAL I would not suggest that it looks good for anyone using this tool if GPL code was in the training set.


Well, it probably is explicitly copying at least some subset of the source text - otherwise the code would be syntactically invalid, no?


I can't say what's happening in GitHub Copilot, but it's not necessarily true that the only way to produce syntactically valid outputs is to take substrings of the source text. It is possible to learn something approximating a generative grammar.

Take a look at https://karpathy.github.io/2015/05/21/rnn-effectiveness/

At the same time, I would not be surprised if there are outputs that do correspond to the source training data.


Strictly speaking, you could train a model which does not contain the original source text (just the underlying language structure and work tokens), and generates ASCII strings which are consistent with the underlying generative model, that are also always valid code. I expect to see code generator models that explicitly generate valid code as part of their generalization capability.


There will almost certainly be cases where it copies exact lines. When working with GPT2 I got whole chunks of news articles.


I seem to remember a similar discussion on Intellicode (similar thing, but more like Intellisense, and as Visual Studio plugin), which is trained on "github projects with more than 100 stars". IFIR they check the LICENSE.txt file in the project and ignore projects with an "incompatible" license. I don't have any links handy which would confirm this though.


Could it be this? https://visualstudio.microsoft.com/services/intellicode/

I was wondering the same thing, especially with MS being behind both.

edited: or this? https://docs.microsoft.com/en-us/visualstudio/intellicode/cu...


My guess would be that the model itself (and the training process) could have different legal requirements compared to the code it generates. The code generated by the model is probably sufficiently transformative new work that wouldn't be GPL (it's "fair use").

I suspect there could be issues on the training side, using copyrighted data for training without any form of licensing. Typically ML researchers have a pretty free-for-all attitude towards 'if I can find data, I can train models on it.'


No, the code generated is what copyright law calls a derivative work and you should go ask Robin Thicke and Pharrell Williams exactly how much slack the courts give for 'sufficiently transformative new work.


My bet is that copyright law has not caught up with massive machine learning models that partially encode the training data, and that there will still be cases to set legal precedent for machine learning models.

Note also that it's not just a concern for copyright, but also privacy. If the training data is private, but the model can "recite" (reproduce) some of the input given an appropriate query, then it's a matter of finding the right adversarial inputs to reconstruct some training data. There are many papers on this topic.


It is almost certainly the case that current IP law is very unsettled when it comes to machine learning models and mechanisms that encode a particular training set into the output or mechanism for input transformation. What should probably scare the shit out of people looking to commercialize this sort of ML is that the most readily available precedents for the courts to look at are from the music industry, and some of the outcomes have truly been wacky IMHO. The 'blurred lines' case is the one that should keep tech lawyers up at night, because if something like that gets applied to ML models the entire industry is in for a world of pain.


You're missing the fair use aspects. Check out this article on fair use [0].

> In 1994, the U.S. Supreme Court reviewed a case involving a rap group, 2 Live Crew, in the case Campbell v. Acuff-Rose Music, 510 U.S. 569 (1994)... It focused on one of the four fair use factors, the purpose and character of the use, and emphasized that the most important aspect of the fair use analysis was whether the purpose and character of the use was "transformative."

It has some neat examples and explanation.

[0] https://www.nolo.com/legal-encyclopedia/fair-use-what-transf...


There are far more current precedents that apply here, and they do not trend in Github's favor -- as I noted previously, Williams v. Gaye (9th Cir. 2017) is going to be very interesting in this case. I am sure several people in Microsoft's legal department set parameters on the model training and that they felt that they were standing on solid ground, but I am also sure that there are a few associate professors in various law schools around the country who are salivating at the opportunity to take a run against this and make a name for themselves.


> So everything generated also GPLv2?

Almost certainly not everything.

But possibly things that were spit out verbatim from the training set, which the FAQ mentions does happen about .1% of the time [1]. Another comment in this thread indicated that the model outputs something that's verbatim usable about 10% of the time. So, taking those two numbers together, if you're using a whole generated function verbatim, a bit of caveat emptor re: licensing might not be the worst idea. At least until the origin tracker mentioned in the FAQ becomes available.

[1] https://docs.github.com/en/early-access/github/copilot/resea...

[2] "GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set. Here is an in-depth study on the model’s behavior. Many of these cases happen when you don’t provide sufficient context (in particular, when editing an empty file), or when there is a common, perhaps even universal, solution to the problem. We are building an origin tracker to help detect the rare instances of code that is repeated from the training set, to help you make good real-time decisions about GitHub Copilot’s suggestions."


I think this would fall under any reasonable definition of fair use. If I read GPL (or proprietary) code as a human I still own code that I later write. If copyright was enforced on the outputs of machine learning models based on all content they were trained on it would be incredibly stifling to innovation. Requiring obtaining legal access to data for training but full ownership of output seems like a sensible middle ground.


Certainly not. If I memorize a line of copyrighted code and then write it down in a different project, I have copied it. If an ML model does the same thing as my brain - memorizing a line of code and writing it down elsewhere - it has also copied it. In neither case is that "fair use".


1) this is not human, it's some software

2) if I write a program that copies parts of other GPL licensed SW into my proprietary code, does that absolve me of GPL if the copying algorithm is complicated enough?


Clearly this requires some level of judgement but this isn't new, determining what is plagiarism and not requires a similar judgement call.


What if I put a licence on my Github-repositories that explicitly forbids the use of my code for machine-learning models?


My interpretation of the GitHub TOS section D4 would give give GitHub the right to parse your code and/or make incedental copies regardless of what your license states.

https://docs.github.com/en/github/site-policy/github-terms-o...

This is the same reason it doesn’t matter if you put up a license that forbids GitHub from including you in backups or the search index.


Then the person training the models wouldn't be legally accessing your code.


And so it begins: We start applying human rights to AIs.

Not a critique on your point, which a was just about yo bring up myself.


IMO the closest case is probably the students suing turnitin a number of years ago, which iParadigms (the turnitin maker) won [1].

I think this is definitely a gray area and in some way iParadigms winning (compared to all the cases decided in favour of e.g. the music industry), shows the different yardsticks being used for individuals and companies.

I'm sure we will see more cases about this.

[1] https://www.plagiarismtoday.com/2008/03/25/iparadigms-wins-t...


Is what a human generates GPLv2 because it learned from GPLv2 code?


What if a human copies GPLv2 code?



When is it copying? What about all those stack overflow snippets I copied?!


Congrats, you've just discovered why many employers block or forbid stackoverflow.


OMG.

There is no such thing as "fair use" as we have in copyright law?


You can train models on copyrighted materials. https://towardsdatascience.com/the-most-important-supreme-co...


IANAL, but my interpretation of the GitHub TOS section D4 would give give GitHub the right to parse your code and/or make copies regardless of what your license states. This is the same reason the GitHub search index isn’t GPL contaminated.


Developers Human brain are also trained with propietary code bases, then when they quit and go elswere , they program using knowledge learned previously, yet you do not sue them.


We kinda have to accept that - we don't have to accept this. You can't interface with humans but you can interface with one of the biggest corporate tech giants straight up leeching from explicitly public and free work for their own private benefit.


Its not cause the License says nothing about training. I mean every oss dev's brain would be under GPL then.


https://en.wikipedia.org/wiki/Clean_room_design

There're definitely cases when devs avoid even looking at implementation before creating their own


So this is what happens when you're owned by Microsoft who has an exclusive contract with OpenAI.

A couple weeks ago I ran a few experiments with AI-based code generation (https://news.ycombinator.com/item?id=27621114 ) from a GPT model more suitable for code generation: it sounds like this new "Codex" model is something similar.

If anyone from GitHub is reading this, please give me access to the Alpha so I can see what happens when I give it a should_terminate() function. :P


I saw that post, neat stuff. We made an attempt to develop something similar 4 years ago and take it to YC, it simply wasn't good enough often enough because our training data (Stack Overflow posts) was garbage and models were weaker back then. I figured it would take about 5 years for it to really be useful given the technology trajectory, and here we are.

I'll note that we weren't trying to build "code auto-complete" but instead a automated "rubber duck debugger" which would function enough like another contextually-ignorant but intelligent programmer that you could explain your issues to and illuminate the solution yourself. But we did a poor job of cleaning the data and we found that English questions started returning Python code blocks, sometimes contextually relevant. It was neat. This GitHub/OpenAI project is neater.

I would be curious what the cost of developing and running this model is though.


This is very impressive!

OpenAI’s tech opens an ethical Pandora’s box:

1. It’s clear that the raw inputs to all of OpenAI’s outputs originated with real, human creativity.

2. So, in a sense, OpenAI is laundering creativity. It reads in creative works, does complicated (and, yes, groundbreaking) transformations, and produces an output that is hard to trace to any particular source.

3. Yet, isn’t that effectively what human brains do too? Perhaps OpenAI lacks the capacity for true invention, but I’d argue that most people live their whole lives without a meaningful creative contribution as well.

All told, I don’t have a good framework for thinking about the ethics here. So instead, I’ll simply say:

Wow.


> Yet, isn’t that effectively what human brains do too?

If I want to watch a bunch of movies, I have to pay the theater for each movie, or pay netflix, or whatever. The screenplay I write afterwards belongs to me, but the learning process involved me paying for access to others' work. That's what's often missing here. But at the same time, if you train on legally public data, there's no 'theater' to be paid.

(Often, people train on illegally public data though, like the eleuther folks. That's a whole extra can of worms I've ranted about plenty).

Maybe we'll start seeing licenses with a section saying "not for use as training data for commercial models."


> Maybe we'll start seeing licenses with a section saying "not for use as training data for commercial models."

Considering that the impact of a single example is extremely small in training a model, and that it is trained on an ungodly amount of examples, then I wonder if the effort of forbidding its use has any real benefits.


I would change your question from “does it have any real benefits” to “does it have a practical effect on the model”

Benefits to me are clear: giving a developer choice over how their source code is used with for-profit, opaque, next generation ML models.

But yes, drop in the ocean in terms of the full data set. But that shouldn’t be an excuse to remove user choice.


Yes, of course it does, because if every user opted out then the model would not work as well as it does, and github would not be able to profit off the work of others to the degree they are (or will be). Just because they are taking code on a massive scale does not mean the outcome is inevitable: don't get it twisted, copilot only works because of the code human beings have written.


What are some ethical problems that could emerge from the box? Maybe unfair competition from having very good tools compared to other programmers, or havin irresponsibly shallow understanding of what the produced code does?


> What are some ethical problems that could emerge from the box?

Being put out of job by an AI trained on your own code?

It's really the same ethical problem of all automation ... and will be as long as we need a job to fulfill basic needs like food, housing and medical care.


Everyone gets into programming for their own reasons.

But to my personal philosophy: if I'm not coding to put myself out of a job, I'm thinking about the problem wrong.

When there are no more lines of code to be written, I shall do something else, content that I have done my part to free humanity from the burden of human-machine interfacing. I hear dairy farming is a demanding and rewarding challenge.


Dairy farmers aren't really looking for workers ... ironically, it is a job that has been almost eliminated by automatic milking machines and robotic harvesters.

https://www.thebullvine.com/wp-content/uploads/2014/03/Figur...


Programmers change jobs like socks and are open to learning new things all the time. Software has been automating itself for 70 years and look how many people have jobs in this field.

Also, human desire is a bottomless pit, where automation saves we spend even more.


I wonder if HLL compiler authors had fears about this back when writing assembly and machine code was the norm.

But good point about the ambivalent result of eliminating busywork. Food, housing and medical care is available in most western countries for people who choose to not get a job... I think the social status problem and guilt of freeriding are also big factors preventing prople from living more leisurely lives in these countries.


There are some really weird licensing problems. Like does your code license say they can use the your code to train AIs that then reproduce very similar code to you but with no attribution etc in someone else's codebase.


s/OpenAI/Photoshop

Reads similarly :)


> If the technical preview is successful, our plan is to build a commercial version of GitHub Copilot in the future.

This may be the first time that a proprietary coding tool offers such a great value preposition that I am actually interested in trying it out and potentially even paying for it. It's also a bit concerning that this will probably be extremely hard, if not impossible, to create an FOSS version of this technology, just because of the immense amount of computing power, and by extension money, needed to create GPT3.

I'm not that comfortable with the idea of a future where proprietary AI-based solutions and libraries (e.g. automatic testing libraries, which have been mentioned here a few times) are so powerful that I'll be forced to use them if I don't want to waste my time.


Well, it should be possible to crowdsource training a FOSS version, right? There should be a SETI-at-home for training neural networks. I would donate some GPU power for sure.

SETI@home achieved 50 times the computing power of the world's largest supercomputer [0], so it might actually be the only way to train the future GPT4 or GPT5.

[0]: https://en.wikipedia.org/wiki/SETI@home#Statistics


At the moment crowd-sourcing ML training doesn't work very well because updates to the model's weights have to be shared between all of the nodes in the computation periodically. Giant models tend to be trained on clusters with very fast interconnects for this reason.


Distributed GPT training doesn't really work (sadly). Maybe someone will get a good solution to that, though.


Says the person who likely owns a washing machine, sink connected to plumbing, microwave, stove, lighters, clothes made with a sewing machine, ect. ect.

GPT-3 will take way less time to make a good substitute that costs the power of compute than other historical time saving technologies. Unlike other historic technologies, they pretty much spell out exactly how to do it, and own no patents related to its creation. I have trouble seeing the downside.


I see your point, but aren't you making me into a bit of a straw man? When did I say that I was some open-access Luddite who won't use any technology that they can't build themselves. I just like the current state of programming, where I can productively build new and exciting things without having to rely on a lot of proprietary libraries or tools. Nothing more, nothing less.


Luddite: a person opposed to new technology or ways of working.

Seems like you came up with the perfect word to describe your view on this. :P

I defintiely do think there is reason to worry about the future you're imagining happening, but as someone who has read through the papers on gpt, this is in no way going to end up being an exclusive proprietary API, so it'll very soon have open source and likely free or at compute cost options.

And well, if it doesn't make us better, no one will use it. If it does, you'll have to adjust to remain competitive, just like millions of professionals for going back millenia who encounter new technology in their chose and vocation. Ultimately, if it makes us better, and doesn't enslave us to the whims of a monopolistic holder (which the point of my post was to conjecture that it likely wouldn't) then it'll probably (though not defintiely) be better for us long term.

Hopefully it doesn't erode programmers abilities like spell check seems to have eroded people's spelling abilities but I fear that's a likely side effect.


Fair enough, it seems like we mostly agree on how technology generally affects the world. I can't say I currently possess the domain knowledge to confirm your views on GPT, so I'll just have to hope you are right.


There is a FOSS version of GPT-3 being worked on by EleutherAI, but they are only at 6 billion parameters so far, while the largest model of GPT-3 is 175 billion (more parameters is better (usually)). EleutherAI is getting more compute from CoreWeave to actually train the GPT-3-like FOSS model, so that's something to look forward to eventually :). Github Copilot uses Codex though, which seems to be a GPT model trained on just code by OpenAI. It wouldn't be too hard to train a FOSS version of Codex on the open-source code of Github and other sources.


it is absurdly ironic that OpenAi is turning out to be one of the biggest opennes perpetrators in the near-future/present.


For a long time now I've thought that AI would have a really interesting role to play in developer experience, though this isn't really the form I think it should take.

I think it would make the most sense as a really advanced static-analysis tool/linter/etc. Imagine writing something like C where whole classes of errors can't be checked statically in a mechanical way, but they could be found by a fuzzy ML system automatically looking over your shoulder. Imagine your editor saying "I'm not sure, but that feels like a memory error". And of course you can dismiss it "no, good thought, but I really meant to do that". Imagine an editor that can automatically sniff out code-smells. But the human, at the end of the day, still makes the call about what code to write and not write; they're just assisted with extra perspective while doing so


You are looking for Haskell.


GitHub Copilot may suggest old or deprecated uses of libraries and languages

This raises two questions.

- is there a way (right now or planned for the future) for library maintainers to mark suggestions to be removed from the suggestions? I can foresee Copilot being used as a source of 'truth' among less experienced developers, and getting people turning up in the Issues or Discord asking why the suggestion doesn't work might be a bit jarring if the maintainers have to argue that "Github was wrong."

- if a library is hosted on Github is there a way to mark some examples as Copilot training data? Maybe by having a 'gh-copilot' branch in the repo that represents high quality, known-good examples of usage?


I'd be pretty worried about this based on how many times an SO question has new better answers months/years later. I find that platform self-corrects reasonably well, with the newer better answer ending up at the top, but no idea how that'd happen here.


> I find that platform self-corrects reasonably well, with the newer better answer ending up at the top, but no idea how that'd happen here.

I don't. Very often I run into an outdated accepted response with much better and more recent replies below it.


Does this solve the right problem? Getting some code on the page has rarely been the expensive part of building something. Indeed, some long-ago experience with code generators suggests that making it easy to create code makes many problems worse down the line.


This is my concern. This could end up generating great swathes of code that no one understands, so that when it breaks it takes much longer to fix.


I agree, I feel like it might be useful for whichever programmers regularly have to search Stackoverflow and then copy paste code snippets.

Then I feel like useful code produced by this tool will have to be treated in the exact same way as a rigorous code review: going through every part of the logic and ensuring it is correct. This seems like just as much or even more work than writing it yourself (if it is written in an unfamiliar way, you might need more time to wrap your head around it).


It's solving 'mechanical' problem. The optimistic twist on this helper is that it just raises the bar - human programmer should better be more useful than 'brainless' code generator - meaning not only being able to write a loop or solve leetcode task, but also understand context and what he's trying to solve for.

As you say typing code is not a bottleneck for problem solving


This is a very good point.


One of the examples they provide on copilot.github.com shows a unit test for strip_suffix function. It does not test for a file name without a suffix, which the function would fail (it removes the last character instead):

    def strip_suffix(filename):
        """
        Removes the suffix from a filename
        """
        return filename[:filename.rfind('.')]

    import unittest

    def test_strip_suffix():
        """
        Tests for the strip_suffix function
        """
        assert strip_suffix('notes.txt') == 'notes'
        assert strip_suffix('notes.txt.gz') == 'notes.txt'


Great, you got 2 assertions for free, which lowers some friction of writing tests. You should still be thinking and be "the pilot". When you start writing additional test cases, it will help you out with those too


Having to read, understand and spot errors in auto-generated code is not free.


One should keep in mind that it is just "copy-paste" on steroids (ok, maybe a gallon of steroids), but users should be cautious about the false sense of irresponsibility.

Because just like when they copy-paste the top answer on SO, at the end of the day they are responsible for the code they ship.


The example in the hero animation has a bug. The ${text} may not be correctly URL-encoded, which would make the body invalid. And because this sort of feature encourages people to blindly trust the machine and not think about what they're doing this error is much less likely to be caught.

Personally I think this whole class of feature only offers trivial short-term efficiency gains at the expense of long-term professional development and deeper understanding of your field.


I could see it becoming a liability, and projects may start using the fact that they don't use it as a promo thing in their marketing.

> Choose library.js for your next secure project!

> * Military-grade encryption!

> * Industry best-practices!

> * Extensively-tested codebase!

> * No Github Copilot!

This seems to me like someone had a cool idea for using GPT and wanted to experiment with it, but then Microsoft threw money at the marketing people and forced it to become a commercial product.


GitHub says you don't need to credit GitHub for any of the code suggestions, but since it's trained on public sources of code, anyone have a clue on potential licensing pitfalls?


From the FAQ at the bottom of the project showcase page[0]:

"GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set. Here is an in-depth study[1] on the model’s behavior. Many of these cases happen when you don’t provide sufficient context (in particular, when editing an empty file), or when there is a common, perhaps even universal, solution to the problem. We are building an origin tracker to help detect the rare instances of code that is repeated from the training set, to help you make good real-time decisions about GitHub Copilot’s suggestions."

[0] https://copilot.github.com/

[1] https://github.co/copilot-research-recitation


(not a lawyer) Copyright issues tend to involve the question of how transformative a work is. This means the code coming out the other end is probably fine. I don't know about the training side, though. Are there license issues in using copyrighted training data without any form of licensing? Typically ML researchers have a pretty free-for-all attitude towards 'if I can find data, I can train models on it.'


(not a lawyer) my interpretation of GPLv2 is that at the very least the model would be licensed under GPL if it was trained on GPL code. The model is a derivative work. Whether the generated code coming out of the model is GPL is trickier. I would lean towards yes but I'm not entirely sure.

I think talking about exact text matches to existing code is a red herring. If you took GPL code and ran it through an obfuscator that changed every byte of the code to new code, that resulting code would be derivative and would need to be licensed under GPL too.

Thank you Microsoft for ushering in a new era of free software.


Should I be impressed that the example parse_expenses.py on the home page doesnt include any error handling and uses a float for currency? This seems like it's going to revolutionize copy and paste programming.


It's a copilot. You're still the pilot. To be honest this seems like it can definitely save me a bunch of googling and let me stay in the ide.


Tell that to the new programmer who builds a piece of software using this creating an absolute mess.

The danger here isn't with experienced developers (this is, obviously, a tool with great potential for productivity). It's with people who just blindly trust what the robot spits out.

Once code like that is implemented in a mission-critical system without discernment, all hell will break loose.

Edit: worth watching this for context https://www.youtube.com/watch?v=ZSRHeXYDLko


So? If you're making a mission critical system, don't hire subpar developers.

It's not Copilot's problem. Powerful tools can be misused by anyone.


Tell that to the HR departments responsible for hiring developers at major companies.

Not hiring subpar developers, especially in a massive company isn't a matter of "if" but "when." And it only takes one screw up to crash an airplane because of software.

And guess who massive companies trust for their technology?

Microsoft.


in their own example the copilot aimed for the mountains with float() - thanks but no thanks


What is the hard boundary between you and your AI companion?

What happens when people think their "self driving" cars are more capable than they actually are? Many times, you are better off in a dumb car, because your expectations are such that you have to pay attention to a continuous stream of events and respond in a stateful manner.

If you try to bootstrap a holistic understanding of your problem in between bursts of auto-generated code, I don't think you are going to have a fantastic time of it.


The output is great for a quick one-off script. Maybe if you make the comments look more “enterprise-y”, it’ll go for more careful code?


I would say the use of float makes it a nonstarter - even for "quick one-off" scripts. That's a fundamental error in the generated code. It maybe looks correct at a quick glance but it's only introducing subtle errors to find down the line.


Consider the following chain of events:

  - I write GPL'ed code.

  - Someone uses this tool to develop a proprietary solution. 

  - I later prove that the tool generated code that is equal to mine.
Now the code of the proprietary solution must be GPL licensed! Cool!

How I'd defend myself? I'd only use such a tool if there are guarantees of the licenses of the code it was trained on. Without such guarantees, it is just too risky.


So the AI doesn't actually understand the code does it? It only looks for similar things. So if I'm writing a game in Rust using the hecs ECS library, how is it going to help me? How many other people have written a game in that language using that library in this genre trying to do this task before? Probably very very few.

And yeah that's a niche example, maybe this is super helpful for writing a react app or something very library heavy. But gut feeling without trying it is that there's no way this could actually work on anything but the most common tasks that you can google and find solutions for already, just made easier so you don't actually have to google for it.

When I used tabnine which learned from you as you edit, it was fairly helpful for very repetitive code within the same project. But it was no where near "read english and write the code I meant for it". I'm curious to know how well this actually performs for non-common tasks, and whether it can understand ideas in your codebase that you come up with. If I make a WorldAbstractionOverEntities thing in my code, and then later use it in the project, will the AI be able to help me out? Or is it going to go "sorry, no one on github has used this thing you came up with an hour ago, I can't help you". An AI that could understand your own codebase and the abstractions you make and not just popular libraries would be infinitely more useful imo.

That said, I haven't tried this, maybe it'll turn out really good.


> how is it going to help me? How many other people have written a game in that language using that library in this genre trying to do this task before?

Because you write code using a bunch of patterns: iterating over data, destructing an object, calling functions etc. If you can generalize the usage of these patterns in such a way that it understands the context where you want to use them you're essentially doing this Copilot thing.

I agree that it'll be hard to have it understand fully the context and paradigms behind your existing project, but if it can help me automate some things in such a way that I can just let it run its thing and than have me "poke around it to get it right" then this is still amazing.


I have no idea how well Copilot will work in practice, but, in principle, imagine a huge Cartesian space where every program in Copilot's training data is a point. Call this space S. Now suppose S contains two programs P1 and P2. Copilot should be able to represent P1, P2 and any program in S that is on a gradient between P1 and P2. If you want to write a new program, P3, such that P3 is in S between P1 and P2, then you're in luck. Otherwise you'll get garbage.

This is explained in more detail in the article below; see the first section titled "Deep Learning: the geometric view":

https://blog.keras.io/the-limitations-of-deep-learning.html

In other words, Copilot should work well for boilerplate code and allow for many variations, but for anything more original it should be hit-and-miss. In principle. In practice, we'll know in a year or two, once enough people have used it. Or not even then.


Generating new code is not very hard for humans. But maintaining, extending, debugging code - often that’s where the real challenges are. Will the AI copilot be able to fix bugs in its code based on bug reports?


Calling it now, there will be a "Copilot considered harmful" post.

If you need to go through the suggested code to ensure it's correct, you may as well write it yourself?

If you glance at it and it looks about right, you can potentially overlook bugs or edge cases, you'll lose confidence in your own code since you didn't properly conceptualise it yourself.

Potentially for newer developers it robs them of active experience of writing code.

Much like learning an instrument, improvisation, or say physics, a lot of people learn by doing it, even if it's grunt work. IMO this is necessary for any professional.

Maybe it will be seen as a crutch, maybe I'm getting old? I have tons of code snippets, but it's usually stuff I've written and understood and battle tested, even if it was initially sourced from SO. Having it in the text editor and appear out of nowhere with no context seems like it'd need some adjustment in apprehension.

Edit: I should have been clear, I'm not against others using Copilot and will try it out myself. I can see it being useful in replacing one-line libs like in nodejs, i.e., copying a useful well-known and needed snippet vs installing yet another lib that could be a sec issue.

Also the industry is the real gatekeeper—we have tools that don't require us to repeat prior-art, yet have to go through hurdles of leetcode-style interviews for a job. Maybe in the future the hardest part of being a (AI-driven) developer will be getting a job?


> If you need to go through the suggested code to ensure it's correct, you may as well write it yourself?

Not really. People are generally far faster at reading something and evaluating whether it's correct, than at writing something. In the same way it's faster to read a book than to write one.

Not to mention the time it takes typing, fixing typos, etc.

So this could genuinely be a huge timesaver if it helpful enough of the time.


I completely disagree with you. Reading code for correctness is difficult and not something most people do well at all. Reading code and reading for correctness are not the same, and most developers can write code a lot faster than they can verify it.


I'd say it's more nuanced. Reading code properly _is_ writing code. Ie i have to work through the logic as if i'm writing it, which is effectively writing it in my head, before i know if that's what i believe to be optimal.

I can _just_ read the code of course, and understand what it does - but just reading isn't analyzing it to the degree you do when you review/write the code. In that level of analysis you're looking for edge cases, bugs, etc. Reasons you'd write it differently. Which i suspect is functionally similar, if not identical, to writing it.


It's pretty much this, but it's harder because if someone else wrote it, there's a level of indirection between how you would have written it and how they did which tends to need a bit of extra mental resolving/processing for correctness.


I guess we just disagree?

Honestly I don't even see how that's possible. Writing code, you're thinking about all the different ways to do it, eliminating the ones that won't work, evaluating the pros and cons of the ones that seem like they'll work, you start writing one and then realize it actually won't work, then start writing it a different way, try to decide what the best approach will be to make sure you're not committing an off-by-one error, and so on...

Whereas when you're reading code for correctness, you're just following the logic that's already there. If it works, it works. How could it possibly take longer than the whole creative process of coming up with it...?

Sure, maybe most people don't read code for correctness well. But then the code they write is surely even worse.


>Whereas when you're reading code for correctness, you're just following the logic that's already there. If it works, it works. How could it possibly take longer than the whole creative process of coming up with it...?

That's exactly the problem. If you "just follow the logic" you can miss important details or edge cases that you would be forced to deal with by coding it yourself.

I wouldn't mind using something like this for mundane tasks, but I would be very careful with these tools while developing high performance code intended to run on specific hardware.


It's funny, I guess I'm just the opposite.

If I'm reading code, I can give 100% of my attention to the logic and details and edge cases, so I'm more likely to pick them up.

While as I'm writing, I'm busy doing all of the stuff that writing code involves, so I'm more distracted and more likely to make mistakes.

This gets proved to me time and time again when I run something for the first time and have to debug it. I look at the offending line, and think -- how could I have made a mistake so obvious that it's immediately apparent? Well, because I was busy/distracted thinking of 20 different things while writing it. But it's immediately obvious when reading it, because it has my full attention now.


>most developers can write code a lot faster than they can verify it

what? so people just write code and never read it back?


Verifying code and reading it aren't the same thing. And yes, most developers don't verify their code as carefully as they should. But also, there are blind spots to verifying your own code because brains take shortcuts. At the same time, there are difficulties verifying other people's code because of different shortcuts brains take.

There was a simple function in java standard library which was wrong for years because of this phenomenon.

https://dev.to/matheusgomes062/a-bug-was-found-in-java-after...


Sometimes you don't quite know how to implement something, without thinking about it for a while. All of us would a lot of the time search StackOverflow for the solution to a simple problem, e.g.

"recursively list all the files in a directory C#"

https://stackoverflow.com/questions/929276/how-to-recursivel...

I imagine an AI copilot could streamline this, instead of searching, reading and verifying, copy pasting, and changing the variable names to my needs, I could now just type the method name, arguments, and documentation and it would similarly fill out the code for me. Then I have to check it (as I normally would).


That’s what a REPL and automated tests are for.


Automated tests find all bugs and replace code review? I'd love some of those drugs!


No, of course not. I meant understanding what you write, while you write it. Also to add a bit of nuance: I typically spend much more time reading and thinking rather than writing. But I read REPL output, logs, test output just as much, sometimes more than the actual code.


I’m going to pile on the disagree train. My experience is that developers find (other people’s) code much harder to read. I suspect this tool will lead to code with subtle problems because people will skim it, shrug “eh, looks about right,” and move on.

Edit: In fact, people in this thread are finding exactly those problems in the example code, which you would assume had been checked fairly carefully.


idk I always find writing code easier than reading code. Perl is a fantastic example of this.


Honestly, this attitude (dismissing the feature even without trying it out) comes across as insecurity and gatekeeping.


[flagged]


There's that insecurity thing he was talking about.


Not sure what is confident about doing pair programming with an AI, josh.


Is “you’re gatekeeping!” the new “first comment!”? I fail to see the information gain from adding this message to every thread.


I honestly think this is solving a real problem with commonly used languages and their lack of syntax abstraction and expressiveness.

I can imagine this being very useful in helping to type out what I consider to be „mechanical noise“: Things that you have to type out to satisfy an expression rather than to convey semantics.

A good example of how this type of noise manifests:

Observe two programmers, both being similarly strong in terms of many concepts except for mechanical expertise. One uses the editor as an extension of their body, it’s beautiful to watch, the other stumbles awkwardly over the code until it’s finished. You can observe the latter in programmers who are very smart and productive, but they either didn’t train their mechanics deliberately or maybe they lack that kind of baseline eye hand coordination.


Couldn't disagree more:

Smart programmers are are not coding thousands of lines of code every day. There's so much more to in software engineering other than coding, that's why senior engineers spend less time writing code than juniors.

If a slow typer is using a auto-completer, he's not learning to type faster.

If auto complete fails, then the slow programmer will need to invalidate the code first, and then type it anyway.

All in all - maybe it's impressive AI research project, but I don't see it as a useful product.


Maybe I reacted optimistically because your objections are reasonable and relatable. But I‘m still very interested in how this will actually play out. I want to see and feel it in action.


It is even entirely possible that this approach hits a middle-ground that serves the corporate software-development space better than highly-flexible languages.

The difficulty with high flexibility is that the expressions become very domain-specific very quickly, creating the challenge of learning the new abstractions. So one isn't just a LISP developer, one knows how to write in the specific forest of macros that have been built up around one specific problem domain. The end result is code that means nothing to a reader who doesn't have a dense forest of macro definitions in their brain (at least in this era, their IDE will likely helpfully pull up the macro definitions with a mouse-over or cursor-over gesture!).

Contrast with this approach, where the complexity of abstraction is being baked into the lower-flexibility language. The code is less dense, and that's a tradeoff... But grab any 10 developers off the street with experience in that language and have them read it and 8 of them will likely be able to tell you with some accuracy what the code is doing. Not a trick I've seen possible with even very experienced LISP developers on a codebase they've never seen before.

... and, of course, being able to grab a random 10 developers off the street and have 8 of them up-to-speed in no time at all is crack cocaine to big businesses with large and complex systems maintained by dozens, hundreds, or thousands of people.


I’ve seen macro heavy code that is very semantic and declarative. It’s a powerful tool, so it is natural that people need to learn and fail until they use it well.


I honestly think this is solving a real problem with commonly used languages and their lack of syntax abstraction and expressiveness.

Do you think there's a risk that a tool like this could lead to an explosion of the size of codebases written in these languages? It's great that programmers can be freed from the need to write boilerplate but I fear that burden will shift to the need to read it.


Damn that’s the best objection I read so far. Writing readable code is already hard and something I aspire to.


Think of it as a junior dev working under you and doing the grunt work of typing in your ideas. Sometimes he can StackOverflow a better snippet than you can write on your own, you will probably learn a bit from it, but it won't surprise you.

It is no different from a code review of another perhaps junior dev and only doing adding finishing touches.

There is plenty of boilerplate you have to write, Intellisense/ Auotfill only goes so far, this is next step in the evolution. Sure it is not perfect but if i can express my ideas faster, why not.

Also It is a very probably poor tool for new devs, they won't know that suggestion maybe not the best and probably won't ignore it when it is wrong as they won't know any better.


Most people on HN will probably be fine for a while. This innovation though, once properly developed, could completely screw over anyone wanting to enter programming.

Code completion might be the new "junior dev."


All the computer scientists[1] at one point considered software developers and IT in the same light as higher level tooling evolved .

While sure purist view is not wrong that average quality of outcome has dropped since 70s-80s, the quantum of throughput meant that impact has been positive and immense.

Similarly I am expecting this kind of tooling would open up to more types of new developers.

[1] all the mathematicans thought similarly perhaps during 50s and 60s.


> Potentially for newer developers it robs them of active experience of writing code.

And for those with experience, this will be obvious when reviewing their code. There's only two possibilities -- either copilot will get so good that it won't matter, or code written by copilot will have obvious tells and when someone is over-relying on it to cover up for a lack of knowledge, that will be very clear from repetition of the same sorts of mistakes.


I imagine, for a while the complexity that Copilot can handle is limited to what most people would pull from stackexchange anyway. And if it helps autocomplete documentation and provide automatically generated (more descriptive) function/variable names, it will probably be a net positive for those limited use cases.

That said, I can't wait to read the first postmortem where someone deployed code generated via Copilot that has a bug. I just hope it's not on a rocketship or missile guidance system.


As a hobbyist this is going to save me so much time. Little things like googling how to read CSVs in python for the 20th time add up and I think this should help solve that.


I'm not quite old enough to remember: when high level languages like C gained wide adoption, how much pushback was there from the philosophy that if you're not writing it in assembly, you're not really writing code?


All of the same can be said for copy-and-pasting code you find in a tutorial in Google search results or in a Stack Overflow answer. This just seems to be automating that process even further.


Those extra steps can be valuable, since you'll have to work to even find the right code to copy/paste, and the context which it's in can teach you something.

Even something as simple as copying from the docs, it's usually a good place to signal deprecation, use-case applicability, API updates etc. you lose all that with the automation.

Oftentimes there's also discussion around a solution, and in many ways can swing one's decision on whether to use the code or not.


I wonder if there's any potential for Copilot to suggest malicious code because it's been trained on an open source projects containing intentionally malicious code.


Maybe not malicious per se, but certainly I'd be concerned about seemingly-correct but actually-wrong code being suggested. Considering how often the top StackOverflow answer is slightly wrong or how often antipatterns crop up across various projects, I'm sure the training data is nowhere near "perfect code" - implying the output cannot be perfect either.


Since it is per line I highly doubt it. I think of it as intellisense+. You select suggestions that you would have written anyway.


or broken code :D


The averageRuntimeInSeconds example does not check for division by zero so it creates broken code at least 20% of the time based on the examples on the homepage :)


nothing more than what I was expecting :D


I can see in their FAQ that there are plans to monetize this. So they're building an ML model that feeds on the work done by millions of people and then selling it? How is this even ethical? Not to mention we'd be feeding the model while using it. Guess this is another instance where we are becoming the product.


If I spent a lot of time reading open source repos on GitHub to teach myself to code, and then went out and got a high-paying job based on that knowledge, is that ethical? This seems roughly analogous to what the machine is doing.


Regardless of the legality, one of these situations is clearly ethical compared to the other. In the case were you get a job based on your knowledge of GPL software, you still must respect the license if you use that code commercially (i.e. at your new job). And yes, if you reproduce GPL code you "learned" from, you are violating the license.

A company ingesting an entire GPL codebase without warning or any way to opt-out in order to create a closed-source feature that they and only they will profit off of is clearly not the same as an individual reading the code and getting a job based on those ideas.


But the points you listed are the same whether it's a person or a machine:

- if you reproduce GPL code verbatim, you're in violation*

- no warning that somebody/something is ingesting the repo

- no way for the repo to have opted out

- closed source (you can't get the source from somebody's brain)

- private profit

* Do we even know if the machine is more or less likely to do this? Humans are certainly capable of it.


You're intentionally conflating scale here to make them seem the same.

> no warning that somebody/something is ingesting the repo

An individual reading code on their own time is not the same as ingesting terabytes to train a machine. No matter how much you believe in AI working similar to the human learning (it doesn't), they are not comparable.

> private profit

Again, the difference between an individual reading code to work for a salary is orders of magnitude different from ingesting terabytes of code so a company can create a new feature. Claiming these things are the same only makes sense if you ignore the massive differences in scale and the differences between how humans and machines learn.


I didn't specifically craft those points to suit my argument, it was good faith paraphrasing of exactly what you wrote.

Is software doing something at scale unethical? Is it unethical to use software for profit? I'm afraid almost all programmers are guilty of both.


> Is it unethical to use software for profit? When the licenses explicitly say that if you use the software for profit it requires attribution, the answer is clearly yes. My code on github is licensed such that if you use it, you must say where it came from. The only way this isn't at the very least unethical (because it goes against my wishes as owner of the code) is if you argue that github isn't "using" the code, which clearly isn't true, because if everyone was able to opt out there wouldn't be a product for github to be working on at all.


I'm sorry if I'm derailing the discussion here but "copilot" really is a much better phrase to use for assisting software than "autopilot". If more companies would choose phrases like these that accurately emphasize that the human is not being replaced but assisted (no names mentioned) I think it would benefit everyone in terms of clarity.

Sure, you might say "it's all marketing and if AP exists nobody would buy CP", but I don't think it's that simple. Customers understand when their expectations are being met, exceeded, or let down.


Copilot is to make you feel like you're the one in control, when in reality you, the user, are training it for free.


This seems to work really well in cases where you're just laying down boilerplate. A few cherry-picked comments seem to suggest that React components are an ideal use case - which makes sense, that's a lot of munging and syntax to just render some strings.

However, I find the process of writing these sorts of functions cathartic and part of the process to get into zen-mode for coding. I think I'd feel less joy in programming if all of this was just done by glorified commenting and then code-review of the robot.

I like to think of coding in terms of athletic training, which usually is comprised of difficult tasks that are interspersed with lighter ones that keep you moving (but are giving you a bit of a break). Training for soccer teams often involved lots of sprinting and aerobic exercise - and in between those activities we would do some stretching or jogging to keep our body moving. These sorts of small functions (write a function to fetch a resource, parse an input payload, etc.) are when my brain is still moving but getting ready for the next difficult task.


I hate to be a downer, but ultimately the people doing the planning and allocating budgets at non-tech companies do not care about the nuances of workflows. They'll lay as much cognitive load on you as is allowed, and they'll always have another body to bring in if you end up quitting.

The value proposition here is clear: "This tool will make it harder for your engineers to do time theft. What used to be an hour's long effort of painstakingly hand-coding boilerplate as a way of 'taking a break' will now be at most 5 minutes worth of easy code review."


> However, I find the process of writing these sorts of functions cathartic

That may be true for individual contributors, but if you're trying to build a company from scratch, any help you can get to move faster is a good thing, cathartic or not.


I tried the paid version of tabnine and was really unhappy because it suggested me code with syntax errors and introduced subtle bugs when I did not closely review every generated line. It was as if you have someone very impatient sitting next to you typing before actually really listening what you want to do. Is Copilot better? Does it suggest broken code, too?


According to some of the comments here: yes, yes it does. One do the snippets on the front page has a bug, don't remember which, but that was written by someone here.


I knew this day was coming but it still stings. What I'm interested in is making copilot the main pilot.. i.e. for web development, why not just let this thing go on its own and have a separate module that attempts to compile the code + observe the differences in a web page functionality? No longer so crazy to say that shit like that is on the horizon. Then the middle managers can have their dream come true, they are the true benefactors! They can 'code' with copilot, and just endlessly iterate with it while figuring out what they want until they get a result they're happy with.


I'd like to see something like that, but with knowledge about every single file in the codebase, and running locally.


Yeah, running locally would be my preference. I get "Antitrust (2001)" vibes from this, but that's the tinfoil hat side of me.


Like this?

https://visualstudio.microsoft.com/services/intellicode/

I bet it's the same people, trying to push their crap into all sorts of successful products.


The IntelliCode and Copilot teams have been collaborating closely together, since we want them to provide a "better together" experience. However, the underlying tech isn't the same. Copilot is powered by OpenAI Codex, and enables rich code synthesis via a cloud service. Whereas IntelliCode uses multiple local models, to enhance various parts of the editor (e.g. prioritizing the completion list based on your context, detecting "repeated edits" and suggesting additional refactorings).


As far as I know that also runs remotely.

Edit: looks like I remembered incorrectly: https://docs.microsoft.com/en-us/visualstudio/intellicode/ov...

> No user-defined code is sent to Microsoft


There is Tabnine that can work like this


Last time I tried Tabnine it wasn't really of much use to me, the top of the line GPT-3 is a much much bigger model, it should be able to do much more intelligent things.


But gpt 3 won't run locally, so no thank you.


Like an IDE?


Yeah but smart.


From the FAQ:

https://copilot.github.com/#faqs

>> How does GitHub Copilot work?

>> OpenAI Codex was trained on publicly available source code and natural language, so it understands both programming and human languages.

I want to propose a new Bingo style game where you get points when AI researchers (or rather their marketing teams) throw around big words that nobody understands, like, er "understands" with abandon. But I can't be bothered.

An AI researcher called Drew McDermot warned against this kind of thing in a paper titled "AI meets natural stupidity":

http://www.cs.yorku.ca/~jarek/courses/ai/F11/naturalstupidit...

In 1976.

Why doesn't anybody ever learn from the mistakes of the past?


In AI research lingo "understands" really means "makes decent correlations".


Please provide a reference for this explanation.


No, I'm serious about this: according to whom?


I like the concept, but just like with kite, having part of my code sent to the remote service is a not going to be ok for many of my projects.

For foss ones it could be great though.

I think the best part for me is that how it's going to introduce even more low code devs to the worker pool, which means I will be able to raise my price again. Last time this happens, when designers got to the backend, I got +30% in a year once my clients figured out the difference in output.


So now I make a bot to upload repositories of intentionally buggy code so that when people blindly use this autocomplete my hacking becomes easier!


Awh man, brilliant idea!


I'm curious if this can reveal secrets (I realise this was trained in already publicly readable code), what happens if you do "var api_key = " what is the auto-completion?


I hope it would suggest something like `//process.env.API_KEY` to get users in the habit of not storing secrets on repos.


How is GitHub/OpenAI ensuring this tech doesn’t throw the industry towards a flawed human/tech capability or dystopian of sorts?

Like… a bubble of developing code from yesterday’s code (plenty flawed itself) and with exponential growth based on feedback loops of self-fulfilling prophecy —— I’m assuming copilot (and every openai variant to come) will essentially retrain itself on new code developed,which overtime might all be code it wrote itself.

Did we just create a hamster wheel for the industry, or high-rise elevator?


I do think this turns into a modern stack overflow however. If observing local runtime errors, debugging process, and tidying/fixing code it produces itself. Plus train on all the SO questions out there.


This looks awesome! And I'd really like to try it out.

2 security thoughts that I couldn't find answers to:

1. how does the input fed into OpenAI codex filter out malicious code pollution? or even benign but incorrect code pollution (relevant research on stackoverflow for example - https://stackoverflow.blog/2019/11/26/copying-code-from-stac...)

2. In an enterprise setting, what does the feedback loop look like? How do you secure internal code that is being fed back to the model? Does it use some localized model? HE, etc?


#2 is the big one for me. I'm hesitant to install this on a work machine where our code could be sent elsewhere.


Plenty of discussion about the IP issues. It makes me want to start adding a section in my LICENSE.txt that says it's not eligible for use training commercial models. We'll likely end up with a whole set of license choices for that.

Although if a license can permit or prohibit use in training commercial models, does that mean that the lack of permission implies a prohibition on it?


My guess is that there is something in the GitHub terms of service that says you consent to this by using them to host your code.


Good thing there are many other platforms we can move to. And most of them are open source unlike GitHub.


I suppose this is a whole new argument in favor of good code commenting - to train/share context with your future tooling.


This is really the same argument as it used to be: help intelligences (used to be only human, now artificial) to find bugs by matching text with code.


Four years later: your AI replacement? When do you all predict something like this will happen?


Maybe thirty four years later. I don't think there's AI to gather requirements, talk to people, understand a problem and produce code. That's kinda general intelligence level AI. But this thing can possibly make devs work easier and if it's good enough maybe smaller teams can produce more.


This will allow for a D or C level coder to be a B/B- coder which is great, quality goes up. But corps will use this to depress wages and finally be able to create that wonderous unicorn of completely fungible coder.

This kind of tooling is akin to the crossbow.

It will allow for less skilled folks to push out code that is like other code at great speed. A copy pasta accelerator if you will.


Is that a bad thing? sw-developers are grossly overpaid to the point it's damaging


Grossly overpaid for effort/comparison to other wages/industries.

Adequately paid for value captured (hence why companies are willing to pay for them at this rate).

Not a moral statement, but this gives more tools to those in power.


I tend to think like you. Somehow we convinced businesses that a mostly-blue collar job gets paid white collar salary but I've been told that SW-engineers aren't overpaid. Most people are just very underpaid.

What are your thoughts on that? I can lean that way just because I have a genius mechanical engineer friend who only makes 60k in his 30s.


I think it's easier to bring people down to the same level rather then bringing them up to it


If the reduction of developer wages led to the increase of wages for other workers, sure. But of course that won't happen. The reduction of wages for any class of worker will simply lead to further consolidation of wealth.


We each produce 1MM+ in actual revenue per year but paying us 100k+ is too much?


Do you think project managers are overpaid too?


I'm not at all worried about AI taking over software development. In all likelihood, what you'll see instead are AI plugins in IDE editors which just assist in a much more advanced way than the intellisense we have now. Having machines code out the business logic is very much so something that would be less efficient than having a person do it.

Realistically, it just means that, rather than your coworker code-reviewing you and making a handful of comments, you get a machine to do that, and get two or so comments from your coworker about the business logic.

To answer your question: never.


I think programmers will just write model and tests in the end. The rest will be generated.


Unless we achieve AGI this is never going to replace programming because it will instead just make programming a higher level task. And well, if we achieve AGI (which I think could be pretty soon), all jobs will replaced, so it's not something I think anybody should be worried about.


I'd love to live in a world where AGI solves cancer.


I imagine that AI like this will certainly speed up development, but I suspect you will almost always need someone in the middle putting the pieces together.


Nah. Remember that "Google AI signs you up for a hair stylist session" demo? We never got anything out of that.


I would love to see an equivalent where it generates all the tests for you


It should be able to (try to) do something like that too.

There's a little demo about that here: https://copilot.github.com/

It's "just" an autocompletion system basically, if you write something that looks like the beginning of a test it should understand that and try to autocomplete that.


I'd say it's more than 'just' an autocomplete system.

Naive autocomplete, as implemented in Excel since forever ago (and I'm sure long before that, I'm just familiar with being annoyed by Excel suggesting wrong entries from its simple and over-eager autocomplete system), merely matches a sequence of characters - if I typed "aut" again in this paragraph it will suggest "autocomplete" because I recently typed it. Implementing it is the kind of task you give to a first-year programming student to practice string matching data structures, similar to a spell checker that merely checks that a string exists in a dictionary.

There's a spectrum from 'just' autocomplete, to a syntax-aware system like VS Intellicode, to this, and eventually beyond this. As mobile predictive text is to a spell checker, so Github Copilot is to autocomplete. As mobile predictive text is to GPT3 [1], so Github Copilot is to...what next? GPT3 is not just a spell checker.

[1] Also by OpenAI: https://news.ycombinator.com/item?id=23345379


I agree, that's why I put the "just" under quotes. It is basically an autocompletion system, just much smarter than human-coded ones in many ways.



only java :(


It does!


Great, a whole new way to automatically introduce bugs through code duplication.

From the examples:

    body: text=${text}
`text` isn't properly encoded, what if it has a `=`?

    rows, err := db.Query("SELECT category, COUNT(category), AVG(value) FROM tasks GROUP BY category")
        if err != nil {
            return nil, err
        }
        defer rows.Close()
shouldn't we want to cleanly `rows.Close()` even if there was an error?

    float(value)
where `value` is a currency. Doesn't Python have a Decimal class for this?

    create_table :shipping_addresses do |t|
that's an auto-generated table, that one's debatable but for starters a `zip` fields makes it American only. And doesn't the customer have an e-mail address instead of a shipping address?

      var date1InMillis = date1.getTime();
But what about the time-zone offset?

I could go on, but literally the first five examples I looked at are buggy in some way.

Edit:

   const markers: { [language:string]: CommentMarker } = {
      javascript: { start: '//', end: ''},
Wow.

Edit 2:

    function collaborators_map(json: any): Map<string, Set<string>> {
Not exactly buggy, but 8 lines of tedium.

What about

    new Map(json.map(({name, collaborators}) => [name, new Set(collaborators)]))
instead?

Edit 3:

      const images = document.querySelectorAll('img');
      for (let i = 0; i < images.length; i++) {
        if (!images[i].hasAttribute('alt')) {
I mean, I get it, it's auto-generate code. Maybe in the future they can narrow it down to auto-generating good code.

    document.querySelectorAll('img:not([alt])').forEach(
(or)

    img:not([alt]) { border: 1px solid red; }
(or)

    eslint-plugin-jsx-a11y because that maybe that img was rendered through react.
(or)

    it should really be an `outline` because we don't want this to reflow the page and we can. And maybe a will-change.


We could get away with more user-friendly programming languages over the years because Moore's law kept giving us more opportunities to sacrifice raw performance for more developer-friendly tools.

But I'm worried these kinds of AI-assist tools will lead to "code spam" that may increase developer productivity even more, yet we no longer have Moore's law to absorb the additional inefficiency in performance these tools may introduce.


This is a great use of the OpenAI Codex. It's fast in VS Code. My early impression is that the more popular the language the better it does. So Javascript is almost magic and something like Rust still useful. I'm looking forward to using it more.


I am most looking forward to this for things like bash scripts. I don't usually do much bash programming, but probably once a month or so I have a need to do something that could easily be solved with bash - but since I use it so infrequently, I'm always forgetting the myriad list of command line tools and options to do what I want. Stackoverflow of course helps but this would be a huge improvement if it works well.


Bash is really tricky tho, a a single wrong character can change the program behavior drastically. So, thanks but no thanks.


From Clippy to this technology, in just under 25 years! Very impressive. I wonder how much impact this tool might have as a teaching aid, as well?


wow I'm going to have lots of opinions about this.

1. A lot of people on this thread are concerned about licensing issues with GPL etc. I am sure Github will restrict the beta until it figures out that stuff.

2. I wonder if eventually our corrections to the code suggested by the model would be used to feedback to the model, and if that'll lead to a differential pricing - If I let it see my code, I get charged lesser.

3. I believe a mini-GPT-3 model is where it's at. GPT-3 (and similar) models look to be to too big to run locally. I've been using TabNine for past year or so & it gives me anywhere between 5-10% productivity boost. But one of the main reasons why it works so well is because it trains on my repo as well. TabNine is based off GPT-2 from what i've heard.

4. prediction: Microsoft is probably going to milk GPT-3. Expect a bumpy ride.

5. In all likeliness, this would be a great tool to make developers productive, rather than take their jobs - at least at levels that are more than just code-coolie.

6. Eventually all tasks with enough data around it will see automation using AI.


Are they making a proprietary tool based on the data provided by FOSS projects?


Basically all of GitHub (with some small exceptions) is proprietary code making money off of FOSS.


I don't understand fascination with bullshit machines.

While they may be useful in propaganda, state or commercial, I'm not sure why Microsoft GitHub would find it useful to generate volumes of bullshit source code.


Won’t be mainstream until it supports C++20 under Emacs.

(Yes that’s an HN-frowned-upon joke comment, but that is my dev environment)


AI to write code is cool, but you know what’d be even cooler?

AI for maintaining, upgrading, improving, and fixing code.

After all, devs spend 80%+ of their time doing those things and they’re WAY more painful than writing code imo.


Snyk code is trying to do something similar. It basically uploads your code and compares it to knows vulnerability patterns, it also gives you examples on how to fix those vulnerabilities based on open source project's pull requests.


Please, replace me faster :)


> Tests without the toil. Tests are the backbone of any robust software engineering project. Import a unit test package, and let GitHub Copilot suggest tests that match your implementation code.

Isn't that the wrong way around? I'd like to start by writing the tests, and for GitHub Copilot to please implement the function that makes them pass.


I can only imagine one of the big reasons to release Copilot to the public for free is to make it better by sending back to GitHub the "reviewed code". Example:

- Copilot suggests me a snippet of code

- It's almost what I wanted. I fix the code

- Copilot sends back the fixed code to GitHub

- Copilot gets better at guessing my (and others) wishes

Unless Copilot is running locally, I won't use it.


Same, I do quite a bit of mistakes like pasting secrets and immediately deleting them right after, and I also have local secrets that are gitignored which I think copilot would just upload without a second thought.


I haven't kept up with the adversarial ML field recently, but I wonder how vulnerable these models are to adversarial attacks.

- Could someone deliberately publish poor code to reduce the overall performance of the model?

- Could someone target a specific use case or trigger word by publishing deliberately poor code under similar function definitions?


Also: what happens when a nontrivial portion of public code out there is ML-generated? How will it deal with feedback effects?


I'd rate myself as "above-average receptive" to ML-based tooling, but after trying two "AI autocomplete" tools (Kite and TabNine) I've decided it's not for me. The suggestions were usually good, but I found having complex, nondeterministic IDE commands pretty unsettling.


Could actually make you better at producing robust code, though. It's somehow always easier to spot someone else's mistakes than your own.

Or to put it another way: you know you can't trust the AI-generated code w/o convincing yourself it's correct. You think you can trust the code you wrote yourself, but you're probably wrong! :-)


If you are interested to read[1] more about it.

[1] https://docs.github.com/en/early-access/github/copilot/resea...


Assuming this is actually a useful and powerful addition to a programmer's quiver (which it looks like it will be if not already is). Then the tying it to Visual Studio is a variation of Microsoft's "Embrace and Extend" philosophy but for Programmers and open source in general.

It uses Open Source as its input but, as far as I can tell (and I would be pleasantly surprised if I was wrong), CoPilot itself is not Open Source.

It is also tied to Visual Studio, making Visual Studio, a Microsoft Product and on its way to monopoly position even more up the power law curve to monopoly status.

This would be much more interesting and less concerning if CoPilot was Open Source and designed to plug in to other Editors / IDEs like via lsp or something similar.


VSCode has telemetry, the extension market place can't be used by non-microsoft products, VSCode is not open source (only VSCodium is), many of the MS extension are not open source (like live collaboration), etc.

VSCode followed the classic big tech recipe : 1) make it open source to look like the good guys & get adoption and contributions 2) close-source many key components & add spyware.

Story of Android too pretty much


Of course this is an continuation of things people have been trying for decades at this point, rather than something fundamentally new, but it brings up a point a colleague and I had a decade ago on training something like this on large data sets - namely that you are going to tend find common idioms rather than nominally best ones. In many scenarios it may make little to no difference, but clearly not all . It's likely going to gravitate towards lowest-common-denominator solutions.

One example of where this can be a problem is numerics - most software developers don't understand it and routinely do questionable things. I'm curious what effort the authors have put in to mitigate this problem.


"In order to generate suggestions, GitHub Copilot transmits part of the file you are editing to the service."

Well, isn't GitHub part of Microsoft now? No wonder it has gained telemetry...

I'm a bit worried that this thing will lead to even more bugs like the typical StackOverflow copy&paste which compiles fine, runs OK, but completely doesn't understand the context and thereby introduces subtle difficult to find issues.

My personal take on autocomplete has always been that I only use it so that I can use longAndDescriptiveFunctionNames. Apart from that, if your source code is so verbose that you wish you didn't have to type all of it, something else probably already went wrong.


We already do this with libraries and frameworks in a more primitive way. The hard work is actually done by a hand full of programmers, everyone else is sticking pipes together. I don't think that's a bad thing per se, but in my experience most people don't make the distinction. You're not going to be able to write your own database server with this tool if you weren't able to do so without it. If you're one of the few programmers that are able to build a database, a graphics engine, a compiler etc. you're fine. Everyone else should probably feel a mild panic. You'll be automated away in a couple of AI generations.


Is there really that much of a difference between a programmer building a database and one building something mundane?

Obviously, there's some core algorithmic work that needs to occur, but that's always been done by a very small number of people anyway. The rest is still glueing.


Not to be confused with AWS Copilot - which has been an area of focus for the AWS container services team: https://aws.github.io/copilot-cli/


Someone on the reddit thread asked about adversarial ML approaches to introduce either poor performing or vulnerable code into the suggestion. Could be a good vector for exposing tons of applications to attack from malicious actors.


Looks exciting! It is kind of disappointing the AI generated main example on their home page has what appears to be a url encoding bug in it though (in text=${text}, text should be url encoded before being passed to fetch).


It sounds like this is similar to Kite, but actually competent as a service and not associated with a brand that has destroyed all trust. But it has to come with the same privacy caveats, right? Uploading your private code to a third-party server could result in business or regulatory violations.

And even if you're okay with sending your code, what about hardcoded secrets? What's to prevent Copilot clients from sending things that should never leave the user's computer? Heuristics? Will we be able to tell what part of the code is about to be sent? And is the data stored?


Don't use this. You'll just be giving training data to OpenAI (which NOT "Open" by any means).


As a programmer who specializes in security, I worry that many of the common errors that people make will get picked up the AI and recommended to new users. It looks like it gets reinforcement from the code that the user selects. In my experience the majority of developers make security errors, so how is the algorithm going to learn not to do it the wrong way when it's learning from bad code and getting reinforced by developers who are wrong? (This is an honest question, not a criticism. I think this product is fascinating)


Most programmers are pretty careless and just get something to "work."

Maybe the first generation of this sort of completion will be bad, but I have full faith it will be better than the average human at avoiding security issues as early as the next generation.


faith based on what? So far in my experience AI projects tend to peter out.


Fools! Don't you realize you're training your replacement!?


This looks amazing! I was going sign up for the preview and stop immediately after reading the additional telemetry that is scrapped from my IDE, Microsoft would basically be allowed to steal my code and see it whenever they need. Including unintended snippets like local (git ignored) secrets and any sensitive information that it might catch without my "snippet by snippet" approval and not way to ignore files (afaik).

Until this is fixed, good luck but not thank you Microsoft.


I'm not worried about programs like Copilot, in the same way that I'm not worried about outsourcing. If price was the sole factor in software engineering then all jobs would have already been shipped overseas. Instead, an awful lot of companies tried that, and realized that it's far from the bees knees.

AI tools like this will be force multipliers, the same way that IDEs are. SWEs 20 years ago decried the rise of the IDE the same way that some are moaning about this. "It'll lead to a generation of know-nothing programmers." Instead, SWEs are more productive today than they ever have been before.

Software engineers should welcome the day that an AI makes it possible to describe what you want instead of telling the machine exactly what to do. That's where I see this technology going in 20 years. That said, it will likely never be good enough to replace an engineer completely because the value of an engineer is in deciding what to build and how to build it. Not in determining exactly which lines of code to write. SWE will be more enjoyable as a field in 20 years because we won't have to wrangle compilers, or deal with fundamental limitations of how computers express ideas versus how humans express concepts. Instead, we'll let the AI translate for us and deal with all those problems automatically.

In the short term, I see lots of potential for this technology to make something like the ultimate cross-compiler. If you can take someone's code, convert it to some kind of universal representation (embedding), and then convert that to another language (and get working code that verifiably produces the same results) you would make a ton of money.


"Software eats world" just got more real. Funny that programmers thought they were eating the world with software, now, software has eaten them.


far from it, this thing won't write full applications by itself


I know.. But we can dream. Also I'me sure when we first got code completion we said - "This thing won't write functions by itself."


The reality is the AI engineers that are working on AI autopilot systems for self-driving cars still tell their users to keep their eyes on the road. AI medical or doctor apps that suggest possible conditions still tell their patients to consult a real doctor. AI trading systems that auto-trade and analyse the markets are also limited and don't account for multiple risk points and still need supervision by the multiple traders and the lawyers using AI to sift through historical cases to save time still need supervision under a human lawyer.

Where we're going is assisted AI; obviously not a full on replacement as the scare stories created by the AI hype squad in this thread after the first reactions of this tool.


I've signed up to be a Guinea Pig, I've never pair programmed, and my primary language is Pascal, and I'm old... this ought to be a hoot.


Wow. Please make this for Rust :D

What this really displaces is StackOverflow (or some of its users...)


This would help me tremendously. I don't code frequently--a few times a month at best. I find myself having to google syntax constantly to write basic programs. If I want to write a simple python command line program that parses input, for example, I can guarantee I'm going to be opening tabs to figure out the syntax. It would be great if something like Copilot could help me stay in my editor.


I wonder if there should be a thumbs-up / thumbs-down mechanism to help teach it, or even flag snippets as bad/insecure/whatever.

I'm picturing the StackOverflow problem where the accepted answer is actually wrong, and actively pushing it into more peoples code is just proliferating the problem into more places. Making people faster at building mostly the write but sometimes the subtly wrong thing.


This would be awesome for terraform! It could potentially derive the details of SSL, k8s, Load Balancers, and databases for each cloud from comments


Oh this would screw with me so badly.

A lot of the time, I'm thinking pretty deeply about the code I'm writing, and as I'm writing code I'll also be thinking about how it applies in context.

Having some external agent inject new code into my editor would shatter my thought flow, since I'd then have to grok whatever it just spit out instead of continuing on with whatever thought I was pursuing at the time.


Gigantic caveat.

> I agree to these additional telemetry terms as part of the technical preview


"You have zero privacy anyway. Get over it."

Scott McNealy, CEO Sun Microsystems, 1999


Then, I should see absolutely zero complaints about privacy, tracking, spying, google analytics and facebook tracking.

Perhaps we should ask Scott if he is willing to share his browsing history, his personal photos and his passwords with the rest of us or maybe if I can come into his house?

After all, "You have zero privacy anyway"


Right. If you’re comfortable giving access to your source files to GitHub+OpenAI, then go for it.

I’m not sure how this would apply to secret keys or flat files with customer data/PII, but in any case that makes it a non-starter for me.

Their “Please do not share this URL publicly.” Banner at the top of the page which disclosed this info makes my skin crawl a bit…

If I were only working on public projects I would be on board right away, it looks like a big time saver.

Am I being to paranoid here?


No, you are not being paranoid. This tool literally uploads all the code it wants off your machine, and I see no way of filtering out secrets and the likes. You have all the rights to be worried about that.


> Am I being to paranoid here?

No.

They already admitted that they send telemetry of the code you give it and its training set already has personal information in it anyway, despite what is being hyped up here by the fanatics, even when someone said that 'Copilot guesses the exact code I want to write about one in ten times' [0]

No thanks and certainly no deal.

[0] https://news.ycombinator.com/item?id=27676845


So am I being too paranoid here to say that a bot (or something) somewhere on HN is instantly downvoting my very good questions and substantiated claims?

I always ask whoever disagrees to have the courage to sit down and discuss, but they always run away and never explain themselves.

Look really suspicious of either bot behaviour or just some angry hater don't you think?


I am pretty sure they will train it on private repos they are hosting as well (just not be public about it)


I absolutely hate whenever I see patterns in my code. The first thing I think is "There has to be a way to automate this" This is not what I had in mind, but if it's as good as people seem to say, it might be a good step. I can't believe I am considering a Microsoft product after 15 years of avoiding them as much as I could.


Remote coding interviews just got 100x easier.


This things suggests code to write, but not code to remove.

https://www.folklore.org/StoryView.py?story=Negative_2000_Li...

What you didn't write or deleted does not contain errors and you do not need to support or fix it.


AI FTW!

(dang please don't ban me for a low-quality comment :) i couldn't resist but will not make it a habit!)


Someone, somewhere, is already working on ways to make it inject vulnerabilities into your project.


I wonder how Copilot's suggested snippet compares if the comment is a CS1 homework prompt.


I started making something like this in emacs about 4 months ago. I'm looking for collaborators! Thanks

https://github.com/semiosis/pen.el/


For a second I got afraid that it is Copilot by Kite, with their infamous history (https://news.ycombinator.com/item?id=19018037).


Ah, here it is; the Power Loom. And we're all weavers.

http://historymesh.com/object/power-loom/?story=textiles


This technology should be available to everyone whose work contributed to it's development to use as they see fit. Free of Microsoft's tendrils.

The absolute gall of Microsoft claiming fair use to gate keep knowledge of millions of minds...


I always bang on about "Software literacy" but I do wonder how i would deal with this if it was suggesting text for me while I was writing - emails, reports, novels.

I suspect that for drudgery or work stuff I would happily take some help with typing, but I am not sure (beta access please!) if I would want if for my novel, or my sales copy.

I am (optimistically) hoping that my novel has my voice - my unique fingerprint of tone and vocab.

And I wonder if my software has similar tone. A much more restricted syntax of course, but my choice to use meta programming here, an API design there. I may be over thinking it.


I expected the appearance of machine learning based developer tools pretty much the moment Github was bought by MS. Liberal access to everything in the largest collection of source code in the world is the ideal starting point for developer tools like this one. Githun/MS doesnt have to obey the same rate limiting when accessing the site that the rest of the world has to put up with.

Now there are only three questions left: how will this be monetized? Will it recoup the purchasing price of Github? And how can vendor lock in be achieved with this or sny follow up products?


A great way to use this would be to create very good tests (manually) and then let the AI write the code. Maybe even with a feedback loop: when a test fails, the AI automatically tries a different approach.


The gap between concept and working product is still very much in the human's wheelhouse. This is an accelerator to snippets not yet turned into a library and, in fact, a lot like a library in terms of it's day to day utility. This will not end or even change programming that much. The value I provide as a programmer is not about copying and pasting snippets. It's something totally separate different in kind from what copilot does. If it's helpful to do what I already do, sure I'll use it. But it ain't me, babe.


How does this compare to TabNine or Kite?


How does this work in the context of leetcode/hackerrank interviews ? Can I just use the copilot to get a 90%?skeleton of the required solution and maybe fill in just the 10% smarts ?


It's funny that GitHub ships their own text editor, Atom, but the second example (under the "More than Autocomplete" header) on the Copilot website is clearly using VSCode.


I don't know if they've officially said anything on this, but since the MS acquisition it feels like Atom is on life support. Github Codespaces is built on VSCode, and there's a lot of effort from Github going into their VSCode extension for things like PR review.


now they just need to build this into vim.


Or more realistically, a language server which would then be compatible with many editors including (Neo)Vim.


Presumably, this will be accessible through a REST API or something like that at some point, so that it can eventually be integrated into all editors.


yep, the only thing I'm not quite sure about is how often it needs to call home. Presumably very often.. which runs counter to minimalism.


Check out the "Telemetry" link when you scroll down on the project page.


What interests me most about the development of tools like this is how it might go on to influence the evolution of programming languages. The article that was posted on the CompCert verified C-compiler for instance. What if machine learning could make the cost of developing using more programming languages with stronger guarantees (ie rust, coq, etc) easier? Using languages with more internal checks could also help manage risk the the co-pilot gave a buggy/insecure suggestion.


Another way to look at it : if an "AI" can predict what you would code next, it means your program is probably not that innovative, and was already created somewhere.


Most of the code I write is not particularly innovative or novel. I'm just trying to get the job done most of the time.


Awesome - but I fear maintaining the code generated by it in the future.. can the AI maintain it as well?

I am looking forward to AI testing the programs I write though. That would be awesome.


I'm very enthusiastic about this kind of technology. I recently stopped using tabnine because the suggestions where often worse than the normal IDE completion while being on top of the list, but I do miss the magic.

I think AI in software development will save us a lot of time, so we can focus on more interesting things. It may replace a few humans in the long term, but as software developers we shouldn't be hypocrite because we work hard to replace humans by software.


This presents some interesting theoretical attack surfaces.

- Intentional poisoning the model with difficult to recognize and exploitable faults - Unintentional poisoning from flawed generation habits which are further reinforced by the usage being eventually fed back into the model

I don’t know how it maps to code, but in my experiments generating text with GPT-3, I have started to get a feel for its ‘opinions’ and tendencies in various situations. These severely limit its potential output.


I've been using TabNine[0] for a few years, which is great. How does this compare?

[0] https://www.tabnine.com


Tabnine, It's you?


I wonder why there is no example in java. It is one of the most popular languages (drfinatelly more popular than ruby or Go, and on par with Javascript and python).


Cause Java is enterprise lang and programmers there know Microsoft master plan, so it won't get adoption. They rely on newbs and hippies to help them train it.


IMO a potentially more interesting application of this technology would be a learning system that is able to learn your coding style. You give it access to the codebase and it reformats all files on save according to your likings, perfectly.

Obviously a program that is able to write actual great code reliably would be spectacular, but we aren't there yet, I don't think Copilot presently is able to make me meaningfully more productive.


Don't most IDEs handle this already?


There are no tools that can format my code with my coding style AFAIK. There are multiple tool that can format my code with their coding style though, which I don't care about.


Most? All? Allow you to edit the config to get it to your coding style but it would be cool to infer it from files you’ve already written.


There aren't enough knobs to turn to get exactly what you want, I have a 500+ lines file full of linter rules configurations and that's still not good enough.

At the end of the day I think it boils down to this simple fact: you can't imperatively codify what makes the face of a person beautiful for you because that's too complicated, similarly you can't codify what makes for beautiful code to your eyes, it's something that must be learned from examples.


Formatting code according to a given style is a much easier task than what Copilot does.


Apparently not that easy[0]. Depending on one's priorities formatting/pretty-printing code can be trivial or very difficult. Doing that dynamically based on a user's individual preferences or even their existing codebase's style is probably much harder.

I don't doubt Copilot is also quite a problem to solve, most likely indeed harder, though in that case at least there's an abundance of training data.

[0] https://news.ycombinator.com/item?id=22706242


It depends what you mean by "given", I can't write a million line document describing exactly what kind of style I want it to use, the formatter must learn the style on its own from examples.

I agree that something like that would be much easier to make in theory, hence why I'm suggesting it since maybe it could be made ~perfectly, which Codepilot isn't (we haven't unlocked AGI yet).


"When you don't pay for a service, you are the product"

Never realised they'll read my code to make something like this and maybe even make profit from it.


Reminds me of the movie Antitrust (2001), where an evil company, modeled after Microsoft, is surveilling programmers to steal their code.


I don't understand how code ownership will work. The application is "trained on ... source code from publicly available sources", but "code you write with its help, belong to you". I get that most code it generates appears to be unique, but they admit that fragments of verbatim code occasionally appear. Unless the code they are using is all BSD-licensed, surely this breaches those licences?


I wonder if by using Github Copilot you are training Github Copilot to code better. I don't see this as replacing a programmer, but I could see it as an advanced, possibly semi-automatic refactoring tool. This could allow a less skilled programmer to be more productive and produce better code, making them more valuable. Also, licensing was mentioned. I would not want Github Copilot training on private repos.


Software automating people out of jobs spares no one, even the people writing the software.

If this improves developer efficiency by 10%, 10% fewer developers are required.


Or development cost lowers by 10% keeping salaries the same, and more work that was previously too expensive to do is now feasible.


I'm primarily an R and SQL user, excited to try this out on some fun data analyses.

How did you construct the Copilot? Did you use a learning approach based on data from actual pair-programming sessions? Or did you take tons of code as your input and use that to suggest next methods based on input code?

I learned a ton whenever I pair programmed, but now I'm at a small company so I'm looking for fun ways to learn new methods :)


This could be the kind of thing easy to train and run locally on a GPU.

- You only need code relevant to your language.

- The amount of unique code in your language is going to be relatively small in compared to say, the entire history of internet comment.

- Training a model also shouldn't take long.

I personally would never use an online version feeding all my code back to a home server somewhere, and leaving breadcrumbs that might suggest I'm violating the GPL.


Tabnine already does this locally, but I didn't have that good of an experience.


Brilliant.

I remember over a decade ago seeing a grad student project with a very straightforward and very clever idea: extending JavaDocs based on code snippets of actual use (to address the common pattern problem in Java code that you often get an instance of an object not by direct construction, but by calling a factory function or singleton getter somewhere). Kicking myself that I didn't see this day coming.


I just watched this demo on twitter[1] and found it entertaining that it was auto-completing useless, redundant comments for the developer. Pretty sure that says more about all of us developers as a whole than about Copilot.

1: https://twitter.com/gr2m/status/1409909849622601729


For me, the excitement comes from Microsoft spending big money on this tech. For those in the field of program synthesis it will be interesting to compare performance and viability of the tech. It's still early stages but this has been long coming and will put the emphasis of software development into more collaborative aspects such as architecture, design and code reviews.


I remember another company called Kite [1], working on a similar approach - smart autocompletion - this however uses GPT-3 so it's a bit different I guess because it doesn't detect what you're trying to do but rather transforms technical natural language into code. Right?

[1] https://www.kite.com


Github sources suggestions from OS projects. There have been AI completion tools that upload your code, basically spyware. Definitely check thoroughly if that’s the case!


I still haven't gotten my hands on the Beta yet, so I'm not sure how it's going to be deployed; But does anyone know if Copilot is going to be accessible through some online IDE (or) is it going to be through an extension for VS Code/other editors? If it's the latter, I hope the extension doesn't eat up all my CPU!


Given Microsoft invested $1B for an exclusive license of OpenAI's GPT-3 I wouldn't be surprised to see GPT-3 everywhere in all of Microsoft's products and in their acquisitions. Maybe we'll see GPT-3 in LinkedIn as well. Maybe don't believe everything you see on that network in the future (or now).


It's a pity it is not (yet, I hope) a plugin to popular IDEs like Intellij IDEA etc. and works only in VS Code.


TabNine is similar and works across multiple IDEs. No affiliation, thought their product was neat: https://www.tabnine.com/


I'm willing to go all-in on something like this, only if I can get a promise that this will be an open project as time goes on. I'm not a fan of all of the "commercial" talk in this... if OpenAI is involved, and most of this can run locally, why can't it be fully open source?


Most of it can run locally? Please point me to a trained gpt 3 model, probably the main brains of this tool.


I'm sorry but this "copilot" is train off you and you are helping it, not the other way around.


I wonder if this breaches any of the open source licenses or copyright?

To some extent, it seems like this could suggest code chunks found somewhere else verbatim, which sounds like a copyright issue, but I also don't know if open-source licenses inherently allow you to train on their code in the first place?


Is anyone aware of the implications this will make for developers who might want to try software like this, but be under an NDA?

Currently consulting for a bank who takes their security very seriously. Would there be extra hurdles here to make sure there's no calls or requests being made using our data?


I haven’t seen any comparisons yet to the competition. How does copilot compare to the likes of kite and tab9?

https://www.kite.com/

https://www.tabnine.com/


This is hardly grounds for celebration. Another step in Microsoft's efforts to drive down programmer salaries by expanding the work force.

Also, this tool will enable more cheap LOC churn for those gaming performance reviews (not that this is currently difficult, but it will be even easier).


It will absolutely transform undergraduate education in computer science, or rather, the breadth of the workload. <:)


Co-Pilot is here, thanks to AI! You know what's next - Auto Pilot ... Get ready to be automated ;)


For anyone who has this installed, how much of the codebase under development leaves the machine? I am asking from the standpoint of working on proprietary source code.

Other than that single issue, which is really a set of issues, it seems to work really well from the videos I've seen.


Reading the their privacy policy: as much as Microsoft wants. So you can probably expect all your codebase to be uploaded.


This will have the same deep impact Stack Overflow has had on code for the past decade. Good and bad!


There’s a lot of products that try to do away with programming in general such as no-code.

However, the elephant in the room is definitely a tool that can auto fix bugs - the type of bug that is usually given to a junior developer because the team doesn’t want them building features yet.


> Convert comments to code. Write a comment describing the logic you want, and let GitHub Copilot assemble the code for you.

I don't know if Copilot does it already but I would love if there was a tool that does exactly the opposite — convert code to comments and documentation.


Seems interesting, but did they train their models on the open source projects available on github?


AI is on its way to replace the people that copy and paste code from stackoverflow. Good riddance.


Excellent. Now we just need tools to automatically shit out convoluted JIRA cards, run them through this, and automatically generate the PRs.

Then we can run it overnight each night, knock back whatever rubbish it generates before breakfast now that there's nobody left to complain and schedule 6 hours of meetings to "fix" it, and instead we have the whole day to just build it quickly and properly the first time.

We just increased productivity on the average enterprise project by 50%. Good stuff.


It would be great for AI to replace all the superficial bureaucracy that ruins software, so that the only thing that's left is people that care.


The utility and quality of this will likely depend on language use:

https://madnight.github.io/githut/#/pull_requests/2021/1


Works on MS's VSCode but not on GitHub's own Atom. More proof that Atom is all bud dead


Ever heard of Embrace, Extend (We are here), Extinguish?

Downvoters: I hope you read the 'Availability' Section of the FAQ. GitHub literally admitted this.

> Can I use it in another IDE than Visual Studio Code?

Not yet. For now, we’re focused on delivering the best experience in Visual Studio Code only.

> Will there be a paid version?

If the technical preview is successful, our plan is to build a commercial version of GitHub Copilot in the future. We want to use the preview to learn how people use GitHub Copilot and what it takes to operate it at scale.

That is a translation to a proprietary lock-in service tied to VSCode, GitHub and OpenAI which is all Microsoft exclusive and it admittedly unavailable on other editors. The extension does NOT work on Atom editor for example.

If that is not a strong case for the 'Extend' phase, I don't know what is.


i’m sort of a Luddite so take this with a grain of salt.

I don’t see this going anywhere.

If it’s really good and we can actually have 1 developer instead of 2 now why would any developer want to do this? this would basically be a piece of automation that diminishes the value we are creating.

If it’s crap it’s going to create mountains of verbosity and code written to pump up the LOC numbers. It’s terrible. After that you’re gonna end up maintaining and enhancing what an “ai” spew out.

I’m not buying the argument this is a problem that needs solving. It’s in the same vein with self driving cars. It looks impressive, it’s good PR, it’s absolutely insanely hard to get right and the benefits (even if we get it right) are questionable.

the way it was introduced is also disingenuous.


No idea why you are downvoted.

I am skeptical and against the hype of this "replacing" programmers which it certainly won't as the AI engine that it uses (GPT-3) is limited and the code itself can also generate garbage or introduce insecure and vulnerable code as well. This is why it will always be 'assistive' rather than going to 'replace' anything. 10 years later, self-driving cars are still unsafe and immature.

The hype squad of this tool know it is limited but they want to capitalise on the 'AI' automation narrative to those who don't know any better.


As long as supply > demand then 1 programmer able to do job for 2 equals higher salary. Obviously supply shortage will end though, sooner or later.


I don’t necessarily agree with you that if supply > demand and 1 programmer does the job of 2 this will lead to a higher salary.

You can see this today in most places where you can walk on water, but you still need to be in your pay band.


It looks really interesting, but I already lost hope of trying it, and that applies in the first days before the boom: C if there is someone from the Copilot team it would be incredible to try this tool my github is github.com/DaveVillamor <3


In terms of questions about where the code originates from, there should also be a tool that allows vetting the generated code against the original data set. Perhaps copilot itself should provide a way to deep dive the possible origins of the generated code.


As a business / product person I am naturally wondering how much more productive this will make my engineering team, should I overtime expect to reduction in costs, faster shipping times...or will the benefit manifest itself in more reliable code...?


Like any other similar questions: ask your team. They'll know better than you or any random person on the Internet.


I don't understand what this has to do with pair programming. It's just a glorified auto complete for function bodies. You still have to come up with everything else. Filling in function bodies isn't what makes pair programming valuable.


How long have you tried Copilot before you came to this conclusion?


I didn't try it because you have to sign up. I looked at all the examples though and I don't see how it relates to pair programming.


Where's the Emacs lib for it?


Reminded me of this tweet that proposed exactly this: https://twitter.com/dalmaer/status/890724764121092096


Goodbye stack overflow copy pasting, welcome copilot auto generated code. It will bring similar problem of someone using something they don't understand.

Another extra work for code reviewer. However on the right hand it will be very powerful!


> Goodbye stack overflow copy pasting, welcome copilot auto generated code. It will bring similar problem of someone using something they don't understand.

To be fair, “beat on it until it seems to work” without AI assistance or copy-pasting code also frequently involves a fair amount of “using something they don’t understand”.

OTOH, copy pasta and AI assisted code are less likely to get “I don’t know why this works” or “I don't know why this is necessary” comments to highlight that to the reviewer.


Wow, that's incredible! Any chance we'll get a downloadable and editable open source version of this so we can play around with it, train it on smaller own datasets and generally experiment? It sounds super exciting!


Did you miss OpenAI part /s

As FAQ mentions, They are planning to launch a commercial product so likely not I guess.


If it's reducing keystrokes. Eventually it might reduce the problem down to no input needed at all.

In that case? Are those who don't control the AI needed? What about when those optimizations can be made by the AI itself also?


> OpenAI Codex was trained on publicly available source code (...)

It would be nice if github made this tool publicly available in a good spirit of open-source instead of straight up monetizing it.

I get that github is not a non-profit but still.


This looks pretty impressive and I'd like to play with it, but it's disappointing that you're forced to accept telemetry to enlist on the waitlist. I guess I'll wait for general availability.


The landing page is really cool. It looks like the screencast is built with Javascript, is there any tool that helps building such screencasts? I assume that it's not trivial to build such animations.


Depends on what you need to do, but asciinema is pretty much exactly that use case: https://asciinema.org/

Wouldn’t work in this case with the overlays and styling tho.


The goal, of course, should be to train your AI partner to eventually do all the work, while you sip a beer at the pub and monitor Slack (where an AI version of you is maintaining conversations).


I wonder if this kind of technology can push industry into more advanced languages. If a programmer can restrain the space of available programs more, it should aid the tool to give even better results.


Will this hurt open source?

I assume companies that were fine with providing you functionality for free may think about this twice because with that they're giving away knowledge of how to build functionality.


Yes. It will encourage using FOSS as a source of "examples" to use it in closed source.


I wonder if CoPilot uses Github's private repositories to train itself, which would allow malicious users to somehow obtain code or designs that they otherwise would not be able to view.


I chuckle when one of the bullet points is that it autogenerates the stupid, pointless unit-tests that you need to write to verify trivial code, but boosts your code-coverage metrics...


I want one-shot learning for refactoring. I.e., show the editor once what transformation I want to perform, and then the editor takes over.

Autocompleting code in general? Sounds like a bad idea, imho.


The IP implications of how network weights depend on their data sources is, shall we say, a matter of ongoing legal discussion.

We genuinely don’t know what it means for licenses or whatnot right now.


I'm assuming that - by design - it has a feed-back-loop that allows it to tweak, learn and improve itself by feeding back the choices people make vs its own recommendations.


Isn't this just search needed code snippet by some sentence? Will it reuse functions those are already exists in my codebase? Or just pure functions without dependencies?


This is an interesting business model. One problem I foresee though is that frameworks can get outdated in a few years, and that might affect the text generation abilities.


What could go wrong? Expecting software to become even shittier.


I get the impression the only way to use this is in a github cloud environment, which means all the code you type will essentially belong to github in some capacity?


How does it compare to Tabnine?

I really like thar Tabnine train against your own codebase and suggest things based on it. It’s crazy accurate and smart a surprising amount of time.


Can I use it in Emacs?


I had visions of writing tests and then the AI would do the production code. I think that would be a lot better. Comments are too hard to get right. Good Luck!


one thing that caught my eye was the convert comments to code feature. If you can use your voice to dictate comments then combined with copilot it might just be possible to write code without touching the keyboard at all!

of course I guess copilot won't be perfectly accurate right now or even maybe for a long time but it is interesting to imagine a future where the programmer can think and get code written without lifting a finger.


Does GitHub Copilot grant you a license to the code it generates? How does it know you haven't just copied some proprietary code which is not free?


How does this compare to tools like resharper? Is it comparable? Is this the next generation of such tools? Or does it work completely differently?


Is there any explanation of how this works, beyond ‘ai’?

I mean, I understand (I think) how Bert/gpt works, but I can’t quite see how you would train this.


This seems great for noobs or dunces to get code compiling, but I hope nobody with talent uses it. It would be like Hemingway using Grammarly.


Doesn't look like it will be editor-agnostic which is a big shame. I've been using TabNine in Vim for a couple years, and I love it.


I wonder if Microsoft will leverage this for Excel.


The site mentions the system this is built on: Codex by Open AI.

Has anyone seen anything about this system? Are others able to build upon it?


CTRL+F for Privacy & Security. No mention?


If you played with OpenAI beta, you won't be surprised -- it was only a matter of time until this becomes widespread.


I prefer the wingman

https://haskellwingman.dev/


Ok, I'm afraid I must become some kind of neoluddite or I'm going to starve to death in next twenty years.


Stack Overflow copt pasta on crack cocaine.


I feel like this could be a great way to help people understand new languages faster than they could otherwise.


Someone needs to put the OSI model in GitHub Copilot and see if it produces a TCP/IP implementation.


How does this not violate the GPL and more permissive licenses such as Apache that require attribution?


The real magic of writing code is the compound interest it pays - if you structure the lower level components well, then they can be combined into ever more powerful components, saving time and effort in an exponential way.

This product seems to encourage the complete opposite - hack together stuff without thinking about how it could fit into your accumulated Baukasten of components.


So how often does it get the first line right?

This must be easy to measure, over the entire open source corpus.

How about two lines?


I wonder how or if it will impact publicly available code and even open source on a larger scale


I wonder if this can work with any editor out of the box, or it's just going to be VS Code.


From the FAQ: "Not yet. For now, we’re focused on delivering the best experience in Visual Studio Code only."

I do hope they feature others at a later date, especially since they are planning to develop a commercial version.


The Rails migration example doesn't follow the rails convention with prepended datetime :p


Is anyone here actually using this?


According to the comments here, people have been using this for a couple of weeks now.


Well, it's a brand new product, so ... no?


I imagine there was a private beta.


I want to try and make it generate a UUID and then see if I can find the original source.


All your commits are belong to us


Interestingly enough I think Ops work is far more resilient to automation than SWE work.


How do you opt out your own GitHub repositories from being a part of the training data?


Probably by changing your license to not allow use in commercial products.


Can't wait for GitHub Dispatch to fix the code GitHub Copilot "writes".


Let me go and look for another job, maybe dancing would be hard for AI to take over.



I think being a nanny is the most future-proof job


I curious about typing "const api_key" and see what the editor adds hahaha


Calling it a pair programmer is a bit exaggeration though. How can it match a human?


It's different. Here you train your partner to be sold as part of massive corporate product and have all your code phoned home in exchange for a little assistance.


So this is basically fully automated Stackoverflow assisted programming? Excellent.


If you compile English to Python why not just compile it to even lower level?


so gh has trained their AI programmers as much as they could with shit code examples, now to bridge the last mile they need some hand holding? sign me up! will I get a real online gf out of this finally?


Interviewer: use copilot to implement the most efficient sorting algorithm


Is this automation of the Search Stack overflow, copy, and paste workflow?


I wonder how much of this is OpenAI-based vs program synthesis techniques.


In the FAQ they state that it sometimes outputs non-running or even malformed code, so it looks like fairly pure language modeling with little to no program synthesis.


I would be more in favor of new languages which require less boilerplate.


What a cool project, I’m impressed. Looking forward to checking it out.


News.yc commenter copilot.


This is where microsoft's billion dollars went in OpenAI. Clever


The alphago moment to go player … the copilot moment to programmer ?


I like the example LOL.

How can one signed up? This could make one programmer an army.


Anyone knows what languages are supported besides those mentioned?


How do we know the code is working and doesn't have any bug?


You review it. I guess that's why it's called copilot, for now.


Would this be available as plugin for other IDEs? E.g. IntelliJ?


"Seriously Copilot, cover all these files with tests."


Great.

Now, how about you use that to implement a non-horrible search experience?


Never learn a new skill again, let the computer do it for you.


I wonder if they are going to have a plugin for ATOM IDE?


Now I am gonna have an outage because co-pilot wrote some code that had an error. People have implicit bias of code being reliable that they don't write, StackOverflow snippets are perfect example for that.


> StackOverflow snippets are perfect example for that.

You basically just explained why your worry is unfounded yourself. This is already the status quo. People already write buggy code and copy buggy code from SO all the time.

The goal of this isn't to write perfect code, that's still up to the programmer to do. This won't make a bad programmer not write buggy code magically.


Would it work with remote files? Because Kite would not.


What about rights to the code that is created this way?


this is so weird

I don't use autocomplete nor syntax highlighting, this thing feels almost hostile to my ways.

(the only non linear help I love, is jump to definition)


An Emacs mode for it is missing in this alpha.


I thought it would be someone you can talk to.


RIP sublime text, I guess it's back to VS


so if github is using open source code to predict code hints, is that violating any copyrights implied by anything


Am I out of a job yet? (I am a programmer)


Is it only available for Visual Code?


Yes. Obviously.


How does this compare to tabnine?


SkyNet mode has been activated.


and away we go to singularity.


checks calendar

nope, it’s not April 1


function proofOfPEqualNP {

Your move, AI.


Classic Lean Startup?


We are not far away from the days where copilot becoming main pilot!


just got access. Jesus christ, its happening


powered by openai, i'm guessing gpt 3


Seeing my job get automated before my eyes makes me nervous but can AI make dumbass libertarian arguments over lunch? We’ll probably actually


wow surprised not one comment that’s scared of this being the step to automate away our jobs.

maybe sooner than we think?


It’s one step closer, but still a good ways away.

- you still need to understand the code that copilot is writing, it just turns it from a recall/synthesis problem into a recognition problem

- most of the work above the level of a junior engineer isn’t about writing the actual code, it’s the systems design, architecture, communicating with external stakeholders, addressing edge cases, tech debt management, etc.


A toy.


github was such a nice place until microsoft fucking ruined it.


What has Microsoft done to GH to "ruin it"?


ignoring THE FUCKING TOPIC HERE. the UI is hideous now without javascript enabled, before it was actually navigable.


low entropy code


want


I think this would be a great resource for beginners particularly. It helps give them the code that will work for them and they can understand what it's returning. That was my biggest issue when I first started with REST calls was not knowing what to use and why things were being used on the server side. Eventually it started to click for me, but I had a lead guide me. In particular we worked with SharePoint which I had no idea had it's own API at the time which added to the complexity. Overall I think the "See different examples" is going to be the best feature out of all of them.


Since most of the code written anywhere is crap (the tool was trained with "millions" of lines of code) I suspect it will repeat all the same anti-patterns,bad-structured,ill-thought code which fills Github.


Every time AI encroaches on human territory it cannot be changed back. Don’t be fooled by the innocent nature of these early encroachments. At a certain point life on earth was just a bunch of amino acids…


hello


Hey


Sounds like a big distraction.


Why are the green ones always angry?


I suppose you know this, but green means the account is new (https://news.ycombinator.com/newsfaq.html). Angry people sometimes create new accounts to vent. That's not a good use of HN, but let's not respond as if there's something wrong with new users coming to the community. We don't want this place to become insular and brackish. New users are the community's freshwater!


This is a great point. Do names turn from green to grey based on karma or time?


Time alone.


Anonymity can breed honesty, I guess.


They haven’t ripened yet.


haha, you asked a question that I've always wondered.


I certainly see the value in solutions like GitHub Copilot (if implemented well), at least, from three perspectives: a) 𝐩𝐞𝐝𝐚𝐠𝐨𝐠𝐢𝐜𝐚𝐥 (by allowing developers, especially beginners, to see the diversity in different approaches to potential solutions for specific problems), b) 𝐬𝐨𝐟𝐭𝐰𝐚𝐫𝐞 𝐞𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐛𝐞𝐬𝐭 𝐩𝐫𝐚𝐜𝐭𝐢𝐜𝐞𝐬 [of course, it should be used with care and, generally, the quality of relevant suggestions from this angle is still TBD] and c) 𝐢𝐦𝐩𝐫𝐨𝐯𝐢𝐧𝐠 𝐞𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐩𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐯𝐢𝐭𝐲 (this is especially relevant for programming languages and frameworks with lots of boilerplate code - e.g., C#/.NET, Java). Spending much less time on typing typically very verbose boilerplate code will give software developers much more time to spend on truly valuable and important activities, such as problem solving, architecting, collaborating.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: