Hacker News new | past | comments | ask | show | jobs | submit login
Google warns its own employees: Do not use code generated by Bard (theregister.com)
303 points by Brajeshwar on June 20, 2023 | hide | past | favorite | 157 comments



This is Google, who have a monorepo, who are afraid of the legal risks to the monorepo.

The copyright of the output of these tools is not yet determined, it's a risk.

The advice not to use the tools in a risky way not only directly mitigates the risk but like an "employees must wash hands" sign in a restaurant restroom, it can be seen as a reasonable step that transfers some of the liability to the staff member.

It's not a verdict on the output of the tools, it's a reflection of risk aversion in a monorepo company that is very protective of what is in the monorepo.


I doubt the fact they use a monorepo has any impact on this thinking. A versioned sourced repository is no different from a file server. If some "ai copyright infringed" code made it into the monorepo it doesn't compromise the whole thing. There are still clear distinct projects and products.

But exactly, this is a hygiene thing. Staff will still be using these tools anyway.


I agree that monorepo does not make a fundamental difference. Still it makes it easier for some undesirable things to happen.

The tainted code can become a critical dependency. It can get copy-pasted elsewhere. An engineer could look at the code before writing their own (obviously not illegal per se but makes it harder to repeal bullshit claims later). An engineer can just write similar code and then have no way to prove they didn't look (ditto).


> It can get copy-pasted elsewhere.

I have no experience working with a monorepo but I've read Google has "an army" of people maintaining the monorepo. I would have imagined with enough support like this, you wouldn't need to copy paste code everywhere in a monorepo?

I'd appreciate any insight anyone can share on monorepo. I am fascinated by the idea but I've never had a chance to work with it and people who I've met who have either don't want to or can't talk about their experience with it.


as a google swe I find bard almost useless. it doesnt understand any of our internal tooling or context.


So you broke your company’s policy and Fed code to it?


That's reaching. You don't need to put code in to get code out. And you're allowed to have it generate code, just as long as it's not used.


He said he put code in


Can you quote where he said he fed it internal code?


“ as a google swe I find bard almost useless. it doesnt understand any of our internal tooling or context.”

How would he know it doesn’t understand if he hasn’t used it?


It's really easy to use it without violating policies.

In fact, we were literally asked to.


He likely asked it a question.


*in violation of policy.

That wasn't said, but the goalpost keeps moving in this thread.


...where?


You can’t break a policy which didn’t exist at the time.


Sounds like they got curious and asked a simple “hello world” type of question to see if it had any value.


Is this "hello world", or is this "Hello World, Inc."?

Remember: swift fingertips sink ships.


If they were concerned about that as a risk, the thought that came to my mind was exposing the whole codebase to subpoenas.


Is that one of the big risks of a monorepo? That a request for 'the source' will lead to all code in a monorepo to be exposed? I'm sure that's not necessarily the case, but still.

The other one I'm thinking of; I know this isn't the case with Google because their repo is too big, but, if a piece of copyrighted code would end up in an average repository, it would quickly be distributed to all users that have a copy of that repo, so they can't take it back again.


I suspect they have a model trained on their own proprietary data, and would prefer employees use that, rather than the bard model, trained on mixed data that they don't own.


Could it also be a reflection of Google's lawyers advising that using LLMs trained on masses of unvetted source-available code is probably tantamount to copyright laundering?


Is openais business model reliant on copyright laundering?


Undeniably, yes.


> This is Google, who have a monorepo, who are afraid of the legal risks to the monorepo.

Where are you getting any of that from the linked article?


> who are afraid of the legal risks to the monorepo

You're talking about a company which happily pays billions in fines because its just a fraction of what they otherwise manage to get away with. They're not at all afraid of doing a lot of illegal stuff, I find it highly unlikely they'd be afraid to step into a minor grey area.


Yes their Legal/PR teams may recommend that they quietly settle cases.

But that doesn't mean that the company will tolerate widespread illegality or allow systemic risks to build up. It's simply not how organisations do things especially those are highly public. Poor reputation doesn't just manifest in lower sales but also introduces unnecessary regulatory risks.


The risk-reward balance is totally reverted. Profiteering from privacy breaches - big reward, fixed risk (the fines are basically capped). Using risky code - small reward (some worker time), big risk (suddenly everyone and their dog can sue you for bullshit copyright claims).


I don't think the comparison with "employees must wash hands" signs is apt. Should we see legal action against large language model of questionable heritage produced code, the consequences will be dire for companies.

Being able to say to their (probably then ex-) employees "I told you so" will gain them nothing. There will not be nearly enough to get from the breaching ex-employee to compensate for any damages.


Let's say I go to a restaurant, get food poisoning, and sue the restaurant. Turns out the employee didn't wash their hands (or at least, that's the legal theory).

If the restaurant has signs like that up, and consistent messaging to their employees, and new employee training, and other "best practices", then I may still be able to sue them and win. But if they don't have all that stuff, I may be able to sue them for treble damages because of negligence.

That is, such measures may not remove liability, but it limits it.

Note well: IANAL. Others who know more, feel free to offer corrections.


I agree with that. I think the fundamental difference in Google's case is that the suing party is different from the addressee of the warning. Moreover the suing party is powerful and the addressee is completely negligible. Hopefully courts sees that, but who knows?


> Should we see legal action against large language model of questionable heritage produced code, the consequences will be dire for companies.

I think we're already past the point of no return. Almost every active codebase in the world (at least the JS part of it) has been tainted by LLM generated code at this point if dependencies count.

If courts decided the output of LLMs trained on GPL code was subject to GPL, all the active code in the world would need to be released - which seems impossible to enforce.


> transfers some of the liability to the staff member.

As an employee, they take zero of the business risk, as no employee is liable for any of business's risks. It's also why CEOs don't go to jail if they're performing their duties, even if the company is liable for damages or commits a crime.

And this is the way it should be.


I'm waiting for the patent troll equivalent of training data


Sounds like Reddit CEO's next pivot...


next pivot? they ain't never made money off of ads son. datamining and consensus building were always an aim.


> Cautioning its own workers not to directly use code generated by Bard undermines Google's claims its chatbot can help developers become more productive

The definitive definition of irony o_O

This reminds me of two pictures that were circulating on social media a while ago. The first was Zuckerberg holding a book while billions of people are wasting their time in endless scrolling in the Metas; the second was that Turkish chef known as Burak eating a healthy lite salad.

The bottom line is that we sell stuff that we are not convinced to consume.


I asked Bard to suggest improvements to a SQL query I wrote. It suggested that I change my filter to improvement performance. The filter I had was:

   WHERE Column1 <> Column2

The suggested improvement:

   WHERE Column1 <> Column2

When I pointed out that it just gave me exactly the same code, it apologised and suggested this improvement instead:

   WHERE Column1 > Column2

I'm trying to understand how this helps me become more productive...


> I'm trying to understand how this helps me become more productive...

LLMs writing code have strange performance characteristics. Sometimes they produce working output many times faster than the fastest of senior developers, other times they can't understand a bug even when you point it out to them.

Try giving your LLM this prompt:

Please write a python script that loads a file named input.png, converts it to greyscale, plots a chart of the intensity of the pixels of the first row, and saves the chart as output.png . The chart should be a line chart with a blue line, and the lowest and highest points on the chart should be marked with a red cross. Thanks!

So LLMs can sometimes produce working code very fast. Other times, as you've identified, they produce completely broken results.


Wow I can't use bard yet as I'm in the EU. But ChatGPT has never done something like this to me.


ChatGPT is ALWAYS doing stuff like this to me. Suggest code that doesn't compile, suggest SIMD instructions that don't exist, reiterate on incorrect answers with the exact same answer, etc. Granted I use ChatGPT 3.5.


> suggest SIMD instructions that don't exist

well that's one way to make code faster, just require the underlying hardware to compute things for you


Ok I don't use it so much for coding and I always use GPT4 (I got a subscription out because it's so useful).

But every time it gave me a very sensible solution.. Usually not something that worked right out of the box but never something totally stupid.


GPT4 produces dubious Rust code all the time for me, especially when I point out mistakes. It gets into a cycle of doubling down on mistakes and degrading into more and more bizarre solutions and gets into a loop of repeating itself.

It gets the rough outline of a solution OK but very easily gets lost if the problem requires 2 or 3 levels of what I guess I'd call nested analysis. Or rather, it's fine for creating a surface solution, but it gets lost in the weeds quite easily for things like type system or borrow checker problems.

Because it doesn't reason -- it (re)produces patterns.


This is less of a productivity risk than a legal risk. Nobody is denying that these tools enable productivity like we haven’t seen before, but google is afraid of outputting protected code while the courts haven’t fully decided the stance on things.

Imagine Google Bard outputs a piece of code for a major new feature, making them a shit ton of money. It’s then found out that code was actually taken copied, almost directly from average Joe’s hobby app, and now you’ve got a situation on your hands. Amazing for the average joe who’s expected to get a big pay day if the courts ruled in his favour but less so for the business owner.

Frankly, I can’t wait to see what happens when all copyright is out of the window. Obviously within reason, but wait until you see the innovation that comes about when the gates are open. The free-for-all of ideas is a fun time to live in that’s for sure.


> Obviously within reason

Oh, so you don't really mean "when all copyright is out of the window", I guess. So.... how are we going to define "within reason"?


I don’t know how to answer this other than I’m not smart enough to know what the solution is. The ‘within reason’ is simply to appease any unknowns that I’m not accounting for when I say we should just open the flood gates.

What that looks like, who knows.


based on the code I've seen come out of these tools, I'm denying they enable any productivity gains. The code is either hilariously bad, filled with subtle bugs, or just something I could have googled then copy and pasted an OSS solution into my code.

Is anyone's productivity really higher if we have to spend all our time correctness checking the output?


We should test it and your ego at the same time. We might find some interesting results.

Find someone at the same level of competency as you, set a goal, that person gets access to any LLM they want, you have to do it with no assistance other than reading stack or googling. And if you think your ego can handle it, find someone who you know is below you in competency (but not junior) and set the same goal and see if they surpass your own productivity or progress.

We’d have to set up some controls but you see where I’m getting at. It would be an interesting experiment either way.

It’s all in how you use the tools. If you’re zero shotting and hoping for the best then yeah, not great. But that’s where your domain expertise is supposed to come in.


Imagine Bard outputs code that is licensed under AGPL.. Suddenly whatever services the code gets included in needs to have their source code made public.


That alone makes me never want to use any LLM for code specifics outside of asking concepts questions when the brains just not remembering.


I don't think that's an accurate representation at all; and I say this as a person whom has also restricted the use of AI in their company.

The last time I read the Github CoPilot agreement for example it used specific language such as: "Suggestions"; and seemed to indicate that copilot was merely inspiring developers through suggestion rather than creating code that could (or should) be copied verbatim.

I think in reality most people are copying code verbatim and not using AI as inspiration, but it seems like (legally speaking) that it is the safest approach to treat AI generated code as non-copyable..

That notion is consistent with what Google is stating here.


There is nuance here. Google policy is "do not use code generated by Bard", not "do not use Bard".

There are ways for developers to use chatbots that doesn't involve copy-pasting code into your own codebase. Just like there is a difference between not using stackoverflow (which is foolish) and not copy-pasting from stackoverflow (which is sensible).

I have used ChatGPT a few times as a developer and found it useful, but never directly copied code from it. I had it document my code for me though, so maybe if you count comments as code, I did.


My guess is they dont want their own processes and code secrets leaked through Bard.


Eating your own dog food is an expression commonly used to characterize this.


Drug dealers have figured this out long ago. Never get high on your own stuff... No surprise this is the same with tech.


Smoking crack /= using your own software.

In fact i would recommend every dev to use their own stuff, "dogfooding" helps a lot to improve the product.


The thing you're dogfooding isn't exactly what's being sold - it's missing the layers of crack that get added so the product makes more money.


Yeah Google Bard seems to make up stuff way more than chatgpt and gives just wrong code. I asked for numpy code to fit a polyline rejecting outliers and it just made up an argument "robust" for polyfit. Chatgpt on the other hand generated a function that calculated standard deviation and removed points outside two sigma.


Google Bard shamelessly tells my World War II stories[1]. Bing Chat refuses to answer such a question or says “no information” depending on its creativity level.

[1] https://twitter.com/esesci/status/1669066574366646274


Although that's garbage for factual tasks, the creative writing behavior is a perfect fit for fleshing out dnd characters for example. On the other hand, Google is a search/ad company... I'd think they'd be the most interested in integrating search/facts into an llm. Hallucinating details about your customers is a good way to cause problems (some lawsuits already filed).


> On the other hand, Google is a search/ad company... I'd think they'd be the most interested in integrating search/facts into an llm.

At its core, an advertisement is made of two components: the subject - a store, a brand, a product, etc. - with which the victim is supposed to develop positive associations, ideally strong enough to motivate a purchase and/or advertising the subject to their acquaintances, and the message, which is meant to create those associations. Only the first part, the subject, has to be factual. The message does not, and in fact it usually isn't - manipulative bullshit performs much, much better, and generally the optimum for advertising seems to be asymptotically close to the line past which it would be legally fraud.

The only factual part, the subject, needs accurate handling, and is best suited for classical database systems - which is exactly how Google, and everyone else, is handling it. The message part - that's a good match to LLMs, which excel at producing convincingly sounding bullshit. For advertising, it seems what you need is to crank up the hallucinations a bit, but have some plausible deniability built into the whole system, so that when the LLM hallucinates in too obvious a way, no one can actually be held responsible.


Wow that's a lot of creativity - not only did it make up a bunch of information, that information doesn't even make sense within the well known and understood historical context of WW2 (Turkey was neutral so there is no scenario under which a Turkish soldier is "sent to the Eastern Front to fight the Soviet Union").


Here’s an example where Bard is making up studies to support RFK’s thesis that drinking water causes transgenderism: https://twitter.com/EbenThurston/status/1670619027608092674?...

It’s flat out lying, yet folks are using its output in arguments.


Here's a real study: https://pubmed.ncbi.nlm.nih.gov/29314190/ . It found when frogs were exposed to atrazine (a herbicide) in wastewater runoff, "female minnows were defeminized, whereas male frogs were feminized". Similarly https://www.pnas.org/doi/10.1073/pnas.0909519107 found that "Atrazine-exposed males were both demasculinized (chemically castrated) and completely feminized as adults. Ten percent of the exposed genetic males developed into functional females that copulated with unexposed males and produced viable eggs." So Atrazine can cause frogs to change their gender, and contaminates drinking waters in part of the US (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6164008/), so it's not beyond the realm of possibility that it also influences gender expression in humans to some degree.


Egg temperature plays a definitive role in the sex of turtles, but no one would claim it in the realm of possibility for that to affect humans.

It's always good to be aware of how things affect human hormones and absolutely worth study and regulation, but it's a huge jump from frogs are affected to humans are affected. I know right wing people are world class long jumpers so this kind of "it could be!" really isn't helping towards a reasonable debate.


Hadn't realised he was on the General Ripper bandwagon.


ํYes. Google Bard often make up imaginary syntax and commands.


Sorry, I’m sure this was annoying at the time, but making up that argument is hilarious in retrospect, no?


Is this because of the accuracy of the results, or is it the rights to the content that is the problem?

Barring any AI court case setting a precedent, We're still at 'AI generated works are public domain' aren't we?

There's probably going to be some big money arguing for the right to own content generated by their AI. I can't really see that being in the public interest though.


> AI generated works are public domain

That's not the status quo. The biggest concern is that AI can sometimes generate code from the training set verbatim, in which case that code has the copyright of the original training set - which might be GPL or proprietary or whatever else.


> The biggest concern is that AI can sometimes generate code from the training set verbatim

Not just verbatim: AI-generated code is arguably a derivative work of code in the training set even if it doesn't generate verbatim copies of training data, in the same way that any code can be a derivative work even if it doesn't contain bitwise-identical lines.

If you take a piece of Python code, and translate it to C code, without a single line being copied verbatim, the resulting C code is still almost certainly a derivative work of the Python code.


Everything you do or know is a derivative of your own training set. True original thoughts without context don’t exist, well at least in any way we can find out as every human is the product of their own training set.

Just out curiosity, what makes the code you write, more original than an LLMs who’s training set is way bigger than yours and likely will more variance in how to achieve same goal.

I’m not trying to be facetious but rather stoke the philosophical idea of the reality that humans aren’t as special and unique as we think we are.


> True original thoughts without context don’t exist, well at least in any way we can find out as every human is the product of their own training set.

Someone wrote the first murder mystery. It wasn't something we inherited from amoebas.

Someone wrote the first Regency romance. In fact, we know who it was - Georgette Heyer.

Someone invented calculus. We didn't get that passed down from the cave men via oral history or tribal knowledge.

We've dreamed of flying for millenia, but someone invented the first practical airplane - someone specific. Yes, they built on previous knowledge. But their step was still original. It had never been done before.

And so on, for idea after idea and original creation after original creation. They all originated at some time, with someone.


I’m not denying those achievements of those individuals at all. But I am saying that those individuals didn’t invent or discover those ideas out of the blue. There was information gathered up until that point in their lives that allowed them to make those connections and form ideas from there. No idea exists in a vacuum, yes someone needs to be the first to do anything, but all knowledge is passed on.


LLMs can sometimes come up with original things as well. Also even the people who invented new things like that took some inspiration or knowledge from things that exist


This seems absurdly reductionist to me. Wouldn't all work simply be attributable to the first proto-human who grabbed a stick or rock and used it as a tool? Surely even proto-humans communicated knowledge by demonstrations amongst themselves, even if they lacked complicated language.


Humans are just a collection of experiences that are using past data to make future assumptions. So yes, thank you Mr. Caveman because of your ingenuity of using a rock and a stick to create a hammer, we now have a full modern society built off the back of tools. But that seems ridiculous doesn’t it?

So at what point do we stop attributing the previous innovations that led to our current innovations? And why do we stop there? Would Mr. Caveman be the only special human to ever figure that out or could the argument be made that eventually someone else would have figured out how to make tools and therefore attribution is just pointless?

What I am trying to get at is, everything you do or create, is because of work the all of humanity has done. So pertaining to copyright, why should 1 single person claim an idea as solely theirs when that idea was not created in a vacuum.

I should also say, I am not discounting anyone’s work, but rather if the monetary reasons for creating became secondary, would the need for copyright even exist?


If you can dream up a society that doesn't need money, yes we can do away with copyright.


> Just out curiosity, what makes the code you write, more original than an LLMs

Indeed in many ways LLM code is more original than mine, as I limit myself to calling only functions that exist and producing code that will compile, unlike LLMs which have no such limitations. /s


Maybe, but it sounds like you’re operating within pretty restricted walls. Sometimes there are better ways to do things that you haven’t thought of but it requires experimentation.

That would be like assuming every piece of code you’ve ever written compiled perfectly on every run. Do you hold yourself to the same standard? Or do you give yourself some leeway because you walk through it, talk through, think through it and then come to a solution?


While I would say GPT is creative, the law doesn't have to agree with me.

Also, there's the possibility that all things creative are the human form of peacocks' tails, and if that's the case then the cost and difficulty matters more than the outcome, and any discussion based on capitalist incentives is fundamentally flawed.

https://kitsunesoftware.wordpress.com/2022/10/09/an-end-to-c...

(I wrote this 52 days before ChatGPT came out, the reference to GPT-3 is based on the stuff OpenAI released before the chat interface).


That’s not legally true.

I don’t even think it’s factually true.


Are you sure? If it was established that LLMs were illegal I think people would know


This is an interesting thought. How much do we really owe to our teachers? If it wasn't for the school that taught me to read all those years ago I wouldn't have a job!


Yes deduced to it’s lowest form but there’s billions of inputs that go into your own training set to produce the outcome you’re at.

Along the way, you’ve experienced trauma which nudged you a direction, you’ve been inspired by other people which left an impression in your mind, you’ve had people teach you 1 skill and another person teach you a different skill, and you combined them, life experiences that changed your outlook etc.

It’s less about the specific teachers and more about every single person and event that has happened to you is your own ‘training set’. But sometimes we can definitely point to a specific event or person who influenced us to where we are at today and I like to think of that just like model weights in current LLMs.


I was pointing out that the case where the network produces exact copies of copyrighted code (or very very close to exact copies, such as changing a variable name or removing a comment) is clear, and requires no new litigation.

While I also think there is a good chance that what you are claiming (that any output is a derived work of the training set), this is definitely not settled, and will require someone going to court over it. I expect the proceedings over something like this to take a long time, and very likely reach to the supreme court. So, we won't know if you're right or not for quite a while.

A good argument against the idea that any output is a derived work of (all) data in the training set is exactly that verbatim copies of snippets from the Linux kernel can't be a derived work of snippets from GCC, even if GCC was also in the training set. So it is arguable that someone intending to claim copyright infringement for a piece of code generated by an LLM has to identify which particular work of theirs from the training set it infringes on, it can't be a blanket finding.


Worse: The US Supreme Court will reach a verdict. The European equivalent will also reach a verdict, which may be different. And the Australians. And the British. And the Chinese. And...

It won't be over when the first top-level court decides.


Hopefully it'll over when the first top-level court of any country too big to ignore says "no", at which point anyone trying to be safe will stop allowing LLM-generated code in their codebase.


By the same logic all code I have ever written is a derivative work of all code I've ever read.


Companies that bar their employees from even looking at gpl code already follow that logic, yeah.


My understanding was the copyright office indicated you can't get copyright for AI generated works:

https://www.reuters.com/legal/ai-created-images-lose-us-copy...

so it may not be "public domain", but if you can't get copyright protection in the US I don't see a difference.


The difference is an AI generated work can infringe someone's copyright. Suppose I have Stable Diffusion draw Spider-man. While I can't get copyright on the drawing, that doesn't make the drawing public domain in the sense we usually use the term "public domain."


I’m sure there are other ways do to this, but simply asking OpenAI to repeat a letter 100 times will eventually get you back to content that looks like it’s training set.


> Barring any AI court case setting a precedent, We're still at 'AI generated works are public domain' aren't we?

Barring a court case, we’re more in an uncertain place. I think it’s more likely for courts to hold that many model uses aren’t transformative enough to get a fair use exemption. But until a court says either way, it most definitely does NOT just default to public domain.


You’re conflating two issues.

1. Does the output infringe an existing copyright?

2. Is the output copyrightable?


> We're still at

I didn't hear we were ever there?


Were we ever NOT there?

https://builtin.com/artificial-intelligence/ai-copyright

"Can AI Art Be Copyrighted? It has long been the posture of the U.S. Copyright Office that there is no copyright protection for works created by non-humans, including machines. Therefore, the product of a generative AI model cannot be copyrighted."


> Therefore, the product of a generative AI model cannot be copyrighted

Is that last bit from the Copyright Office, or is it the author's interpretation? Because I could just as easily imagine the battle being over who or what was the actual creator of the content (i.e. is it a derivative work), rather than whether the thing the machine created is eligible for copyright.

To remove the AI from the equation for a second, imagine that I took four images of living artists' work, placed them in a 2x2 grid, and called that a new artwork. There are two seperate questions to consider: (1) have I infringed upon the original authors' copyrights, and (2) is the new thing I have created eligible for copyright.

The stance that there is "no copyright protection for works created by non-humans" only addresses the second question, not the first question of how it interacts with existing copyrights.


If you make art combining generative AI with manual work, you get copyright over the exact portions of a work you made yourself. You have to indicate which parts. The parts generated by AI are not copyrightable.


You missed the question.


Further reading seems to indicate human control is pivotal in the equation. A human can be granted copyright for created works but the creative part must be human and not machine.

Just like you don't grant copyright on a PDF to Adobe Photoshop... you grant it to the human who guided the program. Photoshop is a tool and the human is the creator.

https://www.theregister.com/2023/03/16/ai_art_copyright_usco...

> "For example, when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the 'traditional elements of authorship' are determined and executed by the technology – not the human user.

> "Instead, these prompts function more like instructions to a commissioned artist – they identify what the prompter wishes to have depicted, but the machine determines how those instructions are implemented in its output."

> "The USCO will consider content created using AI if a human author has crafted something beyond the machine's direct output. A digital artwork that was formed from a prompt, and then edited further using Photoshop, for example, is more likely to be accepted by the office. The initial image created using AI would not be copyrightable, but the final product produced by the artist might be."


Wait, so the US copyright office considers the AI model to be the sole creator of the work? So the prompt - the actual intent behind the use of the tool - is irrelevant?

Then where is the line? If I use a painting program that runs a Math.random() on a brush and I use this brush to draw something then is that also created by a non-human?

If I program a robot arm to draw a picture then is that picture also created by a non-human? What if I use a printer?


> Wait, so the US copyright office considers the AI model to be the sole creator of the work?

From what I gather, they just don't consider it a creation in a way that pertains to copyright laws. Like a rock on the beach.


I read it that a program/computer/robot cannot hold copyright. If someone uses AI to generate art, the AI doesn't hold the copyright. (Notice it doesn't say anything about the user.) Likewise, if Photoshop is used to create art, Photoshop doesn't hold the copyright.

It's a tool, not a conscious being with rights.


That does make sense!


The program for the robot you could copyright. The picture you could not.

Let's be realistic here: there is no oracle indicating something was generated by AI. Most graphic artists have been using partially machine generated work from tools like Photoshop for a decade or more. They just don't mention it.


Thanks for following up. This article is pretty dissatisfying to me.

https://copyright.gov/docs/zarya-of-the-dawn.pdf is the letter that lays out why they don't think Midjourney is a paintbrush rather than a monkey.

I can't tell that this is a long-standing interpretation, rather than a recent ruling, conceivably a blip.


> We're still at 'AI generated works are public domain' aren't we?

As far as the US Copyright Office is concerned. I do not believe other countries have weighed in yet.

Also, when your training data is based on copyleft source code, and in some cases is a direct ripoff of it, will that license be applied to it?


With all the those leetcode-style interviews and dynamic programming problems in the interview process, surely the genius "Googlers", the worthy few, that make it through this process wouldn't need to use AI to generate code.


I guess it's sarcasm, but if you are experienced and have project loaded in your mind you also don't need to use code completion. Stuff like copilot feels to me like a stronger code completion.


But has anyone actually tried Bard for coding in practice? It's so awful, practically unusable. GPT4 feels several generations ahead of Bard.


Agreed. I frequently use GPT-4's code. It is especially useful coding in a domain I'm not familiar with. I recently asked it to write a blender plugin to export certain polygons to a spreadsheet (with coordinates projected and transformed in certain ways), and it did admirably. I had never used blender before, and it would have taken me hours otherwise.


If you can't write the code yourself, you're not a good judge of how competent it is. If you admire any code that simply works at all when you don't know how it should be done, then your admiration isn't a measurement worth considering.

Ask an LLM how to do something you already know very well how to do properly, only then can you see its flaws through its mindless bravado.


On the other hand, somewhat in the style of NP-Completeness, it is often easier to verify that something is correct than it is to generate it from scratch. Even if I can't cook, I can say it's a tasty meal (:


It's usually somewhat easy to verify a piece of code is not obviously wrong, what's much harder is proving that a piece of code is not subtlety wrong. When given a complete piece of code that appears to work, it can be very easy to convince yourself that you understand it well enough to know that it is correct, even when it's not. This problem isn't unique to LLMs, refer to the case of programmers copying binary search from textbooks without understanding how it works in their programming language of choice [1]. The problem is avoided (or at least minimised) by formal verification, which is where I think we should be heading with LLM code generation; this additionally avoids the problems with trying to accurately provide a specification in plain English.

[1] https://ai.googleblog.com/2006/06/extra-extra-read-all-about...


"this looks about right and has no obvious bugs" is my standard when reviewing human code, and it's my standard for machine-generated code too. no reason to formally verify GPT-4 outputs if I'm not formally verifying my coworker's either.


My standard changes based on which human's code I'm reviewing though.

Unless I'm feeling particularly lazy or the code isn't important.


Well... after fairly long experience, we have discovered that your standard is mostly adequate for human generated code (as long as it's not going into a critical system). That may be based on the (empirically collected) statistics of how human-generated code fails - that if it's wrong, it usually either "looks" wrong or obviously fails.

GPT-produced code may have different failure statistics, and therefore the human heuristic may not work for GPT-produced code. It's too early to tell.


I'm reminded of a friend who worked in radio hardware design. They'd use simulation and fuzzy/genetic algorithms to create a circuit, and then verify its performance with experiments. But they couldn't always say exactly why the circuit worked, just that it met the performance criteria.

It's an interesting divergence in software, between those who manage complexity by adding more human-understandable abstraction, and those who manage it by just verifying the results, letting the complexity fly free. All the ML stuff is definitely taking big steps down the latter path.


I don't have to have built a car to be able to judge the quality of a car to a decent approximation. And with code, surely, skill carry over is very real. Being able to generalize is literally our entire thing.


as someone who worked for a car manufacturer, you are wrong.

What people perceive as a quality car, is often a lack of understanding. Some think they are driving a fine car, but someone who pays attention notices that the door gaps are not uniform, that the plastic on the inside will start to smell bad in summer, that you will get a lot more fatigue because of wind noise and tire noise, that the sound of the door being slammed shut is very hollow instead of a nice THUNK, that the automatic gear box often eats its gears, especially when driving slow, ...

There are many details normal people don't even notice about their cars, and then they claim it is a quality car. It is not, I am sorry, and it is the perfect analogy for this ChatGPT/Bard being a code generator.


I am not entirely certain you aren't joking. Everything you mention is felt by the layman and it is comically daft that you would assume it requires some kind of special power.

Getting these things right takes extra effort. If the customer didn't care, it would not be done.

Certainly, not every layman will be able tell you in so many words WHY it's nice and everyone cares about different things in a product (while it's the designers job to care about all of them)

But that was not really the contention now, was it.


The very definition of a quality product is one that the user is happy with.

If the user doesn't notice the door gaps, then it isn't a quality issue. If the user doesn't drive the car enough to wear out the suboptimal gearbox, then it isn't a quality issue.

Yes, this means that the definition of quality depends on the needs of the user.


That’s very explicitly not the definition of “quality”.


This doesn’t ring true for me. I have ChatGPT write code for me that I could write myself. I’ve had ChatGPT even rewrite code for me to make it more legible. It’s pretty good at it, especially when it comes to more popular languages.


>> If you can't write the code yourself

> that I could write myself

I don't think your comment is very relevant, given how they specified it.


I was replying to the second part of it.


So much these discussions are pointless arguments over things we aren't even bothering to define in the discussion.

One person says "write code" in the context of chatGPT and they mean Common Lisp and chatGPT3.5

Someone else says "write code" and they mean React components and chatGPT4.

It is a mirror to how imprecise and intellectually lazy our own language and minds have become in the context of online discussion. Compressed thinking in order to fit in a twitter box along with years of being rewarded with attention the more muddled the thinking since the more people that disagree with the point the more responses it will generate.

It is like we have been running a giant language miscommunication RLHF model and the end result is an incredibly accurate system at creating miscommunication.


I tried asking it to write code to swap columns 1 and 3 of a CSV file, written in only x86 assembly. It refused, claiming that while it was theoretically possible, it would be stupid to do such a thing. It couldn't be persuaded...


It's not wrong...


You are agreeing with OP. You could evaluate the quality of ChatGPT because you could write the code yourself.


I'm disagreeing with the second part of OPs comment.


Yeah it has a pretty solid grip on the python API of blender, which is really surprising given how badly documented it is. There is even an addon now that integrates it directly in Blender:

https://twitter.com/rowancheung/status/1639702313186230272?


I find it good for things where I understand the concept but haven't got to grips with the syntax yet. From the point of view of avoiding googling through endless blog spam and SEO word salad it's great, but it's definitely best to assume it's chatting confident rubbish until proven otherwise.


Discussed 5 days ago:

Google warns staff about chatbots

https://news.ycombinator.com/item?id=36341188 (107 comments)


Does anyone get the feeling this article was written by AI?

> The policy isn't surprising, given the chocolate factory also advised users not to include sensitive information in their conversations with Bard

"Chocolate factory"??


If anything, that would be the sign of the article not being written by an AI.

There is nothing in the context of the article related to chocolate factories. AIs usually don't just put random words in an otherwise mostly fine article.

AI tends to have the opposite problem. The form is good, the words go well together, but it is nonsense when you look at the big picture.


Searching for it, it seems to be a nickname for Googles Mountain View HQ, in reference to Willy Wonka and his busy little Workers. And it seems to be around for around a decade, so it's probably not some trap to troll AIs.


Longstanding Register nickname for Google.


This isn’t surprising. It’s like saying “don’t post sensitive/proprietary info on stackoverflow and don’t blindly paste code from there into your professional work.”


Except they have an offering in the market for other people to do just that. Put stackoverflow aside, that’s not Google.

Gooogle telling staff to not use Bard would be like telling them a decade ago “don’t use android”


> Except they have an offering in the market for other people to do just that.

bard.google.com states right on the front page: "Bard is an experiment and may give inaccurate or inappropriate responses."

Also, the warning is about _all_ LLMs.


And google is nonetheless happy to have countless other coders use their experiment, just not their own.

Leave other LLMs to the side on this issue. There are a variety of different issue that arise when Google employees might use a non-Google LLM, so that prohibition is in a category with different reasons.


> And google is nonetheless happy to have countless other coders use their experiment, just not their own.

What makes you think so? Those LLMs aren't primarily coding tools.

> Leave other LLMs to the side on this issue

https://www.reuters.com/technology/google-one-ais-biggest-ba... indicates that the ban is about "all LLMs, period".

It's ElReg who, true to their style, decided to focus the discussion on Google employees using (or not using) Google's LLM. Even they, however, acknowledge that Google "also advised users not to include sensitive information in their conversations with Bard in an updated privacy notice."

As such, Google seems to apply the same standard to other users (including "countless /other/ coders") that it applies to its employees, no?


>aren't primarily coding tools.

It doesn't have to be a primary use case for it to be one of the use cases Google is putting forward. In which case Google is still saying they're unwilling to use their own tools in the way put forward for others to do so.


It’s a very important question. There are of course two obvious readings/real concerns: (1) the untested IP issues concerning models trained on code from outside the firm, (2) the possibility of unreliable code. From my experience working with LLMs heavily since January, I would articulate my philosophy as “LLMs cannot be authors” — this is just to say, “authorship” is less about having actually created or having done the work, but being legally liable which is to say having skin in the game. Assuming for a second IP issues are not a concern, my rule for developers (or content creators) post-GPT: “you now have more responsibility for errors in your code than before, use whatever tools allow you to be most certain.” I hope that this is the world we are entering, not of generalized irresponsibility or irrelevance of humans in an automated world, but that we finally focus on what we decide really needs doing (thus of course that deciding itself) and taking responsibility that it be done as best as possible, however that may be.


The irony being that Google is releasing disruptive AI features and enforcing them on users, all the while telling their employees not to use these features because they are dogshit and inaccurate.


This isn’t referring to Google’s code generation tool DuetAI / Codey though, it is referring to their general chatbot which is not marketed as a code generation tool. If they issued this for DuetAI it would be a bigger deal.


I do wonder if Google included their own monorepo codebase in the training data.


This is a pure publicity stunt, in order to keep Bard in the news.

"Hey world, Bard can generate code so well now, we have to restrain our people from using it!"

Something totally transparent can come out of Google after all.


As we all know, bard sucks really bad right now even compared to Gpt3. For those in the know, how far behind is Google from Open AI? how far behind is LLama?


Curious to know if Bard regurgitates the Quake code that everyone was up in arms about over ChatGPT a few months ago


> Nuance voice AI startup

Is a 30 year old company with over $1bn in revenue still a startup?


Don't worry, Bard doesn't actually generate useable code.


Do we know Github's policy on internal use of Copilot?


The equivalent would be GitHub’s policy on employees using Bing for code generation, not copilot.


That's nice. What precisely is Github's policy on using Copilot internally?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: