Microsoft CEO of AI Your online content is 'freeware' fodder for training models

ralferoo · 2024-06-29T09:36:26 1719653786

Copyright law is pretty clear that copyright automatically belongs to the creator from the moment they create something. People can choose to transfer those rights to another party, e.g. assigning the copyright to an employer or selling the rights, or by licensing rights to someone else to use the content. But it cannot be assumed that everything is "free" in the absence of a copyright notice, on the contrary one must assume that everything is copyrighted in the absence of a notice saying it's free.

The following rules are agreed upon by pretty much every country that has an interest in copyright: https://www.wipo.int/treaties/en/ip/berne/summary_berne.html

EnigmaFlare · 2024-06-29T11:17:27 1719659847

This misunderstanding is getting out of hand. That's not what he was saying at all! He was talking about sites that say things like "you're not allowed to even scrape our content unless it's for search engine indexing". Without any such restriction, they are giving implied permission to download/scrape their content. That doesn't mean you have the right to redistribute it though, or that it's free, and that's not what he said.

"anyone can copy it, recreate with it, reproduce with it" are his words. Those "with"s are important!

moritzwarhier · 2024-06-29T12:11:41 1719663101

MS "reproducing with" all available content? No thanks.

So if you publish, say, a novel or some expensively researched reporting online, you are now to assume it's legal to have an "AI" paraphrase it and sell it on Kindle? Or MS monetize it on their scummy "Microsoft Start"?

That behavior is a lot more aggressive and unethical than what Google does on its news aggregator (although that has become unusable for me on mobile since the latest redesign, whatever).

sure, Google would probably like to anonymize all "content creators" too, and drive the value of arts and information to zero.

I don't buy the constant equation of linking to external sources (search engine, link aggregator) with the automated mass-production of uncredited copycat content.

That's what I consider an important difference.

It feels weird when link aggregators link to MS Start, which itself seems to be an AI text generator paraphrasing other's content. Not sure because I tend to leave early.

Guess they have some licensed content as well.

When people don't value content, it's cynical to complain about the decline of journalism.

EnigmaFlare · 2024-06-29T21:10:14 1719695414

> So if you publish, say, a novel or some expensively researched reporting online, you are now to assume it's legal to have an "AI" paraphrase it and sell it on Kindle? Or MS monetize it on their scummy "Microsoft Start"?

Existing copyright law already covers that. It's the same rules.

moritzwarhier · 2024-06-29T23:31:36 1719703896

I was aware of that when commenting, I was answering to this part of the original comment, that claimed an important nuance in this statement:

> "anyone can copy it, recreate with it, reproduce with it" are his words. Those "with"s are important!

So I was a very strong anti-DRM proposant when younger and I basically still am.

Information is inherently free, yes. And every creation that is understandable to others must be derivative of other works.

But no, I still think that existing copyright law does not cover AI slop generated from copyrighted content.

It covers "remixing" and fair use to a degree (or, more precisely, there is legislative precedence about many nuances of copyright).

But copyright law deals with humans.

It is not designed to cover automatically generated copycat "work" at scale.

Look at actual cases about derivative works and the line being drawn: no way this covers the situation with LLMs, it is not remotely practical to have a complicated court case for each piece of AI slop generated.

To the contrary, LLMs are ideal tools to obfuscate the actual source of any original piece if information.

Yes, the existing copyright of the next HBO series won't be affected.

But ad an indie author or musician, it could become even harder to achieve any kind of meaningful revenue.

I think this tends to degrade all domains of art to a pastime for people without any material needs.

Not that this is principally new, but the kind of automation that AI allows has the potential to ramp this up to eleven.

EnigmaFlare · 2024-07-01T02:30:50 1719801050

To help me understand you, imagine if AI were as good as it is but somehow not trained on anyone else's material without their explicit permission. Would everything be OK then? It's still undermining an indie author or musician's ability to earn money from their own work.

That's kind of what other human creators collectively are. They're competition and they are the reason indie artists have trouble making money. There's too much supply. But that's a good thing for people in general! If AI acted like that, it would be even better, and perhaps artist would no longer be a job but more like a hobby - like gardening or making model railways.

moritzwarhier · 2024-07-01T16:45:44 1719852344

The scenario I had in mind is simple:

- publish content (music, book, blog post, whatever).

- content gets automatically plagiarized, so that it's not an exact duplicate anymore

- now you'll have to sue and win in court to claim your rights to the content

So the same scenario as with human plagiarizing, but at a scale that makes it harder for authors to claim their copyright or argue against "fair use".

> There's too much supply. But that's a good thing for people in general

No, I don't want to listened AI-generated slop "in the style of" my favorite artists.

Music is a great example. For small excerpts and some background listening, I might not notice at first. But after a while, I do.

Same with text: I have not ever read a single piece of original content worth reading that was "created by AI", do you?

The "supply" here is not getting bigger because of AI. It is diluted.

What do you prefer to read, a StackOverflow answer from 2012 with some edits afterwards, or a translated sunmarization with subtle errors and no attribution?

To address your question:

> To help me understand you, imagine if AI were as good as it is but somehow not trained on anyone else's material without their explicit permission. Would everything be OK then? It's still undermining an indie author or musician's ability to earn money from their own work.

I have not yet heard any AI output that was interesting music on its own to me. Interesting sounds and loops maybe, compositions: not so much.

I find AI as a tool for creators more interesting.

What I am worried about is that this won't happen. That it will only be used to plagiarize at scale, and replace creativity with a mixture of randomness and interpolation of already present ideas, optimized for commercial exploitation.

In this scenario, everything that you put on the internet will be merged into mass-produced culture products by large companies, before you even have any chance to try and claim any kind of copyright.

In the short term, it's unimportant then if your work would be used as input/context or training data.

> If AI acted like that, it would be even better, and perhaps artist would no longer be a job but more like a hobby - like gardening or making model railways.

I used to argue against copyright in a similar way. Money is a bad motive anyway, etc

But this is not practical. Cultural progress and interesting works are rarely done as a leisure activity of people who have to work a regular job full-time.

rvnx · 2024-06-29T10:33:58 1719657238

In practice it's not the case, you can see Google Search or Microsoft Bing, they are reproducing copyrighted content without asking prior permission, and for commercial use.

This is called a search engine.

Mindwipe · 2024-06-29T10:37:40 1719657460

One of the reasons Google Image Search is so neutered is because it was sued and lost.

Small extracts or low quality assets are usually permissible, full on scraping is not.

ChrisMarshallNY · 2024-06-29T11:08:00 1719659280

Tell that to all the StackOverflow clones out there.

It's really crazy. If I search for any of the things I have posted on SO, for the last dozen years, I get multiple hits.

There are quite a number of sites that just full-on scrape SO, strip out the attributions, then republish, wrapped in their branding.

I suspect that this is done for many, many other content-rich sites.

johnnyanmac · 2024-06-30T11:03:53 1719745433

I hate to be assumptuous, but odds are those clones are in countries that do not respect copyright anyway. You'll see tons of shameless clones from Chinese sites or companies and there's not much you can do about it outside of maybe asking the server host to intervene.

Kye · 2024-06-29T11:21:24 1719660084

They probably do the attribution in a way that just barely complies with the wording of the CC license without being too visible.

lupire · 2024-06-29T11:11:01 1719659461

Example?

bmacho · 2024-06-29T13:51:10 1719669070

I think for some people when they search a question, a stack overflow mirror comes up as the first result. I've seen it too, ~once, some years ago. I was totally ready to change to it if it can provide /text/ (without javascript) faster and leaner, but that did not happen.

xkqras · 2024-06-29T08:49:57 1719650997

"That's the future Suleyman anticipates. "The economics of information are about to radically change because we can reduce the cost of production of knowledge to zero marginal cost," he said."

The cynicism is mind-blowing. Artificial Stupidities have created exactly zero knowledge so far, which is why companies like Microsoft now roll out people like Suleyman who openly admits that the DMCA is only for the rich with lawyers.

And steal and repackage "content" from altruistic creators.

This one might bite back Microsoft. If any case goes to the Supreme Court, I'm not sure that they'll be amused by this line of logic.

victorbojica · 2024-06-29T09:15:13 1719652513

I'm not saying it's not cynical, but it's surely not wrong. He just said what everyone in the business was already thinking.

torginus · 2024-06-29T09:18:59 1719652739

It's exactly how democracies backslide into authoritarianism.

First they violate the law. Then they ignore it. Then ignoring it becomes the new norm. Then the new norm becomes the new law.

johnnyanmac · 2024-06-30T11:05:37 1719745537

The Chevron repeal in a nutshell. And it'll keep happening if nothing can keep it in check.

victorbojica · 2024-06-29T09:22:29 1719652949

Not saying it's a good thing either. And what you're saying it's unfortunately true.

We need people who really value creation, democracy, etc and not pure capitalism, profit optimization, etc.

denton-scratch · 2024-06-29T12:56:55 1719665815

It's cynical and wrong. What he (seems to be) saying is that it's fine to violate the legal rights of other people if they don't have a large legal department, and you do. It's blindingly obvious that's wrong.

"Everyone in the business" - you mean, the AI training business? Paraphrasing a famous call-girl, "They would say that, wouldn't they?"

johnnyanmac · 2024-06-30T11:09:33 1719745773

>"Everyone in the business" - you mean, the AI training business?

This isn't even limited to tech. Rich people at worst settle with money to break the law, and at best completely get off scot free doing things that would put much pettier offenders behind bars. I steal some bread from a store and go to jail. Some rich dude commits millions in fraud and cover up and still runs for president.

lucianbr · 2024-06-29T09:37:00 1719653820

Do Mistral and xAI get to freely scrape the output of Microsoft ChatGPT? I for one, think either answer to this question will turn out bad, for all of them.

Drowning out creativity on the internet with remixed word salad will only be profitable for a short while. After which, people will start looking for human-created stuff, and avoid ai as much as possible. And slowly, it will become possible. Just like adblockers slowly advance.

CuriouslyC · 2024-06-29T10:27:03 1719656823

People may think they want non-AI stuff, but they just want things created with intention. AI is just a tool, and the very best creators will leverage it to take art to a new level.

lucianbr · 2024-06-29T11:14:22 1719659662

Well this new level doesn't seem to be here yet, so I'll believe it when I see it. Crypto also "will change the world", except it didn't, for all the confidence of its supporters. Claims about great things in the future are a dime a million.

CuriouslyC · 2024-06-29T12:13:14 1719663194

I'm pretty sure you've seen it many times and not realized, because the creator took care to build upon what the AI produced, giving it polish and a human touch, then didn't advertise it as "AI"

surfingdino · 2024-06-29T14:00:49 1719669649

Funny how all those "thought leaders" do end up in jail even though everyone seeing them steal all they can is already thinking.

hunglee2 · 2024-06-29T09:20:51 1719652851

Three C's are missing from this - consultation, consent and compensation.

We post regularly online without any expectation of payment but then we never considered that the output could or would be used for commercial purpose. The value of our collective output is being captured by a very small elite. Not sure what we can do about this, other than support alternative eco-systems at least to ensure that intra-corporate competition might keep prices low

sigmoid10 · 2024-06-29T10:11:37 1719655897

>We post regularly online without any expectation of payment but then we never considered that the output could or would be used for commercial purpose.

We did. If you are posting here, that means you've explicitly stated that you read and agreed to Y Combinator's TOS. Which include this clause:

> you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed

And people are still fine with this apparently, otherwise they wouldn't use the service anymore.

lolno66 · 2024-06-29T10:57:00 1719658620

I ask them to take a poem and hold it up to the light like a color slide

or press an ear against its hive.

I say drop a mouse into a poem and watch him probe his way out, or walk inside the poem's room and feel the walls for a light switch.

I want them to waterski across the surface of a poem waving at the author's name on the shore.

But all they want to do is tie the poem to a chair with rope and torture a confession out of it.

They begin beating it with a hose to find out what it really means.

johnnyanmac · 2024-06-30T11:14:38 1719746078

>And people are still fine with this apparently, otherwise they wouldn't use the service anymore.

It's heavily skewed against us, but you can still argue the 3c's here. consulation is the licence, consent is agreement, compensation is agreed upon as none for the user.

What's not in that agreement is letting non-afiliated companies do any of the above actions. And I believe the privacy policy and GDPR cover other clauses pertaining to the sell of our data.

>Sale or Sharing of Personal Information. We do not sell or share your Personal Information (as those terms are defined under the CCPA).

sigmoid10 · 2024-07-02T08:51:15 1719910275

These TOS often actually violate GDPR. It's just that most of the time noone bothers to sue companies, except for small non profits like NYOB who actually have the resources and expertise to attack high profile companies in court.

courseofaction · 2024-06-29T11:55:17 1719662117

Bullshit. The cost of not participating in the short term is high.

sigmoid10 · 2024-06-29T15:45:36 1719675936

This is not a new development. All your content online hasn't been yours for nearly two decades. People who still don't get this seriously need to get educated and think about what they post - or simply stop posting.

klardc · 2024-06-29T10:09:15 1719655755

For software engineers it is hard to protest in public since Microsoft controls GitHub and many "free" OSS projects where most real contributors have left after Microsoft installed a small ruling class of otherwise unremarkable software engineers.

For Hollywood writers, artists, musicians, book authors it is much easier to protest. You could go to Redmond for a couple of weekends, camp on "One True Microsoft Way" and protest.

And the NYT would cover this, since its interests are aligned with the protesters.

courseofaction · 2024-06-29T11:47:26 1719661646

When the short term cost of non-participation is far, far higher than the long term cost for the individual, is the ToS anything less than coercive? If you don't use github on principle, you're fucked.

We need the public square back, we need it conceptualised as such. The death of twitter was horrifying, apparently you can just BUY it.

Excuse my passionate language, but fuck these predators.

MiguelX413 · 2024-06-29T18:18:59 1719685139

Only decentralized and open source services were ever candidates to being public squares.

iamacyborg · 2024-06-29T10:08:32 1719655712

A good start might be to start voting with your wallet. Stop using products from companies that have trained their models in this way.

criddell · 2024-06-29T10:21:37 1719656497

And start voting with your attention too. By participating in open services online (like HN), you are actively participating in building a corpus of text that will be used by AI companies.

danybittel · 2024-06-29T10:22:33 1719656553

"I think that with respect to content that is already on the open web, the social contract of that content since the 1990s has been it is fair use"

And that social contract did not include AI. You put content on the internet because you had certain expectations on how it'll be seen or used.

wwweston · 2024-06-29T11:10:24 1719659424

And the idea that fair use includes training set fodder is a non-starter. This use was not only unaddressed in fair use standards were created it involves a set of capabilities that was inconceivable at the time, so it can’t have been intended — even the most generous reading of “transformative” can’t allow for automated reproduction however fractal in nature.

mns · 2024-06-29T11:17:22 1719659842

It’s funny that everything is fair use when it comes to these big corporations using it. When it comes to someone with limited resources and attorney power and it touches their content, then it’s over.

chrishare · 2024-06-29T10:39:24 1719657564

Right. Training models is qualitatively different to educating or entertaining users.

lupire · 2024-06-29T11:15:37 1719659737

How so? The models are educating and entertaining.

Those things aren't what makes a copy fair use, though.

chrishare · 2024-06-29T12:13:56 1719663236

Because models do so in competition with humans, in a way that threatens their way of life. Atleast until the rewards and benefits of ML are shared fairly, people don't to help accelerate that.

MiguelX413 · 2024-06-29T18:14:12 1719684852

Do people not do it in competition with each other too?

chrishare · 2024-06-29T19:11:04 1719688264

They do, on a equal footing. Even human level AI will blow human professionals out of the water, let alone superhuman.

piva00 · 2024-06-29T11:14:06 1719659646

It is the kind of attitude that will bring regulations; in the USA it will probably take a decade (as it usually does) for legislators catch up to it after some company skirting copyright laws has became big enough to be another economical powerhouse for the US's economy.

The EU is probably already acting on it behind closed doors, the usual vitriol against "innovation" peddled by the American-way of doing business went pretty high against the AI Act (which is definitely far from perfect but a step into some direction to regulate it), in the near future I can see more rulings or even new regulation to address the absurdity of AI companies consuming all this data for their own profits with no compensation to the creators of it.

My personal opinion is that leaving "innovation" be the only guidance to what is "good" without any morality imbued is stupid. A lot of us has seen the cycle by now, what was innovative before becomes entrenched, the entrenched companies become behemoths, and obviously start abusing their position of power when consumers have very little options to not be in the system they created. The downfall of tech from what I experienced in the early 2000s to what it has come to be in the 2020s is just sad, it's the new 80s finance yuppie bullshit, instead of coke-addicted greedy as fuck bros we have nerdy-blabbing-about-changing-the-world greedy fucks reaping the profits.

This will get ugly, and companies doing it will deserve the retribution if they get fucked.

ankit219 · 2024-06-29T09:37:56 1719653876

Current laws are insufficient to address this, but we will end up using them to adjudicate whether or not this is fair use. We will get there in due time, and the pathway will be messy.

I am ambivalent about this overall, but few things are clear. Someone getting sued does not automatically mean they are wrong and whoever is suing is right. We don't know the rules, and hence the lawsuit. I see it being used as an evidence of wrongdoing, and seems plainly wrong. Every thing that becomes big ends up being sued (including the artists with allegations that they copied someone's work). Tells us nothing.

(I think this part is clear) Reproducing content verbatim without permission, and for profit, is plain old plagiarism, whether it's done via AI or human. In some cases, with proper citations, it is allowed, but otherwise it's a no. For summarized content, with or without credit and citations, was always allowed, but never done at this scale, so this "social contract" might need to change.

torginus · 2024-06-29T13:58:35 1719669515

I'd say your optimism is unfounded - copyright in the digital realm started out as a free for all, and turned into a sh*tshow, that kinda is still unresolved to this day. And these are issues that arose 20, 30 years ago.

If AI follows the digital copyright model, what'll happen is that interest groups with true legal muscle behind them (music, movie industry, etc.) will enforce their interests with draconian laws, and everybody else will be left to fend for themselves.

courseofaction · 2024-06-29T09:59:55 1719655195

This is the leading edge of a conversation about the 'data' we emit moment to moment, digital and otherwise, and the right of powerful players to incorporate that into models which will definitely be used against us.

The social contract has never been sacrosanct, it's enforced through social consequences. These companies don't feel social consequences, individuals making these psychopathic decisions are socially rewarded by their peers and allies, and financially rewarded in turn. Money is a flow of power, not of value. Especially post Bretton-Woods.

ankit219 · 2024-06-29T10:58:42 1719658722

That is the contentious part. These are not easy questions.

People talk about data, but individual data is as useless as they come for many companies. Data is only useful in aggregate and that too after reaching a certain scale. Counter intuitively, individual data (w or w/o PII) is more likely to be used for nefarious purposes compared to aggregated data. Aggregated data can lead to better user experiences, better utilization of our time, and can deliver more value to us (while delivering personalized ads). No one should create a blanket rule which bans aggregation given how it can ban Google Maps and Google search and Google ads in the same breath if interpreted broadly.

With generative AI, the paradigm has changed and it needs a rethink. That was my original point. We don't have a clear answer yet.

Kye · 2024-06-29T11:11:11 1719659471

I guess I own and can do whatever I want with this Windows ISO then.

https://www.microsoft.com/en-us/software-download/windows11/

dspillett · 2024-06-29T09:48:59 1719654539

How about we test this theory by copying all the “freeware” content that MS put out. Heck, as it is freeware and not anything more restrictive, we can make derivative works too!

denton-scratch · 2024-06-29T13:09:23 1719666563

You missed the point: what Suleyman was saying is that content is freeware, provided the publisher doesn't have a well-staffed legal department. Microsoft does have that, so what they publish isn't freeware.

This is an attitude that has run in Microsoft's veins since the start; they've always been scofflaws, they've always been in the business of stealing other people's inventions. For Microsoft, laws are for protecting their moat.

Frankly, I'm astonished that Suleyman said this on the record.

lambdaone · 2024-06-29T10:25:08 1719656708

Fascinating watching Microsoft morphing from intellectual property absolutism to this. "What's mine is mine, what's yours is also mine."

CM30 · 2024-06-29T10:24:55 1719656695

I'm sure all those media outlets and rights organisations are going to love this logic, and it's not going to backfire at all. They're already being sued by multiple news organisations, the author's guild of America, a few celebrities and likely many other organisations in the coming few months, and I'm sure at least some of the lawyers of those people are going to end up using this quote in court.

m463 · 2024-06-29T09:18:51 1719652731

Could Microsoft just ignore copyright on radio transmissions or broadcast tv too?

dspillett · 2024-06-29T09:41:23 1719654083

No, not easily, because those can afford legal teams that the rest of us can't.

EnigmaFlare · 2024-06-29T10:46:06 1719657966

I think people are getting confused between using something and redistributing it. If you train an AI or build a search engine index, or your browser caches a webpage, you're just using it, and it was put on the web so that it can be used, without any restrictions on what for (except perhaps when the permission to even download it is granted conditionally on how you use it as Suleyman mentioned). Copyright restricts copying, not private use. Since it's on the web, there really is an implied permission to use it privately for whatever you want.

The copyright issue only comes up if you publish the output of your model. But if the AI is (somehow) clever enough to never reproduce the source material in any way that counts as copying for the purposes of copyright, then there's no copyright problem making it available to the public.

Some artists assume they have more rights than they really do and that other people aren't even allowed to mimic their style.

keiferski · 2024-06-29T10:52:48 1719658368

Your comment is a traditional understanding of copyright, pre-AI tools. The point is that this old model increasingly seems inadequate when a megacorporation can functionally copy and reproduce your work to an extremely close, but not exact, degree and then claim they did nothing wrong.

EnigmaFlare · 2024-06-29T11:00:19 1719658819

Extremely close but not exact copying is already not allowed. Human artists have tried that.

Just because AI threatens to produce equally valuable material for much lower cost doesn't mean that's wrong. It just means those people got out-competed in the market. That's what the free market is for - to serve the consumers, not the producers.

latexr · 2024-06-29T11:11:15 1719659475

> It just means those people got out-competed in the market. That's what the free market is for - to serve the consumers, not the producers.

For one, if you serve only the consumers at the detriment of the producers, soon you’ll have no producers left (or, more likely, a handful of extremely powerful ones which will stay stale as they have no reason to innovate).

For another, you appear to be arguing as if the “free market” is an unambiguous positive, when it very much is not. It doesn’t even adequately serve consumers, it serves businesses. The free market is what makes companies knowingly produce and sell harmful substances, like cigarettes or lead paint.

EnigmaFlare · 2024-07-01T02:39:45 1719801585

If AI stagnates, then some human will out-perform it and be able to sell their work. The AI won't be able to generate competitive work because it's stagnant at a lower level.

I'm not arguing free market is all positive, just pointing out that the whole reason people see free markets as a good thing is because they benefit people (consumers). They have downsides too but those aren't the reason they're popular. Nobody wants free markets so they can get poisoned more easily.

keiferski · 2024-06-29T11:09:17 1719659357

I don't even disagree that these tools aren't useful - I use them myself and agree that they allow for creating things for cheap that used to cost a lot.

The reason everyone is irritated at the tech companies is not so much because they made it cheaper to do these things, but because they seem to have such a smarmy, slimy attitude about the whole thing - these comments by the Microsoft CEO of AI as a prime example. They could have just trained their models on old public domain content, Wikipedia, or a million other things, but instead they want to justify raiding content that was shared for free under different expectations. This is only going to result in a locked-down, paywalled web that is pretty much the opposite of 90s net values.

operae · 2024-06-29T10:54:31 1719658471

If you use the data to train a commercial model and distribute it how is that not infringement?

If I use a reversed Jay-Z sample in my music, I will still get sued. This is just a couple of orders of magnitude higher abstraction.

EnigmaFlare · 2024-06-29T11:03:21 1719659001

I'm not talking about distributing the model, just its output, which won't be something the user can non-creatively transform into a Jay-Z sample.

However, if the model really can't regenerate its training data no matter how hard you try, then that would be fine too. I don't think anyone can really guarantee that now so that might be a problem, but it isn't necessarily a problem.

latexr · 2024-06-29T11:04:15 1719659055

> and it was put on the web so that it can be used, without any restrictions on what for

No, this is not true at all. A lot of things are put on the web with plenty of “restrictions on what for”. That’s why licenses exist. Creative Commons being an example of licenses with well defined limits on what you can and cannot do with the content.

Furthermore, a significant portion of content on the web was put there without permission: think piracy or revenge porn. AI systems have been trained on tons of pirated books.

> Since it's on the web, there really is an implied permission to use it privately for whatever you want.

There really is not. It’s just that if you’re using it privately then by definition no one else knows and thus no one can complain. It doesn’t mean that the creator condones whatever you’re doing.

> Some artists assume they have more rights than they really do and that other people aren't even allowed to mimic their style.

We’re not talking about “other people”. Another person mimicking your style is, to an extent, harmless because they’re also making a time investment to replicate your style. With the AI tools you can reproduce hundreds of knock offs in minutes.

We have to stop with this comparison of “what these AI systems are doing is fine because humans could also do it by hand”. It is not the same thing. Scale matters.

EnigmaFlare · 2024-06-29T21:07:42 1719695262

That material with restrictions on use would fall under the "grey area" he referred to. You're essentially not given permission to download/copy it if you're going to do the prohibited thing with it. But if it's just a random picture, there's an implied permission to download it and use it for yourself, since that's what it was put there for.

yencabulator · 2024-06-30T18:40:51 1719772851

There is no such implied permission. You have fair use, that's it.

When something has a stated license, that only increases your rights to use it; the default is nothing allowed except fair use.

EnigmaFlare · 2024-07-05T14:30:54 1720189854

I'm trying to say that copyright restricts copying, not using. If you publish something on the internet, you implicitly give people limited permission to copy it so their browser can download it. Once it's on their computer, they're allowed to look at it with one eye closed if they choose, or even to modify it for their personal use or set it as their desktop background.

yencabulator · 2024-07-05T14:42:56 1720190576

The only reason you are allowed to modify a copyrighted image for personal use is because that's considered fair use. The fact that you found it on a website is irrelevant.

Copyright literally restricts using the material. If you buy a book, you cannot sell tickets to an event where you read it out loud.

MiguelX413 · 2024-06-29T18:31:22 1719685882

I struggle to see how scale would fit into the equation

yencabulator · 2024-06-30T18:40:07 1719772807

> and it was put on the web so that it can be used, without any restrictions on what for

Nope, copyright applies just like it always does.

exe34 · 2024-06-29T09:45:38 1719654338

So I can train a model on all available Windows material, including from MS themselves, and sell it as a support bot?

rty32 · 2024-06-29T10:07:18 1719655638

Probably more useful than all those posts in Microsoft Community.

Make sure to exclude an post with "MVP" in there.

poikroequ · 2024-06-29T10:20:09 1719656409

Someone should do this, scrape everything Microsoft (don't forget MSN news), then create an online chat bot trained on all their data. Tout it all over the web. Then sit back and watch how quickly Microsoft moves to get it taken down.

Sheer hypocrisy.

yencabulator · 2024-06-30T19:04:33 1719774273

From the article:

> "There's a separate category where a website or publisher or news organization had explicitly said, 'do not scrape or crawl me for any other reason than indexing me,' so that other people can find that content," he explained. "But that's the gray area. And I think that's going to work its way through the courts."

Microsoft wants rules for thee, not for me.

EnigmaFlare · 2024-06-29T10:56:56 1719658616

I don't think you'd be breaking the law doing that. As long as you don't reproduce any of the MS-owned material in your output. Data isn't protected by copyright (in the US at least), so your AI could extract the knowledge from some text and present it in its own different way.

latexr · 2024-06-29T11:22:45 1719660165

> As long as you don't reproduce any of the MS-owned material in your output.

That is exactly what you should be doing to call them out on their bullshit. From the article:

> "I think that with respect to content that is already on the open web, the social contract of that content since the 1990s has been it is fair use," he opined. "Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That's been the understanding."

Which means that from their logic you can just copy their content and reproduce it.

exe34 · 2024-06-29T11:05:19 1719659119

that won't stop them suing you into oblivion. they can afford the lawyers. you waste a decade of your life.

dindobre · 2024-06-29T09:53:54 1719654834

dialup_sounds · 2024-06-29T12:04:27 1719662667

"information wants to be free" https://en.wikipedia.org/wiki/Information_wants_to_be_free?w...

croes · 2024-06-29T09:43:16 1719654196

Translation: If it benefits us it's free, if not you get sued.

jerpint · 2024-06-29T13:13:36 1719666816

I have my own small blog , and am more and more tempted to “poison” it with made up facts about me to see if/when it gets scraped by LLMs

surfingdino · 2024-06-29T13:56:56 1719669416

Oh, do fuck off. Copyright exists so that copyright owners have control over who uses their content and for what. Search brings visitors so that kind of use is beneficial to both parties. Training models that repurpose and recombine other copyright owners' IP and assigning copyright to the patchwork created using LLMs is not beneficial to the original copyright owner and is pure theft. Has Microsoft looked at what Adobe did a couple of weeks ago and thought "we haven't had shit thrown at us for a while, we'll have some of that, please?"

courseofaction · 2024-06-29T08:46:47 1719650807

Eat the rich.

bithead · 2024-06-29T13:56:16 1719669376

No wonder AI gets so much shit wrong. Asked ChatGPG a pydantic V2 question yesterday and it got it so wrong it would have hurt to watch if it were a person teaching a class. I hope AI doesn't start running operations and building planes...

surfingdino · 2024-06-29T13:59:04 1719669544

Too late, people are already using AI to embellish CVs and hirings managers are using AI to screen applications and CVs... with that level of stupidity I expect stuff to start falling out of the skies soon.

dvhh · 2024-06-29T11:37:17 1719661037

I understand that the robots.txt is mostly taken as guidelines for courtesy. But if a large corporation can not be expected to respect it, there might be more proactive measures applied as countermeasures.

itronitron · 2024-06-29T14:42:22 1719672142

Seems like it would be easy to structure 'online content' so that it poisons the model with either nonsensical content or offensive content, like what happened with 'santorum'.

courseofaction · 2024-06-29T11:53:34 1719662014

See this in the context of corporate and governmental greed aligning into the impoverishment of the masses. This is part of a general movement towards wealth concentration, and war is the next step.

ChrisArchitect · 2024-06-29T15:26:38 1719674798

[dupe]

Some more discussion: https://news.ycombinator.com/item?id=40826588

operae · 2024-06-29T10:49:44 1719658184

"I think that with respect to content that is already on the open web, the social contract of that content since the 1990s has been it is fair use"

This argument cannot possibly hold in any court. This has not been the 'contract'. I cannot reproduce the content of a newspapers online outlet, I cannot reproduce the art of another artist on Instagram, I cannot reproduce someones Youtube video without permission. This same thing sparked the whole fair-use debate some years ago.

The exceptions to these rules have always existed in limbos of regulatory grey areas and are being discussed for decades now.

This guy is still living in the Napster-era apparently and the amount of gaslighting Microsoft, OpenAI, Google etc. perform right now to freeload on data is presumptuous.

bmacho · 2024-06-29T11:27:57 1719660477

What about windows source code? Is that free real estate too?

keiferski · 2024-06-29T10:40:53 1719657653

I think there's a 99% certainty that almost all serious media websites in the future are paywalled, especially ones that provide information useful to businesses. There is less and less benefit to making your work public.

127 · 2024-06-29T10:50:28 1719658228

"If you don't have enough money to sue us, your copyright doesn't exist."

ozim · 2024-06-29T10:33:24 1719657204

Yeah if I don’t lock my house and someone takes all the furniture it is not a theft……

What a joke of a person. I hope court will roll over them harshly and explain world doesn’t work like that and just because you can do something it doesn’t make it right.

courseofaction · 2024-06-29T13:13:21 1719666801

There are no rules. This is an everyone be damned land grab with every imperialistic instinct justifying itself with bullshit.

Read a history book.

rspoerri · 2024-06-29T09:42:14 1719654134

Soooo... Any downloadable version of Office and Windows are Freeware for my AI to train and spill out the same code again? Kinda cool i think. /s

htthbjk · 2024-06-29T08:43:09 1719650589

[flagged]

WesolyKubeczek · 2024-06-29T08:52:24 1719651144

But it’s not like LinkedIn now likes you to scrape them today. Scraping for me, but not for thee.