Hacker News new | past | comments | ask | show | jobs | submit login
The Mueller Report Can’t Be Copyrighted, Is Flagged by Copyright Bots Anyway (eff.org)
327 points by sohkamyung on April 20, 2019 | hide | past | favorite | 40 comments



While this doesn't seem to be DCMA, a question for the lawyers here on DCMA takedown requests. Given issuing a request require the requestor to make a "good faith" affidavit, if they're reporting something flagged by a bot that is obviously wrong like this if a human had checked, can it be used as evidence of bad faith & committing perjury?


While a false DMCA takedown request can be fined and ultimately punished with jail time, most of the casework on this has clear malicious intent. These are parties that send out a notice just for articles or posts that are critical of said party. In these cases, you had a human on one side knowingly filing a single false DMCA takedown.

http://www.aaronkellylaw.com/consequences-of-filing-a-false-...

These are cases of programmers creating takedown bots with false positives. Do the programmers know that there will be false positives? Yes. Do they not make a good faith effort to prevent those false positives? Probably. Good luck proving this in court though.


This is an excellent example of why @qntm called AI "money laundering for responsibility":

https://twitter.com/qntm/status/1030846375213379584


Given there is a contentID system for explicitly copyrighted works, it seems insane that nobody involved in these systems has seen fit to do the same for explicitly public works.


These systems are good at picking up small portions of copyrighted works within some larger context, but to guarantee that 100% of a YouTube video or scribd document is in the public domain is a different problem.

For published works like the Mueller report, one wouldn't need contentID. Matching the document's hash would suffice.


A hash would still only work for the exact document. Suppose someone uploads a new version with a better table of contents, or with some added annotations [0]; now your hash no longer matches, while contentID probably will.

[0] Both things which could be copywrited, but I'll assume will not trigger the actual flag.


these systems seem to have no concept of non-copyrighted works which is a fundamental flaw


There's no profit motive.


There is an excellent profit motive, it is just not as directly obvious as it is mostly realised in money not spent and humans seem notoriously bad at preferring small gains over large savings.


Can you clarify the profit motive you're referring to?


Not doing it incurs a fairly large cost across many areas of the economy. Time has to be spent by people faffing around with this nonsense that creates no wealth economically and this essentially creates an overall drag on profitability across most of the economy, parasitically favouring only small sectors such as legal services.

The trouble is, this is a group profit motive and people seem to think of profit motive mostly from an individualistic and zero-sum perspective.


There's a reason people only think of it from an individual and zero-sum perspective - that's the aspect that generally effects changes/results.


I think you may have that back to front.


Sorry - I think the tragedy of the commons would like to disagree: https://en.wikipedia.org/wiki/Tragedy_of_the_commons


The issue seems to be skewed incentive towards false negatives vs false negatives. If leaving false positives (materials being taken down based on a false claim) give much lower risk to the platform than false negatives (materials that should be removed but has not been removed), they would naturally prioritize reducing false negatives.


How about some sort of legal punishment against scribd/youtube etc for false positives?


Who do you think Scribd are? A government agency? They're a private-sector business. If they don't want to host your document for a good reason, a bad reason, a reason you don't agree with, a mistaken reason, a false reason, or no reason at all, then that's their decision (except for protected categories for the purpose of discrimination, which do not apply here.)

They’re not stopping you copying anything - they just don’t want to host documents they’re not sure of the status of in their automated system.

Host your own documents if you don't think they’re right.


These companies aren't violating the law, at best they are violating their own policies in regards to the content they host.


Who defines what is a "violation" of their "own" policy? Or what the policy is exactly? Are there no repercussions if their "policy" is not directly reflecting the goals and intentions of original copyright law?


If you read the terms and services agreement for almost any company you'll find you agree to hold them faultless in these sort of issues. And you'll likely find that you've also agreed to arbitration in the handling of these matters.

I'm not sure though how you feel they are violating copyright law, can you go into more detail on that?

Because they basically have the right to remove any content they host at anytime, regardless of if it is copyrighted or not.


isn't the problem here that there is a punishment for not removing copyrighted material. so, anytime some bot says that there is X% chance that this is copyrighted, it's safer to just remove the material.

And these are private companies, not public institutions, they are not letting you upload stuff because of some common good, they do it because it's their business model, why should they let you upload something that has a 0.1% chance of being copyrighted by someone that could then demand money from them.


Curious to see Scribd mentioned. Last time I used anything on Scribd was a decade ago to embed some PDFs that should have been web pages really.

Then they made it all paywalled so that was that.

I think their business has 'mySpaced' and fundamentally does not go with how people consume content. Nobody wants a PDF these days.

So why are they mentioned now?

Well, the report is wrong!!! It is a scanned document. So it is not digital. They could have written it with HTML5 but they are stuck in the past, twenty years out of date. PDF was okay back then but not now. Scanned images in a PDF are not accessible. You can't search it.

People who like PDF because they used it for legal documents back in the day have their arguments about why to use PDF. But if this report came from a single URL and was a tenth the size in HTML then nobody would have problems determining if the page was genuine or a fake - it would be in the URL.

Nobody would need to disperse copies of it over the internets. Just the one URL would do.

Really I think that this Mueller chap and the whole freaking government needs to be sacked and prosecuted for not making their work accessible. It is important to democracy.


You'd have to show a harm.

It's about as bad as the mailman losing your letter. So what? If it was important, you would have insured/certified it.


> It's about as bad as the mailman losing your letter

No it's not. If mailman loses your letter 1 person won't be able to read what you wrote to him. If youtube blocks your channel nobody will, and if you depend on youtube for money your business defaults.

> If it was important, you would have insured/certified it.

There's no way to insure against copyright strikes as far as I know.


Well, the EU has just passed the Copyright Directive. That makes reading such articles both funny and terrifying.


Is Scribd under any legal obligation to display content that users upload? I cannot find any in the Terms of Use.

I did find some terms that allow Scribd to remove content for any reason and terms which purport to limit any remedy for such removal to "Don't like it? Then don't use it."

12.1 Scribd. You agree that Scribd, in its sole discretion, for any or no reason, and without penalty, may terminate any account (or any part thereof) You may have with Scribd or Your use of the Scribd Platform and remove and discard all or any part of Your account, User profile, and any content, at any time and without notice to You.

12.2 You. Your only remedy with respect to any dissatisfaction with (i) the Scribd Platform, (ii) any term of these Terms, (iii) any policy or practice of Scribd in operating the Scribd Platform, or (iv) any content or information transmitted through the Scribd Platform, is to cancel Your account and stop using Scribd.

https://support.scribd.com/hc/en-us/articles/210129326-Gener...

7. Removal of Content. Regardless of which purchase option You have selected, Scribd reserves the right to modify or withdraw at any time any Scribd Commercial Content from access by You at the request of its publisher or for any other reason.

https://support.scribd.com/hc/en-us/articles/210129486-Scrib...


I don’t understand why people use something like scribbd to host a PDF a few megs in size and then complain about takedowns.

Host yourself. The operational and BW cost for hosting static content is ridiculously low.


I believe that the particular report was 139MB of scanned printouts of a redacted PDF. Depending on your site hosting, I imagine it could actually become somewhat expensive if it turns out to be popular.


This. 10,000 downloads would already be a terabyte of bandwidth


Which is <$100 even using something with overpriced bandwidth like EC2.

And there is always BitTorrent if you don't even want to pay that.


We need a reform. One that would remove the idea that one can collect royalties from the work. Royalties are incompatible with the way information is disseminated.

Authors don't collect royalties. Absolute majority of authors don't live of royalties. Lets just kill this monstrosity.

If your business model comes with huge externalities for the society as whole... your business model must go.


Can we not turn this against the record industry's own videos? Genuine question.


1) Doing this purposely seems like a good way to get on the wrong end of some legal process, and is exactly the sort of thing that would end up being somehow fixed in a way that only works for Hollywood and not anybody else.

2) The major studios don't really use YouTube, and Netflix and Hulu and Spotify don't use ContentID. If the entire site was destroyed along with all of their smaller competitors that do use it, they would only celebrate.


Scribd is a massive piracy site that should be sued into oblivion if there is any justice. Only a very small amount of content there is uploaded with the owner's permission.


It's almost always inferior to any other way to distribute a legit free PDF.

Which is honestly nothing against Scribd--there's just no point to using an e-book service to distribute a file every modern browser can open natively.


> it is impossible for a copy of the Mueller report to be infringement since it cannot be copyrighted

This is specious reasoning. The report could contain an independent copyrighted material, like a previously published poem verbatim, "used with permission". So copies could be automatically flagged for infringement of that material.

The "public domain" document is tainted.


I am not a lawyer - I would appreciate it if a lawyer could confirm or reject my understanding - I think it's actually the opposite (but take this with a grain of salt):

This would be a derivative work of the material it is quoting. Derivative works aren't protected by both the original copyright and the copyright of the person who prepared the derivative work, only the latter. Since the US government does not gain a copyright on derivative works that means the entire derivative work is in the public domain. You can make copies of it, distribute it, prepare further derivative works (such as cutting out everything except the poem! Though maybe not a good idea to test that plan), etc.

The only exception to this would be (I think) if the report was created in violation of copyright law, but there's no way that this would not be found to be fair use. All the standard factors are in it's favor. There is a strong public good in it's favor.


Firstly not a attorney, but have spend the last several years working hand and hand with a large number specifically related to these issues.

My take away is as follows

Including copyrighted material in its entirety and original form in a different work does not make it derivative. It's original copyright stands.

A quick requirement for derivative work.

The derivative work must demonstrate transformation, modification or adaptation of the original work must be substantial and bear its author's personality sufficiently to be original and thus protected by copyright.


If that were the case, reproducing the quoted parts of the copyrighted material embedded in a complete reproduction of the public-domain report would almost certainly fall under the fair use exception. The law is written to be interpreted by reasonable people, and computers are neither reasonable nor people — this sort of automatic detection should only be used to trigger a human review and not directly make enforcement actions.


I don’t think the Mueller team would break the law in the publishing of their own report.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: