It seems to me that the content identification works as advertised: it's the metadata that is missing essential information due to inadequate process.
If Youtube knew that the author is Bach and not Sony it would know not to flag the soundtracks featuring Bach songs.
However, the problem is that at some point Sony lied to Youtube and claimed ownership of these songs even though they are in public domain. So there should be reviews or penalty for flagging copyright on someone else's works.
Sony could claim they have copyright on the very Bach recording they uploaded themselves but if content id system can't differentiate between Sony's Bach and anyone's Bach, then for practical purposes there probably aren't much basis for the recording to be eligible for a copyright by Sony.
Not just that, the AI also can't recognise fair use.
The only legitimate (read: non-overbearing, not publishing individual creators by erring on the side of registered rights holders) implementation of this process would be if the registered rights holders were merely notified (with an easy option of forcing a takedown or demonetization) of every potentially infringing upload but the uploads themselves were not demonetized or otherwise punished.
Automated content filters will always be more restrictive than the letter of the law if they have to make automated judgments and the publishing is time sensitive.
That said, Content ID basically allows rights holders to tap all the power of a DMCA takedown without the legal risk of committing a felony by filing an illegitimate claim.
> However, the problem is that at some point Sony lied to Youtube and claimed ownership of these songs even though they are in public domain. So there should be reviews or penalty for flagging copyright on someone else's works.
There is no technical solution to this problem because it is not a technological problem. It is a social problem. Fascinatingly, there is a great social solution to this problem:
* Monetary penalties - asserting an incorrect ownership should create a very strong bite against the entity making such assertion. Make it $500k per incident. If after 10 people upload a slightly different Bach performance that they themselves did Sony's bank accounts decreases by 5 million the erroneous assertions would as if by magic go away.
Huh? If Youtube’s black box AI can’t tell the difference between recordings by Glenn Gould and Murray Perahia they should be equivalent in the eyes of copyright law?
Yes! If you can't tell a difference between a true Mickey Mouse and its (slightly different) clone, and yet the other is not allowed by law due to being not sufficiently different work, then I would say the guys who just aren't sufficiently different from Bach should not be subject of copyright.
The truth is, copyright on interpretation is about the same category as copyright on a work of a plumber in my apartment. I understand that's how some people make money, though, so it's painful to change.
OK. We need to back up here as there are some conclusions being rendered that don't quite hold up.
For music, there is more than one copyright. There is the copyright on the original composition. There is the copyright on the modification of the original composition (for example to use modern notation, etc). There is the copyright on the recording.
Sony has a copyright on some recordings of Bach music. They may also have a copyright on modifications of the original compositions, but I don't know if they do or not. They do not hold a copyright on the original composition, because that is in the public domain.
It is difficult to record classical music without infringing copyright because the modern printed scores are all modified. You can use the original scores, but they are actually very difficult to understand and are akin to a different language. In fact, as far as I can tell this is the reasoning behind allowing a copyright on the modern scores -- they are essentially translations (and translations of creative works are allowed new copyrights).
There are some scores that are both modern and in the public domain. Kimiko Ishizaka has been doing some good work on this front with her "Open Source Bach" series (and you can even get unmixed versions of her recordings under a CC license!!!). However, initiatives on this front are few and far between (consider throwing some money her way as she is worth supporting).
If you have a score for which you have a license to record (or which is in the public domain), then you can make a recording of it. It doesn't matter one bit if it sounds like a recording of a Sony recording. However, it is quite difficult to find such scores and often performers do not know enough about copyright law when they start doing their own performances for Youtube.
Also, for all those classics it's not that hard to find 100-150 year old editions which, while they may have had a separate copyright in their time, likely also fell into the public domain in the meantime (depending on if it's personal copyright or corporate, and if the former, how long the editor is already dead)
IMSLP contains thousands of scores in the public domain. I've never so far had to pay for a score for a public domain song because I found them all on IMSLP.
Not a musician here, and hadn't thought of that. And I don't recall that recent coverage of this re YouTube and Facebook have addressed copyright on scores vs on recordings.
I mean, how would one even do that, based just on recordings? Can experts tell what score was used for a particular performance?
Definitely. There are some youtube video showing the score of the music played along. Sometimes you can definitely notice divergences: the video editor took some random score of the same piece, and not the score used by the musicians!
Actually, there are software (not excellent AFAICS), and humans who just transcribe the played music into new scores. This may be easier to do that to take the original scores and modernize them, but then of course, this would be derived work of the played music...
I had always assumed that some version/edition of "foo" by "bar" (from the author, I mean) is always the same series of notes, with the same timing, played in the same way, etc. Or at least ideally. I knew that instruments differ. And that performers have different styles. And that there are parts of pieces that basically say "improvise here". And that performers occasionally make mistakes, or even consistently make some particular mistakes.
But it never occurred to me that there are variously derived versions. I mean, how can it be referred to as "foo" by "bar", when it's actually a distinct version. Indeed, distinct enough that it can be copyrighted. At that point, isn't it really "baz" by Sony (based on "foo" by "bar")?
A score is a very rough and incomplete model of the composition a classical composer had in his mind.
As Mahler said, the essential things of music are not in the score. Timing, articulation, phrasing, dynamics can make a piece sound totally different. The words alone are not the performed poem.
Non-musician here. Is it possible to create modern scores from the originals? I can engrave using Lilypond or Frescobaldi for choral music to make it print cleaner and super crisp. Is it that hard? Please explain...
Yes you can, and I believe Kimiko Ishizaka is using Musescore. I am not a musician either, but my understanding is that very old scores are hard to read. I don't know if the notation is different or if there is some other problem. It may even be that it's just hard to get your hands on a copy of the old scores that are not under copyright. This blog post describes one of the problems: https://www.kickstarter.com/projects/293573191/open-goldberg...
I was trying to find some links describing the difficulties of transcribing old scores, but unfortunately my google-fu is not up to the task. Bottom line, I don't know how difficult it is, but my understanding is that it isn't trivial.
I've got some experience playing from the original published editions of various pieces. If you want a visual feel for the differences go have a look at IMSLP's scans of period notation against modern editions their contributors have prepared (many of which aren't the pinnacle of modern music engraving, but are still way, way easier to read).
There are a number of reasons for why it's preferable to play from a modern edition if you're a modern musician. While we can of course learn to handle the older notation, we're battling with two different factors - firstly, the printing technology (or handwriting, if we're playing from copies of the composer's autograph score). As one might expect, music printing in the 17th century was nowhere near as clear or easy to read as modern editions can be, because of technical limitations coupled with various ideas they just hadn't had yet. And you get some features, like beamed quavers, which show up in handwritten music before the printers could do them. I'm pretty sure most of the originally printed music was intended to be studied and memorised rather than played at full speed from notation as we expect a modern orchestra to do. Musicians employed in ensembles at various courts around Europe would have been expected to do this, but they would've had handwritten music possibly prepared by the composer themselves (who would also be working for the court) for that occasion. And of course they would have been familiar with the musical handwriting conventions of their era.
But really the thing that gets you is that written musical language has evolved over the centuries. So if you go back and try to play from a facsimile of an original edition you're likely to run into all sorts of fun things, such as:
- music from before the invention of bar lines has no bar lines, and it's amazing how hard it is to learn to play without them these days
- accidentals didn't used to mean that it applied for the rest of the bar, as it does today, partly because they were invented before bar lines were invented
- accidental symbols themselves aren't the same as they once were
- the convention for notating key signatures has changed
- the convention for notating time signatures has changed
- because ledger lines are hard to write by hand neatly, and hard to print as well, there was a much greater diversity of clefs to allow parts to fit more comfortably within the five line stave. Modern musicians are used to playing from one or maybe two clefs on their instrument of choice. Some baroque concert programmes probably took their musicians through three or four of them on the same part between pieces or movements. To read these you need to understand what clefs actually indicate and be able to unmoor your brain from the fixed idea of what the bottom line of the stave represents.
- ornamentation marks have changed, and composers made their own up anyway
- performance conventions have changed (although modern editions generally don't go too far in putting those conventions into the notation because they're way too messy to notate even today, some of them do talk about them in the preface material)
- a lot of music from the time is full of mistakes, which can often be identified by comparing multiple sources of the same piece
A modern editor preparing a new edition of an old work will be rewriting the notation to modern conventions, adding bar lines, reworking key signatures, changing the clefs, fixing mistakes, adding explicit markers for what would have been implicitly understood accidentals in some styles, maybe changing ornamentation marking to something more readily understood by a modern musician... it's a big job.
Which is why the copyright in performance of these editions is defended by their publishers, because someone had to pay for all of that. If you want a copyright-free recording you have to go back to the originals yourself, as well as finding willing musicians and engineers to record it.
And after all that work... the layman, and the lay-algorithm, probably won't be able to tell them apart anyway.
The expense is the reason publishers defend copyright. Technically, what matters for copyright is not the total cost of effort, but the creative work involved.
Aren't most of those changes pre-20th century while anything published before 1922 would have an expired copyright? You wouldn't have to go all the way back to the original to escape copyright, you would just have to go back 96 years.
Wow, great post. This is all very interesting, I love music history. Do you have any recommendations of good books on the subject of evolution of music, notation, compositions etc?
mathw explained most of it well. Read his reply first. One other factor he seems to have missed is that finding the original is a problem.
In some cases they were destroyed. All we have are copies which are already different from each other with no indication of which is most right.
In others they are locked in some library that won't let you look at it because you might make a copy and they want to hide the original for some reason.
In others the composer edited the score after the first performance and we have both. Now you get to choose which edition to use.
That is not how copyright works with music. There are separate copyrights for written the music (the rights to reproduction of the song) and a particular recording of the song.
If I write a song I own the rights to that song. If I allow Taylor Swift to record it, she owns the rights to her recording of it. She controls how that recording is reproduced, but I can still allow other people to make their own recordings.
So, even though the rights for Bach songs have passed into the public domain, the people who make recording still own the rights to their recordings of the song, just not to the song itself.
> copyright on interpretation is about the same category as copyright on a work of a plumber in my apartment
I'm going to guess you're not particularly into Bach, if you're going to make a statement devaluing classical music like this. The interpretation of the performer has always been hugely important. Nobody gets into Bach's music and then listens to MIDIs of it.
In fact those customers care about the interpretation a lot. They know they're going to get an exaggeratedly soothing and smooth version, and that's exactly what they're paying for.
The Sony/Google problem seems to be more that the AI is clever enough to recognise a piece, but neither Google's developers nor the AI were clever enough to understand that different interpretations are important.
This isn't artificial intelligence, it's artificial bureaucracy - the automation of procedure for corporate benefit without any understanding the domain.
The truth is, copyright on interpretation is about the same category as copyright on a work of a plumber in my apartment
Do you believe this is true for all covers and interpretation of all music or just classical music. Does this extend to other works of art as well? Does a play interpreting a novel or a novel based on a movie deserve its own copyright?
Constructions like this are living proof that maybe the system we've decided to implement isn't as smart as we thought it was. Humans have a knack for thinking they can control things they don't actually understand -- and we end up with contradictions like this situation. I'll leave these here:
> I am not an advocate for frequent changes in laws and Constitutions. But laws and institutions must go hand in hand with the progress of the human mind. As that becomes more developed, more enlightened, as new discoveries are made, new truths discovered and manners and opinions change, with the change of circumstances, institutions must advance also to keep pace with the times.
-- Jefferson
> That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness.
> ...proof that maybe the system we've decided to implement isn't as smart as we thought it was.
I'd say it's a lot simpler than that - it's proof that systems that don't penalize false positives will have lots of false positives.
I mean, imagine if YouTube announced that content owners would be subject to the same "three strikes" policy as uploaders, and music companies could be kicked off the site for making baseless copyright claims. One suspects that the dumb systems you're talking about would get a lot smarter.
What you suggest would amount to biting off the hand that feeds you, so no, ain't going to happen. The system isn't dumb, it's working as intended - making people in control of it money.
That was Thomas Jefferson oriiginally actually. It was also intended as a soft limit to the number of laws that would be passed, as it would ensure that only the most important would generally be reaffirmed.
Our (assuming US) federal legislators can barely pass a budget. At best they would simply rubber stamp the existing laws. More likely expiring laws would be used as political hostages.
Granted, that is pretty much the existing status quo. Not sure it would change the behavior in a positive direction.
>every law should come with an expiry date. So that there is debate before it is renewed.
[anti]dovetails with the Kavanaugh confirmation hearings and the fears that the important established precedents (Roe, Brown, etc) will come up for debate/overturn soon once he(or similar guy) gets the position.
Btw, Constitution is a law too. Should it have an expiry date too?
> Btw, Constitution is a law too. Should it have an expiry date too?
What happens if it's allowed to expire? Does the government just disband, and the country allowed to turn into Somalia while all the existing powerful organizations try to form their own government?
>Btw, Constitution is a law too. Should it have an expiry date too?
Yes? Why would one assume an 18th century framework should be preferred to one considering the modern world? Codifying our ideals seems like a better system than entrusting them to precedent.
> Btw, Constitution is a law too. Should it have an expiry date too?
Not necessarily. A constitution is a foundational law, a set of principles by which the nation is to be governed. Those principles are not expected to change very often, so no expiry date is needed. The detailed implementation of these principles, however, could - and probably should - be done with an expiry date.
Ideally, yes. But some constitutions end up being dumping grounds for laws where the amendment process was considered preferable to that of statutes (e.g. if a constitutional amendment requires approval by popular vote and therefore can't be repealed by legislators alone). The Alabama constitution is infamously over 40 times the length of the US constitution and contains 900+ amendments.
Headline seems demonstrably untrue [0]. Better framing should be, "contentID incorrectly identifies YouTube sample as being part of a totally legally copyrighted recording of a Bach work". (Bracketing for now the rightness or copyright at all, I suppose)
Been there. I was, and still am, surspised that we allow for automatic algorithms to do copyright claims. non-free algorithms deciding about the faith of real people sounds like the stuff from dystopian scifi novels, but that is what we already allow to happen in real world.
I'm curious, how much bandwidth does a popular video take, assuming its encoded?
I picked a random Youtube video that was 5 minutes long, downloaded it in 1080p, 53MB in size. The video got 100,000 views, which takes 5TB to transfer. I'm not sure about other cloud platforms, but on DigitalOcean (I haven't found any reputable hosting thats cheaper) the transfer cost is $0.01/GB, so it would cost $53 dollars, which doesn't include CDN. If you made ate least $0.0005 per video watched, you would make up your bandwidth costs. You can probably 20x this price for CDN/other cloud providers, so if you were making at least a penny per view, you would break even.
Some simplifications made:
1) Didn't include peak bandwidth/if the pipe is big enough at peak hours.
2) Assumed all viewers watched the video from beginning to end, instead of stopping 20% of the way through.
Peer to peer driven distribution is underutilized. It allows indivuals to reach millions. However, if you make your money with advertisements, peer to peer distribution does not make sense.
Yes, it costs money to give things away for free. Now you can have YouTube do it, and play by their rules which help them keep costs low... or you can do it yourself.
Personally, I would put up a snippet, the serve a torrent. If it gets popular, its popularity will support it. Otherwise, I can support my hobby with a seed box for 15 a month.
Is there any penalty for being overzealous regarding the protection of your copyrights? I understand that individuals could litigate, but I would assume that to be a long winded process.
Filing a DMCA takedown notice for stuff you do not own copyright for is perjury. (Filing one that ignores fair use considerations does not fall under that category, though).
Youtube's Content ID system is not related to DMCA takedowns, though, so there is no legal repercussions to making fallacious claims outside to violating whatever terms of service the system has.
Almost perjury. There's a good faith clause, so if someone believes in good faith that their magic black box AI correctly identifies their IP and never issues incorrect takedowns, it's not perjury.
The good faith clause does not apply to the condition I mentioned. Good faith applies to believing that the material in question infringes your copyright; it does not apply to believing that you actually own copyright on the allegedly infringing copyright.
17 USC §512 (c)(3)(A)(vi): A statement that the information in the notification is accurate, and under penalty of perjury, that the complaining party is authorized to act on behalf of the owner of an exclusive right that is allegedly infringed.
You wrote "Filing a DMCA takedown notice for stuff you do not own copyright for is perjury" which is ambiguous -- does "stuff" mean the allegedly infringing artifact (not perjury), or the allegedly protected work (perjury)?
If only a judge would rule that such faith in the magic black box is no longer 'good' once it has made, say, 3 mistakes. At that point the user of the AI should understand that it isn't perfect and be obligated to manually check all future results.
You don't understand the purpose of these systems if you believe accuracy is that important. They're designed to pacify copyright owners, to prevent lawsuits against the platform and keep licensed content on the platform... the heavy-handedness and opaqueness are intentional, as is the bias towards false positives.
Content ID is not a DMCA system. The DMCA system is manual.
Content ID is a private system, operated by YouTube, to improve Google's liability risks and relationships with major rights holders. It is not regulated by the DMCA. It is not supposed to be regulated by the DMCA. Google decided to give some companies the right to do the things Content ID does. And Google does that because it saves them money on running a free hosting service and people get what they pay for in that regard.
There is no logic that a judge would touch this scenario. The only argument is that YouTube has near monopoly status, but that isn't an argument for regulating content ID, it's an argument for breaking up YouTube.
Not quite. Sending a DMCA notice on behalf of someone that you do not actually have authorization to send DMCA notices on behalf of is perjury. There is no penalty at all for sending a DMCA notice against something that is not actually the work you say you are sending the DMCA notice for.
That's a hard problem to solve. So every content-id system out there hacks around that - by not solving it, and merely pretending that whoever claims the rights first actually has them. Convenient for the platform and whoever has deeper pockets for lawsuits.
On the other hand YT is full of various Peppa Pig videos (inverted colors, mirror images and all kinds of other weird transformations). YouTube's algorithm apparently can't detect such copyright violations. And the really annoying part is that these videos often rank higher than the original ones.
I have to wonder - isn't the response to this kind of ridiculousness to simply not post on YouTube, for example: http://allofbach.com/en/ ? Yes it's not going to get you the audience YouTube might, but if their platform isn't serving legitimate artists then what good is it anyway?
> I have to wonder - isn't the response to this kind of ridiculousness to simply not post on YouTube
Apparently not. From the article:
In one week, the European Parliament will vote on a proposal to force all online services to implement Content ID-style censorship, but not just for videos -- for audio, text, stills, code, everything.
That's part of it. And more generally with WebRTC, peers know whatever you're doing together. And that's an issue with all P2P stuff that doesn't use overlay networks to proxy connections.
So what I should have said was that it's OK in a VM that reaches the Internet through a nested VPN chain, or whatever, so your ISP-assigned IP address isn't trivially discoverable.
That's... interesting... given that as far as I know the technologies Google uses in this space are proprietary. Even if "all online services" could possibly implement such a system, it's likely they could only do so by committing copyright or patent infringement, if not both.
I simply don't understand how this isn't ordering the impossible.
Eyeballs, basically. The eyeballs are on YouTube already, so if you want an audience and you're off YouTube, you're fighting an uphill battle.
But classical music, sad to say, largely is a niche pursuit. Niche content can always find a home somewhere and users will seek it out.
And I haven't followed the news closely, but if the EU is indeed going to mandate Content-ID style systems, theoretically these niche sites that don't have one will be under some level of legal and regulatory exposure? Not quite sure how any of that would work in practice.
It's not like just anyone can put a video on AllOfBach. That site is entirely the Netherlands Bach Society's project.
If you're saying everyone should make their own AllOfBach... the website clearly has taken tons of development effort, its first iteration was unusably difficult to navigate, it's still not great at accessibility, and if I wanted a portable version of their recordings I have even fewer options than I would with YouTube.
I'm glad AllOfBach exists but there's a reason people want to stick to the music services they are already familiar with.
Yes I’m saying that when you play in a walled garden, you should expect trouble whereas if you publish in your own (ie the whole original point of this www thing) then you play as an equal partner.
Maybe it is a generational thing, as I’ve always and continue to find YouTube nothing more than a cesspool and i would never think to go there first for something
The last time I reviewed copyright law, fraudulently claiming ownership of someone else's work is a very big deal. This is something that can be handled with a class action lawsuit.
That's an empirical question, not theoretical, so it's "has been", not "can be". Interesting question. Like the Bust Beaver function, apparently unknown.
The government keeps introducing new laws all the time. The system is becoming bloated and exceedingly restrictive. Maybe they should focus on removing old laws.
Or at least focus on adding more laws to restrict the rights of corporations instead of people.
If Youtube knew that the author is Bach and not Sony it would know not to flag the soundtracks featuring Bach songs.
However, the problem is that at some point Sony lied to Youtube and claimed ownership of these songs even though they are in public domain. So there should be reviews or penalty for flagging copyright on someone else's works.
Sony could claim they have copyright on the very Bach recording they uploaded themselves but if content id system can't differentiate between Sony's Bach and anyone's Bach, then for practical purposes there probably aren't much basis for the recording to be eligible for a copyright by Sony.