Hacker News new | past | comments | ask | show | jobs | submit login
Open-sourcing AudioCraft: Generative AI for audio (meta.com)
906 points by iyaja on Aug 2, 2023 | hide | past | favorite | 319 comments



> MusicGen, which was trained with Meta-owned and specifically licensed music, generates music from text-based user inputs, while AudioGen, which was trained on public sound effects, generates audio from text-based user inputs.

Meta is really clearly trying to differentiate themselves from OpenAI here. Open source + driving home "we don't use data we haven't paid for / don't own".


This is purely a function of everyone remembering the RIAA's decade-long campaign to prevent people from taking the music they had rightfully stolen. As far as I'm aware LLaMA was trained on "publicly available data"[0], not "licensed data".

Furthermore, MusicGen's weights are licensed CC-BY-NC, which is effectively a nonlicense as there is no noncommercial use you could make of an art generator[1]. This is not only a 'weights-available' license, but it's significantly more restrictive than the morality clause bearing OpenRAIL license that Stability likes to use[2].

[0] https://github.com/facebookresearch/llama/blob/main/MODEL_CA...

[1] https://github.com/facebookresearch/audiocraft/blob/main/LIC...

[2] These are also very much Not Open Source™ but the morality clauses in OpenRAIL are at least non-onerous enough to collaborate over.


My understanding (IANAL) [1] is that copyright licenses have no say on the output of software. Further, CC licenses don't say anything about running or using software (or model weights). It's therefore questionable whether the CC-BY-NC license actually prevents commercial use of the model.

[1] https://opensource.stackexchange.com/questions/12070/allowed...


You're correct, but no one has had the balls (or the lawyers) to clarify this in court yet. Expect to see hosting providers complying with takedown requests for the foreseeable future.


Hosting providers *have* to comply with takedown requests to maintain safe harbor.


I don't remember the details (or outcome) but there was a lawsuit a few years ago involving CAD or architecture software and whether they could limit how the output images were used because they were assemblages of clipart that the company asserted were still protected by copyright. Something like that. A lot of "AI" output potentially poses a similar issue, just at a far more granular level.


You're wrong because software, as you describe it, includes the "cp" command which creates a perfect copy.


As sibling noted, we’re talking about the impact of a software’s license on use of its output.

I suppose your point would stand if the software were a quine?


The copyright license of the cp code itself has no bearing on the copyright of what you produce (well, copy) with cp.


That's not the point they're making. They're replying to their parent comment.


> MusicGen's weights are licensed CC-BY-NC, which is effectively a nonlicense as there is no noncommercial use you could make of an art generator

How do you figure? Have you never just...made stuff to make stuff?


In copyright law the use of the work itself is considered a commercial benefit, so "noncommercial use" is an oxymoron. Consider these situations:

- If I use AudioCraft to post freely-downloadable tracks on my SoundCloud, I still get the benefit of having a large audio catalog in my name, even if I'm not selling the individual tracks. I could later compose tracks on my own and ride off the exposure I got from posting "noncommercially".

- If I run AudioCraft as a background music generator in my store, I save money by not having to license music for public performance.

- If I host AudioCraft on a website and put ads on it, I'm making money by making the work available, even though I'm not charging a fee for entry.

I suspect that a lot of people reading this are going to have different arguments for each. My point is that if you don't think that all of these situations are equally infringing of CC-BY-NC, then you need to explain why some are commercial and some are not. Keep in mind that every exception you make can be easily exploited to strip the NC clause off of the license.

If you're angry at the logic on display here, keep in mind that this is how judges will construe the license, and probably also how Facebook will if you find a way to make any use of their AI. The only thing that stops them from rugpulling you later is explicit guidance in CC-BY-NC. Unfortunately, the only such guidance is that they don't consider P2P filesharing to be a commercial use.

So, absent any other clarifications from Facebook, all you can do without risking a lawsuit is share the weights on BitTorrent.

EDIT: And yes, I have made stuff just to make stuff. I license all of that under copyleft licenses because they express the underlying idea of 'noncommercial' better than actual noncommercial clauses do.


This is a weird comment.

Do you think that non commercial use simply doesn't exist or something?

Because non commercial use isn't some crazy concept. It is a well established one, that doesnt disclude literally everything.

Also, you are ignoring the idea that Facebook will almost certainly not sue anyone for using this for any reason, except possibly Google or Apple.

So if you aren't literally one of those companies you could probably just use it anyway, ignore the license completely, and have zero risk of being sued.


The issue with “non commercial” is that no, it’s not well established. Licenses with a NC clause are so problematic to be practically useless. If you just want to use something at home privately you don’t need a CC license… a CC license is for use and redistribution.

http://esr.ibiblio.org/?p=4559


What about playing the music in a government building as elevator music, for example?


>If you just want to use something at home privately you don’t need a CC license… //

I presume you mean in USA, because in UK you don't have a general private right to copy. Our "Fair Dealing" is super restrictive compared to Fair Use.


Funnily enough in the UK they actually tried to fix this. The music industry argued that the lack of a private copying levy made legalized CD ripping into government confiscation of copyright ownership... somehow. The UK courts bought this, so now the UK government is constitutionally mandated to ban CD ripping, which is absolutely stupid.


I knew CD ripping got reversed but not the arguments against it, definitely stupid as not giving a monopoly is not the same as confiscation (seems like a very straightforward reasoning). No doubt done Tory got a 'management consultancy' gig with the RIAA from that one.

I like that it makes software like iTunes contributory infringers for enabling mass copyright infringement.


I miss that blog. It was a little crazy and the comments were a flame war shitshow, but man it was fun to read sometimes. Even if I vehemently disagreed, it got me thinking.

Whatever happened to esr? Did he just get too paranoid and clam up?


Noncommercial use is not well established in copyright law, which is the law that actually matters. I know other forms of law actually do establish noncommercial and commercial use standards, but copyright does not recognize them.

As for "Facebook won't sue"? Sure, except we don't have to worry about just Facebook. We have to worry about anyone with a derivative model. There's an entire industry of copyleft trolls[0] that could construct copyright traps with them.

Individuals can practically ignore NC mainly because individuals can practically ignore most copyright enforcement. This is for the same reason why you can drive 55 in a 30mph zone and not get a citation. It's not that speeding is now suddenly legal, it's that nobody wants to enforce speed limits - but you can still get nailed. The moment you have to worry about NC, there is no practical way for you to fit within its limits.

[0] https://www.techdirt.com/2021/12/20/beware-copyleft-trolls/


Commercial vs Noncommercial use is well established in copyright law - in everything from Final Rule Regarding the Noncommercial Use Exception to Unauthorized Uses of Pre-1972 Sound Recordings https://www.copyright.gov/rulemaking/pre1972-soundrecordings... to Noncommercial webcasters https://www.law.cornell.edu/uscode/text/17/114#f_4 to Fair Use.

Noncommercial licenses are taken up in "GREAT MINDS v. FEDEX OFFICE AND PRINT SERVICES, INC 886 F.3d 91 (2nd Cir. 2018). Thé court explains they are enforceable and are basically just a category of contract. So, as long as the contract is clear, it’s probably enforceable.


> Noncommercial use is not well established in copyright law, which is the law that actually matters.

No, for “NonCommercial”, what actually matters is the explicit definition in the license.


> My point is that if you don't think that all of these situations are equally infringing of CC-BY-NC, then you need to explain why some are commercial and some are not.

What “NonCommercial” means in the license is explictly defined in the license, and if you think either those examples, or more to the point, every possible use ever so as to render ‘NonCommercial’ into ‘no use’ as you have claimed, you need to make that argument, based on the definition in the license, not some concept of what might be construed as commercial use by general legal principles if the license used the term without its own explicit definition.


Is listening at home a violation of NC? That's what I've interpreted as its intent.


> if you don't think that all of these situations are equally infringing of CC-BY-NC, then you need to explain why some are commercial and some are not. Keep in mind that every exception you make can be easily exploited to strip the NC clause off of the license.

You're right: those are all equally infringing CC-BY-NC. I don't see a problem.


What's your evidence for this bit?

> this is how judges will construe the license


I think the key word there is "noncommercial".


Yes, but you can easily make noncommercial use of an art generator.

Obviously, you can't host a commercial art generation service with a noncommercial-use license, and (insofar as art produced by a generator is a derivative work of the model weights, which is a controversial and untested legal theory) you can’t make commercial art with a noncommercial license, but not all art is commercial.


"Noncommercial art" is not a thing in the eyes of the law. Even if you don't intend to make money the law still considers the work itself to be commercial. That's why CC-BY-NC has to have a special "filesharing is non-commercial" statement in it, because people have made successful legal arguments that it is.

You're probably thinking of "not charging a fee to use", which is a subset of all the ways you can monetize a creative work. You can still make money off of AudioCraft by just hosting it with banner ads next to the output. Even a "no monetization" clause[0] would be less onerous than "noncommercial use only", because it'd at least be legal to use AudioCraft for things like background music in offices.

[0] Which already precludes the use of AudioCraft music on YouTube since you can't do unmonetized uploads anymore


> “Noncommercial art” is not a thing in the eyes of the law

The definition of “NonCommercial”, the oddly capitalized term of art in the license, is not a matter of general law, it is a matter of the license, which defines it as “not primarily intended for or directed towards commercial advantage or monetary compensation. For purposes of this Public License, the exchange of the Licensed Material for other material subject to Copyright and Similar Rights by digital file-sharing or similar means is NonCommercial provided there is no payment of monetary compensation in connection with the exchange.”

> Even if you don’t intend to make money the law still considers the work itself to be commercial.

Even if you do make money, if the use is “not primarily intended” for that purpose, it is "NonCommercial" in the terms of the license.

> That’s why CC-BY-NC has to have a special “filesharing is non-commercial” statement in it, because people have made successful legal arguments that it is.

It has the filesharing term in it because it permits that particular exchange-of-value as a primary purpose.

> Even a “no monetization” clause would be less onerous than "noncommercial use only"

How would a clause that prohibits monetization entirely be less onerous than one which prohibits it only as the primary intent of use?

> it’d at least be legal to use AudioCraft for things like background music in offices.

It is legal to use it for that purpose (in a for-profit enterprise, I suppose, one might make an argument that any activity was ultimately primarily directed at “commercial advantage”, but in a government or many nonprofit environments, that wouldn’t be the case.)


In their example audio clips they have a "perfect for the beach" audio track. With your understanding of the NC license, would a resort or private beach club be able to play a similar generated music track at their poolside bar or something along those lines? Their primary intention of the bar isn't to play the music, its just an additional ambiance thing; they're trying to sell drinks and have guests pay membership fees, people aren't really coming because of the background music.

I realize, this isn't legal advice, YMMV, etc.


> With your understanding of the NC license, would a resort or private beach club be able to play a similar generated music track at their poolside bar or something along those lines?

A resort, probably not, ambiance is, at least arguably, a marketable commercial advantage; a private club in the “mutual benefit organization” sense (rather than a “business selling memberships”, which is just like a resort), probably, because their interest, even indirectly, isn’t making money.


Yes it is. Art that I make for my own enjoyment is noncommercial. Art that I make to explain concepts to my son is noncommercial.


> as there is no noncommercial use you could make of an art generator

r/stablediffusion gives you a hundred examples daily of people just having fun and not thinking of monetizing their generations


> there is no noncommercial use you could make of an art generator

I'm sorry, what?


Google is running on "publicly available data", not "licensed data"


The fact that Meta is able to lie and call their restrictive licensing open source is nearly as misleading as "OpenAI."

We need to do better than to repeat these claims uncritically. The weight licenses are not "open source" by any useful definition, and we should not give Meta kudos for their misleading PR (especially considering that they almost surely ignored any copyright when training these things - rules for thee, but not for me).

"Not as closed as OpenAI" is accurate, but also damning with faint praise.


Just some general piece of advice: it's not productive to constantly be giving out the worst criticism you possibly can when someone does something that's not terrible but still unacceptable. Doing so just tells the companies that nothing satisfies the community and that they should stop trying. Instead, it's better to mention what they did right and point to how they can make it better.


Can you chill? It’s def open source


The source code is, as it's MIT, but the weights are not, as they're CC-BY-NC: https://github.com/facebookresearch/audiocraft#license


  about: pytorch @ fb.


So I can build a business on it, then?


I believe Meta has explicitly said that you can, but that's not what open source means and the model isn't open source.


Meta says to imagine you can: "Imagine a professional musician being able to explore new compositions without having to play a single note on an instrument. Or an indie game developer populating virtual worlds with realistic sound effects and ambient noise on a shoestring budget. Or a small business owner adding a soundtrack to their latest Instagram post with ease."

In reality, you can't, as they licensed the weights for noncommercial use only: https://github.com/facebookresearch/audiocraft#license


Research does exist you know. This is immensely helpful for a huge number of people in academia.

If you want to build a company, perhaps you should do what everyone in the industry has done for millennia, copy the movements performed and optimize them while doing so.


You don't own data. You can sometimes copyright data.

https://www.americanbar.org/groups/science_technology/public...


It's likely partly a PR/branding exercise as well.

In the new world that Meta sees, of VR/AR and AI, Meta is in a position already were people don't want them to have much power in this world, because they don't trust them over privacy etc, meta is trying to pivot to become more trustworthy so they make genuine moves in this space.


That, or this is an ongoing research lab (FAIR) that has existed for ~half a decade and has advanced the state-of-the-art in AI further than Apple, Microsoft and Google combined.


I would be pretty shocked if meta were that far ahead of all 3 of those companies, all of which are also spending a fuck load on internal AI research.


> all of which are also spending a fuck load on internal AI research.

But their internal research stays internal. Sometimes, they put out "papers" which are glorified advertisements, often going as far as hiding the model architecture just to keep their competitive advantage.


I get that, I'm just saying the original statement, that meta is further along than all of those companies combined, is a pretty wild claim.


If all three of those companies have something to show for their research, none of it is at the scale or level of accessibility Pytorch, Llama and now Audiocraft offer.


> "Meta is really clearly trying to differentiate themselves from OpenAI here. Open source + driving home "we don't use data we haven't paid for / don't own"."

Isn't Meta settling lawsuits for this right now? In addition to violating user privacy (another lawsuit)...

Meta is attempting to destroy competition; that's it. Similar to how they paid a fortune to lobby against Tiktok for the exact reasons Meta is under active investigation (again). The irony.


"If we don't win here, then at least we'll kick their lawn to pieces."


Bully "Open"AI into rebranding.


They are doing PR damage control with an influx of AI stuffs due to the ridicule of metaverse and the recent revelations of threads (for which they are playing the long AI game) -- [are not concrened about all threads and IG and other accounts being linked via their internal LLMs we will never hear about?


Yes. Meta is in the business of commanding as much of peoples time as possible. AI is more or less the biggest danger to this model (apart from legislation, theoretically, but let's not kid ourselves). Making AI a commodity is in their very interest.


Goddamn, Facebook being the good guy...


Nah, this is just the modern tech playbook: First you open source stuff, then you can monitor all the related development happening and whenever you see areas of interest/popularity, you simply clone the functionality or buy out whatever entity is building that interesting stuff.


They're not, they're playing a longer Microsoft style game to corrupt the meaning of open source, and releasing models under their terms to undermine competitors.


Sounds like they're good enough. Enemy of Microsoft is my friend


CC-BY-NC Isn't an open source licence, it violates point six of the open source definition https://opensource.org/osd/


Companies just putting “open” in the names of non-open things to make hn and the press automatically love it


who gets to declare what is the "open source definition" and why?


In my opinion, the Free Software Foundation, ironically, since they invented the movement, with open source starting out as a tacky rip-off with the ethics stripped out. After decades, open source converged on free software.

More popular opinion is OSI: https://en.wikipedia.org/wiki/Open_Source_Initiative

They were founded by the persons who (claimed to have) invented the term in order to steward it. It's the same definition as the FSF.


The people who created the term: the Open Source Initiative.

Before, people most often used "free software" as defined by the free software movement, but some disliked this term because it's confusing (most think "free" means no money) and perceived to be anti-commercial.

The term "open source software" was chosen and given a precise definition.

It's dishonest, then, for people to use the term "open source software" with a different interpretation when it was specifically chosen to avoid confusion.


> It's dishonest, then, for people to use the term "open source software" with a different interpretation

I disagree. You're saying that they "invented" the term, but it's a very generic term. The source is open, it's open source. I bet people were using the term before they claim they invented it.

In that context, it is very fine to use a different definition and in fact here's my definition, and I guess most people (maybe not on HN) share it: if the source is visible by the general public, it's open source.

For what you mean, I use "FLOSS".


Where does it say this is CC-BY-NC?

The article says this:

> Our audio research framework and training code is released under the MIT license to enable the broader community to reproduce and build on top of our work



It's pretty common in academic research for trained model weights to be licensed under something different from the code that one would run to create such a model if one had both sufficient compute resources and the same training dataset. That is, if those weights are ever released at all!

IMO, while I'd rather have one part permissively licensed than nothing at all... it stinks that companies sponsoring researchers get an un-nuanced level of street cred for "open sourcing" something that they know nobody will ever be able to reproduce because their data set and/or their compute grid's optimizations are proprietary.

As it stands, I'm not at all sure that the outputs of this model can be used for commercial videos.


One day the FOSS community will implode over ethics and licenses when the coolest thing ever gets released


Anyone feel like with the flood of AI generated content there's a risk of the past being 'erased'. Like in 10 years we won't be able to tell if any information from the past is real or fake - sounds, pictures, videos, etc.. Like we need to start cryptographically signing all content now if there's any hope of being able to verify it as 'real' 10 years from now.


No. We've had photo and audio manipulation for many decades now. For a long time now, we've had to separate out what's credible from what's bullshit.

Fortunately, it's pretty simple in real life. We have certain publications and sources we trust, whether they're the NYT or a respected industry blog. We know they take accurate reporting seriously, fire journalists who are caught fabricating things, etc.

If we see a clip on YouTube from the BBC, we can trust it's almost certainly legit. If it's some crazy claim from a rando and you care whether it's real, it's easy to look it up to see if anyone credible has confirmed it.

So no, no worry at all about the past being erased.


I don't agree. With ML tools it is possible to make sweeping changes to images and text that are often impossible to detect. combined with the centralisation of most online activities, large players could alter the past.

Imagine facebook decides to subtly change every public post and comment to show some particular person or cause in a better light.


If one "large player" like the NYT decides to "alter the past", you can compare with the WaPo or any other newspaper. You can compare with the Internet Archive. You can compare with microfiche. These aren't "impossible to detect", they're trivial to detect if you bother to compare.

We have tons of credible archived sources owned by different institutions. And these sources are successful in large part due to their credibility and trustworthiness.

It's just not economically rational for any of them to start "altering the past", and if they did, they'd be caught basically immediately and their reputation would be ruined.

This isn't an ML/tooling question, it's a question of humans and reputation and economic incentives.


You seem eager to exclude the possibility.

Maybe it is improbable, but there now is the technical possibility which was not there before.

It is valuable to explore that possibility and maybe even work to prevent such a use.

I would be interested in a ledger of cryptographically signed records of important public information such as newspapers, government communication and intellectual discourse.

Your argument that large social media will behave rationally is not backed up by reality. Consider Musk and Twitter.


> If one "large player" like the NYT decides to "alter the past", you can compare with the WaPo or any other newspaper. You can compare with the Internet Archive. You can compare with microfiche. These aren't "impossible to detect", they're trivial to detect if you bother to compare.

Detection doesn't really matter, because people are too lazy to validate the facts, and reporters are not interested in reporting them. AI is simply another tool to manipulate people, like Wikipedia, Reddit.com, Twitter, or any other BS psuedo-authority. Think someone will actually crack open a book to prove the AI wrong? Not a chance.


> and reporters are not interested in reporting them

You really think that if the NYT started altering its past stories, other publications would just... ignore it?

It would be a front-page scandal that the WaPo would be delighted to report on. As well as a hundred other news publications.

Thankfully.


That is maybe true for a small percentage of stories. You are also reducing this argument to the most construed straw man instead of engaging with the idea in earnest.

If you can't alter world news headlines, you can still alter the tone of the article. If you can't alter front page news, you still can alter the remaining 95% of news.

Influencing public opinion is more subtle than the one important headline per day.

You are also ignoring the fact that news sites regularly edit published articles already, from fixed typos to corrections to large re-editings.


You seem to be misunderstanding.

This isn't about a small percentage of stories, it's not about tone, it's the fact that if the NYT ever did this even once with the intention to truly "alter the past" it would be a major scandal.

And obviously things like corrections or taking down libelous content aren't included.

So no, I'm not constructing any kind of straw man here. I'm saying that the threat of subtly nefariously "altering the past" isn't realistic because it would be caught and exposed and there's no financial motivation to do it in the first place.


You are misunderstanding. Altering the tone, comments, bias etc. is altering the past, this is what I meant originally. You came up with the straw man of the new york times. Sure, some high visibility publications are less able to be altered, but that doesn't mean none can in no way.


Your solution, in the case of trusted sources altering content to fit a particular worldview, is to look at other "trusted" sources. I think that therein lies the problem. I believe the real danger isn't people being convinced of something untrue. I think the real danger is the apathy that builds up as people can no longer reliably distinguish the truth, and they give up on sifting through it altogether, instead accepting "their truth". The vast majority of people simply don't care enough to verify sources.

This is already happening without generative AI, and this new stuff is only going to speed things up exponentially.


The suggested large player was Facebook and Facebook posts. Which trustworthy independent sources of authenticity do we have for that? I do not think those you mention reach inside their walled garden?


First, why would Facebook do that? What economic incentive would there ever be, that would outweigh the loss of trust and reputation hit that would ensue?

Second, people take screenshots of Facebook posts all the time. They're everywhere. If you suddenly have a ton of people with timestamped screenshots from their phones that show Facebook has changed content, that's exactly the kind of story journalists will pounce on and verify.

The idea that Facebook could or would engage in widespread manipulation of past content and not get caught is just not realistic.


> We've had photo and audio manipulation for many decades now.

We haven't been able to generate 1,000 different forged variants of the same speech in a day before.

> We have certain publications and sources we trust, whether they're the NYT or a respected industry blog.

We can't even be sure that most of these aren't changing old stories, unless we notice and check archive.org, and they haven't had them deleted from the archive. The NYT has blockchain verification, but the reason nobody else does is because no one else wants to. They want to be free to change old stories.


> but the reason nobody else does is because no one else wants to. They want to be free to change old stories.

You're wildly assuming a motive with zero evidence.

No, the reason companies aren't building blockchain verification of their stories is simply because it's expensive and complicated to do, for literally zero commercial benefit.

Archive.org already will prove any difference to you, and it's much easier to use/verify than any blockchain technology.


Yep, every time technology shifts, reputation systems shift in response.

This goes all the way back to yellow news with newspapers: https://en.wikipedia.org/wiki/Yellow_journalism


Most people these days interact with news through comments, if comments looks legit, a lot of people assume the source is legit. Imagine a world in which a fake video has the bbc logo on it and ai generated comments act if they are discussing the video but they subtly manipulate, like 60% of the comments advocate a certain view point and 40% are random memes, advocate against it etc... The average person would easily be fooled.


You basically described Reddit. Don't even need an AI, all you need is moderator powers and a bunch of impressionable young people.


If I have a random picture, video, text - it's not easy at all to verify its authenticity. Hopefully a media organization has it, but even then are there any services I can use to validate? Definitely not family/personal media, any media that wasn't reported on by a large organization with the ability to manage large archives of data.

I'm saying this is going to become increasingly important fast, and we may miss the window where now almost everything not properly indexed by a large media organization is invalidated as there is no way to verify it.

I have a picture of Frank Sinatra at Disney World riding the tea cups. Who is the Frank Sinatra media authority that can tell me if this ever happened or not? A very small example to extrapolate from. It's going to get worse when everyone can create audio/video/pictures/text of anything they can dream.

The past may very well become a fictional dream, mythology, most of it impossible to verify.


> We've had photo and audio manipulation for many decades now. For a long time now, we've had to separate out what's credible from what's bullshit.

The difference is that the floodgates are being opened.


It doesn't matter though. Most of the internet is already probably mostly SEO blogspam, just like spam e-mail already outweighs legitimate e-mail for a lot of (most?) people. But nobody cares because it gets filtered out in the ways people actually navigate.

We have lots of tools to fight spam, and there's no reason to believe they won't continue to evolve and work well.


At a time when “people” seem easily manipulated and focused on their fully believing their personal feeds of curated outrage. They often don’t apply the screens/filters they should be because of the apparent social proofs, trust, and biases they have with the content. Contemporary journalists hardly do any fact/source checks as it is. So they’ll begin reporting on some of this, giving it further credibility and it’s just a downward spiral. So, more of the same, yay!


Seems like it might now become much easier to post a clip on YouTube that looks like an authentic BBC clip, logo and all. If generative AI gets that good, how will you be able to tell whether a particular piece of media comes from a trusted source?

Might not be possible on platforms - only if it's posted on a trusted domain.


Easy, is it on the official BBC YouTube channel or not?

That's the entire point of having trusted sources. Regular people can post whatever fake things they want on their own accounts; they can't post to the BBC's YouTube channel or to the NYT's website.


The past ended in 2022


Agree. Any video/image/text created post-2022 is now suspect of being AI generated (even this comment). And without any 'registering' of pre-2022 content, we can easily lose track and not really know what from pre-2022 is authentic or not.

Maybe it's not a big deal to 'lose' the past, maybe landfills will be mined for authentic content.


There should be a canonical copy of the 2022 internet that is verifiable. Archive.org is not enough


Or is the past endlessly rehashed with AI generated content?


This ^^


I’ve been wondering about this and real video evidence (eg dashcam or cctv) being refuted in court for inability to show it’s not deepfaked.


Even with digital signatures, there are limits to what we can really verify.

We'll likely be able to verify whether an entity is a real human, using some kind of "proof of humanity" system.

We will have cameras/mics with private keys built-in. The content can be signed as it's produced. But in this case, what's stopping me from recording a fake recording?

Maybe it's a non-issue. We used text to record history and we've been able to manipulate that since, well, forever.


If you're watching a movie or TV show, a vast majority of the sounds you are hearing are not "real". Has that bothered you before?


That seems as pointless a question as suggesting that enjoying TV shows means you shouldn't care if everyone in your life constantly lies to you.


>the stuff i hear is real. Perhaps you meant 'are not from the actual source you think they are'?

*my favorite is always the nightclub scene that goes real quiet when the actors act using their voices (which are real, but may be dubbed in afterwards).


With 90% of human generated media content being forgettable within weeks of publication, and AI not yet capable of matching even average human content (much less pro level), it’ll be some time before we have to worry about AI overwhelming most media content and erasing the works of memorable human authors.


>and AI not yet capable of matching even average human content (much less pro level)

Yeah this is not true. Sota Text, Image generation is well above average baselines. You can certainly generate professional level art on Midjourney


Commercial art and Art are not the same thing.


Unless you have a tool that can tell the difference, they are.


Yes, this is one of my concerns about all of this. The danger is real.


Wonder how far off the whole "generate music based on your existing music library" thing is going to be?

That'll make musicians happy with big tech as well, just like artists are. *sigh*


The Record labels are far, far more litigious than the art community.


They can't litigate a person doing this at home, and never redistributing.

I suppose they might try, anyway.


The RIAA pioneered copyright enforcement at the individual level back in the 2000s, they absolutely would try to sue downstream AudioCraft users.


They should start with AudioCraft itself, conceptually it's derivative work and it doesn't matter if it's "open source" or not. Try throwing in someone's sample in a song and publish it saying "no copyright infringement intended and I totally don't make any money from it"... If it becomes popular, see how long it stays up until DMCA takedown. And we know this dataset is already popular.


> and publish it

This is precisely the opposite of the context I was remarking on.


AudioCraft itself is published. That's the context I am remarking on.


[flagged]


Even before streaming, You never "owned" any music legally [1], you merely owned a physical copy of a performance[3] of a song, that in no way gives you the right to make derivative works [2] automatically.

Also it doesn't really matter on what the law says, RIAA in the last iteration, relied on the fact it you would rather pay a fine than be able to afford expensive lawyers to fight the specifics out in court on average.

It was always about disproportionate ability to bring resources against individual "offenders" to create fear among everyone to deter "undesirable" forms of copying, not necessarily what the legal protections were.

---

[1] Unless you specifically commissioned it under a contract which gave you the right

[2] See recent cases including those related to Kris Kashtanova and Andy Warhol.

[3] Not the song, just the performance, aka Taylor Swift version, for good explanation of how the rights are divvied up in the music industry a Planet Money series covers it well https://www.npr.org/sections/money/2022/10/29/1131927591/inf...


> that in no way gives you the right to make derivative works

True, but only because you have that right anyway. I can do anything I like with copyrighted content I legally possess, as long as I don't distribute the results of my efforts.


Establishing derivation is at the crux of all legal matters surrounding diffusion models. It has not yet been clearly established. If it is, then I'd agree with you. Until then, I think it's a bit more up in the air.

Also, IIRC, RIAA did not bring many resources to bear against e.g. "home taping" itself, because they could essentially never know that it had occured. The overwhelming majority of their efforts went into trying to takedown people distributing multiple copies.

The Kashtanova case does not cover derivation in any real way, but is really about copyright attribution choices between human and software.

The Warhol case specifically tests a fair use claim, not a derivation claim.


RIAA and others in general sued a lot of people including bar owners for playing their songs etc , there were also some enforcement via private companies with three strike policies and so on especially in Europe .

They went after ISPs and torrent sites which only hosted magnet links and many others who shouldn’t have been really sued

The goal was to create a very hostile environment for downloading songs to protect their interests - “you wouldn’t download a car!”

It was never the goal nor ever realistic to actually pursue enforcement action against every offender , the idea was to change behavior with all the related actions .

They did end up changing behavior, people just didn’t want the hassle, or be in fear so paying for streaming for access had a stronger value proposition, it was not what the RIAA planned , but benefits enormously today.


https://en.wikipedia.org/wiki/Audio_Home_Recording_Act

https://en.wikipedia.org/wiki/Home_Taping_Is_Killing_Music

Recording industries have fought end user reproduction often. They’ve fought sampling battles.

Go after the pocketbooks and go after the technology waves. If there’s a derivative argument they can make, they will.


They may have gotten the AHRA passed, but they essentially lost in every important way:

> "This exception was crucial in RIAA v. Diamond Multimedia Systems, Inc.,[14] the only case in which the AHRA's provisions have been examined by the federal courts. The RIAA filed suit to enjoin the manufacture and distribution of the Rio PMP300, one of the first portable MP3 players, because it did not include the SCMS copy protection required by the act, and Diamond did not intend to pay royalties. The 9th Circuit, affirming the earlier District Court ruling in favor of Diamond Multimedia,[15] ruled that the "digital music recording" for the purposes of the act was not intended to include songs fixed on computer hard drives. The court also held that the Rio was not a digital audio recording device for the purposes of the AHRA, because 1) the Rio reproduced files from computer hard drives, which were specifically exempted from the SCMS and Royalty payments under the act, 2) could not directly record from the radio or other transmissions. "

From the AHRA itself:

> No action may be brought under this title alleging infringement of copyright based on the manufacture, importation, or distribution of a digital audio recording device, a digital audio recording medium, an analog recording device, or an analog recording medium, or based on the noncommercial use by a consumer of such a device or medium for making digital musical recordings or analog musical recordings.

and from Wikipedia again:

> In regard to home taping, the provision broadly permits noncommercial, private recording to analog devices and media. However, it fails to resolve the home taping debate "conclusively," as it only permits noncommercial, private recording to digital devices and media when certain technology is used.

> Two reports by the House of Representatives characterize the provision as legalizing digital home copying to the same degree as analog. One states "in the case of home taping, the exemption protects all noncommercial copying by consumers of digital and analog recordings,"[22] and the other states "In short, the reported legislation [Section 1008] would clearly establish that consumers cannot be sued for making analog or digital audio copies for private noncommercial use."[23]

Similarly, language in the RIAA v. Diamond Multimedia decision suggests a broader reading of the Section 1008 exemptions, providing blanket protection for "all noncommercial copying by consumers of digital and analog musical recordings" and equating the spaceshifting of audio with the fair use protections afforded home video recordings in Sony v. Universal Studios:

>> In fact, the Rio's operation is entirely consistent with the Act's main purpose – the facilitation of personal use.


Legal acquisition does not matter for AI training. If training is fair use then you can train on pirated material (e.g. OpenAI GPT). If it's not fair use then buying the material does not matter, you have to negotiate a specific license for AI training for each work in the training set, which is impractical at the scales most AI companies want to work.


This seems to distort the issue a little bit.

If you purchase the music, you have a (sometimes explicit, sometimes implicit) license to do certain things with the music, entirely independent of any concept of "fair use".

The question is not "is training part of fair use?" but "is training part of, implicitly or explicitly, the rights I already have after purchase?"

Given that "training" can be done by simple playing the music in the presence of a computer with its microphone turned on, it's not clear how this plays out legally.


In the US, exceptions to copyright come across in two distinct bundles: first sale and fair use. They exist specifically because of the intersection between copyright law and two other principles of the US constitution:

- First sale: The Takings Clause prohibits government theft of private property without compensation. Because copyright owners are using a government-granted monopoly to enforce their rights, we have to bound those rights to avoid copyright owners being able to just come and take copies of books or music you've lawfully purchased.

- Fair use: The 1st Amendment prohibits government prohibitions on free speech. Because copyright owners are using a government-granted monopoly to enforce their rights, we have to bound those rights to avoid copyright owners being able to censor you.

If you hinge your argument on "I bought a copy", you're making a first sale argument.

Notably, first sale is limited to acts that do not create copies. This limit was established by the ReDigi case[0]. Copyright doesn't care about the total number of copies in circulation, it cares about the right to create more. So an AI training defense based on first sale grounds would fail because training unequivocally creates copies.

Fair use, on the contrary, does not care if you bought a copy of a work legally. It only cares about balancing your right to speech against the owners' right to a monopoly over theirs. And it has so far been far more resistant to creative industry attempts to limit exceptions to copyright - to the point where I would argue that "fair use" is an effective shorthand for any exception to copyright, including ones in countries that have no fair use doctrine and do not respect judicial precedent.

The courts won't care how the training comes about, just if the act of training an AI alone[1] would compete with licensing the images used in the training set data.

[0] https://en.wikipedia.org/wiki/Capitol_Records,_LLC_v._ReDigi....

[1] Notably, this is separate from the act of using the AI to generate new artistic works, which may be infringing


It hasn't been established yet that a diffusion-model generated work is a copy or a derivative of any particular element of the training set.


The people above are arguing about being caught, not legality.


Is training a "pirate model" something you'd reasonably be able to do at home though, given the compute requirements? The analogous "image generation at home" is only possible due to a for-profit entity with significant resources choosing to (a) play fast-and-loose with the provenance of their training set and (b) giving away the resulting model for free, if the open source community had to train their models from scratch then as best as I can tell they would still be stuck in the dark ages generating vague goopy abominations.


Currently, yes, available compute power @ home does indeed seem like a limitation. Whether that remains true going forward seems a little unclear to me.


You could take a model trained on CC content and then fine tune it on copyrighted material cheaply and quickly


Ugh, I dread having to listen to everyone's hyper personal music because they swear up and down to the point of tears that "IT'S THE BEST SONG EVER CREATED! EVER!!!", while the constantly prod for you to affirm how amazing the song is.

Bruh, music is subjective as hell, and I can already tell I hate this song.


Perhaps LoRA (Low-Rank Adaptation) training techniques could be used for these types of models, like they're currently being used with LLMs and latent text-to-image diffusion models.


Sadly looks unlikely if the base model wasn't trained on vocals.

> Mitigations: Vocals have been removed from the data source using corresponding tags, and then using a state-of-the-art music source separation method, namely using the open source Hybrid Transformer for Music Source Separation (HT-Demucs).

> Limitations: The model is not able to generate realistic vocals.

(https://github.com/facebookresearch/audiocraft/blob/main/mod...)

I suspect this was a combination of playing it safe and that the model isn't well architected to reproduce meaningful vocals.


Why not do generate music you like which wouldn’t need you to upload your library and would have RLHF baked in.


Something like the algorithm TikTok uses. First probing by offering a variety of content that should match based on what little information you have on the user (ip location, locale, etc).

Then use the user’s action to iteratively refine your classification, until you end up with something tailor-made.


Uh more like Reddit with up and down and also how long you listen.


Anyone having listened to this MusicGen's output samples would surely answer "a million miles".

Seriously, you couldn't sell this output for a free mobile clicker game.


These models are going to end up being used for advertising. Soon pretty much every ad you see will be generative AI based. It makes A/B testing way easier as you no longer need a creative person to modify the ad or change something subtle about it. For example, the generative voice might change to a different speaker or something, and the AI can generate thousands of different voices to see which one is most effective.


We might even end up with the most effective ones being the weirdest. Where people clicked though because the ad was so weird.


"Now, increasingly, we live in a world where more and more of these cultural artifacts will be coming from an alien intelligence. Very quickly we might reach a point when most of the stories, images, songs, TV shows, whatever are created by an alien intelligence.

And if we now find ourselves inside this kind of world of illusions created by an alien intelligence that we don’t understand, but it understands us, this is a kind of spiritual enslavement that we won’t be able to break out of because it understands us. It understands how to manipulate us, but we don’t understand what is behind this screen of stories and images and songs."

-Yuval Noah Harari


Maybe this us vs them mentality is the biggest bottleneck.

If instead you consider that this new form of 'alien' intelligence is actually a descendant of human intelligence, that we are raising a new species which will inherit what humans have built (ideally only the good parts) and then improve upon it further..

It may sound grandiose, but that perspective changes everything.


The demos are great. Could someone explain what’s in it for Meta open sourcing all these models?


A competitive opensource project basically destroys the pricing power of all closed-source alternatives.

If you're a company and wanted to integrate an LLM into your product if the choice is between several equally good models, but one is free and open-source which would you pick?

Aside from keeping competition at bay, this move also gives Meta leverage because ecosystems are now being built around their projects. If these models see wide-scale adoption they could later launch AudioCraft+ as a licensed version with some extra features for example.

Alternatively, they might offer support or hosting for their open source projects.

Right now though I think the primary benefit of these open sourced models is to attract talent. If Meta is seen as one of the leaders in AI then researchers will want to work for them simply for the prestige.

Arguably one of the reasons Meta has been behind so many awesome projects like PyTorch and React over the last decade was because they were seen as the cool place for recently graduated, but talented software engineers to work in ~2010.


They want to comoditize the offerings of OpenAI, Google, MS, and Apple. Also, they gain mindshare and good will after years of bad publicity. Some back contributions might help them improve the models for free.

If they just keep their models, people won't be interested and will build over ChatGPT or Bard.


Not a fan of Meta, but haven't they generally been pretty forward with open sourcing their tech?


Commoditize Your Complement?

https://gwern.net/complement


What is it a complement to though?


Meta has several of the biggest UGC platforms, and in this case the complement is content itself. Reels with autogenerated (and royalty free) background music is the obvious example but I'm sure there are more. Maybe creative for ads as well?


To Metaverse access. Filling the metaverse with engaging interactive 3D content is an insane job with 2020 technology. It requires a huge amount of a range of skilled labor to create 3D models, soundtracks, NPC dialog, visuals et.c. to make a compelling experience. In 2030 that may have been reduced to that everyone with creativity and Internet access can do it. Sure, most of it will be silly things - but so is social media today, does not make it any less of a commercial success. And there will be be millions of semi-pro creators to create the things with higher production value, like with videography today.


Content is a complement to social media.


In short term it's social media, because people will share whatever they generate on social media. But I don't think it's a very strong incentive to invest in AI for Meta.


If anything it makes them appear to be one of the best places to work at to do research. Could be them playing the long game.


The demos are, unsurprisingly, soulless muzak. This contributes nothing to our culture.


Was asking myself the same earlier, I'm sure it is largely to do with publicity and the fact that selling these services is not their core business. At the very least releasing this stuff probably won't damage their core business but will take the sheen off of some other big names.

I wondered though, generative AI is hurling us into a world where we'll need more mechanisms to sort real from fake, provenance will play a large part, and meta's platforms could be part of the answer. i.e. content linked to actual verifiable people.


Somewhat relevant, Yann LeCun insisted the research should be open sourced. At least in an academic sense.

He touches on it briefly in this podcast episode: https://www.therobotbrains.ai/who-is-yann-lecun


They will own most popular open models so they can dictate the direction in open source AI.


They haven't opened sourced much. Open models/closed weights restrictive non-commercial license is something I guess.

They are trying to kill the market before they get left out.


The same move that Microsoft did back in 90's to kill Netscape. Make your product the one available to masses, next generation of users will be using your product.


I was just thinking how Google made Android free to check Microsoft. This is Meta checking Google.


Checking OpenAI. Google is still playing checkers.


I just can't get how bad Google is doing. They have a ton of top researchers, papers, money, just no good LLMs. It's like OpenAI was first to the punch, and everyone else just saw $$$. Meta was smart to go down this open source road, as the masses will start training their llamas one way or another. Personally I believe the "intelligence" aspect will asymptote, so even having exclusive access to a "super AI" (i.e. hypothetical 1T parameter model like a GPT5) won't be that much of a step behind the lesser AIs, and as soon as you grant access to the masses they will start to use some transfer learning to make their "lesser" models better. AI applications though still need a lot of work. The models are smart or general purpose enough to be useful to the average person out of the box.


The problem also is that Google is making lot of grandiose announcements about tools and models that nobody can see nor use. This is a serious credibility problem in the long-term.


You can use some of them. They have an “AI powered” search (as if their previous search isn’t considered AI anymore). It’s an experiment you can turn on. For programming questions it’s not terrible.

That said, there are a ton of “look at this cool thing out research team did” and then you never hear about them again things from Google. They even built a music generator that was closed to the public until recently.

https://blog.google/technology/ai/musiclm-google-ai-test-kit...


The fact you believe, rightly or wrongly, that meta is ahead of google on ai explains why meta would open source this. It’s a good reputation to maintain.


Checking Google or OpenAI (or both?)


theoretically what's in it for them is that people will build content faster and with less barriers for eventual consumption on their platforms


If people love hanging out with chatgpt or bard, they won't be wasting their precious little eyeballs on FB/Insta


The difference between MBD-EnCodec and EnCodec is pretty interesting. MBD variant sounds more like a professional studio recording, while the EnCodec feels like a richer sound.

Curious if I’m alone in that.

(At the bottom https://audiocraft.metademolab.com/musicgen.html)

For what it’s worth though, the voice based examples sound dramatically better with MBD

https://audiocraft.metademolab.com/encodec.html


MBD definitely sounds like it was recorded in dead room, whereas plain EnCodec has been mixed but includes some artificial noise.


All very interesting, but how would a musician ever be interested in creating the result of "Pop dance track with catchy melodies, tropical percussions, and upbeat rhythms, perfect for the beach"? This stuff will create a lot of Muzak for sure. Actually turning into anything useful for musician? I honestly doubt it, and I'm happy if it stays that way.

Saying that engineers don't understand the arts is a bit of a trite generalization, but reading the way Meta markets these "music making" contraptions is really cringe inducing. Have you ever, at least, listened to some music?


Finally, a way to fulfill my childhood dream of composing a symphony of rubber ducks honking. Bach would be proud.

/edit On a more serious node. I already see the 24/7 lofi girl streaming generated music. The sample[1] on lofi sounds pretty good.

[1]https://dl.fbaipublicfiles.com/audiocraft/webpage/public/ass... "Lofi slow bpm electro chill with organic samples"


I also like some of the generated examples.

Can I haz full version of Bach + `An energetic hip-hop music piece, with synth sounds and strong bass. There is a rhythmic hi-hat patten in the drums.` please?

(https://dl.fbaipublicfiles.com/audiocraft/webpage/public/ass...) ?


> Finally, a way to fulfill my childhood dream of composing a symphony of rubber ducks honking.

Samplers have been around since the 70s.


This is Spotify's route to profitability - the Netflix model of generating their own "content" (/music), and not having to pay the labels. Premium plans for us music nerds who want a human at the other end, regular plans for plebs who just want to fill the silence with something agreeable.


Although I think AI-generated and AI-augmented (using voice cloning, etc.) artists are a given, for Spotify to stop paying labels they'd have to be able to remove all non-Spotify content from their streaming catalog. That doesn't seem like a possibility in our lifetimes. (Also, Spotify hasn't even been able to build a sustainable business on podcasts, which they copy to their closed platform for free.)

It's an interesting thought experiment, though. I can imagine that "environmental audio" companies like Muzak have about 5 years left before they either adapt or die. What other kinds of companies are in trouble?


Their current pay structure is royalties, i.e. per listen. If they can route their audience to mostly AI generated content in time (say, 5-10 year transition), and it's just as good for most people, then they can negotiate much lower prices with the labels. We all grumble about Netflix being full of junk, but most of us are still subscribers, despite a sparse catalog of big name movies.


> then they can negotiate much lower prices with the labels

Or alternatively, if the labels are not stupid, they'll negotiate for a higher price per listen (or similar), as they are still as essential to the service as before.


I just ran all of the cited installation steps, which appear to have been successful... but I am now experiencing a profound sense of "now what?"

There doesn't appear to be any new CLI executables installed, and the documentation links to an API but there's no clues on how to actually process a prompt.

What am I missing? Alternatively, I wouldn't mind using it in a Notebook but so far this thread doesn't link to anything so ambitious (yet?)


The main gradio app has been moved to the demos folder.

    python demos/musicgen_app.py
Otherwise you can check the jupyter notebooks in the same folder.


Thanks! This will be even more helpful if you could share a hint about where this was installed to.

I carefully went through the output generated by the "pip install -U audiocraft" command, and there were no clues provided.

Disclosure: I am not a Python developer, so I apologize if this is a master-of-the-obvious question for Python folks. However, if there was ever a scenario where a line or two of post-install notes would be useful, it's stuff like this.


You may have to clone the repository to get the demos folder. Otherwise it's perhaps somewhere depending on how you use python (global and often broken environment, virtual environments, conda hell, etc…).

I feel like Python folks are on average terrible at distributing software. So many projects have some python script to install the dependencies, still assume you use conda, or don't bother to specify the dependencies versions. Thankfully it's often the same patterns and after some time you understand what to do based on the error messages. But I wish they could use something like NPM or Cargo. Even something like Maven would be an improvement.


Tip: If you run `pip uninstall audiocraft` it should render an interactive confirmation prompt that explains which files it is about to remove, this should help to understand where the library was installed (of course you don't have to confirm the prompt, just press Control-C or answer "No" to the prompt once you have your answer).

Disregarding the tip above, determining where the library was installed requires a bit of context, for example your platform (Windows versus UNIX) and the fact that newer pip releases default to "pip install --user" when not running with super-user privileges, whereas older pip releases did not default to "pip install --user".

Assuming you are using Linux and using an up-to-date pip release and you ran the "pip install -U audiocraft" command without super-user privileges, then the library was most likely installed in ~/.local/lib/pythonX.Y/site-packages (where X.Y is the version of Python that was used by the pip command you ran).


This is the default state of deep learning projects, everyone assumes only phd researchers will ever try it who already know how to use everything in the tool chain. What's happened with llama and other LLMs with codebases that actually work outright with one click when compiled is a pretty big outlier.


You're not supposed to actually install it and use it, just comment on how cool and open Facebook is, especially in comparison to OpenAI. So, user error.


Right its not like anyone has operationalized Llama 2 or there aren't hundreds of repos for inference servers and the likes. /s


I just made a script that generates a two minute long classic radio show episode in the style of Johnny Dollar. Im using elevenlabs for the dialogue so the audiocraft element is definitely the weak point, but its super neat that its even possible currently.


Anyone know if there are ways, as-is, to speed this up on Apple Silicon?

This setup takes 5 minutes:

    - Mac Studio M1 Max 64GB memory
    - running musicgen_app.py
    - model: facebook/musicgen-medium
    - duration: 10s


I see from the musicgen.py-

  >.if torch.cuda.device_count():
  >.   device = 'cuda'
  >.      else:
  >.   device = 'cpu'
So pytorch will fall back to CPU on a Apple Silicon. Ideally it would use Metal for acceleration (MPS) instead of just plain 'CPU', but if you replace CPU with MPS you'll probably run into a few bugs due to various Autocast errors and I think some other incompatibility with Pytorch 2.0.

At least that is what I ran into last time I tried to speed this up on an M1. It's possible there are fixes.


Same here (mps errors). I tried after the initial musicgen release.

I’ll have to check again, but I remember AFAICT my hardware wasn’t getting saturated, so maybe there’s headroom for mac cpu performance. And of course in the meantime I’ll be refreshing the ggml github every day


I think it is a mistake to acquiesce and let copyright owners bully AI model trainers over model data inputs. The endgame of this practice is a "pay per thought" society. This is separate from speculation regarding machine sentience -- as interfaces improve AI models will serve more and more as direct human extensions of mind. While copyright duration is a separate issue, and the current durations are appalling, copyright violations should focus strictly upon the output of models and how they are utilized. There are so many melodies in my head that I have not nor will I ever pay for, (some of which I would love to remove). AI models need also to have the same unfettered access to the commons as we do. Infringement occurs on the outputs -- application of copyright restrictions on model inputs is a violation of Fair Use and a definite money grab.


Generative AI for images and music produce pixels and waveform data, respectively. I wonder if there is research into "procedural" data; so in this case, it would be SVG elements and, perhaps, MIDI data respectively.

I know training data would be much more harder to get, (notwithstanding legal ramifications), but I think that creating structured, procedural data will be much more interesting than just the final, "raw" output!


I've thought about this too. The instruments themselves can be synthesized for extremely high quality audio. All we need is the musical structure - the MIDI.


Is there place where I can check how it works? Like give my input and get output audio?


The model cards from the repo[0] link to Colab and HF spaces.

[0]: https://github.com/facebookresearch/audiocraft#models


Audiocraft+, don't forget the plus, on github has a collab notebook based on audiocraft, and a webui to use. It is pretty awesome!


Maybe this will finally lead to high-quality open-weights solution for TTS generation.


Get ready for the next generation of Muzak


AudioGen seems really fascinating. I have some dumb questions.

While the datasets used for training AudioGen aren't available, is there any kind of list where one can review the tags or descriptions of the sounds on which the model was trained? Otherwise how do you know what kinds of sounds you can reasonably expect AudioGen to be capable of generating? And what happens if you request a sound which is too obscure or something not found in the dataset?

What are AudioGen's capabilities regarding spatial positioning? First example: can it generate a siren that starts in front and moves left to right and complete a full circle around the listener? Second example: can it do the same siren but on the Y axis, so it start at the front, it goes over the listener and then it goes under them to complete the circle?



Incoming strike by American Federation of Musicians in 3, 2, 1....

How many jobs would this thing take away? One of the biggest time consuming in any video production is post production audio including background music, audio, Foley etc. This will automate almost of it!


This would take away all the jobs producing such low-quality low-fi artifact-laden background music... if any existed.


"generating new music in the style of existing music" will probably be a huge field soon. I can't wait for it to happen, it's a low-cost way of producing even more music to listen to.


> I can't wait for it to happen, it's a low-cost way of producing even more music to listen to.

I can't really understand this. I'm a DJ and a huge music nerd, and I spend a lot of time every week discovering new music from the past 100 years and all over the world, and I'm constantly struck by _how much of it there is_. I've spent weeks just digging through psych-funk records from West Africa from the 1970s.

How can you have the impression we're so desperate for more music that we need computer programs to generate it for us?


Music is self-expression. I don’t always identify entirely with others. I always identify with my self. Having music generated for you on such a personalized level is an attractive prospect.

I don’t think this replaces “100% organic, human-made” music, though. I think there’ll always be a reason to listen to music made by other people. But I think this changes the landscape of how and why people create music to begin with. It certainly will devalue existing music, since everyone has something they may prefer that they can generate instantly.

I think generative AI is a terrible technology for artists who want to make money from their art, but in my personal opinion, I strive for a world where art isn’t a transaction, but a gift of human expression and connection. A world where art is appreciated for the emotion, stories, and ideas it conveys rather than the monetary value it holds. Generative AI might disrupt the traditional economic models in the art world, but it also opens up new opportunities for creative exploration and personal expression. It’s a challenging evolution, but one that could potentially democratize art, making it more accessible and personal than ever before! Bring on the Renaissance: Part 2!


> I strive for a world where art isn’t a transaction, but a gift of human expression and connection. A world where art is appreciated for the emotion, stories, and ideas it conveys rather than the monetary value it holds.

In a world where nobody is compensated for their art, the only people making art will be the ones privileged enough to have the means to do so for free. I don't see how this leads to "Renaissance: Part 2."


it can't be a "gift of human expression and connection" if A) a machine creates it and B) nobody but you ever hears it

this isn't democratizing art, and i would argue it has nothing to do with art. it is giving us an endless faucet of content, but not art.


I don’t agree with your definition of music. For me, as both a musician and a listener, music is communication between human beings via harmonic carrier waves. Using a machine to make word salad copies of existing communiques is literally just nonsense to me


What's self-expressive about an algorithm that generates songs?

Recorded music is the worst thing that happened to music.


if you want to express yourself musically, then learn how to play an instrument and compose music


> I can't really understand this

I came to understand a very large portion of the population just wants content, any type, any quality, to fill the void. They'll consume anything as long as it's new. Content to fill the empty vessels we became. Just look around, mainstream music, movies, podcasts, news, it's mostly mediocre, but it goes real fast, you get new mediocrity delivered every day


There is a lot of human music for sure, great music from all eras, but just the other day i generated a song which was pure crystal harp. So, how many songs of crystal harp are out there? 1000 all n all? 10.000 maybe? Now i can generate one thousand crystal harp songs per day.


What if I generated a music of 12 billion farts, how many asses are out there ? 7 billions all in all maybe ? Now I can generate one billions of fart songs per day


Yes but it's not really fungible. My favourite artist is Fats Waller and they don't make anything like that anymore. Most people are only interested in some category of music, not all of it.


Musician here. While I agree with you that there is a nearly endless heap of music to dig through, I think it's interesting to think about the possibility of hearing genre crossovers and styles that don't yet exist.

As an aside, a lot of musicians seem to dislike this kind of technology, but I never saw music as a competition. I don't care if some inexperienced kid is generating bangers from his bedroom even though he can't play a single instrument. It's just something else to listen to. I write music for me.


It’s the same reason we have 250 Marvel movies that all tell the same story. People want the same, but different. They don’t give a damn about human creativity for the most part.


Frank Sinatra sings Lil Jon's "Get Low" - https://youtu.be/7zoQeH2wQFM


Nice results so far. "Perfect for the beach" is a very funny description of music, because it has nothing to do with the acoustic qualities, so consider these descriptions to be anthropocentric! (As if they could be anything else) It is less about describing the actual sounds you want and more about describing the quality or vibe of the atmosphere. This is markedly different than incremental composition, maybe we can call it "composition by collage." Puts on COLLAGE shirt like in Animal House


I'm looking forward to playing with the M1 Mac apps/cli-tools that will probably come out for this in the next week or so! Being able to run this stuff locally is a lot of fun.


Are the M1 macs capable enough? I'm eyeing and upgrade in the coming months and I'm curious if a MacBook would be suitable


I've run Stable Diffusion locally (both from the cli and later using GUI wrappers) and that used my GPUs, I've also run Llama locally but I believe that was on the CPU (I used both llama.cpp, cli, and Ollama, gui). So to sum it up: yes? Or at least it's good enough for me.


Great thanks!


As an amateur musician I’m wondering if there are any of these audio generators that you can give a tune or chord progression to riff on. ABC format maybe? There are lots of folk tunes on thesession.org.

Could you generate a rhythm track? Ideally you could make songs one track at a time, by giving it a mix of the previous tracks and asking it to make another track for an instrument. Or, give it a track and ask it to do some kind of effect on it.

Another interesting use might be generating sound samples for a sampled instrument.


If you mean to give a source of melody of 30sec and extend that melody into a full song, yes MusicGen can do that. There are two ways to extend a song based on a melody: 1) give a sample, and continue the song from that sample as close as possible, and 2) give a melody as an inspiration.

They both work in varying degrees of success. Audiocraft on github, issues or discussion sections have a lot of questions answered.


Evidence? None of the demos suggest that is true.


Did they change the base model? If not, then the audiocraft_plus, is based on audiocraft, and it creates music of length close to 5 minutes.

I don't know if audiocraft_plus incorporates all three modalities of the release, MusicGen, AudioGen, and EnCodec. It uses MusicGen for sure, all four models, small, medium, large and melody.

https://github.com/GrandaddyShmax/audiocraft_plus


Bit is that "extension" to 5mkns more than just the repeat of e.g. 15secs heard in the demos?


Yes, of course it is more than that. You can create full songs of several minutes. I have published 20 songs on youtube of 2-3 minutes each. Listen to that song, the best i have made so far [1]. It doesn't repeat at all!

I haven't looked closely to this release, is the audiocraft page on github different than facebookresearch/audiocraft? The other two modalities may be new, AudioGen, and EnCodec, but i was under the impression that they changed the license to full open source, and that was that.

https://www.youtube.com/watch?v=UL1KmrHMjcM


Thanks. What I hear there is 19s of material on repeat, with the parallel parts variously brought in and out, and no other variation except in decorative detail.

So yes not a literal repeat, but no not enough to rate as a "full song", IMO.

Plus it is not actually song at all, note. Singing would particularly expose its repeat-and-vary trick.


Quite impressive although if these are the cherry picked examples the average output must be pretty weak! Nothing catchy about most of these examples and the reggae one is pretty lame.


I am curious if it can generate audio for a country. For example, the sirens sound in the sample doesn't like I siren I would recognise. Sounds like an American one?


I tried it out (american, british, korean, italian, japanese) and couldn't really get any control. Sometimes the american siren would sound different, but asking for a siren of a specific country would just give the american sound. Maybe better prompting would help. I used "isolated american ambulance siren no traffic".


Thats an interesting q ;

What about the ring tone, busy tone, disconnected tone for any country over time. 2600 vibes (pun)



And https://audiocraft.metademolab.com/musicgen.html

The samples included in the press release are quite impressive to my ears, but the other samples (especially from AudioGen) have a hint of artificially.

As usual the music is quite repetitive, but I'm looking forward to tools that simplify changing the prompt whilst it generates over a window. I can only imagine the consequences for royalty free music.

Edit: the "Text-to-music generation with diffusion-based EnCodec" samples are quite impressive.


Can I generate a reading by someone if I have a lot of their voice samples with this? Or is there a better tool for doing such a thing?


elevenlab and dozen others already do that


Here's a different question: Can you use the audio output this produces for anything else other than "research purposes"?


You can - as long as it's not commercial. It's a broad definition, but a good eule of thumb is if you're not directly making money out of the generated audio. They may still cone for you if you're making money indirectly, so consult a lawyer.


I can see some fantastic uses for this in generating complex acoustic environments to layer over TTS or real recordings for speech-to-text model training. I wonder if that is occupying some kind of gray-area. For example you have 1000hrs of clean speech from the librispeech corpus. It would be trivial to use this tool and available weights to generate background noise, environmental noise and the like, and then layer this with the clean speech to cheaply train a much more robust model. The environmental audio you create would never be directly shared or sold, but it would impact the overall quality of the STT model that you train from the combined results.


I wish people made unconditional predictive models for music instead of text-to-music ones. Would be so cool to give an input 'inspiration' track that it 'riffs' a continuation to. That's usually what I want, just continue this track it's too short that's what I want to hear more of. (That said this is super cool though.)


Theoretically this is very possible using their techniques. They tokenize the audio and learn next tokens autoregressively. Instead of text tokens -> audio tokens as input, just tokenize a prior song and continue it.


The Encodec project's focus on AI-generated sound effects offers practical applications across various industries, from gaming to film production. It's exciting to think about the possibilities of AI enhancing audio design and creating immersive auditory experiences.


Does anyone else hear a kind of background static in these samples? It almost sounds like part of the track is more compressed in terms of dynamic range than other parts, which doesn't make any sense to me. I'm trying to decide if this is my own confirmation bias at work or not.


We built a Mac/Windows app around the original MusicGen so people can experiment with it on their own machine with a simple UI (https://samplab.com/text-to-sample).


A pretty neat background noise generator.

"From text to audio with ease"

I hoped for a second we would get a good quality model to do text to speech - damn, I guess it's back to bruteforcing bark.ai or waiting for tortoise (or more realistically, just paying elevenlabs)


This is great, I've been wanting sound effect generation for years. I spent a lot of time trying to get WaveNet working well, eventually just dropped the project after mediocre results. With AudioGen I'm generating a sample in less than a second.


The license of the model weights is CC-BY-NC, which is not an open source license.

The code is MIT, though.


It's unlikely that model weights can be copyrighted, as they're the result of an automatic process.


> It’s unlikely that model weights can be copyrighted, as they’re the result of an automatic process.

If they can’t for that reason alone, then the model is a mechanical copy of the training set, which may be subject to a (compilation) copyright, and a mechanical copy of a copyright-protected work is still subject to the copyright of the thing of which it is a copy.

OTOH, the choices made beyond the training set and algorithm in any particular training may be sufficient creative input to make it a distinct work with its own copyright, or there may be some other basis for them not being copyright protected. But the mechanical process one alone just moves the point of copyright on the outcome, it doesn’t eliminate it.


That isn't necessarily true. You're saying that the model weights would be a derivative work. Derivative work is insufficiently transformative to fully distinguish the output from the input.

In this instance, it's very likely that the process is sufficiently transformative. A set of model weights look nothing like the Mona Lisa, nor can they be directly transformed back into it. What it is NOT is the product of a creative process on the part of a human, and is thus ineligible for copyright.

It is as though we are able to distill meaning using an automatic process. Copyright doesn't protect meaning, only expression, and it only protects expressions that were generated by humans.

The network itself might be patentable if it isn't obvious.

If a network was copyrighted and it was found that the function of the network was inseparable from its expression, it would no longer be eligible for copyright on that basis.

I think people often misunderstand the application of copyright to GPL code in this way.

Now, will any of this stop people from CLAIMING copyright? No. It'll have to be fought out in courts.


Oh my god, some of these tracks actually SLAP.

Like for real.

The last bastion of human creativity is about to be defeated.


Which ones slap? I want them to, but what I'm hearing is only OK. I think this could generate some interesting starting points for me when I'm stuck, though.


It will be a licensing nightmare, just like Llama1 was.

But clever devs can study it to make better software and pressure them for a better license on the next release. Worked for LLaMA 2


https://www.audiogen.co/ related? unrelated? big fight coming up?


Wouldn't Pandora's possibly vast library associating textual descriptions with music be the ideal training data for something like this?


Yes... except that Pandora's library does not include the music.


Diffusion model is now SOTA in audio and image generation. Has anyone given it a shot on texts?

Audio is more similar to language than images because of more stronger time dependency.

The paper says the critical step they took for making diffusion model work for audio was splitting the frequency bands and applying diffusion separately to the bands (because full band model had limitations due to poor modeling of correlations between low frequency and high frequency features).

I think something could be done on text side as well.


There are two problems with this. Diffusion models work on a single rule of thumb: if you keep adding small, noisy gaussian steps to a "nice" distribution many times, you get uniform gaussian at the end.

So, for text: a) what is the equivalent of a small, noisy step? and b) what is the equivalent of a uniform gaussian in language space?

If you can solve a and b, you can make diffusion work for text, but there hasn't been any significant progress there afaik.


Is there a way to try this out? I didn't see one, but didn't look too hard.


Yes. Installation instructions on the front of the repo, then click on the model readme for sample getting started code (10 lines of python and you get output.)


How does meta plan to make money from this open source?


Would really be interested to run this locally


How many parameters do these models have?


does this model help in TTS(text-to-speech), badly needed only free option is bark and tortise TTS right now.


Coqui-TTS with vtck/vits is very good right now. Not as good as eleven labs or coqui studio, but for fast open TTS it's pretty good, in case you're not familiar with it.

It will be great when there's eventually something open that competes with the closed models out there.


Excellent, I will take look into this.


"What a time to be alive!"


The fact that it generates a song for the prompt "Earthy tones, environmentally conscious ... organic instrumentation" goes a long way to proving that English words no longer mean anything particularly.


I've played guitar 25 years, and it's funny how the music community has been using all kind of words to describe music or tone. Describing certain tone as "hairy", "mushy", "wooly", "airy", "buttery", etc. is just very common.


Sure, there's jargon. But these words don't describe the music. They describe words associated with the kinds of people who would listen to it (according to the biases of the language model). As a description of the music, it's meaningless. If a person was asked to name some "environmentally conscious" music they could just as easily veer over to hardcore straight edge.


It works if you don't need a specific type of music but just something to fit a vibe. For example in an indie game or film project. Maybe level 3 is a jungle area with a plant based species of NPCs and the music should be "Earthy tones, environmentally conscious ... organic instrumentation"


That sort of presumes those words had any effect on the output.

We might know more had it generated a song as you said, but in fact it generated only an instrumental.


I think "song" should go in quotes


oh no, more muzak !


For my two cents; the goal of our human pursuit is to advance our technologies and systems to the point we can all live carefree lives where the focus of our pursuits is self defined.

It is not to "own more rights" to shit. Sorry, but no one actually owns what they create, that's the point of creation. And if you take issue with the term create, that only reinforces my point. We're all influence machines, input and output, the future should not be about preserving some peoples rights to limit our collective advancements over their personal wants. Tough shit.


> the goal of our human pursuit is to advance our technologies and systems to the point we can all live carefree lives where the focus of our pursuits is self defined.

Agreed!

> the future should not be about preserving some peoples rights to limit our collective advancements over their personal wants.

Systems that don't allow people to extract value from the hard work they put into collective advancement do not seem to lead to collective advancement over time and at scale. Incentives matter. No one's going to spend all day making candy and put it in the "free candy" bowl when that one asshole kid down the street just takes all of the candy out of the bowl every single day.

At small scales (i.e. relatively few participants with a relatively high number of interactions between them) then informal systems of reciprocity and reputation are sufficient to disincentivize bad actors.

At large scales where many interactions are one-off or anonymous, you need other incentives for good-faith participation. There's a reason you don't need a bouncer when you have a few friends over for drinks, but you do if you open a bar.


> Systems that don't allow people to extract value from the hard work they put into collective advancement [...] one's going to spend all day making candy and put it in the "free candy" bowl

On the other hand this is what researchers do all day every day. PhDs and professors work for the common good and get barely any pay in return. Maybe the future model in art and music is more like the academic researcher.


PhDs and professors are paid a living wage (though less so over time as federal funding for higher institutions has dwindled).

Academia is a carefully constructed system whose incentive structure is based on highly visible explicitly measured citations and reputation.

People aren't generally just trying to maximize wealth. They're trying to maximize their sense of personal value, which tends to be a combination of wealth, autonomy, and social prestige. Academics (and some creative fields) tend to be biased towards those who prioritize prestige over wealth.


No it's not-- researchers generally get paychecks. Even if they're small, they can pay for their housing and buy their kid food.

Artists don't see a single red cent from their work being sucked up into some AI content blender. Their work is being taken and used-- often in service of others making a profit-- and they receive nothing. Not even credit.

Edit: Well, they don't receive nothing-- they get a bunch of people telling them they're selfish jerks for wanting to support themselves with their work.


The majority artists never receive a single red cent from the humans who consume their work.

This is how it has always been, and fundamental to the economics of art. Things people are willing to do regardless of financial compensation rarely pay well.


Putting commercial artists, aspiring fine artists, and hobby artists in the same bin doesn't make sense. There are a ton of career commercial artists that make money solely off of their work. If you think there are more aspiring career fine artists that don't end up making it than career commercial artists, you're wrong. They're not even in the same business.


Did you notice how you called all of them artists?


You're not serious, right? How about sandwich artists? Custodial artists? Con artists? Are Social engineers, microsoft certified systems engineers, cisco certified security engineers, mechanical engineers, technical support engineers, train engineers, and agricultural engineers all functionally and economically equivalent? Are professional cooks and home cooks economically equivalent because they're both cooks?


Nobody is necessarily owed money for creating art, regardless of whether it was done by hand or with machines (unless they were specifically commissioned to create the work). The economics of art are subject to market forces, and they always have been - long before AI became an additional factor (among many others) in why an artist may not earn a living from their art.

Deviantart in the 00s was a huge repository of art people were mostly making for free. Some people got lucky and turned that into a full time occupation, but the vast majority didn't.


Great non-answer.

None of the people whose art was sucked up by these machines had any idea that this would be integrated into for-profit tools and used against them in the market place, and almost certainly would not have consented if they had known that. The ones with copyrighted images didn't even give legal consent. The fact that Getty's logo showed up in red carpet images is a symptom of a problem that obviously went well beyond Getty, but there's no way an independent artist could prove it.

Furthermore, if you think artists + luck = commercial art you're completely 100% wrong. Most art school graduates don't go into fine art for self-expression with some lucky individuals matriculating into careers-- they go for job training. Go look at the degree programs for any art school-- almost all of them translate into a full-time commercial career immediately out of school the same way a STEM degree does. Concept artists, illustrators, character artists, environmental artists, graphic desginers, VFX artists, animators, cinematographers, choreographers, commercial musicians and composers, photographers... these people don't just spring up out of the fine art world. That is their career. I know because I am one.

Your ethics-dodging free market tech libertarian garbage holds no sway with me, so you might as well just save yourself the keystrokes.


The vast majority of the world's population doesn't have the privilege of going to art schools and training for years to render their ideas.

AI allows more people to be artists without the skill barrier, and that is a social good.

I'm sorry that the mechanistic portions of your art career are rapidly losing economic value, but I think free tools like StableDiffusion (and the better tools that are coming in the future) should be available to every child (and adult) in the world. And the world will be a better place as a result.


Haha... Nice ham-fisted non-sequiturial attempt to make the artists the privileged ones here. Who can and can't go to at school is irrelevant. My example clearly showed that you're understanding of the "economy of art" is not even close to representative. Commercial artists operate at all levels of society, all over the world, and have for a really long time. Nobody making beautiful hand-carved and painted signs for local businesses started out as an independent fine artist using signs as a medium, hoping to get their big break. They learned the arts of carvin, painting, lettering, illustration, and so forth, and the craft of making and finishing durable outdoor wooden structures, and started a career.

The utilitarian argument is only philosophically defensible in the street car scenario; when people are deliberately pulling the levers unprompted and it could have been done ethically but it wasn't because it was just to darned inconvenient and/or expensive, the greater good argument doesn't work. It's the same argument people used to defend the Tuskegee experiment and its ilk... And roman public slaves. If we're willingly throwing people into the spinning wheels of progress because it will be wonderfully convenient and neat for non-artists to use other people's skills instead, there is just no ethical defense. Knocking the stool of from under someone using a tool they built and you didn't even have permission to use is just not ethical. You and many others end up falling back on right-wing platitudes about the free market.

I'm actually a technical artist— my skills are now dramatically more valuable. That doesn't make it any more ethical even if it works on my favor.

I was a developer for 10 years, and have seen this unadulterated hubris in nearly every group of developers I've ever encountered— when you find yourself explaining someone's job and industry to them, you might want to stop and ask yourself... "Am I pulling a Dunning-Krueger right now?"


> non-artists to use other people's skills instead

Here is where I think you're wrong: Everyone is a potential artist.

Many never have the opportunity because of economic reasons: some people don't have the time to cultivate the mechanical skills necessary to render their vision.

All artists pull from the work of others constantly. That is 'using other people's skills'. It is absolutely privileged to have the opportunity to devote time cultivating art production skills (which is usually done by studying and replicating the process of the artists that came before them).

If someone is born to a single mother, grows up having to raise their siblings, then has to immediately start working to help support their family, they don't have the privilege or time to explore art production skills. Technology has been making art production more accessible for hundreds of years, and that's a clear net good. I want everyone to be able to make art if they want to.

It up to the artist if they want to manifest their vision without creating the pigments and brushes themselves, or using a drawing tablet, or using generative AI tools - it all can be art that is meaningful to the person who creates it and more art in the world is a good thing.

Commercial art isn't a very good representation of the artists soul: they are constrained by market forces or their patron rather than producing what their own heart desires. We should all be happy to sacrifice commercial art jobs if it means that every person in the world gains the ability to render the art that comes from their soul.


So, in summary, no professional artist's socioeconomic wellbeing matters because all artists are really the same... and everybody is an artist really, and not all artists start with the same socioeconomic advantages so this is really about about equality... but at the same time, it's about "sorry, that's the market and you lost" capitalism, and blah blah blah.

Being a professional artist isn't fucking magic. It's not winning the lottery or even getting drafted for a pro sports team. It's a large group of regular fucking careers just like any other but creativity makes up a larger percentage of their professional cognitive toolkit. I know a workaday oil painter who's neither well-known nor rich but went to college to learn how to be an oil painter and that's how he comfortably pays his bills. He paints seaside landscapes, wealthy people buy them to put in their vacation homes, and that's how he makes his money. He was in college right along side with graphic designers, animators, architects, product designers, user interface designers, and all manor of other professional artists. Just like any other non-licensed white collar profession, people who didn't go to school are in the business, but it's harder. I also know a professional comic book artist, many animators, sculptors that work in product development, and plenty of other professional artists that built normal careers like any other white collar professional. The fact that not everybody can go to art school to build an art career doesn't change the disposition of professional artists any more than not everybody being able to afford to go to medical school affects the disposition of doctors, and having "the soul of a healer" doesn't really enter the goddamned equation, does it?

Whether you're being deliberately obtuse or are painfully ignorant about something you've got a lot of strong, baseless opinions about, you're obviously not going to cut the bullshit and be honest with yourself. The reason you have to delve into all of these pseudo-philosophical mental gymnastics is because you're wrong, but you really don't want to be, so you're trying to construct a reality in which you're right. That's not how reality works. Using people's work without their permission to make a for-profit system to compete with them in their professional marketplace is not moral no matter how big of a castle of bullshit logic you build around it.

Bye bye. I'm going to let you hang out in your little land of make-believe by yourself.


Interesting discussion. I notice a lot of artists on HN get riled up particularly when AI media generation is brought up, while coders get riled up when AI code generation is brought up, while each side dismisses the other. Fundamentally, this shows me that it's not really about the morals at all but the economics of losing one's livelihood. Fortunately for one and unfortunately for the other, technology will continue to march on, it seems.


> it's not really about the morals at all but the economics of losing one's livelihood

That's a false dichotomy. Just because the economic issue and cultural impact are separate doesn't mean they're mutually exclusive. If a mugger stopped taking people's money and instead just walked around making people afraid for their lives by intimidating them or beating them up, that would still be immoral.


In your point of view it's immoral, not from everyone's, it seems. Either way, technology still arrives.


> Fundamentally, this shows me that it's not really about the morals at all but the economics of losing one's livelihood.

Whether people consider it immoral has no bearing on whether or not the cultural and economic issues are mutually exclusive-- they're not. You can't say that the argument is only about the economic issue simply because there is an important economic argument. It just doesn't make sense.


yeah, but they also get to work on what they love as opposed to what ever the corporate interest currently is. it's rare you get paid will for doing what you love i.e. music, teaching, designing hand bags etc


Well, researchers usually have to get their own grants and must thus work on whatever various funding sources deem worthy. Further, academic positions typically have duties that researchers may not like - administration, reporting, teaching, etc


"Maybe the future model in business and sales is more like the academic researcher" funny how nobody ever suggests that.


To the contrary, the future of the academic researcher is business and sales.


> No one's going to spend all day making candy and put it in the "free candy" bowl when that one asshole kid down the street just takes all of the candy out of the bowl every single day.

Software is an infinite candy bowl. Taking candy out of the bowl does not take any candy from the person who made it.

Imagine if this was the physical world, and you had a machine that could end world hunger. You could copy food like you could copy and paste information on a computer. Imagine someone who would keep that machine to themself, out of a sense of entitlement to make a few bucks. Any person with any sense of morality can see the obvious problem with that.


> Software is an infinite candy bowl.

More in theory than in practice. Ask any open-source maintainer how much running a popular project is unlike putting out an infinite candy bowl and then going on with your life.


> Software is an infinite candy bowl. Taking candy out of the bowl does not take any candy from the person who made it.

Software is not a finite candy bowl. It is also not an infinite candy bowl. It's not like physical goods at all, not even like physical goods that can be magically cloned. It's just different, entirely.

The incentive and value structures around data creation and use just can't be directly mapped to physical goods. You have to look at them as they actually are and understand them directly, not by way of analogies.

Why do people make software and give it out for free? Is it purely from the joy of creation? Sure, that's part of it. The desire to make the world better? Probably some of that too. Are those forces enough to explain all open source contribution?

Definitely not. Here's one quick way to tell: Ask how many open source maintainers would be happy if someone else were to clone their open source project, rename it, claim that they had invented it, and have that clone completely overshadow and eradicate their original creation?

If the goal was purely altruistic, the original creator wouldn't mind. More candy in the infinite candy bowl, right?

But, in practice, many open source maintainers strongly oppose that. There is a strong culture of attribution in open source, largely because there is a compensation scheme built into creating free software: prestige. One of the main incentives that encourages maintainers to slave away day after day is the social cachet of being known as the cool person who made this popular thing.

> Imagine if this was the physical world, and you had a machine that could end world hunger. You could copy food like you could copy and paste information on a computer.

Analogies are generally bad tools for real understanding, but let's go with this. Let's say this machine took fifty years of someone's life to invent, toiling away in obscurity. Basically, an entire working career spent only on this invention with nothing else to show for their adult life.

If, at the end, no one would ever know it was you who invented it, how many people would be willing to sequester themselves in that dark laboratory and make that sacrifice?


The idea of intellectual property being property that someone owns is a purely social construct though.

Candy can be consumed to depletion. Art gets richer the more it is consumed.

A much better analogy would be having a sculpture in your front yard. The idea that a kid would be an asshole for appreciating the sculpture too much is obviously laughable. People choose decorate their yards for the status having an attractive yard brings, without the expectation of profit from it.


>The idea of intellectual property being property that someone owns is a purely social construct though

So is the idea of real property being something that someone owns.


Most things people do end up being a care of other people.

If nobody is beholden to any job or duty, and the machines do everything, who is to say I don't want to make every machine on earth dance in a flash mob? I cannot do that, because it would require other people to halt their use of the machines. Abundance is a false promise and one we should be quick to shoot down lest we surrender our future rights to the ones advertising it.

Removing the worth of people in their jobs removes their leverage in the constant resource allocation negotiation in the economy. Given that we just witnessed Elon Musk spend 20,000 average American lifetime earnings worth of wages just to be the new dictator of a social media company, I'm not sure that I want those negotiations to take place only among the giga-rich.


Ah yes, you should be free to rewrite the fictional work I wrote or add one chapter to it and be free to sell it under your name magically implying that you are the author. Screw the original artists, right? Why should they deserve anything.

Thankfully your opinion is an extreme opinion and will never come to pass. Tough shit indeed. :)

----

I really like hackernews but recently I've been seeing a plethora of "your rights don't matter, you own nothing" bs spreading around.


I'm able to do everything in your hypothetical scenario besides illegally copy and redistribute the original content. If it makes me happy to write my name on the cover and pretend I wrote "Robert Frost's Poetry Collection" then so be it. Truly, if I want to say "fuck the original artists" in the comfort of my home or margins of the pages, I can do so. I can even sell that adulterated copy under the First Sale doctrine.

> I've been seeing a plethora of "your rights don't matter, you own nothing" bs spreading around.

It's not a new sentiment. Plenty of countries worldwide ignore US copyright, if they all got sanctioned then America wouldn't have have electricity.

If you author a copy of your content in a digital medium, you should be prepared for that content to be redistributed against your will, infinitely. It's not the nature of humanity, it's the nature of the digital format.


>that content to be redistributed against your will

That is fine.

>If it makes me happy to write my name on the cover and pretend I wrote "Robert Frost's Poetry Collection" then so be it.

You can do whatever you want in your own house. No one is trying to dictate terms of what you do in your own private time in your own domicile. The problem occurs when you try to profit from my work, by pretending you did the work, and selling it to the public pretending to own and create the work which you have not.

>I can even sell that adulterated copy under the First Sale doctrine.

However, the idea of "You can add a chapter to this, call it your own, and sell it" and I won't legally come after you is absurd. and if the results of said legal pursuit result in you being bankrupt...what was the phrase you used, "then so be it."

---

Alternatively, you could write your own fictionalized work and sell it. But that requires more work than copying what I wrote, calling it yours, and selling it, doesn't it?


> However, the idea of "You can add a chapter to this, call it your own, and sell it" and I won't legally come after you is absurd.

The only part of this that seems legally problematic is copying and redistributing the part that you wrote. If I wanted to buy 10,000 copies of A Game of Thrones so I could staple my fanfiction to the back and sell it on Etsy, there is no legal precedent suggesting I could be stopped. It is a lawful transformation of a legally licensed product, payed for in-full and redistributed in accordance with it's individual license. Absent any extenuating contracts between the owner and seller, I don't see what legal ground you would have to stand on.

> and if the results of said legal pursuit result in you being bankrupt...what was the phrase you used, "then so be it."

"if" indeed. Let's check in with the Author's Guild and see how this fight is going: https://www.copyright.gov/fair-use/summaries/authorsguild-go...


I mean... rejecting ownership of information (but not rejecting attribution of work) was a key value of the hacker movement in the 80s, so I'm not susprised it's a popular belief on HN.


I might point you to Article I, Section 8, Clause 8 of the US Constitution: "[The Congress shall have Power . . . ] To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries."

They are with you on the advance, and that in the long term science and the useful arts can't be owned. But to achieve that long-term goal, they saw it as valuable to give people temporary rights to align those "personal wants" with "our collective advancements".


> For my two cents; the goal of our human pursuit is to advance our technologies and systems to the point we can all live carefree lives where the focus of our pursuits is self defined.

Our overlords didn't get the memo


Culture is based on the creation of those that came before. Genres of music are created as people try to mimic styles of those before…one could argue that they “trained themselves on the previous dataset”. No one creates anything in a vacuum. They utilize things that we collectively have contributed. Hell, language and writing is the open source thing that we collectively own that ppl use to create their stuff. The creation came after going to schools that we collectively pay for. And travelling on shared roads. It’s the standing on the shoulders of giants — except its really just stacked people all adding their bits.


Also the latin alphabet originated from the Euboean alphabet. Euboia is my home, i live here. My guesstimate is that all latin writers, wouldn't like to pay copyright for their use of our Greek letters in their everyday lives.[1]

I mean everyone who writes right now, into this HN thread, owns copyright to someone for the letters, amirite? That someone is me. Anyway, long story short, copyright was always a pretty ridiculous idea, alongside with patents of course, but it is only right now, with programs that can mimic writing style, painting style, speech style etc, that is obvious to everyone.

As a side note, there was a Greek private torrent tracker, blue-whitegt, which today would be a serious competitor to American companies like netflix or youtube, but it was shut down because, surprise surprise, there were some copyright issues, despite the site being a really quality service, a paid service of course. Blue-whitegt today would be a 10 billion to 100 billion company, instead all the profits aggregated to American companies.

When it comes to copyright, very soon everyone on the planet, will have monetizable torrent seeding. All of these copyright chickens, are coming home to roost!

https://en.wikipedia.org/wiki/Archaic_Greek_alphabets#Euboea...


I disagree - the human pursuit is artificially, or organically, besting its own pattern recognition wet-ware.

If we can off load the mundane (survival) aspect of our pattern recognition engine, then maybe we can use those cycles in lofty pursuits - this is the victorian fallacy.

-

Break everything down - its all patterns all the way, and how we process them... we are letting AI take on an aspect of ourself (pattern recog ;; tokens;; and prediction)

That is, if applied to self-preservation, is the essence of sentience.

(I think! therefore, I am, and I will prevent you from making me NOT)


Creating something (writing a book, recording a song, etc) is a conversion of time (your only finite resource) into something of value (maybe only to you). It also turns out that having a profit motive and IP protection around creating valuable things is a fundamental requirement for having a creative industry to begin with. It's also what drives the free market to determine which creations are even valuable to begin with.


This has nothing to do with living a carefree life, this whole AI initiative is so tech companies can extract more money from their products.

Don’t want to pay for content ? Well we have “solved that”…


"Imagine a professional musician being able to explore new compositions without having to play a single note on an instrument." A musician will always reach for an instrument as their compositional tool - keyboard and mouse producers are not musicians.


This makes no sense.

How is pressing a key on a piano different from pressing a key on an electronic piano?


I was referring to a computer keyboard, not an electric piano. I can't see how any musician would see this appealing as a compositional tool. Music is its own language - expressing a musical idea with a text prompt is antithetical to the process of making music.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: