Open-sourcing AudioCraft: Generative AI for audio

gavman · on Aug 2, 2023

> MusicGen, which was trained with Meta-owned and specifically licensed music, generates music from text-based user inputs, while AudioGen, which was trained on public sound effects, generates audio from text-based user inputs.

Meta is really clearly trying to differentiate themselves from OpenAI here. Open source + driving home "we don't use data we haven't paid for / don't own".

kmeisthax · on Aug 2, 2023

This is purely a function of everyone remembering the RIAA's decade-long campaign to prevent people from taking the music they had rightfully stolen. As far as I'm aware LLaMA was trained on "publicly available data"[0], not "licensed data".

Furthermore, MusicGen's weights are licensed CC-BY-NC, which is effectively a nonlicense as there is no noncommercial use you could make of an art generator[1]. This is not only a 'weights-available' license, but it's significantly more restrictive than the morality clause bearing OpenRAIL license that Stability likes to use[2].

[0] https://github.com/facebookresearch/llama/blob/main/MODEL_CA...

[1] https://github.com/facebookresearch/audiocraft/blob/main/LIC...

[2] These are also very much Not Open Source™ but the morality clauses in OpenRAIL are at least non-onerous enough to collaborate over.

ericpauley · on Aug 2, 2023

My understanding (IANAL) [1] is that copyright licenses have no say on the output of software. Further, CC licenses don't say anything about running or using software (or model weights). It's therefore questionable whether the CC-BY-NC license actually prevents commercial use of the model.

[1] https://opensource.stackexchange.com/questions/12070/allowed...

cosmojg · on Aug 2, 2023

You're correct, but no one has had the balls (or the lawyers) to clarify this in court yet. Expect to see hosting providers complying with takedown requests for the foreseeable future.

indymike · on Aug 2, 2023

Hosting providers *have* to comply with takedown requests to maintain safe harbor.

mcbits · on Aug 2, 2023

I don't remember the details (or outcome) but there was a lawsuit a few years ago involving CAD or architecture software and whether they could limit how the output images were used because they were assemblages of clipart that the company asserted were still protected by copyright. Something like that. A lot of "AI" output potentially poses a similar issue, just at a far more granular level.

Tepix · on Aug 2, 2023

You're wrong because software, as you describe it, includes the "cp" command which creates a perfect copy.

ericpauley · on Aug 2, 2023

As sibling noted, we’re talking about the impact of a software’s license on use of its output.

I suppose your point would stand if the software were a quine?

tikhonj · on Aug 2, 2023

The copyright license of the cp code itself has no bearing on the copyright of what you produce (well, copy) with cp.

robertlagrant · on Aug 2, 2023

That's not the point they're making. They're replying to their parent comment.

eropple · on Aug 2, 2023

> MusicGen's weights are licensed CC-BY-NC, which is effectively a nonlicense as there is no noncommercial use you could make of an art generator

How do you figure? Have you never just...made stuff to make stuff?

kmeisthax · on Aug 2, 2023

In copyright law the use of the work itself is considered a commercial benefit, so "noncommercial use" is an oxymoron. Consider these situations:

- If I use AudioCraft to post freely-downloadable tracks on my SoundCloud, I still get the benefit of having a large audio catalog in my name, even if I'm not selling the individual tracks. I could later compose tracks on my own and ride off the exposure I got from posting "noncommercially".

- If I run AudioCraft as a background music generator in my store, I save money by not having to license music for public performance.

- If I host AudioCraft on a website and put ads on it, I'm making money by making the work available, even though I'm not charging a fee for entry.

I suspect that a lot of people reading this are going to have different arguments for each. My point is that if you don't think that all of these situations are equally infringing of CC-BY-NC, then you need to explain why some are commercial and some are not. Keep in mind that every exception you make can be easily exploited to strip the NC clause off of the license.

If you're angry at the logic on display here, keep in mind that this is how judges will construe the license, and probably also how Facebook will if you find a way to make any use of their AI. The only thing that stops them from rugpulling you later is explicit guidance in CC-BY-NC. Unfortunately, the only such guidance is that they don't consider P2P filesharing to be a commercial use.

So, absent any other clarifications from Facebook, all you can do without risking a lawsuit is share the weights on BitTorrent.

EDIT: And yes, I have made stuff just to make stuff. I license all of that under copyleft licenses because they express the underlying idea of 'noncommercial' better than actual noncommercial clauses do.

stale2002 · on Aug 2, 2023

This is a weird comment.

Do you think that non commercial use simply doesn't exist or something?

Because non commercial use isn't some crazy concept. It is a well established one, that doesnt disclude literally everything.

Also, you are ignoring the idea that Facebook will almost certainly not sue anyone for using this for any reason, except possibly Google or Apple.

So if you aren't literally one of those companies you could probably just use it anyway, ignore the license completely, and have zero risk of being sued.

elondaits · on Aug 2, 2023

The issue with “non commercial” is that no, it’s not well established. Licenses with a NC clause are so problematic to be practically useless. If you just want to use something at home privately you don’t need a CC license… a CC license is for use and redistribution.

http://esr.ibiblio.org/?p=4559

robertlagrant · on Aug 2, 2023

What about playing the music in a government building as elevator music, for example?

pbhjpbhj · on Aug 2, 2023

>If you just want to use something at home privately you don’t need a CC license… //

I presume you mean in USA, because in UK you don't have a general private right to copy. Our "Fair Dealing" is super restrictive compared to Fair Use.

kmeisthax · on Aug 2, 2023

Funnily enough in the UK they actually tried to fix this. The music industry argued that the lack of a private copying levy made legalized CD ripping into government confiscation of copyright ownership... somehow. The UK courts bought this, so now the UK government is constitutionally mandated to ban CD ripping, which is absolutely stupid.

pbhjpbhj · on Aug 3, 2023

I knew CD ripping got reversed but not the arguments against it, definitely stupid as not giving a monopoly is not the same as confiscation (seems like a very straightforward reasoning). No doubt done Tory got a 'management consultancy' gig with the RIAA from that one.

I like that it makes software like iTunes contributory infringers for enabling mass copyright infringement.

Tao3300 · on Aug 2, 2023

I miss that blog. It was a little crazy and the comments were a flame war shitshow, but man it was fun to read sometimes. Even if I vehemently disagreed, it got me thinking.

Whatever happened to esr? Did he just get too paranoid and clam up?

kmeisthax · on Aug 2, 2023

Noncommercial use is not well established in copyright law, which is the law that actually matters. I know other forms of law actually do establish noncommercial and commercial use standards, but copyright does not recognize them.

As for "Facebook won't sue"? Sure, except we don't have to worry about just Facebook. We have to worry about anyone with a derivative model. There's an entire industry of copyleft trolls[0] that could construct copyright traps with them.

Individuals can practically ignore NC mainly because individuals can practically ignore most copyright enforcement. This is for the same reason why you can drive 55 in a 30mph zone and not get a citation. It's not that speeding is now suddenly legal, it's that nobody wants to enforce speed limits - but you can still get nailed. The moment you have to worry about NC, there is no practical way for you to fit within its limits.

[0] https://www.techdirt.com/2021/12/20/beware-copyleft-trolls/

tiahura · on Aug 2, 2023

Commercial vs Noncommercial use is well established in copyright law - in everything from Final Rule Regarding the Noncommercial Use Exception to Unauthorized Uses of Pre-1972 Sound Recordings https://www.copyright.gov/rulemaking/pre1972-soundrecordings... to Noncommercial webcasters https://www.law.cornell.edu/uscode/text/17/114#f_4 to Fair Use.

Noncommercial licenses are taken up in "GREAT MINDS v. FEDEX OFFICE AND PRINT SERVICES, INC 886 F.3d 91 (2nd Cir. 2018). Thé court explains they are enforceable and are basically just a category of contract. So, as long as the contract is clear, it’s probably enforceable.

dragonwriter · on Aug 2, 2023

> Noncommercial use is not well established in copyright law, which is the law that actually matters.

No, for “NonCommercial”, what actually matters is the explicit definition in the license.

dragonwriter · on Aug 2, 2023

> My point is that if you don't think that all of these situations are equally infringing of CC-BY-NC, then you need to explain why some are commercial and some are not.

What “NonCommercial” means in the license is explictly defined in the license, and if you think either those examples, or more to the point, every possible use ever so as to render ‘NonCommercial’ into ‘no use’ as you have claimed, you need to make that argument, based on the definition in the license, not some concept of what might be construed as commercial use by general legal principles if the license used the term without its own explicit definition.

NegativeK · on Aug 2, 2023

Is listening at home a violation of NC? That's what I've interpreted as its intent.

Tao3300 · on Aug 2, 2023

> if you don't think that all of these situations are equally infringing of CC-BY-NC, then you need to explain why some are commercial and some are not. Keep in mind that every exception you make can be easily exploited to strip the NC clause off of the license.

You're right: those are all equally infringing CC-BY-NC. I don't see a problem.

wpietri · on Aug 2, 2023

What's your evidence for this bit?

> this is how judges will construe the license

analognoise · on Aug 2, 2023

I think the key word there is "noncommercial".

dragonwriter · on Aug 2, 2023

Yes, but you can easily make noncommercial use of an art generator.

Obviously, you can't host a commercial art generation service with a noncommercial-use license, and (insofar as art produced by a generator is a derivative work of the model weights, which is a controversial and untested legal theory) you can’t make commercial art with a noncommercial license, but not all art is commercial.

kmeisthax · on Aug 2, 2023

"Noncommercial art" is not a thing in the eyes of the law. Even if you don't intend to make money the law still considers the work itself to be commercial. That's why CC-BY-NC has to have a special "filesharing is non-commercial" statement in it, because people have made successful legal arguments that it is.

You're probably thinking of "not charging a fee to use", which is a subset of all the ways you can monetize a creative work. You can still make money off of AudioCraft by just hosting it with banner ads next to the output. Even a "no monetization" clause[0] would be less onerous than "noncommercial use only", because it'd at least be legal to use AudioCraft for things like background music in offices.

[0] Which already precludes the use of AudioCraft music on YouTube since you can't do unmonetized uploads anymore

dragonwriter · on Aug 2, 2023

> “Noncommercial art” is not a thing in the eyes of the law

The definition of “NonCommercial”, the oddly capitalized term of art in the license, is not a matter of general law, it is a matter of the license, which defines it as “not primarily intended for or directed towards commercial advantage or monetary compensation. For purposes of this Public License, the exchange of the Licensed Material for other material subject to Copyright and Similar Rights by digital file-sharing or similar means is NonCommercial provided there is no payment of monetary compensation in connection with the exchange.”

> Even if you don’t intend to make money the law still considers the work itself to be commercial.

Even if you do make money, if the use is “not primarily intended” for that purpose, it is "NonCommercial" in the terms of the license.

> That’s why CC-BY-NC has to have a special “filesharing is non-commercial” statement in it, because people have made successful legal arguments that it is.

It has the filesharing term in it because it permits that particular exchange-of-value as a primary purpose.

> Even a “no monetization” clause would be less onerous than "noncommercial use only"

How would a clause that prohibits monetization entirely be less onerous than one which prohibits it only as the primary intent of use?

> it’d at least be legal to use AudioCraft for things like background music in offices.

It is legal to use it for that purpose (in a for-profit enterprise, I suppose, one might make an argument that any activity was ultimately primarily directed at “commercial advantage”, but in a government or many nonprofit environments, that wouldn’t be the case.)

vel0city · on Aug 2, 2023

In their example audio clips they have a "perfect for the beach" audio track. With your understanding of the NC license, would a resort or private beach club be able to play a similar generated music track at their poolside bar or something along those lines? Their primary intention of the bar isn't to play the music, its just an additional ambiance thing; they're trying to sell drinks and have guests pay membership fees, people aren't really coming because of the background music.

I realize, this isn't legal advice, YMMV, etc.

dragonwriter · on Aug 3, 2023

> With your understanding of the NC license, would a resort or private beach club be able to play a similar generated music track at their poolside bar or something along those lines?

A resort, probably not, ambiance is, at least arguably, a marketable commercial advantage; a private club in the “mutual benefit organization” sense (rather than a “business selling memberships”, which is just like a resort), probably, because their interest, even indirectly, isn’t making money.

Blahah · on Aug 2, 2023

Yes it is. Art that I make for my own enjoyment is noncommercial. Art that I make to explain concepts to my son is noncommercial.

schleck8 · on Aug 2, 2023

> as there is no noncommercial use you could make of an art generator

r/stablediffusion gives you a hundred examples daily of people just having fun and not thinking of monetizing their generations

Blackthorn · on Aug 2, 2023

> there is no noncommercial use you could make of an art generator

I'm sorry, what?

rvnx · on Aug 2, 2023

Google is running on "publicly available data", not "licensed data"

JeremyNT · on Aug 2, 2023

The fact that Meta is able to lie and call their restrictive licensing open source is nearly as misleading as "OpenAI."

We need to do better than to repeat these claims uncritically. The weight licenses are not "open source" by any useful definition, and we should not give Meta kudos for their misleading PR (especially considering that they almost surely ignored any copyright when training these things - rules for thee, but not for me).

"Not as closed as OpenAI" is accurate, but also damning with faint praise.

j_maffe · on Aug 2, 2023

Just some general piece of advice: it's not productive to constantly be giving out the worst criticism you possibly can when someone does something that's not terrible but still unacceptable. Doing so just tells the companies that nothing satisfies the community and that they should stop trying. Instead, it's better to mention what they did right and point to how they can make it better.

voz_ · on Aug 2, 2023

Can you chill? It’s def open source

mkl · on Aug 2, 2023

The source code is, as it's MIT, but the weights are not, as they're CC-BY-NC: https://github.com/facebookresearch/audiocraft#license

version_five · on Aug 2, 2023

  about: pytorch @ fb.

Filligree · on Aug 2, 2023

So I can build a business on it, then?

__loam · on Aug 2, 2023

I believe Meta has explicitly said that you can, but that's not what open source means and the model isn't open source.

mkl · on Aug 2, 2023

Meta says to imagine you can: "Imagine a professional musician being able to explore new compositions without having to play a single note on an instrument. Or an indie game developer populating virtual worlds with realistic sound effects and ambient noise on a shoestring budget. Or a small business owner adding a soundtrack to their latest Instagram post with ease."

In reality, you can't, as they licensed the weights for noncommercial use only: https://github.com/facebookresearch/audiocraft#license

chaxor · on Aug 2, 2023

Research does exist you know. This is immensely helpful for a huge number of people in academia.

If you want to build a company, perhaps you should do what everyone in the industry has done for millennia, copy the movements performed and optimize them while doing so.

archontes · on Aug 2, 2023

You don't own data. You can sometimes copyright data.

https://www.americanbar.org/groups/science_technology/public...

ChildOfChaos · on Aug 2, 2023

It's likely partly a PR/branding exercise as well.

In the new world that Meta sees, of VR/AR and AI, Meta is in a position already were people don't want them to have much power in this world, because they don't trust them over privacy etc, meta is trying to pivot to become more trustworthy so they make genuine moves in this space.

smoldesu · on Aug 2, 2023

That, or this is an ongoing research lab (FAIR) that has existed for ~half a decade and has advanced the state-of-the-art in AI further than Apple, Microsoft and Google combined.

__loam · on Aug 2, 2023

I would be pretty shocked if meta were that far ahead of all 3 of those companies, all of which are also spending a fuck load on internal AI research.

polygamous_bat · on Aug 3, 2023

> all of which are also spending a fuck load on internal AI research.

But their internal research stays internal. Sometimes, they put out "papers" which are glorified advertisements, often going as far as hiding the model architecture just to keep their competitive advantage.

__loam · on Aug 3, 2023

I get that, I'm just saying the original statement, that meta is further along than all of those companies combined, is a pretty wild claim.

smoldesu · on Aug 2, 2023

If all three of those companies have something to show for their research, none of it is at the scale or level of accessibility Pytorch, Llama and now Audiocraft offer.

scrum-treats · on Aug 2, 2023

> "Meta is really clearly trying to differentiate themselves from OpenAI here. Open source + driving home "we don't use data we haven't paid for / don't own"."

Isn't Meta settling lawsuits for this right now? In addition to violating user privacy (another lawsuit)...

Meta is attempting to destroy competition; that's it. Similar to how they paid a fortune to lobby against Tiktok for the exact reasons Meta is under active investigation (again). The irony.

croes · on Aug 2, 2023

"If we don't win here, then at least we'll kick their lawn to pieces."

itsyaboi · on Aug 2, 2023

Bully "Open"AI into rebranding.

samstave · on Aug 2, 2023

They are doing PR damage control with an influx of AI stuffs due to the ridicule of metaverse and the recent revelations of threads (for which they are playing the long AI game) -- [are not concrened about all threads and IG and other accounts being linked via their internal LLMs we will never hear about?

jstummbillig · on Aug 2, 2023

Yes. Meta is in the business of commanding as much of peoples time as possible. AI is more or less the biggest danger to this model (apart from legislation, theoretically, but let's not kid ourselves). Making AI a commodity is in their very interest.

agilob · on Aug 2, 2023

Goddamn, Facebook being the good guy...

deepvibrations · on Aug 2, 2023

Nah, this is just the modern tech playbook: First you open source stuff, then you can monitor all the related development happening and whenever you see areas of interest/popularity, you simply clone the functionality or buy out whatever entity is building that interesting stuff.

version_five · on Aug 2, 2023

They're not, they're playing a longer Microsoft style game to corrupt the meaning of open source, and releasing models under their terms to undermine competitors.

agilob · on Aug 3, 2023

Sounds like they're good enough. Enemy of Microsoft is my friend

RobotToaster · on Aug 2, 2023

CC-BY-NC Isn't an open source licence, it violates point six of the open source definition https://opensource.org/osd/

barbariangrunge · on Aug 2, 2023

Companies just putting “open” in the names of non-open things to make hn and the press automatically love it

hackernewds · on Aug 2, 2023

who gets to declare what is the "open source definition" and why?

frognumber · on Aug 2, 2023

In my opinion, the Free Software Foundation, ironically, since they invented the movement, with open source starting out as a tacky rip-off with the ethics stripped out. After decades, open source converged on free software.

More popular opinion is OSI: https://en.wikipedia.org/wiki/Open_Source_Initiative

They were founded by the persons who (claimed to have) invented the term in order to steward it. It's the same definition as the FSF.

xdennis · on Aug 2, 2023

The people who created the term: the Open Source Initiative.

Before, people most often used "free software" as defined by the free software movement, but some disliked this term because it's confusing (most think "free" means no money) and perceived to be anti-commercial.

The term "open source software" was chosen and given a precise definition.

It's dishonest, then, for people to use the term "open source software" with a different interpretation when it was specifically chosen to avoid confusion.

OrangeMusic · on Aug 5, 2023

> It's dishonest, then, for people to use the term "open source software" with a different interpretation

I disagree. You're saying that they "invented" the term, but it's a very generic term. The source is open, it's open source. I bet people were using the term before they claim they invented it.

In that context, it is very fine to use a different definition and in fact here's my definition, and I guess most people (maybe not on HN) share it: if the source is visible by the general public, it's open source.

For what you mean, I use "FLOSS".

mesebrec · on Aug 2, 2023

Where does it say this is CC-BY-NC?

The article says this:

> Our audio research framework and training code is released under the MIT license to enable the broader community to reproduce and build on top of our work

gnaman · on Aug 2, 2023

https://github.com/facebookresearch/audiocraft/blob/main/LIC...

btown · on Aug 2, 2023

It's pretty common in academic research for trained model weights to be licensed under something different from the code that one would run to create such a model if one had both sufficient compute resources and the same training dataset. That is, if those weights are ever released at all!

IMO, while I'd rather have one part permissively licensed than nothing at all... it stinks that companies sponsoring researchers get an un-nuanced level of street cred for "open sourcing" something that they know nobody will ever be able to reproduce because their data set and/or their compute grid's optimizations are proprietary.

As it stands, I'm not at all sure that the outputs of this model can be used for commercial videos.

7moritz7 · on Aug 3, 2023

One day the FOSS community will implode over ethics and licenses when the coolest thing ever gets released

bottlepalm · on Aug 2, 2023

Anyone feel like with the flood of AI generated content there's a risk of the past being 'erased'. Like in 10 years we won't be able to tell if any information from the past is real or fake - sounds, pictures, videos, etc.. Like we need to start cryptographically signing all content now if there's any hope of being able to verify it as 'real' 10 years from now.

crazygringo · on Aug 2, 2023

No. We've had photo and audio manipulation for many decades now. For a long time now, we've had to separate out what's credible from what's bullshit.

Fortunately, it's pretty simple in real life. We have certain publications and sources we trust, whether they're the NYT or a respected industry blog. We know they take accurate reporting seriously, fire journalists who are caught fabricating things, etc.

If we see a clip on YouTube from the BBC, we can trust it's almost certainly legit. If it's some crazy claim from a rando and you care whether it's real, it's easy to look it up to see if anyone credible has confirmed it.

So no, no worry at all about the past being erased.

ysleepy · on Aug 2, 2023

I don't agree. With ML tools it is possible to make sweeping changes to images and text that are often impossible to detect. combined with the centralisation of most online activities, large players could alter the past.

Imagine facebook decides to subtly change every public post and comment to show some particular person or cause in a better light.

crazygringo · on Aug 2, 2023

If one "large player" like the NYT decides to "alter the past", you can compare with the WaPo or any other newspaper. You can compare with the Internet Archive. You can compare with microfiche. These aren't "impossible to detect", they're trivial to detect if you bother to compare.

We have tons of credible archived sources owned by different institutions. And these sources are successful in large part due to their credibility and trustworthiness.

It's just not economically rational for any of them to start "altering the past", and if they did, they'd be caught basically immediately and their reputation would be ruined.

This isn't an ML/tooling question, it's a question of humans and reputation and economic incentives.

ysleepy · on Aug 2, 2023

You seem eager to exclude the possibility.

Maybe it is improbable, but there now is the technical possibility which was not there before.

It is valuable to explore that possibility and maybe even work to prevent such a use.

I would be interested in a ledger of cryptographically signed records of important public information such as newspapers, government communication and intellectual discourse.

Your argument that large social media will behave rationally is not backed up by reality. Consider Musk and Twitter.

oceanplexian · on Aug 2, 2023

> If one "large player" like the NYT decides to "alter the past", you can compare with the WaPo or any other newspaper. You can compare with the Internet Archive. You can compare with microfiche. These aren't "impossible to detect", they're trivial to detect if you bother to compare.

Detection doesn't really matter, because people are too lazy to validate the facts, and reporters are not interested in reporting them. AI is simply another tool to manipulate people, like Wikipedia, Reddit.com, Twitter, or any other BS psuedo-authority. Think someone will actually crack open a book to prove the AI wrong? Not a chance.

crazygringo · on Aug 2, 2023

> and reporters are not interested in reporting them

You really think that if the NYT started altering its past stories, other publications would just... ignore it?

It would be a front-page scandal that the WaPo would be delighted to report on. As well as a hundred other news publications.

Thankfully.

ysleepy · on Aug 2, 2023

That is maybe true for a small percentage of stories. You are also reducing this argument to the most construed straw man instead of engaging with the idea in earnest.

If you can't alter world news headlines, you can still alter the tone of the article. If you can't alter front page news, you still can alter the remaining 95% of news.

Influencing public opinion is more subtle than the one important headline per day.

You are also ignoring the fact that news sites regularly edit published articles already, from fixed typos to corrections to large re-editings.

crazygringo · on Aug 3, 2023

You seem to be misunderstanding.

This isn't about a small percentage of stories, it's not about tone, it's the fact that if the NYT ever did this even once with the intention to truly "alter the past" it would be a major scandal.

And obviously things like corrections or taking down libelous content aren't included.

So no, I'm not constructing any kind of straw man here. I'm saying that the threat of subtly nefariously "altering the past" isn't realistic because it would be caught and exposed and there's no financial motivation to do it in the first place.

ysleepy · on Aug 4, 2023

You are misunderstanding. Altering the tone, comments, bias etc. is altering the past, this is what I meant originally. You came up with the straw man of the new york times. Sure, some high visibility publications are less able to be altered, but that doesn't mean none can in no way.

cooper_ganglia · on Aug 3, 2023

Your solution, in the case of trusted sources altering content to fit a particular worldview, is to look at other "trusted" sources. I think that therein lies the problem. I believe the real danger isn't people being convinced of something untrue. I think the real danger is the apathy that builds up as people can no longer reliably distinguish the truth, and they give up on sifting through it altogether, instead accepting "their truth". The vast majority of people simply don't care enough to verify sources.

This is already happening without generative AI, and this new stuff is only going to speed things up exponentially.

jononor · on Aug 2, 2023

The suggested large player was Facebook and Facebook posts. Which trustworthy independent sources of authenticity do we have for that? I do not think those you mention reach inside their walled garden?

crazygringo · on Aug 2, 2023

First, why would Facebook do that? What economic incentive would there ever be, that would outweigh the loss of trust and reputation hit that would ensue?

Second, people take screenshots of Facebook posts all the time. They're everywhere. If you suddenly have a ton of people with timestamped screenshots from their phones that show Facebook has changed content, that's exactly the kind of story journalists will pounce on and verify.

The idea that Facebook could or would engage in widespread manipulation of past content and not get caught is just not realistic.

pessimizer · on Aug 2, 2023

> We've had photo and audio manipulation for many decades now.

We haven't been able to generate 1,000 different forged variants of the same speech in a day before.

> We have certain publications and sources we trust, whether they're the NYT or a respected industry blog.

We can't even be sure that most of these aren't changing old stories, unless we notice and check archive.org, and they haven't had them deleted from the archive. The NYT has blockchain verification, but the reason nobody else does is because no one else wants to. They want to be free to change old stories.

crazygringo · on Aug 2, 2023

> but the reason nobody else does is because no one else wants to. They want to be free to change old stories.

You're wildly assuming a motive with zero evidence.

No, the reason companies aren't building blockchain verification of their stories is simply because it's expensive and complicated to do, for literally zero commercial benefit.

Archive.org already will prove any difference to you, and it's much easier to use/verify than any blockchain technology.

minsc_and_boo · on Aug 2, 2023

Yep, every time technology shifts, reputation systems shift in response.

This goes all the way back to yellow news with newspapers: https://en.wikipedia.org/wiki/Yellow_journalism

strikelaserclaw · on Aug 2, 2023

Most people these days interact with news through comments, if comments looks legit, a lot of people assume the source is legit. Imagine a world in which a fake video has the bbc logo on it and ai generated comments act if they are discussing the video but they subtly manipulate, like 60% of the comments advocate a certain view point and 40% are random memes, advocate against it etc... The average person would easily be fooled.

oceanplexian · on Aug 2, 2023

You basically described Reddit. Don't even need an AI, all you need is moderator powers and a bunch of impressionable young people.

bottlepalm · on Aug 2, 2023

If I have a random picture, video, text - it's not easy at all to verify its authenticity. Hopefully a media organization has it, but even then are there any services I can use to validate? Definitely not family/personal media, any media that wasn't reported on by a large organization with the ability to manage large archives of data.

I'm saying this is going to become increasingly important fast, and we may miss the window where now almost everything not properly indexed by a large media organization is invalidated as there is no way to verify it.

I have a picture of Frank Sinatra at Disney World riding the tea cups. Who is the Frank Sinatra media authority that can tell me if this ever happened or not? A very small example to extrapolate from. It's going to get worse when everyone can create audio/video/pictures/text of anything they can dream.

The past may very well become a fictional dream, mythology, most of it impossible to verify.

amelius · on Aug 2, 2023

> We've had photo and audio manipulation for many decades now. For a long time now, we've had to separate out what's credible from what's bullshit.

The difference is that the floodgates are being opened.

crazygringo · on Aug 2, 2023

It doesn't matter though. Most of the internet is already probably mostly SEO blogspam, just like spam e-mail already outweighs legitimate e-mail for a lot of (most?) people. But nobody cares because it gets filtered out in the ways people actually navigate.

We have lots of tools to fight spam, and there's no reason to believe they won't continue to evolve and work well.

conductr · on Aug 2, 2023

At a time when “people” seem easily manipulated and focused on their fully believing their personal feeds of curated outrage. They often don’t apply the screens/filters they should be because of the apparent social proofs, trust, and biases they have with the content. Contemporary journalists hardly do any fact/source checks as it is. So they’ll begin reporting on some of this, giving it further credibility and it’s just a downward spiral. So, more of the same, yay!

probablynish · on Aug 2, 2023

Seems like it might now become much easier to post a clip on YouTube that looks like an authentic BBC clip, logo and all. If generative AI gets that good, how will you be able to tell whether a particular piece of media comes from a trusted source?

Might not be possible on platforms - only if it's posted on a trusted domain.

crazygringo · on Aug 2, 2023

Easy, is it on the official BBC YouTube channel or not?

That's the entire point of having trusted sources. Regular people can post whatever fake things they want on their own accounts; they can't post to the BBC's YouTube channel or to the NYT's website.

seydor · on Aug 2, 2023

The past ended in 2022

bottlepalm · on Aug 2, 2023

Agree. Any video/image/text created post-2022 is now suspect of being AI generated (even this comment). And without any 'registering' of pre-2022 content, we can easily lose track and not really know what from pre-2022 is authentic or not.

Maybe it's not a big deal to 'lose' the past, maybe landfills will be mined for authentic content.

seydor · on Aug 3, 2023

There should be a canonical copy of the 2022 internet that is verifiable. Archive.org is not enough

apabepa · on Aug 2, 2023

Or is the past endlessly rehashed with AI generated content?

shon · on Aug 2, 2023

This ^^

jeffwass · on Aug 2, 2023

I’ve been wondering about this and real video evidence (eg dashcam or cctv) being refuted in court for inability to show it’s not deepfaked.

vagab0nd · on Aug 2, 2023

Even with digital signatures, there are limits to what we can really verify.

We'll likely be able to verify whether an entity is a real human, using some kind of "proof of humanity" system.

We will have cameras/mics with private keys built-in. The content can be signed as it's produced. But in this case, what's stopping me from recording a fake recording?

Maybe it's a non-issue. We used text to record history and we've been able to manipulate that since, well, forever.

russdill · on Aug 2, 2023

If you're watching a movie or TV show, a vast majority of the sounds you are hearing are not "real". Has that bothered you before?

swores · on Aug 2, 2023

That seems as pointless a question as suggesting that enjoying TV shows means you shouldn't care if everyone in your life constantly lies to you.

Ylpertnodi · on Aug 2, 2023

>the stuff i hear is real. Perhaps you meant 'are not from the actual source you think they are'?

*my favorite is always the nightclub scene that goes real quiet when the actors act using their voices (which are real, but may be dubbed in afterwards).

randcraw · on Aug 2, 2023

With 90% of human generated media content being forgettable within weeks of publication, and AI not yet capable of matching even average human content (much less pro level), it’ll be some time before we have to worry about AI overwhelming most media content and erasing the works of memorable human authors.

og_kalu · on Aug 2, 2023

>and AI not yet capable of matching even average human content (much less pro level)

Yeah this is not true. Sota Text, Image generation is well above average baselines. You can certainly generate professional level art on Midjourney

squidsoup · on Aug 2, 2023

Commercial art and Art are not the same thing.

bottlepalm · on Aug 3, 2023

Unless you have a tool that can tell the difference, they are.

JohnFen · on Aug 2, 2023

Yes, this is one of my concerns about all of this. The danger is real.

justinclift · on Aug 2, 2023

Wonder how far off the whole "generate music based on your existing music library" thing is going to be?

That'll make musicians happy with big tech as well, just like artists are. *sigh*

LewisVerstappen · on Aug 2, 2023

The Record labels are far, far more litigious than the art community.

PaulDavisThe1st · on Aug 2, 2023

They can't litigate a person doing this at home, and never redistributing.

I suppose they might try, anyway.

kmeisthax · on Aug 2, 2023

The RIAA pioneered copyright enforcement at the individual level back in the 2000s, they absolutely would try to sue downstream AudioCraft users.

throwaway290 · on Aug 2, 2023

They should start with AudioCraft itself, conceptually it's derivative work and it doesn't matter if it's "open source" or not. Try throwing in someone's sample in a song and publish it saying "no copyright infringement intended and I totally don't make any money from it"... If it becomes popular, see how long it stays up until DMCA takedown. And we know this dataset is already popular.

PaulDavisThe1st · on Aug 2, 2023

> and publish it

This is precisely the opposite of the context I was remarking on.

throwaway290 · on Aug 3, 2023

AudioCraft itself is published. That's the context I am remarking on.

PaulDavisThe1st · on Aug 2, 2023

[flagged]

manquer · on Aug 2, 2023

Even before streaming, You never "owned" any music legally [1], you merely owned a physical copy of a performance[3] of a song, that in no way gives you the right to make derivative works [2] automatically.

Also it doesn't really matter on what the law says, RIAA in the last iteration, relied on the fact it you would rather pay a fine than be able to afford expensive lawyers to fight the specifics out in court on average.

It was always about disproportionate ability to bring resources against individual "offenders" to create fear among everyone to deter "undesirable" forms of copying, not necessarily what the legal protections were.

---

[1] Unless you specifically commissioned it under a contract which gave you the right

[2] See recent cases including those related to Kris Kashtanova and Andy Warhol.

[3] Not the song, just the performance, aka Taylor Swift version, for good explanation of how the rights are divvied up in the music industry a Planet Money series covers it well https://www.npr.org/sections/money/2022/10/29/1131927591/inf...

JohnFen · on Aug 2, 2023

> that in no way gives you the right to make derivative works

True, but only because you have that right anyway. I can do anything I like with copyrighted content I legally possess, as long as I don't distribute the results of my efforts.

PaulDavisThe1st · on Aug 2, 2023

Establishing derivation is at the crux of all legal matters surrounding diffusion models. It has not yet been clearly established. If it is, then I'd agree with you. Until then, I think it's a bit more up in the air.

Also, IIRC, RIAA did not bring many resources to bear against e.g. "home taping" itself, because they could essentially never know that it had occured. The overwhelming majority of their efforts went into trying to takedown people distributing multiple copies.

The Kashtanova case does not cover derivation in any real way, but is really about copyright attribution choices between human and software.

The Warhol case specifically tests a fair use claim, not a derivation claim.

manquer · on Aug 3, 2023

RIAA and others in general sued a lot of people including bar owners for playing their songs etc , there were also some enforcement via private companies with three strike policies and so on especially in Europe .

They went after ISPs and torrent sites which only hosted magnet links and many others who shouldn’t have been really sued

The goal was to create a very hostile environment for downloading songs to protect their interests - “you wouldn’t download a car!”

It was never the goal nor ever realistic to actually pursue enforcement action against every offender , the idea was to change behavior with all the related actions .

They did end up changing behavior, people just didn’t want the hassle, or be in fear so paying for streaming for access had a stronger value proposition, it was not what the RIAA planned , but benefits enormously today.

cmgbhm · on Aug 2, 2023

https://en.wikipedia.org/wiki/Audio_Home_Recording_Act

https://en.wikipedia.org/wiki/Home_Taping_Is_Killing_Music

Recording industries have fought end user reproduction often. They’ve fought sampling battles.

Go after the pocketbooks and go after the technology waves. If there’s a derivative argument they can make, they will.

PaulDavisThe1st · on Aug 3, 2023

They may have gotten the AHRA passed, but they essentially lost in every important way:

> "This exception was crucial in RIAA v. Diamond Multimedia Systems, Inc.,[14] the only case in which the AHRA's provisions have been examined by the federal courts. The RIAA filed suit to enjoin the manufacture and distribution of the Rio PMP300, one of the first portable MP3 players, because it did not include the SCMS copy protection required by the act, and Diamond did not intend to pay royalties. The 9th Circuit, affirming the earlier District Court ruling in favor of Diamond Multimedia,[15] ruled that the "digital music recording" for the purposes of the act was not intended to include songs fixed on computer hard drives. The court also held that the Rio was not a digital audio recording device for the purposes of the AHRA, because 1) the Rio reproduced files from computer hard drives, which were specifically exempted from the SCMS and Royalty payments under the act, 2) could not directly record from the radio or other transmissions. "

From the AHRA itself:

> No action may be brought under this title alleging infringement of copyright based on the manufacture, importation, or distribution of a digital audio recording device, a digital audio recording medium, an analog recording device, or an analog recording medium, or based on the noncommercial use by a consumer of such a device or medium for making digital musical recordings or analog musical recordings.

and from Wikipedia again:

> In regard to home taping, the provision broadly permits noncommercial, private recording to analog devices and media. However, it fails to resolve the home taping debate "conclusively," as it only permits noncommercial, private recording to digital devices and media when certain technology is used.

> Two reports by the House of Representatives characterize the provision as legalizing digital home copying to the same degree as analog. One states "in the case of home taping, the exemption protects all noncommercial copying by consumers of digital and analog recordings,"[22] and the other states "In short, the reported legislation [Section 1008] would clearly establish that consumers cannot be sued for making analog or digital audio copies for private noncommercial use."[23]

Similarly, language in the RIAA v. Diamond Multimedia decision suggests a broader reading of the Section 1008 exemptions, providing blanket protection for "all noncommercial copying by consumers of digital and analog musical recordings" and equating the spaceshifting of audio with the fair use protections afforded home video recordings in Sony v. Universal Studios:

>> In fact, the Rio's operation is entirely consistent with the Act's main purpose – the facilitation of personal use.

kmeisthax · on Aug 2, 2023

Legal acquisition does not matter for AI training. If training is fair use then you can train on pirated material (e.g. OpenAI GPT). If it's not fair use then buying the material does not matter, you have to negotiate a specific license for AI training for each work in the training set, which is impractical at the scales most AI companies want to work.

PaulDavisThe1st · on Aug 2, 2023

This seems to distort the issue a little bit.

If you purchase the music, you have a (sometimes explicit, sometimes implicit) license to do certain things with the music, entirely independent of any concept of "fair use".

The question is not "is training part of fair use?" but "is training part of, implicitly or explicitly, the rights I already have after purchase?"

Given that "training" can be done by simple playing the music in the presence of a computer with its microphone turned on, it's not clear how this plays out legally.

kmeisthax · on Aug 2, 2023

In the US, exceptions to copyright come across in two distinct bundles: first sale and fair use. They exist specifically because of the intersection between copyright law and two other principles of the US constitution:

- First sale: The Takings Clause prohibits government theft of private property without compensation. Because copyright owners are using a government-granted monopoly to enforce their rights, we have to bound those rights to avoid copyright owners being able to just come and take copies of books or music you've lawfully purchased.

- Fair use: The 1st Amendment prohibits government prohibitions on free speech. Because copyright owners are using a government-granted monopoly to enforce their rights, we have to bound those rights to avoid copyright owners being able to censor you.

If you hinge your argument on "I bought a copy", you're making a first sale argument.

Notably, first sale is limited to acts that do not create copies. This limit was established by the ReDigi case[0]. Copyright doesn't care about the total number of copies in circulation, it cares about the right to create more. So an AI training defense based on first sale grounds would fail because training unequivocally creates copies.

Fair use, on the contrary, does not care if you bought a copy of a work legally. It only cares about balancing your right to speech against the owners' right to a monopoly over theirs. And it has so far been far more resistant to creative industry attempts to limit exceptions to copyright - to the point where I would argue that "fair use" is an effective shorthand for any exception to copyright, including ones in countries that have no fair use doctrine and do not respect judicial precedent.

The courts won't care how the training comes about, just if the act of training an AI alone[1] would compete with licensing the images used in the training set data.

[0] https://en.wikipedia.org/wiki/Capitol_Records,_LLC_v._ReDigi....

[1] Notably, this is separate from the act of using the AI to generate new artistic works, which may be infringing

PaulDavisThe1st · on Aug 2, 2023

It hasn't been established yet that a diffusion-model generated work is a copy or a derivative of any particular element of the training set.

NegativeK · on Aug 2, 2023

The people above are arguing about being caught, not legality.

jsheard · on Aug 2, 2023

Is training a "pirate model" something you'd reasonably be able to do at home though, given the compute requirements? The analogous "image generation at home" is only possible due to a for-profit entity with significant resources choosing to (a) play fast-and-loose with the provenance of their training set and (b) giving away the resulting model for free, if the open source community had to train their models from scratch then as best as I can tell they would still be stuck in the dark ages generating vague goopy abominations.

PaulDavisThe1st · on Aug 2, 2023

Currently, yes, available compute power @ home does indeed seem like a limitation. Whether that remains true going forward seems a little unclear to me.

throwuwu · on Aug 2, 2023

You could take a model trained on CC content and then fine tune it on copyrighted material cheaply and quickly

Workaccount2 · on Aug 2, 2023

Ugh, I dread having to listen to everyone's hyper personal music because they swear up and down to the point of tears that "IT'S THE BEST SONG EVER CREATED! EVER!!!", while the constantly prod for you to affirm how amazing the song is.

Bruh, music is subjective as hell, and I can already tell I hate this song.

spudlyo · on Aug 2, 2023

Perhaps LoRA (Low-Rank Adaptation) training techniques could be used for these types of models, like they're currently being used with LLMs and latent text-to-image diffusion models.

operator-name · on Aug 2, 2023

Sadly looks unlikely if the base model wasn't trained on vocals.

> Mitigations: Vocals have been removed from the data source using corresponding tags, and then using a state-of-the-art music source separation method, namely using the open source Hybrid Transformer for Music Source Separation (HT-Demucs).

> Limitations: The model is not able to generate realistic vocals.

(https://github.com/facebookresearch/audiocraft/blob/main/mod...)

I suspect this was a combination of playing it safe and that the model isn't well architected to reproduce meaningful vocals.

zitterbewegung · on Aug 2, 2023

Why not do generate music you like which wouldn’t need you to upload your library and would have RLHF baked in.

ElFitz · on Aug 2, 2023

Something like the algorithm TikTok uses. First probing by offering a variety of content that should match based on what little information you have on the user (ip location, locale, etc).

Then use the user’s action to iteratively refine your classification, until you end up with something tailor-made.

zitterbewegung · on Aug 2, 2023

Uh more like Reddit with up and down and also how long you listen.

chrisjj · on Aug 2, 2023

Anyone having listened to this MusicGen's output samples would surely answer "a million miles".

Seriously, you couldn't sell this output for a free mobile clicker game.

makestuff · on Aug 2, 2023

These models are going to end up being used for advertising. Soon pretty much every ad you see will be generative AI based. It makes A/B testing way easier as you no longer need a creative person to modify the ad or change something subtle about it. For example, the generative voice might change to a different speaker or something, and the AI can generate thousands of different voices to see which one is most effective.

e1ghtSpace · on Aug 3, 2023

We might even end up with the most effective ones being the weirdest. Where people clicked though because the ad was so weird.

reducesuffering · on Aug 2, 2023

"Now, increasingly, we live in a world where more and more of these cultural artifacts will be coming from an alien intelligence. Very quickly we might reach a point when most of the stories, images, songs, TV shows, whatever are created by an alien intelligence.

And if we now find ourselves inside this kind of world of illusions created by an alien intelligence that we don’t understand, but it understands us, this is a kind of spiritual enslavement that we won’t be able to break out of because it understands us. It understands how to manipulate us, but we don’t understand what is behind this screen of stories and images and songs."

-Yuval Noah Harari

ironborn123 · on Aug 2, 2023

Maybe this us vs them mentality is the biggest bottleneck.

If instead you consider that this new form of 'alien' intelligence is actually a descendant of human intelligence, that we are raising a new species which will inherit what humans have built (ideally only the good parts) and then improve upon it further..

It may sound grandiose, but that perspective changes everything.

s1k3s · on Aug 2, 2023

The demos are great. Could someone explain what’s in it for Meta open sourcing all these models?

kypro · on Aug 2, 2023

A competitive opensource project basically destroys the pricing power of all closed-source alternatives.

If you're a company and wanted to integrate an LLM into your product if the choice is between several equally good models, but one is free and open-source which would you pick?

Aside from keeping competition at bay, this move also gives Meta leverage because ecosystems are now being built around their projects. If these models see wide-scale adoption they could later launch AudioCraft+ as a licensed version with some extra features for example.

Alternatively, they might offer support or hosting for their open source projects.

Right now though I think the primary benefit of these open sourced models is to attract talent. If Meta is seen as one of the leaders in AI then researchers will want to work for them simply for the prestige.

Arguably one of the reasons Meta has been behind so many awesome projects like PyTorch and React over the last decade was because they were seen as the cool place for recently graduated, but talented software engineers to work in ~2010.

gostsamo · on Aug 2, 2023

They want to comoditize the offerings of OpenAI, Google, MS, and Apple. Also, they gain mindshare and good will after years of bad publicity. Some back contributions might help them improve the models for free.

If they just keep their models, people won't be interested and will build over ChatGPT or Bard.

jamil7 · on Aug 2, 2023

Not a fan of Meta, but haven't they generally been pretty forward with open sourcing their tech?

jdadj · on Aug 2, 2023

Commoditize Your Complement?

https://gwern.net/complement

maximus-decimus · on Aug 2, 2023

What is it a complement to though?

CrypticByte87 · on Aug 2, 2023

Meta has several of the biggest UGC platforms, and in this case the complement is content itself. Reels with autogenerated (and royalty free) background music is the obvious example but I'm sure there are more. Maybe creative for ads as well?

jononor · on Aug 2, 2023

To Metaverse access. Filling the metaverse with engaging interactive 3D content is an insane job with 2020 technology. It requires a huge amount of a range of skilled labor to create 3D models, soundtracks, NPC dialog, visuals et.c. to make a compelling experience. In 2030 that may have been reduced to that everyone with creativity and Internet access can do it. Sure, most of it will be silly things - but so is social media today, does not make it any less of a commercial success. And there will be be millions of semi-pro creators to create the things with higher production value, like with videography today.

jwestbury · on Aug 2, 2023

Content is a complement to social media.

raincole · on Aug 2, 2023

In short term it's social media, because people will share whatever they generate on social media. But I don't think it's a very strong incentive to invest in AI for Meta.

bick_nyers · on Aug 2, 2023

If anything it makes them appear to be one of the best places to work at to do research. Could be them playing the long game.

squidsoup · on Aug 2, 2023

The demos are, unsurprisingly, soulless muzak. This contributes nothing to our culture.

hereonout2 · on Aug 2, 2023

Was asking myself the same earlier, I'm sure it is largely to do with publicity and the fact that selling these services is not their core business. At the very least releasing this stuff probably won't damage their core business but will take the sheen off of some other big names.

I wondered though, generative AI is hurling us into a world where we'll need more mechanisms to sort real from fake, provenance will play a large part, and meta's platforms could be part of the answer. i.e. content linked to actual verifiable people.

klapinat0r · on Aug 2, 2023

Somewhat relevant, Yann LeCun insisted the research should be open sourced. At least in an academic sense.

He touches on it briefly in this podcast episode: https://www.therobotbrains.ai/who-is-yann-lecun

vasili111 · on Aug 2, 2023

They will own most popular open models so they can dictate the direction in open source AI.

ipaddr · on Aug 2, 2023

They haven't opened sourced much. Open models/closed weights restrictive non-commercial license is something I guess.

They are trying to kill the market before they get left out.

unnouinceput · on Aug 2, 2023

The same move that Microsoft did back in 90's to kill Netscape. Make your product the one available to masses, next generation of users will be using your product.

zyang · on Aug 2, 2023

I was just thinking how Google made Android free to check Microsoft. This is Meta checking Google.

ipaddr · on Aug 2, 2023

Checking OpenAI. Google is still playing checkers.

mcbuilder · on Aug 2, 2023

I just can't get how bad Google is doing. They have a ton of top researchers, papers, money, just no good LLMs. It's like OpenAI was first to the punch, and everyone else just saw $$$. Meta was smart to go down this open source road, as the masses will start training their llamas one way or another. Personally I believe the "intelligence" aspect will asymptote, so even having exclusive access to a "super AI" (i.e. hypothetical 1T parameter model like a GPT5) won't be that much of a step behind the lesser AIs, and as soon as you grant access to the masses they will start to use some transfer learning to make their "lesser" models better. AI applications though still need a lot of work. The models are smart or general purpose enough to be useful to the average person out of the box.

rvnx · on Aug 2, 2023

The problem also is that Google is making lot of grandiose announcements about tools and models that nobody can see nor use. This is a serious credibility problem in the long-term.

janalsncm · on Aug 2, 2023

You can use some of them. They have an “AI powered” search (as if their previous search isn’t considered AI anymore). It’s an experiment you can turn on. For programming questions it’s not terrible.

That said, there are a ton of “look at this cool thing out research team did” and then you never hear about them again things from Google. They even built a music generator that was closed to the public until recently.

https://blog.google/technology/ai/musiclm-google-ai-test-kit...

conductr · on Aug 2, 2023

The fact you believe, rightly or wrongly, that meta is ahead of google on ai explains why meta would open source this. It’s a good reputation to maintain.

roody15 · on Aug 2, 2023

Checking Google or OpenAI (or both?)

Saturdays · on Aug 2, 2023

theoretically what's in it for them is that people will build content faster and with less barriers for eventual consumption on their platforms

nprateem · on Aug 2, 2023

If people love hanging out with chatgpt or bard, they won't be wasting their precious little eyeballs on FB/Insta

jasonjmcghee · on Aug 2, 2023

The difference between MBD-EnCodec and EnCodec is pretty interesting. MBD variant sounds more like a professional studio recording, while the EnCodec feels like a richer sound.

Curious if I’m alone in that.

(At the bottom https://audiocraft.metademolab.com/musicgen.html)

For what it’s worth though, the voice based examples sound dramatically better with MBD

https://audiocraft.metademolab.com/encodec.html

operator-name · on Aug 2, 2023

MBD definitely sounds like it was recorded in dead room, whereas plain EnCodec has been mixed but includes some artificial noise.

camillomiller · on Aug 2, 2023

All very interesting, but how would a musician ever be interested in creating the result of "Pop dance track with catchy melodies, tropical percussions, and upbeat rhythms, perfect for the beach"? This stuff will create a lot of Muzak for sure. Actually turning into anything useful for musician? I honestly doubt it, and I'm happy if it stays that way.

Saying that engineers don't understand the arts is a bit of a trite generalization, but reading the way Meta markets these "music making" contraptions is really cringe inducing. Have you ever, at least, listened to some music?

trojan13 · on Aug 2, 2023

Finally, a way to fulfill my childhood dream of composing a symphony of rubber ducks honking. Bach would be proud.

/edit On a more serious node. I already see the 24/7 lofi girl streaming generated music. The sample[1] on lofi sounds pretty good.

[1]https://dl.fbaipublicfiles.com/audiocraft/webpage/public/ass... "Lofi slow bpm electro chill with organic samples"

painted-now · on Aug 2, 2023

I also like some of the generated examples.

Can I haz full version of Bach + `An energetic hip-hop music piece, with synth sounds and strong bass. There is a rhythmic hi-hat patten in the drums.` please?

(https://dl.fbaipublicfiles.com/audiocraft/webpage/public/ass...) ?

squidsoup · on Aug 2, 2023

> Finally, a way to fulfill my childhood dream of composing a symphony of rubber ducks honking.

Samplers have been around since the 70s.

smallerfish · on Aug 2, 2023

This is Spotify's route to profitability - the Netflix model of generating their own "content" (/music), and not having to pay the labels. Premium plans for us music nerds who want a human at the other end, regular plans for plebs who just want to fill the silence with something agreeable.

CharlesW · on Aug 2, 2023

Although I think AI-generated and AI-augmented (using voice cloning, etc.) artists are a given, for Spotify to stop paying labels they'd have to be able to remove all non-Spotify content from their streaming catalog. That doesn't seem like a possibility in our lifetimes. (Also, Spotify hasn't even been able to build a sustainable business on podcasts, which they copy to their closed platform for free.)

It's an interesting thought experiment, though. I can imagine that "environmental audio" companies like Muzak have about 5 years left before they either adapt or die. What other kinds of companies are in trouble?

smallerfish · on Aug 2, 2023

Their current pay structure is royalties, i.e. per listen. If they can route their audience to mostly AI generated content in time (say, 5-10 year transition), and it's just as good for most people, then they can negotiate much lower prices with the labels. We all grumble about Netflix being full of junk, but most of us are still subscribers, despite a sparse catalog of big name movies.

hobofan · on Aug 2, 2023

> then they can negotiate much lower prices with the labels

Or alternatively, if the labels are not stupid, they'll negotiate for a higher price per listen (or similar), as they are still as essential to the service as before.

peteforde · on Aug 2, 2023

I just ran all of the cited installation steps, which appear to have been successful... but I am now experiencing a profound sense of "now what?"

There doesn't appear to be any new CLI executables installed, and the documentation links to an API but there's no clues on how to actually process a prompt.

What am I missing? Alternatively, I wouldn't mind using it in a Notebook but so far this thread doesn't link to anything so ambitious (yet?)

speedgoose · on Aug 2, 2023

The main gradio app has been moved to the demos folder.

    python demos/musicgen_app.py

Otherwise you can check the jupyter notebooks in the same folder.

peteforde · on Aug 2, 2023

Thanks! This will be even more helpful if you could share a hint about where this was installed to.

I carefully went through the output generated by the "pip install -U audiocraft" command, and there were no clues provided.

Disclosure: I am not a Python developer, so I apologize if this is a master-of-the-obvious question for Python folks. However, if there was ever a scenario where a line or two of post-install notes would be useful, it's stuff like this.

speedgoose · on Aug 2, 2023

You may have to clone the repository to get the demos folder. Otherwise it's perhaps somewhere depending on how you use python (global and often broken environment, virtual environments, conda hell, etc…).

I feel like Python folks are on average terrible at distributing software. So many projects have some python script to install the dependencies, still assume you use conda, or don't bother to specify the dependencies versions. Thankfully it's often the same patterns and after some time you understand what to do based on the error messages. But I wish they could use something like NPM or Cargo. Even something like Maven would be an improvement.

xolox · on Aug 3, 2023

Tip: If you run `pip uninstall audiocraft` it should render an interactive confirmation prompt that explains which files it is about to remove, this should help to understand where the library was installed (of course you don't have to confirm the prompt, just press Control-C or answer "No" to the prompt once you have your answer).

Disregarding the tip above, determining where the library was installed requires a bit of context, for example your platform (Windows versus UNIX) and the fact that newer pip releases default to "pip install --user" when not running with super-user privileges, whereas older pip releases did not default to "pip install --user".

Assuming you are using Linux and using an up-to-date pip release and you ran the "pip install -U audiocraft" command without super-user privileges, then the library was most likely installed in ~/.local/lib/pythonX.Y/site-packages (where X.Y is the version of Python that was used by the pip command you ran).

moffkalast · on Aug 2, 2023

This is the default state of deep learning projects, everyone assumes only phd researchers will ever try it who already know how to use everything in the tool chain. What's happened with llama and other LLMs with codebases that actually work outright with one click when compiled is a pretty big outlier.

javajosh · on Aug 2, 2023

You're not supposed to actually install it and use it, just comment on how cool and open Facebook is, especially in comparison to OpenAI. So, user error.

parhamn · on Aug 2, 2023

Right its not like anyone has operationalized Llama 2 or there aren't hundreds of repos for inference servers and the likes. /s

goosinmouse · on Aug 3, 2023

I just made a script that generates a two minute long classic radio show episode in the style of Johnny Dollar. Im using elevenlabs for the dialogue so the audiocraft element is definitely the weak point, but its super neat that its even possible currently.

bestcoder69 · on Aug 2, 2023

Anyone know if there are ways, as-is, to speed this up on Apple Silicon?

This setup takes 5 minutes:

    - Mac Studio M1 Max 64GB memory
    - running musicgen_app.py
    - model: facebook/musicgen-medium
    - duration: 10s

mk_stjames · on Aug 2, 2023

I see from the musicgen.py-

  >.if torch.cuda.device_count():
  >.   device = 'cuda'
  >.      else:
  >.   device = 'cpu'

So pytorch will fall back to CPU on a Apple Silicon. Ideally it would use Metal for acceleration (MPS) instead of just plain 'CPU', but if you replace CPU with MPS you'll probably run into a few bugs due to various Autocast errors and I think some other incompatibility with Pytorch 2.0.

At least that is what I ran into last time I tried to speed this up on an M1. It's possible there are fixes.

bestcoder69 · on Aug 2, 2023

Same here (mps errors). I tried after the initial musicgen release.

I’ll have to check again, but I remember AFAICT my hardware wasn’t getting saturated, so maybe there’s headroom for mac cpu performance. And of course in the meantime I’ll be refreshing the ggml github every day

waffletower · on Aug 2, 2023

I think it is a mistake to acquiesce and let copyright owners bully AI model trainers over model data inputs. The endgame of this practice is a "pay per thought" society. This is separate from speculation regarding machine sentience -- as interfaces improve AI models will serve more and more as direct human extensions of mind. While copyright duration is a separate issue, and the current durations are appalling, copyright violations should focus strictly upon the output of models and how they are utilized. There are so many melodies in my head that I have not nor will I ever pay for, (some of which I would love to remove). AI models need also to have the same unfettered access to the commons as we do. Infringement occurs on the outputs -- application of copyright restrictions on model inputs is a violation of Fair Use and a definite money grab.

parekhnish · on Aug 2, 2023

Generative AI for images and music produce pixels and waveform data, respectively. I wonder if there is research into "procedural" data; so in this case, it would be SVG elements and, perhaps, MIDI data respectively.

I know training data would be much more harder to get, (notwithstanding legal ramifications), but I think that creating structured, procedural data will be much more interesting than just the final, "raw" output!

IAmGraydon · on Aug 2, 2023

I've thought about this too. The instruments themselves can be synthesized for extremely high quality audio. All we need is the musical structure - the MIDI.

vasili111 · on Aug 2, 2023

Is there place where I can check how it works? Like give my input and get output audio?

operator-name · on Aug 2, 2023

The model cards from the repo[0] link to Colab and HF spaces.

[0]: https://github.com/facebookresearch/audiocraft#models

emporas · on Aug 2, 2023

Audiocraft+, don't forget the plus, on github has a collab notebook based on audiocraft, and a webui to use. It is pretty awesome!

Palmik · on Aug 2, 2023

Maybe this will finally lead to high-quality open-weights solution for TTS generation.

JHonaker · on Aug 2, 2023

Get ready for the next generation of Muzak

TheAceOfHearts · on Aug 2, 2023

AudioGen seems really fascinating. I have some dumb questions.

While the datasets used for training AudioGen aren't available, is there any kind of list where one can review the tags or descriptions of the sounds on which the model was trained? Otherwise how do you know what kinds of sounds you can reasonably expect AudioGen to be capable of generating? And what happens if you request a sound which is too obscure or something not found in the dataset?

What are AudioGen's capabilities regarding spatial positioning? First example: can it generate a siren that starts in front and moves left to right and complete a full circle around the listener? Second example: can it do the same siren but on the Y axis, so it start at the front, it goes over the listener and then it goes under them to complete the circle?