All other factors being equal: Taller person = more cells = more cell divisions = more chances for cell mutation = increased risk of (any) cancer.
(credit/source: Hank Green and possibly SciShow on YouTube)
Wake me up when this can run two stages: TF/OT/Pulumi/etc to provision infrastructure, and then Ansible/Salt/etc to configure them.
Glibness aside, this looks promising, I would love an orchestrator for both infrastructure and configuration management though. From the readme, I'm guessing that's somewhat doable with the `Toolbox` feature; be cool to have native integration for more tools though.
Maybe I'm misunderstanding the scope/intention of the tool though.
Guess I'm 'almost nobody'. I see zero issue with AI farming Stack Overflow, Twitter, Reddit, and the like - publicly accessible forums. The value to me was in the discussion. It's happened; I've extracted my value from that engagement. If an AI company can also extract value from that collection of discussions, it costs me nothing and I expect no compensation.
Thought AI companies farming copyrighted work on the other hand, that's a different story.
The article isn't talking about public discussion forums at all, it's talking about WordPress and its owner, Automattic. The article is a blog post on a WordPress-hosted blog. It then goes on to talk about consent in software in general.
Personally, I'm not really okay with what you're calling public forum harvesting either. I've put a lot of work into Stack Exchange answers and I am not okay with a for-profit company recycling and possibly outright regurgitating that work without attribution. (The latter would be a flagrant violation of CC BY-SA, of course.)
I understand that the author is mad and wants things to be opt-in. But I also think the author is smart enough to know that the tech industry understands consent just fine. It just doesn't care.
twitter and reddit will soon be bots talking to bots (if they aren't already) so the AI can train on that.
> Thought AI companies farming copyrighted work on the other hand, that's a different story.
Copyright also happens to be opt-out. You have to explicitly say “this is not copyrighted” for copyright to stop applying.
See your comment and my reply? Both copyrighted. Right now. As soon as we hit publish we started to own copyright. There is an EULA somewhere on HN that probably says we give HN implicit permission to host this content in perpetuity and can even make it available in APIs, show it to bots, etc. But that’s not the same as no copyright. If somebody who is not HN wants to screenshot this comment and publish it in a book, they in theory have to find us and ask for permission.
> Copyright also happens to be opt-out. You have to explicitly say “this is not copyrighted” for copyright to stop applying.
This isn't possible under US copyright law. You can say "this is not copyrighted" all you want, but it's still copyrighted. The closest you can get to voluntarily putting something in the public domain is to refuse to take enforcement actions against violations of your copyright.
Search engines link back to the original sources, making them discoverable which is the “payment” for allowing it. That’s a very different use than an AI that doesn’t even know where the original content came from and provides nothing back to the original creator.
Inevitably there will be copyrighted images, audio, and text mixed in with random social updates and discussions. It should be on the LLM builder to seek active consent, rather than everyone else to be vigilant and/or sue to get their work out of the model's data.
> I see zero issue with AI farming Stack Overflow,
> Thought AI companies farming copyrighted work on the other hand, that's a different story.
All post on Stack Overflow are still copyright of respective posters. They are offered publicly under Creative Commons license that require attribution.
In the US, everything you write anywhere online is copyrighted by you, unless you sign a copyright assignment agreement. It's automatic any time you put an expression into a fixed form, and there is no way to revoke that copyright.
As I understand it copyright has failed. Or rather we are into an age
of naked double standards where courts will enforce the copyright of
big-tech against you for "stealing" a movie, but will not enforce your
copyright against big-tech for "stealing" your data for its AI.
Copyright is still deeply important to prevent behemoths from just straight-up taking stuff individuals wrote and profiting from it with no consequences.
For instance, without copyright, traditional publishers could just take everything the authors they currently contract with have written, and every other current author, and publish it without paying the authors a cent.
ML training is a legal gray area right now, because it's a new thing, and we haven't had time to properly sit down and both understand what its effects are, and how it should be treated legally. It is possible that this process, when it ends up happening, will be captured by corruption; it is possible that it won't. But using the current frustrations and anger about ML training as evidence that copyright has "failed" is a vast oversimplification that ignores the very real good that copyright does in our society.
It's failing right now to protect millions/billions of people,
because we've decided that it's "legal gray area right now".
Maybe it should be, I don't know. I mean maybe it's time we said bye
bye to copyright?
There could be flip sides. If the world decides that ML sidesteps
copyright then I look forward to the entire corpus of LibGen, SciHub
etc being legally released as open models and the overnight demise of
Elsevier et al. (I once wrote a fiction about that [0])
My objection here is to seeing the clear wishes of the majority being
trodden over roughshod.
> I mean maybe it's time we said bye bye to copyright?
This is exactly the kind of oversimplified, baby-with-the-bathwater proposal I was talking about.
No, we should not "say bye bye to copyright". We should actually take the harder, more complex steps, requiring actual critical thinking and analysis, to fix the problem, rather than just pretending that a one-step grand gesture will be a magic bullet.
> We should actually take the harder, more complex steps, requiring
actual critical thinking and analysis,
Those are fine words. We're all about critical thinking and analysis
round here. But way I see it, folks already did some real hard critical
thinking and their analysis was "bollocks to that!"
And the judges said, "sorry the law that applies to you doesn't apply
when big money is involved". One rule for you, another rule for them.
So I'm kinda thinking we'll maybe have to get a little more critical
than you might be comfortable with.
> to fix the problem
It's always a good idea to pause right there. What is the problem? I
mean seriously... what exactly is the problem going on here? Because
from where I see it, the problem is a massive power imbalance
And it's a structural one. Because AI training compute and global
crawling/scraping is expensive and in the hands of the few.
I don't think this problem would look the same if every kid was
running AI training on a Raspberry Pi, and hoovering down JSTOR like
Aaron Swartz. People would be getting arrested, no?
Well, yes. The problem you are identifying is primarily a structural power imbalance.
It is not a structural power imbalance that can be fixed by abolishing copyright. Indeed, abolishing copyright is vastly more likely to hugely increase the power imbalance.
You are looking at the problem too narrowly (identifying it as "a problem with copyright", rather than "a problem with the power structures in our society"; "AI training and compute...in the hands of the few" rather than "most of the money and resources in the hands of the few"), and thus coming to counterproductive conclusions about how we might solve it.
It's very satisfying to imagine taking a big hammer to a system we know to be corrupt and serving those without our interests at heart. But just smashing the system does not build a new one in its place. And until you address the power imbalances, any system built to replace one you smash—assuming you can manage to do the smashing, which is highly suspect—is nearly guaranteed to simply be designed to serve the desires of the powerful even more than the one we have now.
Some good thoughts, though maybe you underestimate my bead on the
world, and perhaps overestimate my desire for "smashing". A more
peaceful, and just, time when we simply take their toys away will
come. That is certain. A question of "intellectual property"
remains. In a post-exploitation world, would we still want or need it?
Let's hope we keep living to see how it pans out. Respects.
If you know it's being delivered through USPS, have it delivered to 'general delivery'[1]. Problem is Amazon and many others don't let you choose your shipper and most post offices (reasonably so) don't accept general delivery from 3rd party carriers.
Alternatively, ship it to a UPS store that offers 'Package Acceptance' service. They charge per-package - usually $5-$7 though I've seen some as high as $15. Or you could open a UPS POBox if you receive enough packages to make it worth the cost.
It might be possible that on an international TECHNOLOGY forum, a significant percentage of users know what 'VPN' (a tech thing) stands for and don't know what 'ICE' (a government office (off topic) of a single nation (limited scope)) stands for. Context matters. ICE isn't within either the regional or topical domains of common conversation on this forum.
Depending on one's technical background or interests, reading about ICE can refer to many things. in my case Internal Combustion Engine and In-Circuit Emulator.
I thought this was another article about cars - engines overheating after bad security update and nation actors (agents) install apps and VPNs in cars.
A question as old as time (or as old as the existence of code editors). 4k results on Hacker News alone - and surely thousands, if not millions more, across the internet. There is no right answer, just differing preference.