That's a really weird argument. Copyright is a legal system created by the Constitution and statutes and administrative rules. It cares about whether you are "copying", and it cares about whether you're creating things that compete with the works of the original authors. It doesn't care about potential output spaces.
In this context, I don't see a principled difference between the model weights and really good compression. If I send you a gzipped copy of the latest bestseller book it's still copyright infringement. And it would still be infringement if I shipped it inside a software program that can _also_ reshuffle the words in a bajillion different ways, if there's a "copy" of the original work in there.
Yes, but Google Books is a search engine. It doesn't write books, it just tells you where a particular phrase might occur in those books. There's explicit caselaw allowing you to do this, extending back before Google was even a thing. For related reasons, Google Books also does not let you read the whole book - just the page the search match came from.
OpenAI and other large language model developers are claiming they have a machine that can write books, but they also fed it shittons of books, and they can't account for where all that text went. At best they can say "well, it doesn't produce exact, verbatim copies of the training set all the time".
Generative AI doesn't write books either, it's a machine spitting out results based on queries entered by a person. It has no personhood nor rights to ownership, in spite of what AI more crackpot advocates try to claim, it's more a very complicated pencil.
> Last September, the US Copyright Review Board decided that an image generated using Midjourney’s software could not be copyright due to how it was produced.
> It's the job of a regulatory body to reduce risk.
Sure, and the parent comment's point was that there's a line where further risk reduction doesn't make sense anymore. The agency doesn't have the right incentives to stop at that line.
Plenty of very significant risks aren't regulated to the degree that Viagra and Cialis are. You don't need a note from a doctor or a govt-issued permit to buy kitchen knives or a table saw or a Bic lighter, for example.
> If a worker gets sick, they could sue for damages,
Maybe I’m misreading it, but from the article, comments here, and linked alternative readings (as well as very limited personal experience), the issue could be that the workers themselves are actively not participating in wearing PPE. I know this sounds like blaming the victim, but if that is truly what’s happening (and I’d appreciate information explaining the opposite), why should the workers get to sue someone else? Is the PPE not effective enough?
Or are you suggesting that allowing the employee to sue would create incentives for the employers to actually enforce the workers use PPE?
If you know which type of uuid you have (v1, v4, etc) then you can take a look at how many bits of randomness it has, how many total items you have, and compute the probability of a collision if you just take a subset of the bits and use that as an ID.
In theory it's definitely possible. The 128 bits you get in a UUID is a LOT of randomness for an identifier. Postgres BIGINTs are just 64 bits. Instagram's sharded IDs are just 64 bits. (See below.)
You can test it. If you're using uuidv4 (which is 100% random bits, minus a few for the version), you could make a new column in your table in Snowflake, populate it with the first 64 random bits of your existing uuid column, then see if you have any collisions.
I think the most important place to start is appreciating the distinction between authentication ("is the person trying to use my application really the person they say they are?", abbreviated "authn") and authorization ("is this person allowed to perform the action they're trying to perform?", abbreviated "authz").
Most of the comments on this page are referring to authentication. It's important to know, but also the piece you're likely to spend far less time on. It's where most of the heavy lifting will be done by some vendor or tool you set up instead of by your own code.
Authorization is far less likely to be something you get off the shelf and far more likely to be where you spend significant time. It can be very intimately connected to your business logic. Active Directory roles and groups are one authorization solution for a particular class of problems but I have only seen them used for controlling business internal assets (mostly file servers); not public-facing applications.
I really like Oso Academy as a resource for authorization topics. It's structured like a progressive course, though I don't know if they have the kind of exercises you mentioned.
It’s true that authZ requires a lot of customization. In my experience, though, authN is the harder one to implement when there is no existing infra to support it. How do we store and distribute credentials? How do we allow user-defined identities? How do we implement session keys? How do we scale out the authN layer? If we decide to use certs for authN, how do we manage certs lifecycle? The list is long.
> Authorization is far less likely to be something you get off the shelf and far more likely to be where you spend significant time
Agreed. It is business logic, which means that it is harder to do off the shelf.
That said, there are some startups trying to make this work. Here are the ones I'm aware of:
* permit.io
* cerbos.dev
* osohq.com
RBAC (role based access controls) can take you a long way for many applications, but at some point you will be more interested in ABAC (attribute based access control) or PBAC (policy based access control).
Just to add another AuthZ approach to your great comment:
ReBAC for Fine Grained Authorization (FGA) is also something that's becoming more common at the moment. Google released their Zanzibar whitepaper explaining how they implement FGA for things like YouTube and Drive and it's lead to a lot of new tooling based upon it.
I'm working on a project at the moment with quite complex document management with various levels of access. Auth0 open sourced their FGA implementation recently as OpenFGA which looks ideal for our use case. As it's all fairly new there isn't much info out there about different ways of implementing it so we're kind of figuring it out as we go.
This is the thing about "OAuth isn't about authentication" argument. . . there is quite a bit of overlap between RBAC and authorization. And that in itself, if quite confusing.
What most annoys me is that OAuth is also very much about authentication, specifically outsourcing your authentication to a third party. It's not like OAuth has nothing to do with authentication, which is the knee jerk response you get from people when they attempt to simplify an explanation about what OAuth does and doesn't do.
I think that's the key point really. For someone with even a mild interest in this stuff, the task of running ethernet cables is very approachable and feasible. At that point the time spent on it is a plus instead of a minus, and paying someone else is no longer competitive.
I remember at the time there was a "Silicon Valley rides to the rescue" subtext. Though a Silicon Valley person myself, that rubs me the wrong way.
I appreciate the author resisting that framing, and admitting that on that first day, the contribution of the new team was basically to ask "Can we install New Relic on the servers?" The response to that question really illustrates the root of the problem, in my opinion. It's not that the people who built the original healthcare.gov were stupid people. It's that they were operating under constraints and incentives that emphasize rules and compliance over making the whole system work well.
Bit of both, really. We got https://en.wikipedia.org/wiki/United_States_Digital_Service out of the original Healthcare.gov disaster; if you've used login.gov I suspect you've found it one of the nicer government websites out there (for example: it's one of very few to explicitly advise against SMS 2FA, and to offer U2F support), and it owes its existence to the USDS.
> It's not that the people who built the original healthcare.gov were stupid people. It's that they were operating under constraints and incentives that emphasize rules and compliance over making the whole system work well.
Great comment. I spent the first half of my career in classical "silicon valley", direct-to-consumer businesses. Lately, I've been working in software in highly regulated industries. The differences between the two cannot be overstated. Even if you look past tiny (/s) issues like "you can't lose people's money, ever", you've got reams of often-conflicting regulations at the local, state and federal levels. Making a small text change to a webpage can require multiple lawyers.
This kind of well-intentioned stuff sometimes makes it impossible for the greatest engineers to do even simple things quickly. Work at the government level, and it gets worse, because now you have politicians reacting to your every change.
It's not that the people who built the original healthcare.gov were stupid people
Thats very kind of you but they probably were, if not stupid, then just large companies who routinely do these big government websites and mess them up and only just get them done years late and way over budget and don't really give a shit or face any consequences.
This NPR article says the original site was built by a bunch of different companies, most of them subcontracting out to other companies, and no-one really in charge of the overall architecture apart from some little subsection of Centers for Medicare and Medicaid Services who probably weren't ready for the cat herding required.
I can tell you that the difference between a site made the 'Silicon Valley' way and an 'enterprisey' site made by EDF or Acccenture or Fujitsu or one of those other big companies is a massive difference. Its a completely different world.
USDS, 18F, and the Digital Services Coalition have made remarkable improvements to how digital services are delivered for government.
Right, and thats because USDS and 18F kick out all the big old 'enterprisey' consultancy companies and bring it more in-house. And presumably the smarter version of in-house where you hire people who can actually do stuff. Digital Services Coalition sounds like a sortof in-between where companies can apply to join but presumably they are finding ways to avoid the crap old enterprisey ones.
Obama brought in Mikey Dickerson, head of google SRE, as well as several top people in Facebook (at least those were the people I had directly heard about) and humbly said "we failed, we need your help." From what I remember hearing he literally brought top valley people into the oval office and gave a "your country needs you" speech.
"Install new relic" really minimizes what was actually happening. "Can we install this tool that helps us understand how the system works" sounds a lot more enlightened doesn't it? Understand what's broken is the first step to fixing something and that's exactly what "installing new relic" does. If you don't have a monitoring system, that shows an extremely critical failure of technical leadership. It means no one knows what's happening. No one is measuring how things work. The google SRE book devotes chapters to monitoring, probably 25%+ of the book is explicitly about monitoring.
If you listen to Mikey present his work on healtchare.gov, he does not emphasize the solution, he emphasizes the problem: No one knew how things worked, no one knew how the pieces fit together, no one could measure why the site was performing so poorly.
The step after "installing new relic" was getting 50+ contractors in the same room and making them justify themselves. The contractors sold themselves to the government, but there wasn't a person on the government side with enough understanding to know you don't need 3 different load balancers at the same level for the same traffic and every system does need monitoring. How many different data stores do you really need?
If you listen to Mikey Dickerson give his presentation live, to the people he hopes to recruit, then it was very poorly managed at the technical level. Were those people stupid? probably not, but they were giving out lucrative contracts to any company that said they could help. So the makers of the original healtchare.gov (and by proxy the American public and Obama administration) were being taken advantage of for their ignorance.
> It's that they were operating under constraints and incentives that emphasize rules and compliance over making the whole system work well.
It's that trust was given to American companies who directly benefited from taking advantage of the government, and not to individuals who work on behalf of the public with specific expertise.
Mikey ends his presentation with a powerful call to action. If you aren't going to help the government with your own expertise, and make that sacrifice on behalf of the public, then some contractor will happily fill that role.
> Were those people stupid? probably not, but they were giving out lucrative contracts to any company that said they could help. So the makers of the original healtchare.gov (and by proxy the American public and Obama administration) were being taken advantage of for their ignorance.
Good points but that sounds to me like too simple a story. The people managing healthcare.gov weren't working their first day or managing their first project; you don't get that job without some experience. And if you have some experience, you know what you don't know, you know how to deal with those situations.
(I'm not implying some other specific story - I don't know what happened. I'm just wary of such simple answers.)
I saw Mikey Dickerson talk about this at the time, and he was very explicit that the engineers on the project were all perfectly capable, but they were hobbled by an IT services culture that made them all terrified to be the person blamed for any
mistake.
He also said there was no culture of real-time monitoring. They literally had a TV running CNN, and whenever the chyron said "healthcare.gov is down again" they would spring into action. Amazing.
I can't really phantom that. When I was six grade or something, I had old PC with Debian in my room as "server" to play around and serve some random stuff from. And I had real time monitoring on that.
Exactly this, it can be so frustrating for teams, when they've been repeatedly trying to tell people what the problem is, to have some outside expert come in, point out the same things and get them green lighted.
Potentially valid concerns, but not what copyright law was designed to address. Mazda can easily prohibit their own customers from doing this kind of thing in their terms of service if they want.