I went through this process yesterday, trying to create a city on a floating island in the sky, and it's so fun.
Basically, drawing sketches, editing (rudimentaly) in image editing software, img2img, edit, img2img, and a few more rounds, and you can get to something really, really cool.
What's fascinating is what img2img adds to the creative process. Text to image was pretty cool, but not super interesting to me. But seeding the output with something I've drawn, with a drawing befitting a 3-year old's work of art, really adds to it, because of the larger part you're taking in the output's creation.
It's like that story from the 50's when cake mix was first introduced, with a recipe of water+cake mix to make a cake. It flopped, and was pulled from the market. They reintroduced it, with a new recipe of water+egg+cake mix instead, and was a success. The added egg made it feel just that much more like cooking, and I think the same thing is happening here with img2img.
That was an awesome video. I can't wait for someone to inevitably start a The Joy of pAInting twitch feed, complete with chill commentary and occasionally ruining the canvas with a daring addition to the picture.
Yeah I've been using the same iterative process using img2img. Using AI removes most of the toil (tedious masking and colour matching and relighting images) involved in this kind of photo manipulation work. As these tools improve it will be interesting to see what professional artists can do with it.
And an example of a step in-between: The base was [2] which I changed into [3]. That was I think the last step before the final generation.
It's not great by any means, but it's miles ahead of what I could hope to achieve myself. The biggest problem was to stop stable diffusion from turning my flying island into a pretty standard mountain. It kept trying to connect it to the ground. Especially in further iterations.
Back in the 90s I bought a book that came with some floppy disks. The book was about 500 pages of clip art, and the disks were the actual images. At the time, people said such things would put graphic artists out of business.
What actually happened is that it put mediocre graphic artists out of business and highlighted the difference between one that was mediocre and one that was good.
I feel like this will happen again here with digital artists. The mediocre ones will be indistinguishable from AI, but the good ones will still stand out.
I think that's a good way to put it, but there's still a problem.
The path of a digital artist is long and arduous. For a time on this path, the artist may be considered mediocre, or to put it better, they are an apprentice.
Just as in other physical trades, an apprentice who is mediocre at their craft can still practice aspects of that craft well enough to be useful and earn some money. It is also through practice that the apprentice improves their skills. In this way, the apprentice is financially supported and even incentivized to improve at their trade, until one day they become truly good at it.
So what things like DALL-E and Github Co-Pilot and your clip art package do is displace the apprentice. With no path of mediocrity for the apprentice to walk, to earn a stipend for training, how then can they receive the financial support necessary to train until they're a master? They would need to already be independently wealthy or receive financial assistance.
In order to train more master artists and programmers, we would need to provide them with financial support while they train without us receiving anything useful in return.
It's interesting that you basically just made Andrew Yang's argument for Universal Basic Income -- that we need to redistribute the wealth of automation to all of society.
This is the perfect example -- with a UBI the apprentice no longer needs to get paid to learn. They can live off of the UBI while learning, until they are good enough to charge for their services.
1) We need people to do low level jobs. So if UBI exists, wages will need to rise until people are willing to do them. This will happen along with price raises until an equilibrium is found where poor people need to work in order to survive. No need for narratives about landlords raising rent, though it is possible. The poor people aren't in an overall worse position here though, because although they're still earning just enough to live, a portion of that minimum is now guaranteed. However:
2) By raising your domestic (or local) wages/prices, you've just given yourself an absolute disadvantage against every other economic entity in the world. Anything that is outsourceable is now more appealing to outsource than before. This removes jobs and puts downwards pressure on wages.
If everyone just "lives off UBI while learning" society won't function because the jobs they do are important.
You're missing the point. At SOME point, even YOU won't be able to find a job, due to robotics and automation, compounded by extremely high unemployment making even basic jobs like plumbing impossible to get. If that happens, we either just let everyone starve until the population drops to equilibrium, or restructure society to support people when there's no jobs for 99% of us.
It tipped decades ago: in the United States, the labor force participation rate for men has been in secular decline since 1950 (and perhaps earlier—-the Fred data only goes back to 1948):
There is a large difference between the labor force participation rate and the unemployment rate because the LFPR can’t be gamed. A stark way to think about it is to consider that, if every person currently looking for work were to fail to such an extent that they just gave up, the unemployment rate would drop to zero.
Yes, if you ascribe to the Singularity theory of technological development. The closer we get to the asymptote, the larger each automation jump will be, and we'll have less time to adjust as a society before another even larger jump in automation hits. Granted, at this point we're talking about basically being at AGI, but still, yes, the idea is that it will tip eventually
If our enormous economic engines were devoted towards efficiency rather than profit motive, I think we would be there.
How many appliances do we build to last for a few years and then break? How many economic resources could we save by building fewer products to last longer? If the economic engine were tilted towards quality rather than churn, we could be much more efficient about our use of resources.
The profit motive promotes efficiency by driving all prices towards marginal cost, as long as competition exists. Highly durable goods are not per se an efficient use of resources, but we probably disagree on the meaning of that word anyway.
If you wait until a post-scarcity to implement a UBI you will have violent collapse of society long before you reach post-scarcity. We walk a razor's edge of oppression - too little and society doesn't have enough peasants to grind into the machine to make the gears work, too much and the peasants destroy the machine.
And if you implement UBI now, you'll just shoot yourself in the foot and become less economically competitive because you're in a globalized economy. You cannot do this while you're still dependent on trade with other countries, which we are. You can do UBI when it really wouldn't matter if we didn't employ people in low wage jobs.
For all intents and prepared we already live in a post-scarcity society. We're just too shitty to make an effort to distribute the wealth, the food, the products etc.
The idea that high unemployment is likely is just an artifact of bad monetary policy in 2010.
The entire time, automation people like Yang have been claiming unemployment was already happening, and yet it was constantly going down. Unemployment in the US is now lower than it’s been in decades.
It's not so much that it's a bad artifact of something in the past, it's more speculation about what will happen to employment of humans if/when companies can just create human-level AI that operates 1000x faster than fleshy humans for digital/white collar work, and just buy a Boston Dynamics style robot with a similar AI in it for blue collar work
Why would the wages need to increase? UBI is additive to wages. It is not like welfare where one loses the money when one starts working. For the welfare state, you absolutely have to raise the wages to be above whatever the government is giving to those without money. UBI is explicitly intended to do away with that problem. In other words, if someone is willing to work for 20k a year now and we roll out a UBI that gives everyone 12k a year, then the 20k job is still an attractive option and would net them 32k. Now, it may be the case that the wage goes down to 8k which effectively leads to a UBI subsidizing the employers. That would be unfortunate and is a risk, but it certainly does not lead to a disadvantage compared to other countries in terms of employers though it may lead to disadvantages for attracting high earners.
A UBI also opens up the possibility of removing the minimum wage which not only allows for more people to obtain jobs, but also raises the competitiveness with other countries, potentially (it depends on whether the minimum wage is actually effective in raising wages above the market rate).
Doing a grueling low wage job because you need those wages to survive makes sense. Doing low wage jobs for the extra cash is not, because it's not a lot of cash. Money, like everything, has a diminishing marginal value. The first bunch of money is keeping you alive. If the government provides you that first bunch, your employer is providing much less value to you. Everyone preaches about how UBI will allow people to start businesses and learn skills. Well yeah, but that means they're dropping out of the labor force because they don't need to do those jobs.
How do you incentivize people to work? Pay them more.
In other words: We have to keep the slaves in poor conditions or else they'll quit working. The only thing we are able to provide to make them work, is a threat of death if they don't. No carrots, only sticks.
If you're in a globalized economy that has scarcity, yes. That is the unfortunate truth of it. And you're mostly deluding yourself if you manage to outsource your need for slaves to other countries and think you've done anything particularly good. It's a prisoner's dilemma over and over.
You already figured out the problem. If you agree to work a job for 20k, why would you not agree to work a job for 12 plus 8k? Earnings are the same, except now every net tax payer pays a subsidy for employers to pay pityful wages.
If mass unemployment was a substantial problem, this may well be an acceptable tradeoff, but in the current economy it is not.
This is backwards and incorrect. If you make $1M per year working very hard and now get $999k for free, are you going to work very hard for the extra $1k? No, because the marginal value of the $1k is trivial, but the labor effort is the same.
If you get just enough to survive from the government, and employers try to reduce wages because you'll net out the same, you probably won't accept the job. It's not worth your time. Even as a dirt poor person, your time is valuable. An employer needs to pay you more so that it's worth your effort again. You have much higher freedom to shop around too.
If the government does not pay you enough to survive (non basic income), you're still in a precarious position, but you are still in a BETTER position than you were without the funds. This will RAISE wages, which will in turn RAISE prices, until the equilibrium is found where the UBI is distinctly not sufficient and you need to work to survive. You will still be poor. You will make more wages, but have about the same level of real wealth. You will be a bit safer due the guaranteed portion of income.
Until globalization kicks in and makes your specific local circumstances much worse.
The assumption of UBI is that it is a wealth redistribution from automation, so
> We need people to do low level jobs
Is solved through automation and immigration (only citizens get UBI). Of course this is a major downside, because you end up with a slave class unless you make sure those immigrant workers are well protected.
Your point 2 has already happened. But the wealth still remains here in the US. So if that wealth were redistributed to the poor it would actually make things better.
> and immigration (only citizens get UBI). Of course this is a major downside, because you end up with a slave class unless you make sure those immigrant workers are well protected.
... Uh... yeah I would prefer not to have a slave class
> ... Uh... yeah I would prefer not to have a slave class
What do you think we have now, where lower class people need to work to survive? If slavery is having the option between work or death, then I don't see how our current economic setup is not producing slaves. Sure, we don't treat the lower class as property (at least not normally), but they are definitely forced to work under any reasonable definition.
The distinction with slavery is being property. Moreover, it is certainly not the case that people who do not work will die in our society. Being forced to do something to survive is not slavery in any meaningful sense of the word. It is the natural state of affairs.
Whatever wrongs you may think of the current capitalist / slavery-like conditions exist for the lower class now, they are not nearly as bad as the hypothetical society that relies on large amounts of labor of low wage immigrants who don't have UBI while everyone else does.
It's arguably the most regressive fiscal system one can imagine. A class of people who must earn an entire cost of living paycheck to net zero with their non-working, unskilled citizen equivalents.
The mental contortions people go through to redefine slavery to not mean property when that’s literally the definition is mind boggling. https://en.wikipedia.org/wiki/Slavery
People are not property and they are most certainly not forced to work. Wanting things and needing money to buy the things you want is not “being forced”.
I'm not necessarily a UBI proponent, but the interesting thing to point out here is that if a job is so essential that it is required for society to function, maybe it should be paying a whole lot more.
It is essential that the poor earn as little as possible so as to incentivize their labor, and it is essential that the rich earn as much as possible (by, for example, reducing their tax burden) so as to incentivize their labor.
> We need people to do low level jobs. So if UBI exists, wages will need to rise until people are willing to do them.
Andrew Yang's premise is that those low level jobs are increasingly being automated away anyway - meaning that no, we don't need people to do them.
Even without that premise, this argument presupposes that people receiving UBI will do so in exclusion to working. That doesn't really logically or practically follow; it's just as possible that people will work anyway because extra spending money is extra spending money. They'll work because they want to work, not because they're being actively coerced to work.
> This will happen along with price raises until an equilibrium is found where poor people need to work in order to survive.
With Yang's proposal, the "price raises" part is probably true, yes. However, that has nothing to do with UBI; instead, it has to do with VAT. VAT advocates oft insist that it's somehow "not a sales tax" and therefore "totally not regressive like a sales tax", but at the end of the day consumers are paying more than they otherwise would for goods - and since consumer spending is disproportionately higher (relative to income/wealth) for the working class than the ownership class (or low/middle v. high, if that's the terminology you prefer), that's going to have the same regressive tax effects.
However, a VAT ain't the only way...
> No need for narratives about landlords raising rent, though it is possible.
Not if the UBI is instead funded by taxing the unimproved value of land - a.k.a. a land value tax, or LVT. We Georgists tend to call that a "citizen's dividend", but it's just a special case of UBI: a basic income intended to compensate citizens for occupying less than their equal share of land value within a given jurisdiction. There are a lot of implications of this (I could go on and on about the economic efficiency and ethical justifications), but relevant to this conversation is that the lack of deadweight loss means replacing other taxes with LVT would if anything reduce the consumer-facing cost of goods by reducing the effective tax burden of those producing said goods.
> Anything that is outsourceable is now more appealing to outsource than before.
That has already happened, without UBI. UBI is if anything necessary because of outsourcing - again, because we don't need local people doing those particular low level jobs, because they're now being done overseas.
UBI also might even help correct outsourcing; it's a lot easier to start a business if you know that if it fails (like most businesses do) you won't be homeless and starving as a result, and that's exactly the sort of safety net that UBI enables.
> Andrew Yang's premise is that those low level jobs are increasingly being automated away anyway - meaning that no, we don't need people to do them.
Well Andrew Yang is wrong. That's not what automation does. Automation reduces the amount of skill required to do jobs, reducing both the amount, but also the value. You still need people, and often more people because it becomes economical to employ poor people at a higher scale.
> Not if the UBI is instead funded by taxing the unimproved value of land - a.k.a. a land value tax, or LVT.
A land value tax is a great idea, but irrelevant to what I was saying. We need people to do low wage jobs. If they get some wages for free, we need to pay them more to do the jobs. If we pay them more, then we need to raise prices on the goods in order to not go bankrupt. The natural level of wages/prices is the one where people need to work in order to survive. The tax system and funding of the UBI is a separate problem.
> That has already happened, without UBI. UBI is if anything necessary because of outsourcing - again, because we don't need local people doing those particular low level jobs, because they're now being done overseas.
Economic Comparative and Absolute Advantages are not binary events. Doing things that make domestic businesses less competitive across the board in a globalized international economy is suicidal.
> UBI also might even help correct outsourcing; it's a lot easier to start a business if you know that if it fails (like most businesses do) you won't be homeless and starving as a result, and that's exactly the sort of safety net that UBI enables.
It's just a naive thing to focus on this founder idea.
> Automation reduces the amount of skill required to do jobs, reducing both the amount, but also the value.
Which translates to one worker being able to produce the same output as what required multiple workers previously. And sure, you could hire three entry-level workers at $15/hour for the price of one specialist at $45/hour, but chances are high that the same automation that enables those workers to do the specialist's job at all also enables said specialist to do considerably more than merely triple one's output.
Even ignoring the above, automation doesn't cause demand to materialize out of thin air; if you're a widget manufacturer and your sales team is able to sell 10,000 widgets a day, then multiplying the daily output of each widget factory worker from 10/day to 100/day will necessitate one of four things:
1. Figuring out how to multiply customer demand at the current widget price
2. Slashing widget prices
3. Slashing factory headcount
4. Slashing factory wages
1, 3, and 4 all minimize COGS and thus maximize profit margins. Unfortunately, 3 and 4 are both much easier than 1 (since 1 typically entails considerable effort to execute), so those are the options most companies pick. Both represent a severe loss of worker income - and thus, both necessitate UBI to compensate.
> A land value tax is a great idea, but irrelevant to what I was saying.
Assessing where the tax burden lies - and the impacts on that tax burden on spending ability, and the impacts of that on demanded wages - is pretty darn relevant to what you're saying. If you're paying an extra 10% (or whatever) on everything you buy, then you're going to adjust your wage expectations accordingly.
> We need people to do low wage jobs. If they get some wages for free, we need to pay them more to do the jobs.
Good. We should be paying workers a lot more than they're currently getting. The American (and for that matter, global) working class has been chronically shafted under capitalism for centuries now; God forbid we get shafted a little bit less.
> If we pay them more, then we need to raise prices on the goods in order to not go bankrupt.
Or the management could take a pay cut. I have very little sympathy for the "but what about our profits?" argument when C-level execs of even small businesses are skimming enough money on the output of our labor to be able to afford multi-million dollar homes and fancy cars.
> It's just a naive thing to focus on this founder idea.
Doesn't seem any more naïve than the idea that workers will somehow manage to "pull themselves up by their bootstraps" in a socioeconomic system deliberately designed to ensure we're never able to accumulate enough capital to do so (even at all, let alone without significantly impacting our physical and mental health in the process). Entrepreneurship currently skews hard toward those who already have money. That's a problem which in and of itself needs solved in order for a society to actually have any semblance of that "equality of opportunity" to which "laissez-faire" capitalists pay lip service; that maximizing the ability for working class people to start their own businesses (be it as individuals or cooperatively with others) happens to also at least partially alleviate outsourcing-induced job loss is a nice side benefit.
> Is energy really scarce? Currently the entire world consumes about 165.000 TWh (Tera Watt per hour) which is a lot, but it just a tiny fraction of the energy we receive daily from the sun, which is about 174000 * 0.7 * 3600 TWh = 430.000.000 TWh. On top of this there is all the energy stored inside our planet and atmosphere, which has been subjected for billions of years to the sun’s energy transfer.
By this logic energy has never been scarce. Energy is one the most important scarce resources. Much of the world is currently undergoing an energy crisis.
> Is food really scarce? No, as it can be obtained by a mix of energy and chemical elements.
We have the sun, and there is soil, ergo food is not scarce.
> Are chemical elements really scarce? Let’s pick gold, an element which is notoriously considered rare. The total gold mined in all human history is 200.000 metric tons. But if we look at the abundance of elements on earth, even though the mass fraction of gold is just 0.16 part per million, knowing earth’s mass we can estimate the total gold on earth to be about kg * * 0.16 = metric tons. If we only consider earth’s crust, that’s about 100 times less, which is still a huge number.
Minerals that require more energy to retrieve than value they provide.
You state
> Now I’m going to state something which may either hit you as a profound insight or as an obviousness. Basic resources are not scarce per se, what’s limited is the ability to transform them and make them usable. The fact that we need a human to perform the job is what creates scarcity.
It's a really poorly reasoned thesis and your arrogance to call it a profound insight is just bad. If you have one human and you have a water pump that requires two humans' labor to retrieve one human's water, it is nonsense to say you have a labor shortage. You have a water shortage.That one resource can be used to acquire another does not meant there's only one resource on the board. Everything you've said about labor could be restated as useable energy.
Energy, food, materials, labor, land, time. It's all scarce.
I'm going to be level with you: I don't want to pay for someone's food and board so they can draw lines on paper (which won't sell) all day. Likewise, I don't expect anyone to pay for my food and board so I can do fuck all either.
If you want a living, earn it. If you want wealth, earn it. Might not happen with your favorite school of craft, but the vast majority of people don't/can't make money doing something they are passionate about.
It's called a basic income because it's subsistence living. Most people won't just live off it and do nothing. And those that would, well, they aren't really going to do much anyway if you force them to work, other than the bare minimum of the most menial labor.
So far every experiment in UBI has shown that almost everyone getting it does something useful with the money and doesn't just sit on it.
And frankly, I have no problem with paying someone to sit on their ass drawing lines, if it means they aren't starving and homeless.
He clearly doesn't expect everyone to be generous, hence why he advocates for UBI. UBI would be mandated and would therefore force participation from those without such generosity (according to their means, of course). By claiming that he holds a viewpoint which he obviously does not you've utterly failed to refute his argument. Perhaps you should seriously consider why his argument works and yours doesn't. You may come to a surprising change in your point of view.
This is the thing with automation, we're on a path to destroy most jobs that you can earn a living from self driving cars, automated kitchens (and ghost kitchens) self checkout, automated bookkeeping and mid level managerial positions, all of those are more or less set to be automated on the close future
Even if that only kill half the positions, we're still looking to a situation where humans overall don't have anything attractive to the market, if you can't earn a living wht would you do?
i don’t think your point is a valid retort; when you’re paying a landlord you are receiving something of value that you want for yourself, just like when you pay for a cheeseburger.
> when you’re paying a landlord you are receiving something of value that you want for yourself
No I don't. I receive a temporary lease to something of value that is fundamentally necessary for meaningful existence in modern society. Landlords are pure middlemen - and while there's a place for middlemen in society to provide initial capital, at some point that value dwindles down to zero as that initial investment is repaid, and then dwindles past zero as the landlord continues to parasitically rent-seek despite contributing nothing that the tenants themselves could accomplish for far cheaper.
Your retort to my retort would be valid (or at least actually equivalent to your cheeseburger analogy) if - in exchange for my rent checks every month - I received ownership stake in the property and/or the company that owns it. Such an arrangement has more in common with a housing cooperative than with a typical landlord/tenant relationship.
This focus on other people "earning it" almost seems religious to me at this point in our evolution, especially as we look forward towards automation potentially creating plenty. If we need people to work jobs, great, but why confabulate jobs just so people you can feel good that other people aren't getting their food and board paid for?
I agree with this sentiment, always have, but I always like to probe for issues with it.
When "earning it" takes much more than it used to due to technological shifts or otherwise, the only ones who can afford to walk the path toward mastery are the very well-off. This of course violates the modern western liberal ethos of equality for all, particularly in regards to educational pursuits.
We end up with a McDonald's worker class, their menial profession determined from birth, and their noble masters.
Maybe c'est la vie and there's nothing we should or even can do about it. But it's unpleasant, to say the least, knowing there's an entire class who's destined from birth to perform cheap menial labor their whole lives, without the slightest hope of doing anything else. After all, slavery is necessary for civilization, always has been.
seems like you're missing the main idea behind ubi? if automation gets good enough at enough things, there might not be jobs for everyone to do. if, when, where, and how the above might happen are up for debate - but your post just sounds like typical anti-welfare nonsense
I work in tech, and while it's mostly meetings and leaning on some knowledge of various Java and SQL use cases, as well as some niche knowledge of crappy languages like D, I probably don't work as hard as someone scrubbing the toilets or making the beds in the local Marriot hotel.
I can accrue money doing what I'm doing - they can't.
Perhaps seek recourse in one of the vast lucrative industries created from scratch in your lifetime (video games, b2b software, smartphones, internetworking, robots, ... )?
I don't think UBI can work. I love the idea of it, but as soon as you have a democracy and state provided income, it's too easy to vote yourself a raise. Look at countries like Argentina, Spain, Greece, Venezuela, where there are or were a huge percentage of people on the dole. Their economies collapsed. You can't keep squeezing a small upper middle class to pay for everyone. We're pretty much at the limit there as it is in many countries. I think UBI is fundamentally at odds with human nature in a way that would prevent it from being successful at scale.
If Native Americans in any significant number are receiving UBI already, then that's news to me. They're disproportionately in poverty - and thus disproportionately eligible for various strings-attached welfare programs like food stamps - and tribal members / reservation residents receive some additional pittances of federal funding as a sort of "sorry we stole your lands and attempted to genocide y'all, now shut up about it and take this spare change we're throwing at you", but that's about it. The only thing even vaguely resembling tribal UBI of which I'm aware is annuities from casinos, and last I checked those don't come anywhere close to even meeting the poverty line, let alone sufficient for anyone to live off them - i.e. they don't satisfy the "basic" in "universal basic income".
UBI is the solution, not the problem - and on the topic of Native Americans, funding it can and should start by taxing the unimproved value of the very lands we conquered from them.
The assumption is that most basic needs will be provided by automation, not humans, hence the need for a UBI. Also, immigrants, since they don't get UBI (but hopefully get a lot of protection so as not to become a slave class).
That’s a pretty stupid assumption given how many of the jobs that support basic human needs are labor intensive (home construction, farming, medical attention, infrastructure construction for energy and water, etc, etc).
We aren’t even close to automating basic needs. We can certainly automate the manufacturing of some complex individual items though. Seems like a pretty fundamentally flawed assumption of UBI.
Just to illustrate what the problem is using an extreme example: Oh good, we made it so anyone can turn the whole of the earth's crust into paperclips with a push of a button in a fully automated way that doesn't require any human labor and the energy to do it is completely sustainable. Hmm... Maybe that wasn't such a good idea.
I'm not understanding your objection. Would you be able to fill in some of the steps between "UBI" and "the entire earth's crust has been turned into paper clips"?
I'm guessing that they're imagining we'll give the productivity AI too much leeway, because it prints money so nobody has to work, until it goes unchecked and eventually starts making decisions contrary to typical human interests, because we based its reward function on profit instead of understanding what people really need/want to be happy.
Let's say that we have completely automated fishing boats. They can trap every fish in the sea. We give everyone UBI. They all decide to eat fish. No humans have to work or do anything to completely remove all fish from the sea. Is this a good idea? In previous eras we were constrained by the need for human labor to do all these things, but now AI does it, so we can have as much of it as we want until the natural resources run out. This creates problems with sustainability however, so how is that controlled?
This is a problem we already have to deal with: people got rich enough that they could afford to pay people to overfish the oceans, and we responded by limiting how much people are allowed to fish.
That is, I don't think UBI adds a new problem beyond "how do we make sure that humanity properly accounts for externalities" and "how do we make sure that AI does what we want it to do".
I was using fish to make an obvious example. The answer is regulation, but there are so many things like fish in the world. Do we have to have a regulation for every single one? It seems like it will end with whack-a-mole micromanagement of everything. It almost seems like we'll get communism eventually out of it. Except there is no all labor is of equal value, because there's no labor. I wish there was some alternative.
I'm sorry, I just still don't see the path by which UBI leads to a dramatic increase and how much regulation we need? Would you be able to give an example that is specific to UBI, and that describes a problem we wouldn't have without it?
Economists are still divided on the subject. So far localized experiments in UBI have not caused localized inflation, but it's hard to tell since it's small scale.
I'm not so sure it would actually happen though. We already give support to a lot of poor people through various programs like EBT and Medicaid. This just converts that help to cash, which gives people more freedom on what they want to spend on.
The problem with UBI experiments is that it hasn’t been U. If it’s localized and small scale it’s obviously not Universal, which makes it hard to draw conclusions.
With tools like stable diffusion, mastery means something different than it used to. Now a master is understands style and composition, who knows how to use the tools effectively to produce stylish, well composed images, then has sufficient editing skills to clean up/paint over tool output in order to produce professional results.
But the problem presented in GP still exists right? Not everyone starts out with a good understanding of style and composition from the start. They need time to master those skills but also need a line of income to survive till then. If mediocre level work is all automated someone starting out might not get the time to reliably skill up
Learning composition and style is a different thing than developing technical skill though. You can learn the principles of composition/framing and get a good survey of art styles in months, compared to technical skills which frequently take years and years to develop. With this tech, you could start out as an enthusiast generating your own art, then get hired as an assistant of sorts to do low level prompt and input "exploration" for a head artist in a sort of apprenticeship.
The general public are not going to be the ones using these tools for the most part. Sure some people will try it for fun and some will become hobbyists but the ones actually producing salable work will be in the industry or attempting to get into it. So now instead of drawing a bunch of crude pictures and selling them for cheap they’ll be generating those images with an AI and selling them cheap.
All of said tools having the most godawful interfaces and documentation known to man if the JS ecosystem is any evidence of the direction things are going.
Was there really an apprentice level for digital artists before AI models? I know somebody who does a lot of digital art as a hobby. They have spent years and years working on stuff for their own enjoyment, and to hear them tell it they're only now reaching the point of marketability.
What's the market for mediocre art today? I long ago worked on tech for magazines, which would sometimes use adequate commissioned art to jazz things up. But that was before the rise of vast stock art collections that were instantly accessible. Looking at some popular web-based magazines, it seems like the still commission the occasional original illustration, but that it's mainly stock photos or photo-composite illustrations.
Learning to play an instrument, to draw, sculpt or basically anything is hard, or at least it takes time.
There was never a market for mediocrity... but people will happily pay for exposure (to play in a bar, rent a space as a gallery and so on).
The problem is that even for good art it's hard, and it has always been. The rise of accessible stock art doesn't help, and AI will not.
Still one point is important: if you want to create something new, and not reassess (derive) the same thing, I guess we (human) have still a place. At least for now.
I don't know about digital art, but centuries prior, traditional artist training followed the apprenticeship model just like any other trade at the time. Leonardo da Vinci walked this path.
And less than a century ago, scores of people drew background images for Disney animated feature films, with the better ones getting allowed to draw main characters, and the best having final say in accepting or rejecting drawings.
I guess the same happened with those creating and animating 3D models for the likes of Toy Story.
Sure, but we're talking about the near-future impact of AI. My point is that I don't think this is going to make much of an impact on available apprentice positions. I'm not worried about da Vinci; he won't be harmed by this.
The thing is, the job of the digital artist of the future will include Stable Diffusion or whatever comes next. The apprentices of tomorrow will be apprentices not at creating this kind of work by hand but at using these technologies to make art. The problem is there aren't any masters at this, yet.
As an aside, I'm not sure I'd use the term "apprentice" -- I'd maybe say "junior" as in "junior developer" or "junior designer". They're learning how to make good work, but they're still a professional in the field.
I think this argument is a bit like saying that mechanized looms will destroy the art form of weaving. The apprentices won’t be able to practice their craft by slogging away at menial tasks but they will be able to learn the new craft of AI enhanced art and coding. They’ll need to learn a new set of skills along with some of the old skills in order to succeed and what they produce will be magnified by the power of this new technology.
I think the apprentice model will still exist, they’ll just use AI to aid them. Only the very experienced, talented artists will know when AI is hindering them. Same way a really good programmer will understand when not to use a web framework or whatever, but an inexperienced programmer who knows how to make a crud app with django or whatever is still valuable.
While I'm overall of the mind that nobody can really know what to expect, I think this may be a little more accurate of a reality than the aspiring Cassandras are trumpeting about. I'm willing to bet that most of the people who are going to utilize these technologies are people who never could afford a graphic designer to begin with. I'm sure some of the market will be lost to this tech, but a blackbox that shoots out passable images just isn't going to cut it for certain areas of the art market.
I also wouldn't be shocked if a big portion of the market for this technology ends up being the artists themselves. I personally know a painter whose creative process has been overhauled by DALL-E. Brainstorming the next project is easier than ever, and unlike the DALL-E images inspiring them, the resulting paintings actually have the "human-touch" necessary to illicit the deeper emotional response that a good painting should bring about. Adding depth to a model doesn't necessarily add depth to the output.
But like I said, I don't think anyone really knows what's going to happen. We'll see, I guess.
When new technology is introduced people always talk about how existing workflows are cheaper. That's true, but the biggest impacts are always driven by things that were previously cost prohibitive that now become possible.
YouTubers are a perfect example of that, it used to take entire television broadcast studios to do what they do, and now it can be done solo or with just a few people. And YouTubers are exactly the audience for this new tech - the smaller ones who want to put out branded merch can not currently do so at a level of quality. But you hire one artist to go out and make 5-100 pictures of your branding, tell them to take their time and make those few images to perfection, and that can now be molded to anything they want to create.
Indie game devs must be thrilled. Aspiring indie directors looking to make green-screen backgrounds are thrilled. VRchat users looking for 3d models are thrilled.
What’s interesting is the comparison- YouTubers can fit a niche that TV never could - specifically because they can produce content easier without people wandering off.
Though much of YouTube can be summed up as what was the three cheapest TV shows to produce - standup comedy, talk shows, and howto. The amount of YouTube sitcoms is a much lower number.
There was an example on a Splendid Diffusion post earlier. Someone using SD for generating recipe pictures, https://news.ycombinator.com/item?id=32644800 where they almost certainly wouldn't hire an artist.
I think putting some graphics artists out of business is one side of the coin. The other side of the coin is creating new jobs for people with good visual taste and imaginative ideas, who did not pass the skill barrier previously required to actually create good art.
Presumably we need fewer of the new more efficient jobs to displace the work done by the old jobs. But that lowers the price of ”mediocre” graphics, and therefore increases demand; maybe we actually end up with more jobs at the more accessible level.
These market dynamics are quite hard to predict but either way, bad time to be an entry level professional artist, great time to be just about anyone else.
Look at music production: historically the barriers to music creation were the dexterity and practice needed to master an instrument, or multiple instruments.
But when you put tools in the hands of more people without the filter of needing the time, training and skill to coax the sounds in your head from the instrument you have… and suddenly you get stuff nobody was making before.
>The other side of the coin is creating new jobs for people with good visual taste and imaginative ideas, who did not pass the skill barrier previously required to actually create good art.
I mean. Do you think that there are people out there with good taste in software and imaginative ideas, who can’t pass the skill barrier currently required to write code?
I think art is a combination of physical skill and conceptual taste/ability, among other things. The skill part is non-trivial.
Software seems like mostly thought-stuff to me, there’s no mechanical skill-based piece. But even so, when AI can generate a full app with the ease of iteration displayed in the OP, then sure, I think you’ll see some people with app ideas generating those apps themselves, instead of having to hire contract developers. Right now just completing stubs of functions doesn’t seem useful enough to allow someone that can’t code to make an app.
What happens is, story tellers get empowered with each and every advancement that makes their process easier. Only the technical people with niche focus get screwed.
Over the history of humanity the printing press, the photograph, the computers etc. destroyed some profession only to make something else flourish.
Finaly a fair point. I dont know for sure and some here do, but my feeling is that we are still in a Moore's law in this art generation thing. If thats true, it means 10 years from now the AI will be able to mimic the very bests to perfection. Taking a pencil will become hopeless.
20 years from now they'll click a button and you'll get a fully randomly generated pixar movie thats as good as the originals.
Im software engineer and amateur illustrator. I have always welcomed technology. Always felt good about it. Copilot? No worries, please automate my job, if humanity dont code anymore, I dont think its gonna kill us the slightest on the contrary. Art? Mark my words, this wont go well with people souls. This is an obvious evil mistake. Im still confident some wises will stop this heresy before civilization collapses (I like to dramatize like that but still this is bad imo)
Especially when you realized that those magical image AI put out comes from hard work of fellow artists that it aim to replace.
I think non-artist people will ever understand this pain.
I'm also the type that welcome new technology into my workflow, always one of those early adopter, but I have a hard time this time
...or maybe I'm just overthinking (Life finds a way!, right?)
Highly unlikely, unless by “mimic” you mean “vaguely evoke”. There is no actor behind these models. Recombination is only a very limited form of intelligence.
Maybe we should go even further. There is no creativity in whatever that thing generate, it is always sophisticated plagiarism. Therefore training those models out of close regulated research environment and selling the output should be prohibited.
I actually have zero problems with that happening. Also, your opinion of your own ability to use sarcasm successfully is at the very least highly suspect.
You're right about that. At the end of the day it is a tool in the craft domain of art. Artist take a long-ass time to perfect their craft end of the business, but ultimately what sets great artists from mediocre ones is not craft, it's their taste. Own (good or bad) taste pushes artist to not stop working on their image/painting until their own taste is satisfied. Same will happen no matter the tool. I am actually terribly excited by these tools to expedite sketching, not in a sense of speed as much as volume while hunting for those directions that satisfy that inner taste.
Then I found that the end of the era of handmade digital art is coming. Only transistors are limited and future digital artists will differ in memory size and teraflops like bitcoin miners.
Yes and I think it will especially push the boundaries of art forward like we have never seen before.
Also, it will be interesting to watch whether and how people will be assessing art created by AI. Will there be something like Connoisseurship for AI art?
The cynicism of some people in this crowd never ceases to amaze me. This stuff is nothing short of mind blowing. If you are someone who is about to comment "meh," you probably need to take a step back from the keyboard.
It's funny, but in a different genre, because we should expect Krugman to be clueless, whereas in theory people on geeky tech websites would be better.
Similar to the HN'er who dismissed dropbox as an ftp clone (iirc), some geeks live so close to the tech that they can see how a product is simply an iteration on what they understand/own etc. and miss the marvel of the entire package as a mass market product.
I think it's a pretty natural mindset to have for many people who work as software engineers. If you see something useful or interesting, the first reaction is often to notice what needs to be fixed or improved. It's also a field prone to nitpicking
I'm firmly in the moderate camp of "this stuff is incredibly cool and promising, but we're nowhere near the overblown predictions of human artists becoming obsolete".
The only people who think this are people who have no experience in the field.
The creative process, like many kinds of fields applicable to business is about problem-solving. AI text-to-image generation doesn't replace that function, however it does form an excellent tool, especially when it comes to rapid conceptualisation.
This will allow more people to be creative problem solvers without needing to possess technical skills in image creation. Much in the same way that graphics apps allowed people to make image without needing to learn studio or art skills. Or DTP tools allowed more people to publish without the tedium and high set up costs.
I will still be hiring illustrators and designers, and this may be one of their tools, and it would be their responsibility to be experts in it, but it doesn't replace them - it makes them better illustrators and better artists.
The right way to think about this is not that it shrinks the field, rather it opens it and accelerates it - for that it's a very welcome addition.
No creative is scared of this - they're looking forward to the next-generation approach, and it's clear that 2D images are not the end point. Soon we'll have 3D (already in progress), soon we'll have music, soon we'll have this for animation and programming.
People fiddling with the technology have noticed some obvious short-comings, such as getting consistent results - for example it's currently not possible to develop a series of story boards where the character is obviously the same. Instead some level of reseeding the image with the desired character or outright recomposing the graphics later is needed.
These aren't things that can't be fixed however, what we're seeing now is definitely an exciting new tool in its infancy.
Travel agents are actually a great weird example of what might happen... we didn't really get rid of 99% - more like 80-90%. They just stuck around as good salesmen who use Expedia etc better than the average person and (in theory) destress the process. Just like artists of the future almost certainly will be quite good with these AI tools and use them primarily - but maybe have a bit of artistic talent themselves.
We've been promised self-driving cars since the mid-2010s, while AGI was supposedly only 5 or 10 years away (nowadays almost nobody mentions it anymore). All that to say that a little skepticism is understandable.
Back to the subject at hand, and as other people have already said, the generated images seem to lack emotion and "feel". They're good to maybe put as wall-art in a rented out AirBnb, but that's pretty much it. Still a cool thing from a technical perspective, though.
I think it's fascinating, and at least to me completely unexpected, how good these image generation models are. What happens when we have a 42gb model, or a 4.2tb model?
For comparison, the human brain is estimated to have a (equivalent to a computer) capacity of about 2.5 petabytes [0].
I think I read in the past that the human mind holds memories in a picture like way, I wander if that's why these image based models are so incredible when compared to the text models.
Maybe we are in a new "Moore's Law" like period where the complexity and size of these models is going to double something like every 18 months. It's going to be fascinating what's possible in a few short years time, I fully expect to be continually surprised.
I'm looking forward to seeing a video model trained on 10 second clips, someone somewhere is working on it.
The size of neural nets grew 3000x in 10 years - from 60M params AlexNet to 175B params GPT-3. Thats about 2.23x per year. Moore's law was a doubling every two years, that comes up at 2^(10/2)=32x.
Model size scaled 100x faster than compute over a decade. We are paying for this difference by using more energy and hardware, but it's already too expensive to train except for a few labs, and deployment is restricted.
Can't even load GPT-3 on most computers. Stable Diffusion is an exception, they did a good job and were lucky the model can be so small.
There is evidence that the performance of the models scale linearly with size, so moores law scaling is likely to get us some “free” improvement even if no one ever invents a better ML technique.
Wow, somehow I missed this one with all the whirlwind of image models recently. It’s very illuminating how the capability keeps scaling in their examples.
If you read through the research papers, a big section always deals with benchmarks. That’s because the output needs to be quantified in some way in order to improve the models. Several benchmarks have been proposed for text to image models.
That makes sense, but that would imply that there's a limit, right? Once the image is pixel-perfect and outputs the optimal image, what does increasing the model size do? Who and how can decide: "yes, this is more Picasso looking than that one", or "this one indeed looks more energetic", or "this image does make me sadder than this one". How do you benchmark this?
Yes you are on the right track. Once you get really close to a perfect score on your benchmark you can no longer improve so you need to develop a better benchmark with more headroom. And you have the right idea of how you go about benchmarking subjective quality. A bunch of humans produce output-scoring pairings and the model is judged against that. To train an AI you need a very measurable goal and in this case the measure is “humans like it.”
If you are noticing that this seems to fundamentally limit model performance on certain tasks to aggregate human capability, you are noticing correctly.
To give you some idea of what these benchmarks look like, here’s the prompt list from DrawBench which Google created as part of training their Imagen model.
Also, after a point the differences will be more given by the specific individual that views the image, not by what the AI can generate, so the AI would have to optimize it's output per individual and would need to have a deep understanding of them.
I realize we are not done with it yet. There are new process node launches planned for the next few years and each processor generation continues to improve density, power consumption and price per transistor.
I’ll hold off declaring it dead till it is well and truly dead. And even then we could expect cost improvements as the great wheel of investment into the next node would no longer need to turn and the last node would become a final standard.
As to physical limits, there are plenty of weird quantum particle effects to explore so that seems overstated. We are still just flipping on and off electromagnetic charge. Haven’t even gotten to the quarks yet!
You can draw a straight line right through this log scale plot that goes to 2020. Not sure what definition of Moores Law you are using, but it doesn’t seem to match the one on Wikipedia.
> I think it's fascinating, and at least to me completely unexpected, how good these image generation models are. What happens when we have a 42gb model, or a 4.2tb model?
First 'how good' is an ill-defined metric -- that it seems in this case is a measurement of how much surprise and wonder they generate in the audience.
Second, it might just be that the real wonder of the models is the their compression -- that is there is a space of mappings of line drawings and simple descriptions into art and this technique was able to lossy compress that space down to 4.2G. If you only compress it down to 42G, you'll be looking at the JPEG that's 90% compressed instead of 99%. Yeah it will be better but incrementally, not necessarily "Wow!" better.
Honestly it's not obvious that it will be better at all.
> For comparison, the human brain is estimated to have a (equivalent to a computer) capacity of about 2.5 petabytes [0].
That's a terrible and basically non-sensical comparison to make.
Here is my perspective on these kinds of images. This kind of 'picture' usually comes from speed-painters who incorporate techniques like photobashing. As in integrating 3D models and RL photos into their composition, or just painting over a 3D picture entirely.
It was already a genre that highly incorporated computer assisted methods. There is a lot of doom and glooming going around, but honestly the modern process of creating 'concept art' was already extremely commodified and efficient. These weren't exactly your idealized vision of some artisan craftsman laboring weeks over a picture, they churned this stuff out in a few hours (if that)
I think you're not grasping the magnitude of the change. Creating even an average quality speed-painting requires tremendous amount of expertise in drawing, painting techniques, composition, lighting, perspective. It takes years of training.
These models let anyone achieve similar results in minutes. Without any prior learning. It is not even lowering the bar, it is literally dropping the bar to the ground.
Besides, stable diffusion is able to generate not only painterly scenery, but also photographic images that are almost indistinguishable from actual pictures (certainly helped by the fact images have a heavy digital look in our era).
I agree. People aren't grasping the magnitude because they're thinking about jobs. Jobs a silly way to measure this. Jobs are temporary. Nobody worries about the mechanical stocking frame making socks anymore.
This is more like the literacy/printing press transition.
Used to be, people had to learn to memorize a lifetime of stories and lore. Now nobody learns to make a memory palace or form a mnemonic couplet. Why would you bother? You just write things down.
Today, people learn to draw. In a generation, why would you bother?
There will still be specialist jobs for people generating images, but instead of learning to make them up, the specialists will be very good at picking them, suggesting them, consuming them.
Humans will be the managers and the editors, not the creators.
The same thing will happen to other arts. First (and very easily) to music. Eventually, perhaps, to writing and whole movies...
The only thing stopping that is that the models can't maintain a reality between frames. They can't make an arc. It's all dreamlike.
If we find a way to nail object persistence it will be a singularity-level event. The moment you can say "make another version of this movie, but I want Edgar to be more sarcastic and Lisa should break up with him in the second act" we will close the feedback loop.
I mean it sounds pretty cool to be able to fork a film and create different iterations and mashups. Maybe if you create a cool enough scene the director will merge your PR back in.
I agree. It is more than just "lost jobs", like artist impressionists, court room sketch artists, etc. it is a complete dystopia and it doesn't help artists at all, but displaces them. At least the value of actual paintings will be more valuable that the abundance of this highly generated digital rubbish.
So given that the technologists have so-called 'democratized' and cheapened digital art, I really can't wait until we get an open-source version of Copilot AI that would create full programs, apps, full stack websites with no-code so that we would be seeing very cheap Co-pilot AI shops in the south east of the world generating software that effectively eliminates the need for a senior full stack engineer.
Easy cheap business solution for the majority of engineering managers on a tight budget who know they need to offshore tech jobs without the need for any skill as it is offloaded to cheap Copilot prompters.
So we will have no problems with that and be happy with that dystopia. Wouldn't we?
People can already use websites to create simple websites but that hasn't really displaced web developers because our needs keep changing. But it has definitely help people bring their businesses to market much easier. You don't need a whole lot of IT knowledge anymore to start an online clothing business. But that definitely means jobs have been displaced, though in reality, they tend to move rather than simply disappear.
Being an artist isn't really being able to draw well, it is able to do a lot more than that in harmony, and so I believe these tools will just get incorporated and some new artists will appear and older artists will adapt.
My only worry with this, and it's not something that I see being pointed out too much. Is that due to these models being able to produce art from previous art they've seen we might find it difficult to come up with new novel styles. But then again, this might precisely be a new kind of avenue for human artist expression.
Not disagreeing, just wanted to point out that this is already happening in some niches. It used to be that you had to hire someone to make you a webpage, and they had to use PHP or whatever. Then came WP and themes - and you had your page made by some youngster for peanuts.
But I think society will find a way. Who knows, maybe we'll all work less and enjoy life more? One can hope.
Speed painting also usually involves practicing certain scenes. This method anyone can use to create any new scene that they can imagine and the result looks quite good with some patience. Seems like some people are overly pessimistic but to me this seems like we’re on the cusp of something truly disruptive in the arts space. And it’s not NFTs. Remember that last year this would have sounded mostly like sci-fi unless you were following cutting edge research.
In the realm or “real” art I’m actually very excited since I believe there are hundreds of very imaginative and patient people who just can’t paint well but will be able to create new art with tools like this. It can also synthesize new and alien things.
> This method anyone can use to create any new scene that they can imagine and the result looks quite good with some patience. Seems like some people are overly pessimistic but to me this seems like we’re on the cusp of something truly disruptive in the arts space.
A race to the bottom and the cheapening of 'art' in general for the sake of replacing artists is a shame to see and nothing to celebrate. I was against both the gatekeeping of GPT-3 and DALL-E by Open 'faux' AI. But now it seems that every-time an open-source alternative or version was released into the wild, it seems that the uses become even more dystopian; especially with DeepFakes, fake news propaganda / articles and catfishing with generated hyperrealistic faces.
> And it’s not NFTs. Remember that last year this would have sounded mostly like sci-fi unless you were following cutting edge research.
Stable Diffusion is the reason why JPEG NFTs will always be worthless. Both of them will fuel JPEG NFT prices to the floor value of zero. But as NFT proponents cheered in believing that they will help artists, here we are seeing DALL-E 2 and Stable Diffusion fans screaming that it will help artists. No it will not.
> In the realm or “real” art I’m actually very excited since I believe there are hundreds of very imaginative and patient people who just can’t paint well but will be able to create new art with tools like this. It can also synthesize new and alien things.
This isn't the 'democratization of digital art', it is the complete devaluation and displacement of digital artists and it now makes 'real art paintings' much valuable.
> I was against both the gatekeeping of GPT-3 and DALL-E by Open 'faux' AI. But now it seems that every-time an open-source alternative or version was released into the wild, it seems that the uses become even more dystopian;
So are you still against gatekeeping? Are you in favor of releasing AI advances to the wild?
I am still against OpenAI’s gatekeeping and gave AI itself a chance to be more used for good and significantly less dystopian.
Even with the release of GPT-3, there seems to be very little good in such a system despite it being generally underwhelming at generating convincing sentences. However with DALL-E 2, it has gotten much better for worse on digital images, to the point where even gatekeeping it would spur on an open source competitor superseding DALL-E 2 anyway.
But it was actually after the release of Stable Diffusion that done it for me when most here hyping just want to aid the race to the bottom and at the same time are screaming that it will help artists when (like NFTs) it won’t.
So looking at both DALL-E and Stable Diffusion, it is yet another contribution that advances the dystopian AI industry which will just be used for fake news, surveillance and catfishing. Worse part is that they haven't built any detectors for this.
Rather than ignoring the several conditions I mentioned, Read again on what I said:
> I am still against OpenAI’s gatekeeping and gave AI itself a chance to be more used for good and significantly less dystopian.
> Worse part is that they haven't built any detectors for this.
So it is neither. If a given AI project has no detectors or a straight indicator of knowing that it is generated by an AI, then the whole project should be effectively scrapped and cancelled, postponed, etc until it has one. It is that simple. And No. DALL-E 2's tiny watermark doesn't count.
'AI researchers' know the dystopian scam that they are creating and they know that they need detectors and analyzers for them to significantly reduce the risk of malicious use. So it doesn't matter if there are others that are more powerful as the conditions are still the same.
I think you're directionally correct, but overstating the case in a few ways.
One, as a not particularly visual person, even this example involves some skills of composition and perspective. If you asked me to do something practical, like creating an illustration to go at the top of a blog post, I would not do nearly as well as somebody with art skills, and I would take a lot longer.
Two, this is the beginning. In the same way that digital artists took tools I could use and got really good at them, I expect the same will happen here. What will a good artist be able to do with a solid workflow and a few years of picking up tricks? Given the opaqueness and quirkiness of models, I expect a person who puts in the time, especially one with a strong command of art styles, composition, and the practical uses of visuals, will be able to run rings around me.
Three, people are quite accepting of AI images right now, but they're novel and exciting and decontextualized from how we normally use images. That's a playing field that advantages the novice. But what happens once these images are no longer fun and novel, but boring and overdone? As we learn to discern novice-grade work from what real artists can do with AI assistance, I think our bar as image consumers will rise.
Spending a lifetime learning img2img and using weeks to create a single artwork will always beat someone without experience who creates an artwork in an hour. There will be only a handful of people who will put in the time to become true masters of img2img. Everything comes always down to how much actual physical time someone is willing to put it. No matter how advanced the tools become, there's always a learning curve to mastery, which only a few people are willing or able to climb.
But it didn't drop the bar on the ground, it raised a new bar. People without computer literacy and/or basic programming skills won't be able to pull this off. Even using Photoshop (which I believe does/will integrate this new technology) is not easy/possible for some who can actually draw. Plus, how many regular people have access to the machine with 12GB of VRAM?
The method shown in this demo was already simple enough to teach someone to do in an afternoon. But we're only a week into the release of SD and we haven't gotten to all the sure-to-come GUIs that will pack the model into an idiot-proof application.
Think about it, give the user a few basic, MS-Paint level pencil tools, colors, shape makers. Ask for a description, the application can even push you in the right direction for putting together good, detailed prompts, gives you a list of art-styles, artists, filtering methods, etc all with reference images so you don't need to memorize names. You can zoom into sections of the image to work on independently (like the birds in the article), then blend it into the greater image. Drag and drop image files onto the project and iterate on them.
Implementing the glue to simplify the "tough" parts of this process is honestly pretty trivial.
The parts that are inaccessible right now seem incredibly easy to overcome.
Using a CLI-based tool is inaccessible for most people... but building a GUI around this would be very easy. I'm too lazy to google it, but I would bet someone already has a GUI, or is working on one.
12GB of VRAM may not be accessible on most computers, but there's nothing innovative about offloading that task to an EC2 instance. It just requires an opportunistic developer to tie the pieces together.
I would be monumentally surprised if Figma/Canva/InVision/Adobe are not already working on this.
> Plus, how many regular people have access to the machine with 12GB of VRAM?
Probably not many in general, but the RTX 3060 has 12GB or ram and it is around $350. And I saw a RTX 2060 12GB for $250 the other day. That's a pretty reasonable entry fee IMO.
Few have those high end cards, but they don't need to anymore. Huggingface is saying it needs 12gb, but the source was forked with some smart mods to chunk the loading on to the GPU.
Itll comfortably run on 6gb now. gtx1600 series cards need to run in full precision mode to produce output. The HLKY fork has improved the Gradio GUI and integrated realesrgan and gfpgan for those with beefier cards.
Someone else also figured out how to load and run it all on a CPU, so pretty much anyone can in theory run the model now.
There is an elaborate Colab notebook linked in the HLKY repo that seems to get more point and click user friendly every time i look at it. I think it even launches the gradio webui so you can use the Colab instance with a webui remotely.
You can run the model on your CPU quite easily, and a lot of people have access to 16gb machines - it's much slower, for sure (10minutes/50passes on my old gaming pc), but it's still much faster than drawing things of the same quality by hand.
It dropped the bar to the other side of the planet. There are so many people computer literate that can pull this off. You could pick 5 people off the street that can follow these instructions and 3 of them would. VS the old way, you would have to pull off 400 people off the street and you probably wouldn't get this result unless you got really lucky.
Pick 5 people at random you get one who doesn't know what a mouse is, 2 or 3 who can turn on the computer and maybe one who can get to the cli. Out of 400 people you will find more natural artists compared to someone who could install this even if they had the equipment
Let me update your heuristics on this. Computer mice are practically obsolete. People use cell phones, not computers. No one needs a cli to run stable diffusion because a mobile web interface was released on day one. 6.6 billion people have a smartphone which is 83% of the world’s population (including the infants). This is about the same number of people who are literate.
4 out of 5 people globally would be able to submit a stable diffusion prompt and view a result. Most would have no idea what the hell was going on or even why it was interesting.
There are already web sites that will run the underlying models, requiring no installation.
The neat new applications that have taken over this site for the last couple days sometimes require CLI steps to install because they are in active development and it can be easier to experiment with something local. I'm sure they'll either be moved online or wrapped in nice installers over the next couple weeks.
I am talentless and untrained. Now with a combo of prompts and img2img, I can create awesome results on any topic and in any style that I have the rights to use. That’s a 0 to 1 moment. It didn’t get faster for me, it went from impossible to possible.
"Any style" seems like an enormous stretch. There definitely seems to be some styles that AI favors, ones which I've seen described as "clutter the frame so you don't notice the flaws". It struggles with simpler styles. I have yet to see a flat black & white image generated by an AI that looked even passable.
Have you tried DALL-E or Stable Diffusion yet? I bet you could generate a black and white image that met your standards for being impressive, if you spent a few minutes on it.
Nah, that's not it. I mean flat #000000/#ffffff. Google "stencil image" and you'll see what I mean.
AI really doesn't handle styles with restrictions like that well. I tried the stable diffusion website with variations on "black and white silhouette stencil image of a cat". It kept wanting to give the cat colored eyes, or it used shading, or the cat didn't have a coherent anatomy, along with the typical AI art "duds" that aren't really anything at all.
To be fair, I did get a couple of passable results when I replaced 'cat' with 'dog'. They were simple, but didn't have any obvious errors.
To be fair in the other direction, replacing 'cat' with 'abacus' gave me an (admittedly pretty) grid of numbers and some chainmail, and 'helicopter' suggested a novel design where two helicopter bodies would be stacked vertically, connected by a vertical shaft through the rotor, and which turned into a palm tree trunk above the top unit. Once you get out of the sample data, it starts to fall apart.
I feel like other people here are willing to forgive more errors than I am. They see an incoherent splotch in an image and assume more development can iron out all the problems, and I see a unavoidable artifact of the fact that these systems don't have a real understanding of what they're making.
Is this not an acceptable result to you? Did this on my first try and to my eyes it’s the same thing as I see when I googled “stencil image.” I’m thinking you have just not tried these prompts enough.
I gave this another shot to see if I could get a more complex stencil. This was my very first try again, so truly not cherry picked. Prompt was: “Stencil image of a tiger face. Clip art. Vector art.” This looks like an infinite stencil making machine to me
That first one just sucks. I don't have anything else to say about it.
The second one is representative of the upper end of what I was getting. It's almost passable, but doesn't hold past about five seconds. The left and right half don't look like they belong to the same animal. The blank space in the middle of the face is huge and detracts from any sense of structure, and the whole mouth area is just odd.
It's seriously impressive for AI, but it's not end-of-artists type stuff. I can google "tiger black and white stencil" and get a bunch of tiger faces, and every one of them is noticeably better. People imagine there are plenty of art jobs where discrepancies like this don't matter, but there really aren't.
Yes, that tiger is shit. Here are 35 tiger stencils I generated over a few minutes with Stable Diffusion on my PC with a humble RTX 2070s gpu: https://i.imgur.com/y9oCZIz.jpg
I could run pretty much any of these through adobe illustrators auto trace and end up with an amazing vector image.
I could also leave it generating these for an hour and I'd have over 1000 results to choose from.
I do feel like you just moved the goalposts on me from AI can’t produce this style at all, to can’t produce this style on a level with an experienced human artist. I don’t think anyone is claiming it beats a good human artist. That does not make it useless for stencils.
Again those were my first try and I know nothing about stencil beyond what 2 seconds on Google images could give me. Certainly better than I could produce if you gave me Adobe Illustrator and a weekend. And the image is mine to use as I please, unlike what I could rip off Google Images.
Also, I thought the cat was cute, but there’s really no accounting for taste. Here’s a silly and swirly cat that might be more your thing? This was a cherry picked one of 10 since you have high standards;)
I considered qualifying “any” but decided it required no qualification. I don’t know how many examples are needed in the training data for a given artist to be able to reproduce that artist, but given how many obscure artists I have seen Dall-e and Stable Diffusion able to recognize, it must not require that many examples. And it’s still possible to fine tune a model with additional training if a new artist comes along or you want a bit more capability with a rare style.
Technically true, but I think this is VASTLY understating what has become possible with your average PC over the past 30 years.
Today I can get quick, effortless renders from Blender with a zillion available assets on the internet on my laptop. I can drop that directly into something like Clip Studio and paint right over it.
In the 80s you needed an extraordinarily expensive workstation like the Quantel Paintbox to even do primitive Photoshop type stuff. If you wanted a 3D render you needed a whole farm of servers.
That seems like an overreach. Thirty years ago, object recognition basically didn’t work, except in extremely simple cases. Something like semantic segmentation would have been way out of reach. Computers couldn’t play Go effectively against even modestly skilled human amateurs.
I meant the very technical sense where you could take an object recognition algorithm and compile it to run on a 80386 and it would run fine although slow to the point of not being practical. Computers brought us more speed (and memory) to enable new classes of uses, but there’s not a single intrinsic operation a modern computer does which an old one can’t replicate.
That's very thoughtful. Maybe I'll get downvoted but I'll say this:
One could just store this on the long-storage tech or in a temp-controlled seed farm in someplace cold with a computer, and english instructions on how to use this. Maybe one day a human (or even another being for that matter!) in the late late future will discover this and wonder what the hell this was about and how it really worked.
It’s a trope in sci-fi - the protagonist encounters a mysterious technological object from an ancient civilization, a trove containing ‘all the knowledge of an extinct race’.
Usually it gets downloaded into someone’s mind, triggering some kind of cascade of baffling imagery.
Feels kind of odd that this model data actually… works sort of like that?
But only so much can be encoded through history. I'd love to see a sci-fi movie combined with the butterfly effect. A somewhat advanced civilization in the past and another (maybe present) civilization where people try to find out stuff about the other one, maybe they're successful at the beginning at depicting how they were but they start to think they know everything and the whole perspective of the civilization changes.
And the size will be less and less until a certain information compression limit:
Emad tweeted that their optimized model is already just 2.1 Gb and he hopes to make it less, around 100 Mb:
This is the first time I see this accessibility issue, pressing the "Down" key (▼) doesn't work to navigate the page (scroll down) and instead goes all the way to the top.
Yeah there's something really screwy going on here - I ended up having to disable JavaScript on the page in order to copy and paste from it without it leaping me up to the top when I tried.
It seems to be because of this, any other key closes the modal and resets the hash, and resetting the hash scrolls you up:
function closeModal() {
let modal = document.getElementById("myModal");
modal.style.display = "none";
window.location.hash = '';
}
// Left and right arrow keys scrub through the gallery.
// Everything else closes the modal.
document.onkeydown = function (e) {
if (e.keyCode === 37) {
// Left arrow
goBackPrevImage();
} else if (e.keyCode === 39) {
// Right arrow
advanceNextImage();
} else {
closeModal();
}
};
I think this might be the first time I've genuinely seen something and though this quote applied: “Any sufficiently advanced technology is indistinguishable from magic”.
Don't get me wrong. I'm sure there is a skill and what I'm seeing in demos is the happy path where it all happens to work well. But damn it's impressive.
Looks like visual artists will be rewarded in the future for creating new styles and templates that can feed prompts. Instead of developing a single style and producing from it over a lifetime, developing multiple types and spreading them like young seaturtles.
Companies could be hired to develop a particular keyword over a period of weeks or months to allow for more specific prompts.
I find myself increasingly frustrated by the low resolution of the images coming out of these systems. It all feels like a huge tease, blocking my brain from deciding whether I should be impressed or not. And at other times I find myself not really interested in looking closer at all and just keep on scrolling.
Not really relevant bc it's kinda nitpicky but amazingly it's not even 4gb, it's more like 2gb if you use the float16 version (which has no quality degredations). Quite amazing that so many images fit in that small a package.
hmm -- I tried converting to float16 just using a naive model.half() call and saw some quality degradation in my images compared to just autocasting parts of the model to float16 while leaving others at float32. Curious if anyone else has had the same experience.
Might be that there's some degredation but I think it's pretty close. Anyway I'm using their 'official' fp16 version which they might be doing some extra magic on idk. I.e. via
My best guess is that I just did something wrong lol. Autocast seems to convert most of the model to fp16 anyways and that works great, so I'll just keep using that!
Mind blown. I don't often feel like I'm looking straight into the future, but the way Andy describes this process 100% evokes this feeling now.
In some domains this will almost certainly transform the way people create art, and it will feel like it's happening overnight.
Digital artists will surely adapt and use these technologies to their advantage, but transitioning will take time.
Personally, this also feels like merely a first glimpse into a world where humans use AI-based "cultural technologies" (to borrow Alison Gopnik's term) to "fill in the blanks" on ideas across many domains.
What we are seeing now is likely just the first pass at what this technology can do.
I feel like a lot of the discussion is around using these things to directly create new artwork, but these technologies (and GPT-3) feel like a method of digital divination to me. It's basically a more sophisticated form of casting an I Ching or reading tea leaves for artistic inspiration. I personally think that divination is underrated in modern society, so these developments are an interesting trend to me.
I think that’s an interesting connection and way to look at this. I first encountered that sort of artistic inspiration on a Twitter account that randomly generated sheet music snippets. Sure the generation could be garbage, but that gives the artist a starting point to say “this is bad because X, it should be more like Y”. And of course with these AI models, the generation is more likely to be good. Having a divined/generated starting point for creative work can definitely be an invaluable tool.
I remember when I tried to get my classmates to sign up for the AI ethics course at my grad program. Most declined saying it would be boring, not markettable, or wtv. At the end, a small slice of my cohort took the class. Kinda scary if it is the same at similar programs. I feel like a lot more people need to be thinking about the ethics of this. This particular blog post is quite harmless tbh, but things quickly get out of hand with explosive growth.
Oh I’ve been doing this since they published. I warned a friend (who is a concept artist) that his days are numbered.
Great article about one of the little known features of stable-diffusion. The img2img.py awesomeness that turns your 4 year olds finger paints into Picasso or Monet. It’s just mind boggling!
The article is from 2015, so yes, you are right, it is easy to say that we have not had insane changes since 2015. And certainly there have not been insane changes between yesterday and today. But if things start changing just 1% more each day it becomes a big deal within a few years.
I suppose that’s how you measure things? 250 years ago we were mostly farmers and our society had barely changed in thousands of years. Now look at us. 25 years ago most people lived in a pre-digital world. Now there are 6 billion smartphone users. 2.5 months ago AI art was a weird curiosity and now John Oliver is doing skits on it.
As far as She-Hulk goes yes. This isn't going to replace Netflix shows, it is going to create entire new industries, just like how when YouTube came out no one thought that some 12 year old kid named Jimmy was going to build a 9 figure business on top of it.
The article starts off with me expecting the twist to be that the image was generated with a single text prompt. That would have been neat, and in line with the other recent sensational coverage of how these new models are BLOWING PEOPLE’S MINDS. But it’s actually a walkthrough where the author goes through a much more tedious process that I could ever imagine going through to get the level of quality of that final result.
Huh. I had almost the exact opposite opinion. As someone who likes to make art, this example leads me to believe that this kind of process can allow me to quickly create images that are far closer to what I have in my head than most of the DallE style examples I’ve seen recently.
My reaction too. I've only superficially played with some of the AI image generators, and while it's easy to get something which looks interesting or even good, getting something which matches a specific idea precisely seemed hard. This walkthrough shows a clear technique which seems endlessly adaptable to using it as a tool to get to a specific goal.
This article demonstrates control via iterative techniques, which is more flexible than trying to encode everything in a sentence. The input image acts as state carrying forward much more information than a sentence from one iteration to the next.
Like my sister comment, I had the exact opposite reaction. Dall-E is very cool, no doubt, but the idea that I could actually work alongside the AI to produce something that is in my head was much more eye-opening.
I'm very skilled at illustration and fine painting. Compared to what it takes to "manually" make such an image, his process is about as trivial as a single text prompt.
Can someone give this completely ignorant old fellow a description of what the author is doing in this article?
What is the initial drawing being done with?
What is this stable diffusion process? (Amazing is one answer.)
A very brief description of the software used.
Color me astounded. I actually liked (was delighted by) the results in step-6. When I realized it was done programmatically somehow, I had to pick myself up and get back in my chair.
The author is using a text-to-image neural network to generate an image, in this case the author is also conditioning the model on a sketch to guide the generation process.
Okay. Now I'm starting to wonder if folks who worry about AI taking over are onto something. Well, not really... yet. But color me absolutely flabbergasted.
This has been flabbergasting a lot of us. I have been playing with these text to image models for about a month and I still cannot believe it.
Try for yourself a bit. Here’s my current favorite interface to Stable Diffusion. Give it a little sketch, then go wild with the prompt description. Try different style descriptions like oil painting or comic book.
Thanks! I'm gonna try it out. First, though, I think I'm going to add an exorcist to my contacts list. This is honestly the closest thing to magic I've ever encountered.
Like telling an artist "hey, could you add a tree", "now recreate it, but make it appear to be on the moon", only that he's talking to an AI (artificial neural network).
It strikes me that 4.2 gigabytes is comparable to the size of the human genome, which takes between 725 megabytes to 3 gigabytes to store uncompressed depending on if you pack 4 base pairs in a byte (https://stackoverflow.com/q/8954571).
So this "Stable Diffusion model", it was training on a bunch of copyrighted data, and anything that comes of it must hope to launder the copyright sufficiently to not constitute a derivative work, somehow?
> these models were trained on image-text pairs from a broad internet scrape
… yep.
I've the same issue with this as with Github Copilot.
I will admit, it is technically impressive, and something I would love to use, as someone who cannot draw worth a darn. And it is that I cannot draw that I do not feel morally comfortable with this: I am using a — complicated, admittedly — tool to just derive art from the unwilling talents of others. (Admittedly, my skill in prompting & editing might matter, but that's true of "normal" derivative works, too!)
All art is derivative and there's no such thing as originality. Every human artist draws inspiration from their visual and emotional experiences, copyrighted or otherwise, how is this different? If I watch Star Wars and then make a space opera film that's aesthetically similar to Star Wars, that's not copyright laundering, it's inspiration! Same principle applies here.
Because the AI doesn't have "experience", it has training data that it's deriving the work from.
People have shown fairly convincing examples of this in the more general sense: e.g., they've had well-known stock image (e.g., iStockPhoto) watermarks get produced in the output from the AI models (when not prompted). An artist with "experience" would not reproduce a watermark. Or in this article[1], where an AI was requests to mimic another artists style, and the output was (attempting to) reproduce the artist's signature.
(IANAL.) If you make a film that directly incorporates aspects from Star Wars (what I believe to be the more accurate version of what these models do), then yes, I would expect that you will be handed a C&D. "Glowing space swords" aren't copyrighted, but if you include something indistinguishable from a lightsaber & call it a lightsaber? I bet Disney would have something to say about that.
I don't personally see much difference between how I trained myself to be a portrait artist vs. how diffusion models do. In order to learn to draw stylized portraits, I looped over:
1. Find a photo of a person as reference
2. Create portrait
3. See how well the portrait compares to the reference and the stylized art I was drawing inspiration from.
The work I was doing was original in colloquial sense, but also I see zero reason why what the AI's process is fundamentally inferior to mine.
I am pretty sure that children learning to draw will in fact include some of those “copyright markers,” when not explicitly admonished by adults. What humans do is not some magical “experience,” they just have worse memories and better self-censorship.
This Kotaku article is really trying to spread misinformation about this kind of model. The image shown in the article was not trying to imitate anyone, as the author of the image stated https://twitter.com/illustrata_ai/status/1558559036575911936 (the artist's name was not in the prompt), it is only RJ Palmer who for no reason thought this was the case, the signature also does not even come close to the original as the model is not really trying to copy anything, the signature is like the rest of the image completely made up. Also, in the article you linked it states that there are programs to explicitly remove the signature, this is also not true. Articles like the one you posted are usually full of nonsense, written by people who don't really understand this kind of technology and I wouldn't use them as a source of any kind. RJ Palmer's reaction to the image in the article: "This literally tried to recreate the signature of Michael Kutsche in the corner. This is extremely fucked up", these people are good at creating controversy, even when it is based on facts that are not true.
It's not entirely settled law, but it seems the US Supreme Court would probably disagree with you. These issues were near the center of the Authors Guild vs Google case that ran from 2005 to 2015. There's a good relevant summary of it here https://towardsdatascience.com/the-most-important-supreme-co...
But broadly the courts have upheld the rights of companies to use copyrighted works as inputs to commercial algorithmic derivative works like neural networks.
Now you might argue this doesn't apply here. A key aspect of the decision rested on the fact that the original copyright holders (book authors & publishers) were not directly harmed by Google's indexing of them, since it probably drove more sales of those books. In this case it's not so clear. Is somebody using a diffusion model doing so instead of buying a piece of commercial art? If they're generating a new piece of art, I'd say probably not. But if they're generating something specifically similar to an existing specific piece of art, perhaps, but if it's deliberately different, it's still a tough argument. If the ML model is being used to deliberately replicate a specific artist's style, then I think you can make that case pretty strongly. But if you're building something that's an aggregate of a bunch of styles (almost always the case unless you specifically prompt it otherwise) then I don't think the courts would find that any damage has been done, and thus nobody taking this to court would have standing.
I think it's likely we will see this end up in the courts somehow. But being able to prove actual harm is critical to the US court system. And it's difficult to see how the courts would rule against the kinds of broad general use that is most common for this kind of generative art.
Thank you — that's at least an argument I've not yet heard and that isn't the trope of "the AI is thinking".
> Now you might argue this doesn't apply here.
Indeed, I would. In particular,
> and the revelations do not provide a significant market substitute for the protected aspects of the originals
I'm not sure if that holds here. In Google's case, the product (a search engine) was completely different from the input (a book). Here … we're replacing art with art, or code with code, admittedly different art. And … uh, maybe? different code. I'm also less certain due to the extreme views on what constitutes de minimis copying the courts have taken.
> I think it's likely we will see this end up in the courts somehow.
I agree.
> But being able to prove actual harm is critical to the US court system. And it's difficult to see how the courts would rule against the kinds of broad general use that is most common for this kind of generative art.
This is a good argument, too, though I'd like to see it tried in court, I think.
> If the ML model is being used to deliberately replicate a specific artist's style, then I think you can make that case pretty strongly.
I'll link the same example I linked in a comment, [1]. Seek to "On the left is a piece by award-winning Hollywood artist Michael Kutsche, while on the right is a piece of AI art that’s claimed to have copied his iconic style, including a blurred, incomplete signature"
Pretty exiting stuff. It feels like there's a new cool model being trained every week, the probability that some of the upcoming ones won't have a huge impact on the world in the next few decades seems close to 0% to me.
As always, I will suggest adding loading="lazy" attributes to all images on page (or even website), as loading currently is struggling. Hoping HN won't hug your server to death, as topic is very popular.
Vonnegut wrote some about the effect of recording technology & mass media on the value of individual artistic talent—in short, that it all but obliterated both the monetary and, perhaps more importantly, social value of all but (literally) world-class skill. Fewer sing-alongs around the piano at home. No more small-time performance troupes (say, vaudevillians) making enough money to get by. That uncle who's an amazing story-teller just can't compete with radio programs, so the social value of that skill plummets, and it's like that for every medium that puts the ordinary in competition, if you will, with not just the world's most talented people, but, as fields advance, with entire teams dedicated to making those already-top-notch folks seem better-than-human.
The benefits of this are clear, but the problem is that artistic expression and being able to receive small-scale rewards and genuine encouragement—at least in one's family or social circle—for even minor talent seem to be very healthy and fulfilling things for people to do. Taking that away came at an ongoing cost that none of the beneficiaries of that change had to pay. A kind of social negative externality.
Relatedly, consider the sections of Graeber's Bullshit Jobs where he treats of the sort of work people tend to find fulfilling or are otherwise proud to do, or are very interested in doing (sometimes to the point that supply of eager workers badly exceeds demand and pay is through the floor, as in e.g. most roles related to writing or publishing). What kind of work is it? Mainly plainly pro-social work (to take one of Graeber's examples, the disaster-relief side of what the US military does, which is by no coincidence often heavily featured in their recruiting advertising; or teaching, for another one) or: creative, artistic work.
Graeber notes an apparent trend whereby these jobs tend to pay pretty poorly either due to the aforementioned over-supply of interested workers, or because there's some societal expectation that you ought to just be glad to have a job that's obviously-good and accept the sacrifice of poor pay, and that you must be bad at it or otherwise unsuited if you want to make actual money doing it (teaching's a major case of the latter—I've seen that "if you care about being paid well you must be a bad teacher" POV, and the related "if we raised teacher pay it'd result in worse teachers", advanced on this very site, more than once—it's super-common).
People really, really want plainly-good and/or creative jobs, but those don't pay worth a damn unless you're at the tip-top, either of talent level, or of some organization. This seems like another blow to the creative category of desirable jobs, at least.
My point is: I wonder and worry about the effect this latest wave of AI art (in a broad sense—music and writing, too) generation is going to have on already-endangered basic human needs to feel useful and wanted, and to act creatively and be appreciated for it by those they're close to. There's already a gulf between the among-the-best-in-the-world art we actually enjoy and, should our friends present their creations, how we "enjoy" those these days, with the latter being much closer to how a parent enjoys their child's art, and everyone involved knows it. Used to be, hobby-level artistic talent and effort was useful and valuable to others in one's life. Now, that stuff's just for yourself, and others indulge you, at best.
Why, with this tech, you can't even get by doing very-custom art, such that the customization, rather than the already-devalued-to-almost-nothing skill itself, is what delivers the value. Now the customization is practically free, too, and most anyone can do it.
Getting real last-nail-in-the-coffin vibes from all this. I'm sure it'll enable some cool things, but I can't help but think we're exchanging some novelty and a certain kind of improved productivity, for the loss of the last shreds of a fundamental part of our humanity. I wonder if we'd do this (among other things) if we could charge the various players a fair value for harm to social and psychological well-being that happens as a side-effect of their "disruption"—alas, that pool's a free-for-all to piss in all one likes, in the name of profit (see also: advertising)
I found your post very compelling, but ultimately I disagree strongly. Vonnegut's assessment rings true, however the only reason I even heard of him - now one of my favorite authors - is because of the internet. I'm sure he's a much better storyteller than my uncle, but guess what, I can tell stories and thoughts from Vonnegut to all my friends, and that gives me joy. We can comment on movies and games that took enormous teams millions of dollars to make. I can go and be part of that creation under many different roles, even as a reviewer on Youtube. I think many more people have hobbies and produce art now than ever before, and this was enabled mostly by technology.
I feel that this apocalyptic perspective is just us older folks trying to grasp how we would do things we like under different conditions. Kids will find a way to make a career using these models, and maybe now some not-so-skilled but extremely smart person can finally show us their creations in a meaningful way.
And for me personally, I find extreme beauty in these technologies. A fundamental part of our humanity, artistic expression, and we can manipulate it like this? Another unique human trait, language, how enormously fortunate I am to see another part of that puzzle being worked out and reduced to math.
I don't think that effect will persist, and before long it'll be about as impressive as using a "meme generator" site. Already heading that way in some settings, as far as I can tell.
I fed the mid-post prompt into MidJourney (text-only) and got https://imgur.com/a/Xg0Byt1.
Guiding the input with a ms-paint worthy picture really adds to the "I made this!" feeling of img2img.
4.2 gigabytes???? That's insane, especially considering how poorly optimized the models are (with better algorithms and ways to understand and categorize the data, this could be easily reduced 10x or even 100x).
This is very cool. I've been blown away from my dabbling with these text-to-image models, but I love the steps here to help it generate more what you're envisioning.
I'd love to follow this process to generate several that I have in mind to put up on my walls, but I keep running into the resolution limitation. You need a pretty high resolution to get a crisp image at a poster size. Is there a trick or a setting to get the models to output images suitable for posters?
I usually upscale the images with something like real-esrgan-vulkan. I've been using that to build up a bank of images. I'm considering getting posters made of a few of them, the most notable one is one of Richard Stallman and Bill Gates playing chess that I'm calling "The Good Future We Never Got".
I'm late to the party, but what stops training this SD model on audio spectrograms? Then you'd tell it "some mozart-style violin for 5 seconds, add drums in background." The spectrogram is then translated to sound and suddenly you're a very decent music writer.
With img2txt you could give it an audio file, call it "S" and tell "music in S style, but with flute".
mp3 density: 30sec per 1MB (some instrumental music with background). jpg density: 12M pixels per MB (trees and some landscaping). I'd argue music has a lot more information, if we can compare seconds with pixels. Imho, OpenAI didnt do a great job: a small dataset and a limited model.
I know that technology has displaced workers in the past, but it seems to be doing so at a faster and faster pace. It makes me rethink my involvement in the creation of software and its ethical implications.
I think it revolves around just one fundamental question that's become obvious since the start of the industrial revolution: why have workers if you can get work done without them? You can answer this question rather quickly; very often, the benefits of being able to produce something en masse outweigh the initial drop in quality due to adopting an automation process that is not yet mature but evolves over time, and that's been proven many times in the past two centuries.
It's just the speed and the scale that have been following logistic distribution. We're still before the midway point as a global society, but we can certainly see the big speedup as more and more work becomes automated.
If people's work can be automated away as a whole and people somehow become poorer rather than richer, then the benefits of automation are sucked away from them. And here comes the flaws you mention - or the flaw, I think, singular. They happen exactly in this one point.
Sometimes it’s just an initial drop in quality, other times it’s permanent. My favourite example is book binding: hot-melt glue has completely supplanted cold glue in mass production, despite being drastically inferior, because it’s (just barely) good enough, and it’s so much faster/easier/cheaper for production.
Many aspects of computer setting of works for printing also show a significant regression in potential quality (prose, musical engraving, &c.), though the right software and proper human tweaking can balance that out—but people seldom actually do as much tweaking as would be done automatically by the experts of old. And as for text presentation on screen… well, that’s just lousy compared to what an expert setter would do. But it does adapt to different media with no or minimal effort required, doing in one second what used to take days, and that’s a rather big deal.
Something I sometimes think about, is how really can capitalists benefit from that?
If you can automate your company away, and fire all employees, and if every company is doing that, then workers have no jobs or ways to make money. So the customers for most of the companies start disappearing. It's the ultimate accumulation of capital, to a point that the wheels of economy grind to a halt.
That’s exactly what happened to the Magratheans in “Hitchhiker’s Guide to the Galaxy”:
“Unfortunately, the venture was so successful that Magrathea soon became the richest planet of all time and the rest of the Galaxy was reduced to abject poverty. The Magratheans went into hibernation, awaiting an economic recovery that could afford their services once more.“
This is exactly Marx' thesis. He took capitalists on their word with respect to believing that competition will drive margins towards zero over time, and that as a consequence cutting labour is inherently a pre-requisite for surviving competition, and that as a result that capitalism will eventually, once it runs out of new markets to expand into, start experiencing crises of simultaneous overproduction and under-employment.
If he was right, it can be "patched" and made to work with a sufficient level of redistribution to avoid such crises, or left to fail catastrophically without.
Rapid change is a real problem and not a flaw of society. People cannot change careers quickly without loss of human capital. There is no flaw in society that you can fix to prevent this.
It doesn't mean that we shouldn't make things more efficient, but economies exist to serve people, not the other way around. That sometimes means bailing some people out, sometimes it means phasing something in, and sometimes it means changing the definition of value.
For example, painting used to be how you'd get a portrait. But photos did that much better. Painting shifted: it became much more abstract, and because photos were cheap and easy, they didn't have the same cachet as a portrait. Not many people hang big photo portraits on their walls they way they might have done with paintings.
I suspect authenticity, or some other thing which the machine cannot replicate, may be valued more in future.
I agree. But I also think that meaningful work that provides value to your community and to yourself is something that provides some of the greatest satisfaction available to man. If you remove that ability by either automating someone's job or reducing it to a kitschy "artisan" label which implies that their work is now inefficient and flawed compared to the efficient automated way, you are taking away something important about the human experience that cannot be replaced by leisure activities.
What about women? Yeah, yeah, I know it's uncouth to worry about gendered language. On to the real topic ...
> reducing [a job] to a kitschy "artisan" label
Automation doesn't (only) reduce jobs, it (also) allows people to get meta. They are now free to think about how the job gets done, rather than constrained by time to only do the job the way they did it yesterday. Or, they can do other jobs that they prefer.
I used to live in an apartment without a washing machine, and where the closest laundromat was a 20-minute walk away. I got used to a bucket-and-plunger method for washing my clothes. It was enjoyable in that "I am the salt of the earth" way. When I found a used sink-attachable tiny washing machine on Craigslist, I had a smile on my face the rest of the day.
Well, it highlights that there's some disagreement between people who see maximizing production as inevitably positive, and other people who are unconvinced.
It sounds like you might be so squarely in one of those camps, that you see it as a fundamental flaw that people may disagree with you. To me, that sounds even worse!
Having worked with a lot of artists, they are seeing stable diffusion and other image generators as a way to generate lots of ideas. A base they would then ,usually starting from scratch, build their end product off of.
So even though to a casual glance, these images look amazing, if you look closely, you see all the flaws that come with them being generated. Weird artifacts, a lack of symmetry that humans usually add to their creations.
These flaws would not exist (to such a degree) when a professional artist paints the same scene.
Yes - this will make artists hyper productive IMO; and will make a lot more people willing to make art! If you've ever had a little ditty and wondered what it would sound like as a violin concerto in D minor, now you can know! If you ever wanted your wife painted as a Rembrandt, there you have it. If your five year old comes up with an idea for a fun video game called Laser Chicken, now his friends can play it.
Imagine pairing this with bespoke automated clothing output - take a photo of an outfit, verbally describe the changes you want to it ("a little more debonair, dark lapels, 1920's styling"), click a button, preview it as it would look on you in several recent pictures you took, and a week later your tailored suit arrives. Now for sneakers. Hats. Bags. Watches.
The 20th century was about mass production to ensure everyone could have things: food, clothing, transport, entertainment. The 21st century may turn to expression: allowing each person to express themselves however they want in their goods and services. Or just following along to buy whatever your favorite tastemakers recommend. However involved you want to be!
The world is about to get a lot weirder and more interesting.
Making artists hyper productive will cheapen even further their output. If one artist can do the work that you are currently paying 8 to do, you only need to pay one artist that can wrangle these tools.
I would agree that this is like 20th century mass production. To be clear, I don't necessarily think that mass production is a good thing either. In fact, it has been probably the most detrimental thing to our environment that humans have ever done.
They cheapen the entry fees for learning, but the expectation will keep getting higher. It's like games where, even for indies, they won't accept N64 quality these days. Text to speech is quite good now, but people prefer real narrators. Everyone can create music with a cheap laptop and some midi keyboards, but only those really talented will make it.
Maybe, but it hasn’t worked that way for software. I think people might see the opportunity to inject bespoke art in a lot of places it wasn’t previously. College students who could only afford movie posters previously will have art commissioned, every building will have a mural, etc etc.
One of the many things that I loved about Lisbon was that art was hilariously pervasive. It felt like every vertical space was filled with beautiful and original art. The whole city was a gallery. It would be wonderful to see that in more places.
> These flaws would not exist (to such a degree) when a professional artist paints the same scene.
The flaws in AI generated art were 100x as obvious in systems like this only a few years ago. In 5 years I doubt anyone will be able to tell the difference between AI and human art.
When the photograph displaced most portrait painters, we invented a new type of artist - the photographer. I hope we’ll see the same thing here - artists who specialise in using stable diffusion (and friends) to make new art in a new way. This blog post is like one of the world’s first photographers saying “hey look how the photo changes when I move the subject relative to a light source!”. I can’t wait to see what results we get with deep expertise (and better algorithms).
How long before we have filmmakers using AI to cast, direct and shoot their films?
Maybe, maybe not. It didn't happen with CGI, the uncanny valley hasn't been surmounted, car chases aside.
The mulchers, as Bruce Sterling calls them, have a fresh meat problem. They've consumed all the words, and all the pictures, and we already know that feeding them their own mulch gives worse results.
We're not at the scale limit for data but we know where it is. It's not clear that refinements to the mulching process will create mulch good enough to tell apart from creation. It might. But it might not.
> with CGI, the uncanny valley hasn't been surmounted
I'm not so sure about this. While some scenes are still obviously using CGI, I think a lot of CGI in movies now passes unnoticed, even entirely digital characters.
We certainly notice when those characters do things humans can't do, of course, and when budget or schedule or both result in things being pushed out too early, but how would we know when digital characters look natural on-screen? We wouldn't!
I wonder if anyone is gonna take advantage of this point in time where the average person isn't aware of these breakthrough AI models, and sets up themself an account as a professional-grade artist on Fiverr, offering to draw highly detailed landscapes or whatever where the client can provide reference material and ideas.
You generate the image in 30 minutes (maybe less if you get the process down to a science), then wait around for a few weeks to keep up the illusion that you're actually doing the drawing by hand, and send it off to your satisfied client. You could be charging hundreds of dollars for your "artistic services," and have dozens of clients going on simultaneously.
> Having worked with a lot of artists, they are seeing stable diffusion and other image generators as a way to generate lots of ideas.
But it also dillutes their own ideas. I know of painters painting AI generated stuff and that is likely a new genre but a lot of genuine artwork will lose interest on the market. It is what it is and don’t I love or hate it…
You don't just paint a better version of the generated images, you look at a bunch of generated images, and get a better idea of what you want to paint from that.
Say you know you want to do a portrait of a woman in armor, you can generate a dozen of those in around a minute on a 3090, look at the generated armors, the faces (usually all sorts of screwed up), and the composition. Its just a way to kickstart the creation process.
Yes, I get that but if everyone’s doing it it becomes a race to the bottom sort of thing… I hope I am wrong and things turn out in a completely different direction that I can’t see now
You say 'look at the generated armors' etc. but this does not mention what you use to look with: the artist's eye.
People so quickly assume that access to these tools will make everyone an artist, but the raw output is so lacking in a voice and intentionality. If you supply the voice and intentionality through your iterative process and a hybrid visual/text language playing the generator like a violin… you're playing the generator like a violin.
Your artistic skills have been translated to a wholly new set of vocabularies, and it's your eye that is tested most. Can you see/imagine better than the next guy?
The issue is an ever smaller percentage of the population can be successful at ever more difficult opportunities that result from ever better automation.
Manual labor still exists, but it’s a vastly smaller percentage of overall jobs. Productivity and automation seem like the same thing on the surface. However, the argument for a long tail of creators in an ever more wealth society breaks down when AI can start writing niche romance novels not just barely coherent news articles etc.
In theory we might have ever more new types of jobs, but automation isn’t just getting better it’s also getting faster.
Meanwhile I've gotten into a lot of very heated arguments with artists who fear this will actively destroy their livelihood. I lean towards seeing these developments as a massive positive for a society as a whole, but I find it hard to ignore that. Most of the flaws you see today will be reduced over the next few months, and people will get better of finding ways of working around them. It will cause substantial upheaval for a lot of people, including job losses especially on the low end. Maybe many, or even all, of those job losses will be compensated with additional jobs elsewhere, but we don't know.
I would argue that the technology will soon be at a point where these flaws are much less visible and will then further cheapen the work done by a human.
All of these AI image-algorithm furnaces are ultimately powered by the raw material shoveled in by humans. If we stop providing the algorithms with unique material the capabilities will decrease.
Further, when you automate one portion of the work you still need the human brain to strategize and orchestrate at a higher level. Job opportunities have only increased as a result of this, not decreased.
This happened before, and this will happen again. There are no more typists, nor data entry personnel. Hardly any human travel agents. No bank tellers. Human translators are next in line.
However, much more positions in machine learning and data science.
Software _is_ eating the world. The only viable survival strategy is learning to code. I don't believe that not everyone can learn to code. I teach people to code, and I have yet to meet one that couldn't learn, assuming some general intelligence (about the same that you need to learn a foreign language).
Don't think of "coding" as writing javascript to make websites. Think of "coding" as "talking to computers".
As software eats the world, the value of people who can talk to computers will increase. Even if it takes fewer programmers to make a website, there will be more jobs for programmers to automate concept art production pipelines. You may not be writing javascript or python in fifteen years (I bet you will) but there will be code to tell the automation services what to do.
The interesting question is the general population becoming more tech savvy? Will this change in work encourage more students to learn how to code (whatever that looks like in twenty years)? Or will the demand for coders rise without a corresponding increase in supply?
Well, in software engineering, like in many modern professions, you don’t get trained once and then work for life. Learning is continuous, and this is not optional anymore.
... will there ever be a (better) Copilot for COBOL or PL1? This may be an escape hatch, just throw yourself at legacy mainframes, this may work for an additional 20-40 years. :D
I know that technology has displaced workers in the past, but it seems to be doing so at a faster and faster pace.
Technology has always enabled the creation of jobs faster than it displaces workers though. Sure, horse manure shovelers lamented the automobile, but people who became mechanics and petrol pump attendants didn't. The same will likely be true for artists - this will suck for them, but the proliferation of easily generated art assets will likely enable the creation of entirely new jobs we haven't considered yet.
I doubt computers are gonna be good at physically using pencils or paintbrushes. Art will have value long after every commenter here is rotten in the ground.
if I want to replicate what the author did, what drawing tool / software should I get to do the initial drawing? I use ubuntu so preferably something compatible with that.
How much of it is it `painting` though? It looks like your typical startup, combine two ideas into one. (Image) Search engine + Photoshop. You enter some prompts, it searches it's database for matches, and meshes all found results into a single visually pleasing image. Can it draw something that is not a mesh or a variation of it's database? Can you?
I don't that "searches it's database for matches" description is a great metaphor for how this really works.
As Andy points out in this article, the model itself is a 4.2GB file. That's way, way too small to work as a "database" of examples it can stitch together.
I think of it instead as an enormous mass of loosley assembled impressions of concepts - everything from a low-level primitive like a triangle to a Star Destroyer. You can use text prompts to combine those primitives - so you could get it to generate something just from dots and lines and shapes, or you could mix in extremely detailed concepts like the Seattle skyline - or anything in between.
At bare minimum I can think of the 3D space the character is in, and rotate and shift it to any location and position I can dream of, even if I have never seen it before. When we will have 3D aware AI, that would be interesting.
I think pop media will continue to decline while the long tail will become better and better (this has already happened to music, film, tv, etc). If you use google and billboard charts as a discovery engine you’re gonna have a bad time, but people who seek out quality will have more options than ever.
Basically, drawing sketches, editing (rudimentaly) in image editing software, img2img, edit, img2img, and a few more rounds, and you can get to something really, really cool.
This Photoshop plugin demo blew my mind yesterday: https://www.reddit.com/r/StableDiffusion/comments/wyduk1/sho...