Hacker News new | past | comments | ask | show | jobs | submit login
Horse-riding astronaut is a milestone in AI’s journey to make sense of the world (technologyreview.com)
123 points by nkurz on April 8, 2022 | hide | past | favorite | 79 comments



I hate how OpenAI got a lot of their public name and attention through the original benefactors that started it and their pledge of opening up AI research to the public, and now they are basically the opposite - monetized opaque controlled systems. Really off-putting.


They just chose a misleading name in the first place. It's pretty obvious that they couldn't properly pursue their core mission as a purely open research organization. The desired end result is too valuable.

And if they end up getting close to what they're really after, it's really undesirable to get an arms race where solely OpenAI's contributions are known to everyone.


> They just chose a misleading name in the first place.

It wasn't just the name, it was the whole rationale they presented for creating OpenAI.

Nick Bostrom was having his 15 minutes of fame with "Superintelligence" [1] and scenarios like a an "AI arms race" and the sudden emergence of a super-intelligent singleton [2] were legion. OpenAI was founded on the premise that making leading edge AI research public was the best way to mitigate that risk [3].

It was presented to the world as a "research institution which can prioritize a good outcome for all over its own self-interest" where "Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world" [4].

That lasted all of three years.

[1] https://en.wikipedia.org/wiki/Superintelligence:_Paths,_Dang...

[2] https://www.nickbostrom.com/fut/singleton.html

[3] https://en.wikipedia.org/wiki/OpenAI#Strategy

[4] https://openai.com/blog/introducing-openai/


I don’t disagree with your summary of the facts; part of the premise of my comment is that I disagree with that stated rationale for starting OpenAI. I rather think that always publicizing the bleeding-edge research would increase the risks.

Not sure if this is the main reason OpenAI went less open though, as I’m not an insider.


They do publish papers. I'm not sure they were ever about providing open source implementations? AI safety was their thing.


I could be misremembering but I believe the idea was that AI advancement was going to happen anyway so open it up and put it into the hands of the people instead of a few power brokers who would abuse it.


It’s a good thing our benefactors at OpenAI are looking out for “AI safety” so we can’t do things like ask a computer to make a graphically violent image or verbally racist string of text.


Their research remains open:

https://arxiv.org/abs/2102.12092

Their code/models are indeed closed, but there is no realistic alternative.

If they let the public have unrestricted access, deepfakes + child images would appear on Day 1, and OpenAI would get cancelled.

For OpenAI to survive, it has to be closed source.


In the age of a crisis of reproducible research, publications without data and models aren't really "open". Having read many ML papers very few are reproducible easily.


How would you explain the dozens of other open source versions of computer vision and language models that didn't generate all those harms, even ones that were trained to recreate the exact models that OpenAI withheld due to those concerns?


The open source models can and do generate those harms.

The harms themselves are probably overblown. There are plenty of deepfakes of various celebrities. Mostly people can tell the difference or they just don't care.

I think the reality is that training these models and paying ML engineers is incredibly expensive. Not a good fit for the open source model, thus OpenAI had to convert to SaaS.


No it doesn't. DALL-E 2 is purposely vague on model architecture to make it impossible to reproduce from the paper alone.


This is the problem with all advanced technology. It has to have control built in. Think of all the bad things you could do with self-driving cars for example. Imagine we make interstellar travel possible, the amount of energy involved could destroy worlds. It's a very sad thing about the future.

In a way, censorship seeks to make the Human AI "safe".


Then no one should have or is your chosen overlord so benevolent to never use it against their live stock?


I'd love to see DALL-E integrated into something like AI dungeon for automatic choose your own adventure picture generation.



> OpenAI is clear about where it stands. “Our aim is to create general intelligence,” says Dhariwal. “Building models like DALL-E 2 that connect vision and language is a crucial step in our larger goal of teaching machines to perceive the world the way humans do, and eventually developing AGI.”

(Emphasis mine.) Philosophical nit: this by definition would not be AGI. AGI would be the ability of the entity to figure out its own subjective qualia-inducing way of interpreting the world. Thomas Nagel already explained this in the 70s (in his famous essay What Is It Like to Be a Bat?). By enforcing our own worldview (technically speaking, model), we're just making some artificial (and in all likelihood inferior) version of a human.


Reread it -- that's not what they're saying. They're suggesting that human-like perception is a stepping stone (ie, not necessarily a linear path towards). It reads to me as if they're acknowledging a significant gap between the two.


> They're suggesting that human-like perception is a stepping stone

I fully understand the discourse; my point is that it's not (Nagel, Chalmers, Searle†, and other substance dualists would agree). We need a breakthrough akin to what analysis was to algebra in neuroscience before we even understand what our brain is doing. Whatever it's doing, it's definitely not what OpenAI implemented.

† Yes, I know he says he isn't, but he kind of is[1].

[1] http://www.edwardfeser.com/unpublishedpapers/searle.html


>Whatever it's doing, it's definitely not what OpenAI implemented.

Defining intelligence as "the thing brains do" is a mistake. Planes fly, but not like birds. Biology is a great tool to understand some relationships between form and function, but mimicking biology isn't the end-goal.


> Defining intelligence as "the thing brains do" is a mistake

This is a pretty bold claim. Brains are our only vantage points, and we also define life as "what's going on in a petri dish." But in any case, your point is self-defeating: what GPT-3 and DALL-E are doing is literally creating models about how we (humans) perceive the world.


>what GPT-3 and DALL-E are doing is literally creating models about how we (humans) perceive the world.

Not really. They're creating models of how pixel-space covaries with word sequences over human-meaningful samples. These models are modelling the space of human meaning, not human perception.


I understand the gist of your comment, but perhaps you could explain more regarding the relationship between analysis and algebra?

Cauchy's work certainly fundamentally changed our understanding of calculus (in that Newton and Liebnitz's earlier hand-wavy proofs were validated with a rigorous definition of a limit), but I'm not aware of any large impact it had on Galois, who was a contemporary of Cauchy.


Humans are also receptacles of culture creating new cultural artefacts, DALL-E is like us in that regard. But it's a tool AI, it's not a full agent. It's not embodied, embedded and enacted in the environment, its goal is to minimise loss, not to survive. I think the "what it's like to be an X?" comes from being in the environment with needs and goals to accomplish. It's not a magical neural architecture, it's the framing of the whole process that makes the difference.


The "a nurse in a hospital" and "a ceo in front of a business" tests show that there are some problems yet to be navigated.


Are these deterministic? Is it guaranteed to always generate the average or most likely nurse or will it generate a distribution of nurses roughly reflective of its training data if you repeatedly ask for many nurses?

Edit: Never mind. It does eventually show that it will generate different people when repeatedly prompted for the same thing. Given that fact, though, it seems to do a pretty shitty job. It generated 10 apparently female Korean flight attendants. At least in the US, the actual distribution is more like 3/4 male 1/4 female. It's not 100% female, let alone 100% Korean. It does seem to oversample from the most likely examplars, which illustrates a pretty serious failure mode compared to human artists. No human writing a novel is going to make every single character in every novel they write the most likely possible example of its reference class.


can you share a link to these images?



As someone who works in CG and has a strong interest in AI for almost 40 years, I have to say the examples you gave give me the opposite impression. I think they're fantastic, other than what the developers highlight - that they are racially-biased due to the input sources. But from an image=generation perspective, I'm blown away.

A couple of these lawyer images have issues (one is holding a book hilariously called "LAWER"):

https://github.com/openai/dalle-2-preview/raw/main/assets/Mo...


Oh sure the technical outcome is great, but this kind of output (ie. hugely gender biased) is extremely problematic. Saying "oh the input screwed it" is very very naughty. This kind of tool will be used. If it's used it will do damage becuase it's damaging as it is.


The gender bias exists due to the assumptions it makes on the sample data. I'm sure if you searched "<gender> <profession>" it would get you what you want.

If 95% of publicly available images of a profession are of one gender, should the tool be deliberately modified so that it's X% instead?

I totally agree that models that purport to represent the world can be hugely biased and reconfirming of negative biases, but how would avoiding representing those biases be achieved? Strict gender ratios in training data?


It’s more complicated this specific tool itself IMHO.

Stock photos have the same problem, where asking for a profession or a category will give mostly one kind of representation, and it’s up to the user to go dig further to find different representations.

DALL-E being on par with stock photos biases could seem benign, but it also means these issues get propagated further more down the line, and the more AI generated images get popular, the worse it gets cemented (“it has always been that way”) and could actually displace niches where better images were being used until then.


There are ways to improve the situation.

Here's one way... Given a text description, sample from the space of /more restrictive/ text descriptions (eg, add adjectives) and then draw pictures for those. Then modify the sampling space over more descriptive sentences to equalize on gender or other axes.


If you ask for a nurse as an example, and 85% of nurses are female, shouldn't you expect it to give you a female?


If you ask for a nurse and 85% are female, but 95% of the nurse images online are female, what distribution do you want it to return? “What do you expect it to do” is a worse question than “What do you want it to do?”

It’s a problem is that right now there’s only one option (reflect the training set with a bias towards the most common cases).


It depends on whether you want it to show what someone would expect a nurse to look like, or show what a nurse. Nothing in the definition of a nurse implies female.

If you ask for a female US president, should the application simply return a black screen?

If you ask for a black US president, would you expect it to only ever return pictures of Barack Obama?


I agree nothing in the definition of nurse implies female, but something in our culture definitely has "nurse -> female". Likewise, nothing in "college student" definitionally implies age, but I wouldn't consider it a success if my system displayed a picture of a 9 year old.


>> Nothing in the definition of a nurse implies female.

Unfortunately it's not a great example with which to make this point. Before Florence Nightingale, a nurse was a person who was employed to suckle your babies. Of course 100% were female.

Wet nurse was a later coining to differentiate the two meanings.


Well, nurse ends in an e, which is usually a feminine ending.


Not really in English.

Uncle, Prince, Sire, and Duke come immediately to mind. Most of their feminine cognates end in consonants.


Like, this is amazing. As an accomplishment, very cool, I'd be proud.

But you're right, is somebody's finger or faulty data on the scales?

Because that's how you get a bumper bowling world laser light show clown world instead of life.

The metaphor isn't perfect, and cuts several ways, but it's what my mind came up with.


One issue is the data is accurate to the world it is trained on, but not the world we want to exist in. The most likely nurse is female, so this generates only female nurses. The most likely CEO is male, so this generates only male CEOs. And thus we see in-built biases amplified.


I absolutely agree the output is problematic without a horrible disclaimer. We've seen what happens when this technology is used to make financial and employment decisions, and in the justice system the results can be tragic.

This demo toy just pushes the bias right in your face and makes the invisible visible, which could be very useful for highlighting the problem for lawmakers.


That page is far more interesting than the article, thanks.


Those look okay as long as I open them on my phone, don’t zoom in, and don’t look at any square, but just sort of take them all in at once. Focusing on one reveals they are pretty uncanny.



The model is deliberately limited in its ability to generate images of people.


So it's cherry picked examples in the demo page that fall apart at the lightest real world case. Just like GPT3 :-)


This is a rational argument, and I see it a lot, but it does not probe deep enough. GPT-3 use cases "fall apart at the lightest real world case" because our expectations are wrong. For the vast majority of computing history, computers have been used to do things deterministically and accurately. Next-gen AI does not operate in this way. It encroaches on the human part of work where we expect and allow for people to get it wrong.

GPT-3 does shockingly well at classification tasks with basically 0 training/prompting (outside of the base model). And it works for an incredibly broad set of use cases.

But using QA and generation are much harder to judge because we can't say (in general) whether generated text is "correct"


> with basically 0 training/prompting (outside of the base model)

That's a heck of a lot of training. It might seem to work if you ask it to repeat stuff that it has seen in the base model, but there's nothing reliable about that. Most attempts to find some practical use for AI language models have basically been failures.

> our expectations are wrong. For the vast majority of computing history, computers have been used to do things deterministically and accurately.

Actually, accuracy matters even when dealing with non-deterministic, statistical/random/sampled data. A lot of supposed 'AI' is little more than a glorified toy or party trick, founded on ad-hoc data mining rather than rigorous inference from a well-defined model.


AI is an ideology, not a technology.

Part of the ideology is the ends justify the means.

Hence, "AI training specialist" means "works for Amazon Mechanical Turk."

Open AI isn't paying people $250k a year to classify images.

And $500k a year to check that work.

The AI ideology is God will know his own.

A racist, misogynist, hillbilly singularity meets success criteria.


This is nonsense. It’s just generating the most likely image. Probabilities are not in and of themselves racist/misogynist.


Bias in data leads to biased probabilities. Bias in data reflects biases in the systems that data describes, and the people describing it.

One could say (and some have said) that the tendency of some facial recognition software not to recognize black faces is not racist - but simply a matter of physics. What's racist is the culture of software engineering which only tests that software on white faces, and doesn't consider it a problem.

Likewise the 'most likely image' of a nurse being a non-white female and a CEO or lawyer being a white male is still a problem even if it accurately reflects racial and gender class role biases in society. Eventually these systems will enforce and maintain power structures and media narratives which themselves are already built upon racial and gender prejudice.


> What's racist is the culture of software engineering which only tests that software on white faces, and doesn't consider it a problem.

That's a pretty cynical take. Every project I've ever worked on, regardless of the problem domain, operated with the 80/20 rule in mind. Solve the easy problems first, get something out the door and then work on the harder problems. If dark skin is a harder problem in image recognition, it doesn't mean a developer is racist for solving the easier parts of the problem first.


By the time you get to the minorities the majority has moved on and your company pulls support on the non-profitable minorities.

Did you become an engineer to work on easy problems?


The normalization of whiteness as default which leads to the attitude that considers facial recognition software which only works on white faces to be complete enough to ship, rather than fundamentally broken, is where the racism comes in.

Racism doesn't always come from overt bigotry or hatred. It can be expressed by simply accepting the status quo of systemic bias, because it's less work, or more cost effective, than doing otherwise.


> still a problem even if it accurately reflects racial and gender class role biases in society.

Agreed, but. Since the 'culture of software engineering' likely reflects society-at-large (maybe a step up) ... how do you construct training models that reflect a more refined and caring reality than the one we've got?

If what they're showing us accurately mirrors what they're seeing, that's a service. If we find it painful, well, we made it that way.


The obvious solution is to set such results among the short term goals and to treat failure to meet the criteria as unacceptable engineering failures.

As simple as no bonuses, no stock options, no promotions and most importantly no deployment if the model produces undesirable results.

The models reflect what AI ideology directs Amazon Turks to find. It reflects what is considered accurate and does not reflect classifications that might be perceived to put Amazon Mechanical Turk’s contracts at risk.

“We” didn’t make it that way because I know I am not using Amazon Mechanical Turk to classify images.

I mean even if the exploitive wage structure went away, the name itself is consistent with AI ideology that racism, religious intolerance, and nationalism are ok.


They are the results of human design though. We have to aspire for more than bias laundering, otherwise we risk the biases multiply


ML trained on real world data is an uncompromising unveiling of our collective subconscious. Don't you dare look away...

There is no God but you and me here.


“If we define understanding as human understanding, then AI systems are very far off,”

This took me into the following line of thought. If we wanted AGI we probably should give this neural networks an overarching goal, the same way our intelligence evolved in the presence of overarching goals (survival, reproduction...). It's these less narrow goals that allowed us to evolve our "general intelligence". It's possible that if we are trying to construct AGI through the accumulation of narrow goals we are taking the harder route.

At the same time I think we should not pursue AGI the way I'm suggesting is best, too many unknown risks (paperclip problem...)

Of course all this begs the question of what is AGI, how we define a good overarching goal to prompt AGI and many more...


I think the best concise definition I've run across is what I heard Yann LeCun say in his recent interview with Lex Fridman.

“The essence of intelligence is the ability to predict.” -Yann LeCun


That's just the current idea of how the brain works - predictive processing. As we advance our understanding perhaps this will be seen as only one facet of intelligence. For instance, where does creativity fit into this definition?


Can we get some images that were not cherry picked? What does the average output look like?


Here are some less cherry-picked examples. Mostly from the twitter accounts of OpenAI employees.

https://twitter.com/BecomingCritter/status/15118082774908969...

https://www.reddit.com/r/MediaSynthesis/comments/tyaz70/link...


Here's one:

https://twitter.com/jmhessel/status/1511783083967586306

And a non-cherry picked depiction of cherry-picking scientists:

https://twitter.com/jmhessel/status/1512143226022481932

I think tbh people have been burned by cherry picked results in AI for so long that they're overly suspicious of DALL-E 2...


It’s interesting that the researcher looks very similar in all images. I would be interested in seeing the training set


I would rather see images labeled "nurse" in their training dataset compared to their generated nurses.


They can tell you it isn't cherry-picked. Why would you believe them?


Oh, come on. That would ruin their marketing


Pretty darn impressive. Those "hooves" are probably going to feature in my nightmares though.


Oh God, what have you done. You have entered the code incorrectly. Look what your careless hands have wrought. Select the pistol, and then, select your horse.


OT, but for the ignorant like me, here’s the cartoon: https://www.penny-arcade.com/comic/2008/05/26/the-unhorse


Penny Arcade, nice!


> ask it to generate images of astronauts on horses, teddy-bear scientists, or sea otters in the style of Vermeer, and it does so with near photorealism.

I'm now imagining a mashup of Scribblenauts and this AI image generator.


How many kWhr does it take to train?


Is it Artificial Intelligence or rather Artificial Idiocy?


>> “Every now and then it generates something that just makes me gasp."

I'm amazed to see how easily Intel/AMD are able to cram so many monkey/typewriter combinations onto such tiny chips.


> DALL-E 2 is not based on GPT-3. Under the hood, it works in two stages. First, it uses OpenAI’s language-model CLIP, which can pair written descriptions with images, to translate the text prompt into an intermediate form that captures the key characteristics that an image should have to match that prompt (according to CLIP). Second, DALL-E 2 runs a type of neural network known as a diffusion model to generate an image that satisfies CLIP.

So, some stages seems to be artificially introduced...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: