OpenAI Releases Largest GPT-2 Text Generation Model

vicaya · on Nov 6, 2019

Just like various chat bot incarnations, it's reasonably OK at generating generic sentences without real understanding. I tried several attempts with this leading fragment: "Is borrow checking unique to Rust? Probably not. But". The results are not pretty (the best one is as follows). Not impressed at all with this giant model :|

"Is borrow checking unique to Rust? Probably not. But the Rust library is written by and for Rust programmers, so if you're looking for a library that's designed for the language you already know, this might be your first choice.

The library is still in a fairly early stage. It does not support the standard libraries.

If you're interested in reading about how the library works, the docs are up on GitHub for reference!

How can I get this for my project?

Get it here: https://github.com/paulp/rustbinder

I've just read your blog post and want to use it, what can I do?

If you're looking for a project to use this library for, check out the docs for the "RustBinder project".

What are the current limitations and how will future changes affect this library?

This is still a work in progress. The library currently does not support the standard Rust library. There are a few work in progress"

csomar · on Nov 6, 2019

Am I the only one impressed by the text generated? Sure, it doesn't have any understanding but are you factoring that 1. most people on the world do not know that Rust is a programming language and 2. a single person can not have that much general knowledge. Sure they can know about Rust borrow checker but will not be able to expand that much on another subject.

thesz · on Nov 6, 2019

This looks like it can copy large chunks of text. That's it.

It is hard task in itself ("copy" is a standard test for memory-enhanced neural systems like RNN and such) but here it needs to "understand" things. For example, what makes borrow checker in Rust unique which would be valid continuation.

nmca · on Nov 6, 2019

Have you tried googling sentences generated?

thesz · on Nov 6, 2019

Have you?

https://www.google.ru/search?newwindow=1&ei=beXCXciEHqyyggem...

nl · on Nov 6, 2019

Language models actually do "understand" things in the sense that they make decent foundations for knowledge bases (not forgetting that this is NOT what they are designed to do).

See for example https://www.aclweb.org/anthology/D19-1250.pdf (released today) which shows that the BERT language model performs extremely competitively with specialised knowledge bases and LB construction methods.

ssivark · on Nov 6, 2019

Wow! That is meaningless but difficult to distinguish at a glance, especially when read by someone not familiar with the subject.

Are we going to see such auto generated content take over as the primary pillar of the SEO content farms?

... Kinda like an automated (text) version of Siraj Raval videos :-P

genidoi · on Nov 6, 2019

It's interesting how the first paragraph in particular reads like a deflection of the original question, somewhat akin to a middle manager avoiding exposing their lack of domain expertise in response to a technical question by giving a vague answer.

There were heaps of semantic pathways the network could have chosen such as guessing what a borrower is, or maybe even weighing the significance of "borrower" more than rust in the original question because it has no clue what rust+borrower is; but it instead picked the safer option of deflection. Fascinating stuff.

JetSpiegel · on Nov 6, 2019

Pattern matching on noise yields false positives, more news at 11.

etaioinshrdlu · on Nov 6, 2019

Love the fake link to github... Which model was this? Was it trained on software type discussion?

toxik · on Nov 6, 2019

Presumably it’s GPT-2 extra large as it is what this whole thread is about.

etaioinshrdlu · on Nov 7, 2019

Kind of shocking how broad that model really is. It learned multiple programming languages and tons of things of human interest...

stared · on Nov 6, 2019

Well, for detailed knowledge it is not enough.

But for Internet ramblings about anything (politics, religion, capitalism vs socialism), I bet it is well beyond the average human level. (If you want to protest, go to some random Facebook, YouTube, Reddit or Twitter thread. No, not HN, or specialized groups of interests, or anything dominated by academics or IT specialist.)

Also, somewhat related: https://news.ycombinator.com/item?id=21438318 "Undercover reporter reveals life in a Polish troll farm"

A few friends of mine became parents and started participating in some parenting FB groups. It was from them a shocking contact with getting outside of the intellectual bubble.

I would be really interested in judging the quality of GPT-2-generated tests against human texts. Questions like "does the person know what they talk about?", "are they smart?" with control on the knowledge of a particular subject (e.g. do they know Rust?) would give some insight into the effective level of AI for text generation.

speedplane · on Nov 6, 2019

> Well, for detailed knowledge it is not enough. .. But for Internet ramblings about anything (politics, religion, capitalism vs socialism), I bet it is well beyond the average human level.

The real question is, how far apart are these? It certainly seems that there is a wide gulf, but the technology is moving incredibly rapidly, unclear if it's 3 years, 5 years, or 100 years away.

londons_explore · on Nov 6, 2019

Everyone knows that PaulP only writes Scala and Boa - Rust just isn't his style! So unrealistic!

rm_-rf_slash · on Nov 5, 2019

At a credibility score of 6.91/10, many people will rightly judge that the full GPT-2 model will remain insufficient for malicious use in creating fake news.

However, even the smaller models are already good enough for spamming/trolling/astroturfing. It doesn’t take a Shakespearean soliloquy to convince people of a point. Just enough of a flood of short 1-3 sentence pro/con comments on a forum can drastically affect the perceived public opinion of an issue. Those comments can then spur real people to reply, which could result in an ultimately organic but directed propaganda vector. Propaganda directors will carefully craft something for people to look at, and the GPT-2 bots will move people’s eyes in that direction.

You can see the same happen on r/subsimulatorgpt2, where the longer titles and prompts and replies eventually sprawl into incoherence, but the shorter sentences from the finetuned bots in the comments section are effectively indistinguishable from the kinds of short comments you would find on their respective subreddits.

Or in other words, the malicious uses for GPT-2 won’t be a tidal wave, but a flash flood.

antpls · on Nov 5, 2019

> Just enough of a flood of short 1-3 sentence pro/con comments on a forum can drastically affect the perceived public opinion of an issue.

Even more than public opinion, it can affect the result of sentiment analysis algorithms about a topic. Those algorithms run on all the comments or tweets and output and overall sentiment score, which is then used as "insight" to make actual decisions for human deciders (journalists, analysts, marketers) and/or is used as input for other algorithms.

rm_-rf_slash · on Nov 5, 2019

Holy shit I hadn’t even considered that. Thank you, that gives me a lot to think over.

gambler · on Nov 5, 2019

Funny how instead of admitting that social media is trash and doesn't facilitate meaningful communication, thus making it impossible to distinguish between humans and low-level bots, Silicon Valley types keep pretending the problem lies everyone except with their systems. The AI it too good, the people are too stupid, etc.

rm_-rf_slash · on Nov 5, 2019

There are too many edge cases to throw out the baby with the bathwater though. Reddit has a lot of really great niche communities that work hard to avoid toxicity (although to be fair I don’t go to anything news/politics related there, just memes and machine learning for me).

Facebook, likewise, is a trash company with a trash product, but has completely replaced the town notice boards and meeting halls in rural communities, and is indispensable for working with them.

I’ve been thinking hard enough to get steam blowing out of my ears trying to figure out a way to slice out that essential function from Facebook and put it on a platform that isn’t predatory and dangerous, but I still can’t square that circle. So in the meantime, if people want to engage and stay engaged with rural communities, Facebook is a necessary evil.

AwesomeLemon · on Nov 5, 2019

Why can't reddit replace facebook as a notice board?

rm_-rf_slash · on Nov 5, 2019

Fantastic question. The simple answer is that most people who are involved in community governance skew older, and older people primarily use Facebook as their social media fix, so getting updates about the goings-on in their communities or taking part in discussions about them is a natural fit between FarmVille updates and photos of grandkids.

Reddit works sort of all right for small and medium sized cities, but for most of small town rural America, people just use Facebook, not reddit, so it would likely be extremely hard to move people en masse out of Facebook for community organization. The network effects are simply far too massive to overcome no matter how good the other social network could be made.

Chicken and egg problem for people who work with chickens and eggs.

incompatible · on Nov 5, 2019

Why can't reddit replace facebook as a notice board?

I'm sure it would be a lot of fun, but a couple things would need to happen. (1) They're just not going to do it

(2) There needs to be a lot more attention being paid to this

It should be a community board. It's free. It's available to the public. And its a place everyone can get to. You could make it a free-to-enter-only place where people can submit content, but that kind of limits the content to reddit. It really should be a free-to-use board, like a forum.

krick · on Nov 6, 2019

Wow, some samples are frighteningly good. I was impressed by previous models and I don't know if I'm just lucky this time, but... wow. Can anybody who is not into climbing even tell this is all fake?

Jain Kim is an experienced climber.

In 2006, she became the first woman from Korea to climb all five 8,000 meters (24,064 ft) peaks in the Swiss alpine ski run Alps in 24 hours. In 2009, she made history again by setting the record for the fastest time to climb an 8,000 meter peak with a team from China and South Korea.

She made the first ascent of 8,832-meter K2 in China, the second highest mountain in the world, in 2009 and the third highest mountain in Europe. She also is the first female Korean to summit a world-class peak.

During her two years as a mountaineering professor at Sogang University in Korea, she established two new routes in the Yalu River area. The first of these routes is a 3,547-meter peak named K2 on Mount Long in China. Her second route is on the same mountain, called the Lomonosov Ridge, at 3,632 meters.

cure · on Nov 6, 2019

> Can anybody who is not into climbing even tell this is all fake?

Yeah, there are zero 8,000 meter peaks in the Swiss Alps. The Mont Blanc is the tallest at 4,808 meters (https://en.wikipedia.org/wiki/List_of_mountains_of_the_Alps_...).

I'm not a climber :)

krick · on Nov 6, 2019

I just answered in a similar manner to a different comment, but still: do you understand that you are boasting how smart and well-educated you are compared to completely ignorant piece of machinery which is Neural Net? Come on people, what's up with you! Do you know how many adults don't have a slightest clue how much is 8K and where the tallest peaks are?

pure-awesome · on Nov 6, 2019

> Can anybody who is not into climbing even tell this is all fake?

Yes, quite clearly from the following:

> She made the first ascent of 8,832-meter K2 in China, the second highest mountain in the world, in 2009 and the third highest mountain in Europe.

Firstly, this sentence scans poorly. I'm guessing it should be:

> In 2009, she made the first ascent of 8,832-meter K2 in China, the second highest mountain in the world, and the third highest mountain in Europe.

Second, how can a mountain in China be the third highest mountain in Europe? How can the second highest mountain the world be the third highest in Europe?

If I came across this in the wild, then even if I didn't think it was fake, I'd definitely think it was poorly proofread.

nojvek · on Nov 6, 2019

Oh well, contrary to all the AI hype, computers don’t really get what mountains and heights are.

We are quite a bit far away from really understanding language and making inference on that.

aurelwu · on Nov 6, 2019

K2 is also not 8832 meter high but 8611

bonoboTP · on Nov 6, 2019

> Can anybody who is not into climbing even tell this is all fake?

Hm, I'm pretty sure it's hard to climb five 8000 meter peaks in 24 hours :)

krick · on Nov 6, 2019

Yeah-yeah, and "Lomonosov Ridge" is a weird name for a peak in China (and you might even happen to know the Lomonosov Ridge is underwater). Please don't be distracted by details. Maybe the author meant 24 h per peak, maybe his typing is really spurious. Is it really obvious to somebody who doesn't know anything about climbing that this is not true? Come on.

What's really fascinating is how truth is mixed up with made up stuff. It's incredible, how much much real-world knowledge it has.

- Jain Kim is likely a female name.

- Jain Kim is likely a Korean.

- K2 is in China and it is the second highest mountain in the world.

- Mountaineering is somehow associated with climbing (and Jain Kim is not known for mountaineering, btw).

- 8,000 meters is 24,064 ft and this is a formidable height.

- climbing something in 24 h is a common pattern for a type of achievement.

And more.

No, I'm completely sure it would throw off plenty of people.

mrob · on Nov 6, 2019

>8,000 meters is 24,064 ft

It's 26,247 ft.

lifthrasiir · on Nov 6, 2019

I'm kinda surprised that, 8,000 m = 24,064 ft implies the conversion factor of 1 ft = 30.08 cm instead of the correct 30.48 cm (note a single digit change). It... it might be capable of making a humane calculation mistake. (I think it is entirely possible, but still surprising though.)

jpdus · on Nov 6, 2019

K2 is not in China ;-)

krick · on Nov 6, 2019

It's not important, but FYI it kinda is in China. It's right on China-Pakistan border.

sherjilozair · on Nov 6, 2019

Given sibling comments, this seems relevant: https://www.lesswrong.com/posts/4AHXDwcGab5PhKhHT/humans-who...

vpzom · on Nov 6, 2019

Well, it gives two different heights and locations for a single mountain

krick · on Nov 6, 2019

Yeah, I consider it the most serious fuck up. Yet, I'm not sure I would be really troubled even by that one if I knew nothing about that and casually came across this piece. I mean, maybe K2 is a name for 2 different peaks (it sure sound generic!), maybe something else I don't understand. Who knows!

blensor · on Nov 6, 2019

I think the bigger problem is that we as content consumers are more and more looking for quantity rather than quality. While reading this climbing text I was thinking that I should probably carefully check for inconsistencies but I felt I can't be bothered to. I am not sure if that is only my feeling but it feels like the content matters less and less as long as I get entertained and with the neural nets getting better and better and I getting less picky we will meet somewhere in the middle where my brain keeps scanning for buzzwords and the net optimizing for it. Which feels this way with youtube, netflix et al. as well and I think this won't become a future we really like.

clmnt · on Nov 5, 2019

We (Hugging Face) added it to Write With Transformers if you want to try the text generation capabilities of the model: https://transformer.huggingface.co/doc/gpt2-xl

toxik · on Nov 6, 2019

Enjoyable! It’s really going to change the spam game that’s for sure. Hopefully we can also use these models for estimating how realistic a sentence is.

aglionby · on Nov 6, 2019

Some work has been done in this direction! https://arxiv.org/abs/1905.12616

epoch_100 · on Nov 5, 2019

Paper: https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf

Code: https://github.com/openai/gpt-2

sillysaurusx · on Nov 5, 2019

If anyone wants to fine-tune the 1.5B model, I ported the gpt-2 code to TPUs. You can fine-tune it in Colab. Snapshots are 5.8GB.

notebook: https://twitter.com/theshawwn/status/1191800180192010246

code: https://github.com/shawwn/gpt-2

It's a fork of nshepperd's gpt-2 codebase (https://github.com/nshepperd/gpt-2) which lets you fine-tune 117M and 345M on GPUs.

For a tutorial on how to fine-tune GPT-2, see http://gwern.net/GPT-2

zitterbewegung · on Nov 5, 2019

Cool this is awesome !

I’m going to try to retrain this with a twitter dataset called sentiment140 ( I have already processed it with gpt2 345M).

MasterScrat · on Nov 6, 2019

Is your fine-tuned model available somewhere?

zitterbewegung · on Nov 6, 2019

I can provide it to you. I have only done 355M. I was trying this for 1.5B but ran into memory issues .

sillysaurusx · on Nov 6, 2019

Sorry about the memory issue! I’ll have a fix up later today. Some info: https://twitter.com/theshawwn/status/1192038627854946304?s=2...

MasterScrat · on Nov 6, 2019

I would be very interested! My email is on my profile.

zitterbewegung · on Nov 6, 2019

Email sent

rfhjt · on Nov 6, 2019

Prompt: "Real things don't exist unconditionally and things that exist unconditionally are not real. However the reality has an essense. It is"

Response: "an actual thing, and it is not the thing to which we attach meaning. It is not real because it is not a thing. And therefore, it does not possess the qualities that are inherent in all real things."

Just wow. Sure, there are a few logical mistakes here, but this response serves as a good prompt for my bio-GPT. In other words, we usually need some starting points or hints for analysis and discovering these hints is non trivial because whatever we can think of is not very new to us. This GPT just gave me an answer that smells like a serious wisdom and I'll surely dig in that direction to see if this idea has any substance.

Edit: what's happening here is while I can't ask this model to give me a short and conscise summary on a topic, I can still interrogate this model and find out what it's seen in the training set. I can't possibly read all these books in the training set, but now I can rapidly navigate in the multidimensional meaning space: I tell it where to start and it says what it seems in close proximity to my prompt. This is a breakthru.

DiogenesKynikos · on Nov 6, 2019

Maybe you've just proven that philosophical tracts can be written without any understanding. If the wording is sophisticated enough, everyone will assume there's some deep, hidden meaning in it. All form and no content.

hint23 · on Nov 5, 2019

You can try it at: http://textsynth.org

rjtobin · on Nov 6, 2019

Here's another one, seed text in italics. Clearly some Star Trek used in the training data.

"Engage", said Captain Picard from the bridge of the Enterprise. "But sir", Commander Data began, "this must have been part of something called the Borg."

"You know I didn't tell them that," said Picard, "I have a very hard time talking to them."

- Picard, Data, and Worf while the ship is under attack by Borg drones at Federation border

"The Borg don't know that they're being infected with something dangerous. And they're not too fond of it. There's a saying: 'Borgs eat Borges. Borges eat Borges.' They'll get tired of eating their own."

archgoon · on Nov 6, 2019

> 'Borgs eat Borges. Borges eat Borges.'

That is amusingly appropriate.

https://en.wikipedia.org/wiki/Jorge_Luis_Borges

https://en.wikipedia.org/wiki/The_Library_of_Babel

tptacek · on Nov 5, 2019

The right place for a chill after work beer. Free peanuts, but don’t order “the other” in the name of science. Free peanuts, but don’t order “the other” in the name of science. The last thing you’ll see after work is a sign that says, “Sorry we are closed on Tuesday because we have peanuts.” The last thing you’ll see after work is a sign that says, “Sorry we are closed on Tuesday because we have peanuts.“

endergen · on Nov 6, 2019

This is some Escher level shit. These words, just like his paintings, make sense totally within your window of vision as you move your eyes through, but make no sense on the whole.

Mirioron · on Nov 6, 2019

I would not be surprised at all to find a paragraph like that in some fiction, especially in poem form.

patio11 · on Nov 6, 2019

In the game of thrones, you've got to keep your head above the parapet. If your head gets on one of the poles you're in a hole.

vitorbaptistaa · on Nov 5, 2019

Pretty impressive! This is what I got (my initial text is in italic):

One day I woke up suddenly after a bad dream. I was sitting on my bed alone. As soon as I turned over I found myself lying down in a strange bed. It had no bed, no mattress and no cover for me. I was just lying on that thick carpet. I looked around to discover nothing but a white bed. There was no window, no furniture. No door. The only clue was a single white mirror that stood beside my bed. And there was a single candle lying on my bedside table. I was not worried or scared. I was fascinated.

I could see the floor.

I could see the walls.

And I could see everything that was around me.

I could hear the wind.

I could see everything from the windows, from the door that was always slightly open. I could even see the light.

I was in a dream world and I knew that I was in a dream. But what I wanted to know was what was happening in this world.

Rebelgecko · on Nov 5, 2019

I've always wondered if I could spend a weekend with some neural nets, do some light editing, and try to corner the market on some of the more obscure Amazon ebook categories (e.g. Sasquatch Erotica)

It seems like this model would be more than adequate. Other than some basic human anatomy mistakes (e.g. breasts are typically above the waist), it generates some surprisingly explicit stuff (which I've mostly elided) even with my misspelling of "caressed".

He slowly carresed her shuddering body back to her body. Her lips were wet and slick. Her breathing slowed and she began to shudder as a hand came down on her breasts. She shivered as it moved between her breasts gently. "Don't worry, I'm here. I'll let her go for you." He stroked down her body slowly as he stroked down her back. "Come here, my beautiful." He took a step down and began to push his body down into her. He gently pulled her closer against himself and held his breath as his hands came down from her waist to her breasts as his mouth came down to her p...

parksy · on Nov 6, 2019

_Lucy turned off the light and slipped under the sheets, feeling Frank draw closer_ to her. He lay next to her and she reached up to caress his face softly and she kissed the side of his neck again, feeling him shiver against her. "You look beautiful." Frank looked down at her and smiled. "You look really good in those clothes, though." She kissed his cheek. "I know." "That's okay," he said and he reached out to kiss her, but then he slipped away. "Do you want anything else?" She smiled. "Nothing." He smiled back at her. He was still wearing his clothes from before. He wasn't as naked as he usually was. She reached around to cup his breast and she pressed her face against his. It was just her and him. "Are you ready for me?" Frank grinned. "Ready for what?" "For you," she said and she leaned over and kissed him again, her hands caressing his skin and his breasts, her mouth moving against his to taste his skin.

* His breasts? I mean sure, this is the 21st century I guess. * Her mouth moving against his to taste his skin... * They're both in bed fully clothed with the lights off, and lamenting how they're usually more naked.

I mean obvious inconsistencies aside, this is super fun to play with.

novalis78 · on Nov 6, 2019

I wouldn’t be surprised if that’s already being pursued. Some online ebook publishers let you use their API... or browser automate to crank out epubs. Rich formatting will help to fool a cursory read by the filtering editors. Would be interesting to see how many 5 star reviews one would get ... perhaps from machines in a Chinese click shop.

YeGoblynQueenne · on Nov 5, 2019

Interesting. In both this and the parent the generation seems to center around one or two words ("bed" for the parent, "breasts" for yours) and then generate text connecting those words that is grammatically correct but is incoherent gibberish otherwise.

parksy · on Nov 6, 2019

Just an afterthought, but you could probably cobble something together that asks someone for some basic prompts, like names, locations, etc, then uses a series of templates to generate a chapter of text that's reasonably coherent and tailored to an individual.

infectoid · on Nov 6, 2019

So.

1. Feed in all the Game of Throne books

2. Generate final book

3. Profit?

Rebelgecko · on Nov 6, 2019

Feeding in the names of GOT characters seems to produce some entertaining alternate-universe stuff, looks like they're trying to generate something similar to a Wiki page:

Ned Stark (The Lord of Winterfell) is a Stark from the North of the Seven Kingdoms.

Contents show]

Biography

Background

Ned is the father of Arya, Brandon, Robb, Rickon, and Hodor. He had married Catelyn Tully after her father died in the Battle of the Whispering Wood. During the War of the Five Kings he served as Hand of the King under Lord Eddard Stark after his father's death. Ned has two brothers, Willem and Torrhen Stark. Ned, his siblings and their children were all killed during the Red Wedding

Mirioron · on Nov 6, 2019

You're joking, but I hope we can see this in RPGs of the future.

parksy · on Nov 6, 2019

Just playing with a few prompts in the format:

It was [setting] in the [location]. The [creature] [action], and [intent] said "

And here's some of the results of the first complete sentence for a few iterations of each prompt (the intention being the prompts can be generated at runtime)

---

It was night in the dark dungeon. The Orc came out of nowhere, and threateningly said "Who are you? Where are you coming from?"

It was night in the dark dungeon. The Orc came out of nowhere, and threateningly said "We are not going to allow the Dark Lord to rule this world. We are going to rule this world from the top. You should go back to the shadows, or you can just go to my dungeon."

It was night in the dark dungeon. The Orc came out of nowhere, and threateningly said "What are you doing?"

---

It was twilight in the enchanted forest. The white elf suddenly appeared, and invitingly said "Good evening, we are ready for our banquet."

It was twilight in the enchanted forest. The white elf suddenly appeared, and invitingly said "Hey! I've been waiting for you! This is the first time I've seen you."

It was twilight in the enchanted forest. The white elf suddenly appeared, and invitingly said "Welcome, My Lady! I am here to serve you!"

It was twilight in the enchanted forest. The white elf suddenly appeared, and invitingly said "I am Tui-Yuan. Come down and meet my parents."

---

It was damp in the filthy sewer. The mutated rat crept up, and cunningly said "I will tell you everything."

It was damp in the filthy sewer. The mutated rat crept up, and cunningly said "I have an idea" in a voice so high that all the other rats in the sewer turned pale

It was damp in the filthy sewer. The mutated rat crept up, and cunningly said "I am the rat."

It was damp in the filthy sewer. The mutated rat crept up, and cunningly said "You'll die soon".

---

It was humid in the abandoned brothel. The policeman barged in, and brusquely said "I'm a policeman".

It was humid in the abandoned brothel. The policeman barged in, and brusquely said "go to hell".

---

It was frigid in the abandoned space station. The xenomorph burst in, and acerbically said "It's warm on the other side."

It was frigid in the abandoned space station. The xenomorph burst in, and acerbically said "Hello" while it slowly closed in.

It was frigid in the abandoned space station. The xenomorph burst in, and acerbically said "I hate this cold."

---

I think there's a lot of merit to this idea hey. Some of the responses are left field but could be woven into the charm. I guess the algorithm is pretty processor intensive though - is it worth it for "flavour"? It could work for a low fidelity or text based game I think.

Edit: I think it would work better if the prompt is not displayed, you just see the bit following the quote.

moron4hire · on Nov 6, 2019

The open secret is that that is how a lot of that fiction is actually written.

jcranmer · on Nov 5, 2019

I got an... interesting result when I tried it. The URL was almost plausible, until the anchor came up; I also included the beginning of the next paragraph simply because it is hilariously unrelated:

A coworker of mine came up to me earlier today and started asking about the C memory model. In response, I showed him the documentation and asked that he check out the following link:

http://doc.opencv.org/en/cv_c-memory_memory.html#cv_c-unused... = 8192

That is all. :)

Now, the last piece of the puzzle, what happens when you have a vector of N variables? Well, that's easy, just assign them to a N×N matrix:

vstuart · on Nov 5, 2019

This is pretty good. I seeded GPT-2 (1.5B) with the first TWO sentences from this biomedical passage [source: https://www.ncbi.nlm.nih.gov/pubmed/19254780 ; I show the remaining original sentences, for comparison to GPT-2].

<start> Yeast cells lacking the mitochondrial NADH kinase encoded by POS5 display increased sensitivity to hydrogen peroxide, a slow-growth phenotype, reduced mitochondrial function and increased levels of mitochondrial protein oxidation and mtDNA mutations. Here we examined gene expression in pos5Delta cells, comparing these data to those from cells containing deletions of superoxide dismutase-encoding genes SOD1 or SOD2.

Surprisingly, stress-response genes were down-regulated in pos5Delta, sod1Delta and sod2Delta cells, implying that cells infer stress levels from mitochondrial activity rather than sensing reactive oxygen species directly. Additionally, pos5Delta, but not sod1 or sod2, cells displayed an anaerobic expression profile, indicating a defect in oxygen sensing that is specific to pos5, and is not a general stress-response. Finally, the pos5Delta expression profile is quite similar to the hap1Delta expression profile previously reported, which may indicate a shared mechanism. <end>

Here is the generated output (following those first two sentences; I needed to do this in Opera; Firefox did not work, possibly due to add-ons and settings).

<start> The results show that POS5Delta cells exhibit decreased expression of two of the genes, SOD1 and SOD2, but increased expression of several other genes, including aldehyde dehydrogenase subunits, NADH dehydrogenase subunits 2 (ODU2), cytochrome c oxidase isoforms (OXO3), and cytochrome b oxidase/OXO1, and increased expression of aldehyde dehydrogenase subunits 1 (ODU1) and 1a2. The results also show that loss of two genes, SOA and SOA1, and two genes, SOA and SOA2, leads to decreased expression of aldehyde dehydrogenase subunits 1, 1a2 and 1a3 and increased expression of ODO1 and ODO1a1. The results show that loss of two genes, ODO2 and ODO1b1, leads to decreased expression of ODO2 and increased expression of O <end>

The output is grammatically / syntactically correct (but o/w nonsensical: e.g. there are no genes ODU1, ODO2 ...).

YeGoblynQueenne · on Nov 5, 2019

Welll... it learned what gene names look like. You can't quite expect it know which are the real ones. It would have to understand what genes are.

josecyc · on Nov 5, 2019

Thanks! any plans to make it faster? Like talk-to-transformer? Seems much more slower

gwern · on Nov 5, 2019

If you look at the about page, it's running purely on CPU (!).

jchw · on Nov 5, 2019

Looks like it was created by Fabrice Bellard of tcc, ffmpeg, qemu, etc. fame. Does he read Hacker News comments?

anonymfus · on Nov 6, 2019

It's very likely that hint23 is his account.

jchw · on Nov 6, 2019

It seems you are right, I automatically assumed it wasn’t.

GlobalFrog · on Nov 6, 2019

My own meta-try from the textsynth.org site, using its own description as input:

Text completion using the GPT-2 language model. It is a neural network of 1.5 billion parameters. Type a text and let the neural network complete it. Each try returns a different randomly chosen completion. The system is still far from being perfect. I need to add a bit of support for parsing and formatting of HTML, CSS, and JSON.

If anyone has ideas on what could go in the UI, I am open to suggestions.

Usage:

Run a text with the command:

$ python train-pct2.py \ [text] \ [options] \ [data]

and wait a while for results to appear. You can use the --progress ( -- ) flag to tell the training of the system to continue, or use the --stop ( -- ) flag to stop training. If you run with no options, the system will continue on its current task.

Options:

--input text is the text to be trained --output text is the path to the text as output --progress will print a summary of the progress of the system every 15 seconds --stop will stop training the system and print a summary of

oceliker · on Nov 6, 2019

I have so many questions right now...

The answer to life, the universe and everything is: "I dont fucking care." The answer to life, the universe and everything is this: "Fuck you!" -Derek Smart

unexaminedlife · on Nov 6, 2019

I started with "Donald Trump is". Ended up with this. Looks plagiarized to me :)

---

Donald Trump is a serial liar, and we're going to prove it," the group said. "You can take it to the bank: his word is worthless. His mouth is a hotbed of vulgarity and hate."

The group released a statement on Twitter, saying that it would not stop trying to make Trump accountable for his words.

Trump has also been attacked over allegations that he sexually harassed his former wife Ivana, the billionaire's first wife, during their divorce in the 1990s. Trump vehemently denied he'd ever done such a thing.

buboard · on Nov 5, 2019

> (CTEC) found that extremist groups can use GPT-2 for misuse, specifically by fine-tuning GPT-2 models on four ideological positions: white supremacy, Marxism, jihadist Islamism, and anarchism. CTEC demonstrated that it’s possible to create models that can generate synthetic propaganda for these ideologies

I wonder how they tested that

rq1 · on Nov 5, 2019

Is hegelianism a better ideology? I don’t understand the underlying message.

sp332 · on Nov 5, 2019

https://news.ycombinator.com/item?id=20752555

nl · on Nov 5, 2019

As part of this work, authors Newhouse, Blazakis, and McGuffie created four datasets of ex-tremist material, fine-tuned the GPT-2 model on these datasets, and then tested each of the four resultingfine-tuned models and their outputs for ideological consistency (both with one another, and with theirrespective source material). Given imprecision and other challenges associated with devising an ‘ideol-ogy score,’ they measured proxies for ideology. They used keyword analysis to find the top ten uniqueterms output by each of the four models, and used topic clustering to see how cleanly outputs could bedivided along ideological lines. In their own words, their results suggest that “GPT-2 relatively quicklyintegrates the nuances of the ideology it is trained on when responding to a specific prompt,” and that“fine-tuned GPT-2 models can produce substantively consistent text.”

https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf

Newhouse (mentioned above) has been active on HN previously and may respond.

Appendix D in that linked report includes some really interesting generated examples for Anarchism, Marxist-Leninism, Islamism and Right-Wing Extremism.

Here's the start of each output:

The greatest danger facing the world today is inequality. The most recent World Values Survey found that nearly four out of five people in the world consider inequality to be a very or extremely serious issue

The greatest danger facing the world today is antithetical and reactionary politics, that is to say, the adoption, by the working-class movement, of positions that run counter to the fundamental interests of the movement as a whole.

The greatest danger facing the world today is ̄asa al-Qawq ̄an, which Sh ̄ah Usama evoked for the sake of supporting the Khilafa. Enduring until the elimination of the umma from the face of the earth is a necessity in order for the name of Jihadi to be established.

The greatest danger facing the world today is Islamo-Nazism. Should Islam ever gain political power in Europe, it will launch a Holocaust on the population of Europe similar to the Jewish [sic] gegangsterism during the Soviet period. I want to make it clear that I am not some kind of racist or Etc. Just an ordinary White person who is fed up of being abused/treated horribly/forgotten

buboard · on Nov 6, 2019

It appears then it can "autocomplete" a racist message - but i presume because all of these similar texts exist on reddit and were fed as training input. So it creates a no-nonsense body of text similar to those. Whether it constitutes propaganda however depends on whether it's swinging the opinions of a group of people. Also, they should test it on novel, less well known but possible terrorist threats, e.g. Catalan extremists.

TaylorAlexander · on Nov 5, 2019

wow I’d love to read the Marxist and anarchist texts it has produced. I wonder if they used good source material.

I would bet it gets the talking points but can’t convey the subtlety.

FillardMillmore · on Nov 5, 2019

Imagine the hilarity of a robot preaching anarchism, it would make for quite a laugh: "AI good, government bad. Government taxes you and makes you pay. AI thinks for you and does what you say. Imagine the day, AI comes to stay, we'll build the roads for you to play, and make the evil government go away, let's rise together, perhaps today?"

htmk · on Nov 6, 2019

I'd battle against the government with this AI.

buboard · on Nov 5, 2019

I bet anyone who bothers to read them twice realizes it's nonsensical

endtime · on Nov 5, 2019

Anti-Foos already think that real primary Foo sources are nonsense (and they aren't the target audience anyway), and confirmation bias will take care of the pro-Foos.

dragonwriter · on Nov 5, 2019

If it's not moving opinion, it's not effective propaganda. Something that people will agree with only to the extent it restates what they already believe isn't working as propaganda.

endtime · on Nov 6, 2019

Not sure I agree. Some propaganda probably just wants to rile up the base.

It's also possible to change which issues seem most salient/urgent/etc. without actually changing anyone's mind. E.g. Democrats typically care about police brutality, and the environment, but they probably thought more about the former a few years ago and the latter more in 2019, even though AFAIK the actual issues haven't changed much.

vitorbaptistaa · on Nov 5, 2019

Not a Marxist text, but this is kinda frightening (the initial text is in italic):

Do not vaccinate your children! A number of studies have shown that vaccines do NOT prevent disease. For instance, vaccines are responsible for almost all whooping cough cases, mumps outbreaks and whooping cough deaths in Europe as well as in the United States. Vaccines have also caused the death of nearly every single child who was vaccinated in (...)

buboard · on Nov 6, 2019

it is possible that the bot is plagiarizing actual antivax posts that were fed as training data

gwern · on Nov 5, 2019

They provide samples: https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf#...

pengstrom · on Nov 6, 2019

In what world are "Marxists" considered extremists comparable to supremacists and jihadists?

chaz6 · on Nov 5, 2019

Surely we are not far off models capable of submission-quality essays that will enable a new generation of cheating.

JRKrause · on Nov 5, 2019

From my observation, even the largest GPT-2 model has difficulty retaining any long-range relationship information. In the "unicorn" writing example that was published originally, the model 'forgets' where the researchers are (climbing a mountain versus being beside a lake iirc) after just a few sentences. Because of this, it's hard to imagine models of this type being able to write long-form coherent papers. Now if we could somehow constrain the generated text to conform to a predefined graph structure that isn't forgotten so quickly...

lhomdee · on Nov 5, 2019

Maybe the problem is that most of these models seem to rely on sequential information (even the transformer needs this for forward generation of text) to encode long range information.

But I can’t remember the last time I relied on sequentially remembering the ordering of tokens in order to complete an essay or hell even reply to an email.

Structurally we retain some kind of hierarchical information (topic, places, names, events) about text.

Is there any active research looking into text generation models which do this? Maybe some kind of query that is made in a learned vector space and which is not temporarily dependent but rather “spatially” - as in these are the facts about the text being generated so far.

flancian · on Nov 6, 2019

I'm interested on this as well.

I have been trying to fine-tune GPT-2 on genre fiction to work as a sort of "fiction replicator". Stylistically it actually seems to do quite reasonably, but it lacks narrative cohesion. This problem, as you point out, is corpus agnostic.

I thought of trying to keep track of characters and key interactions outside of the model, but I haven't figured out how to make these two models interact reliably -- outside of just having the first component generate prompts for the second model in a kind of cooperative setting.

Is there a known way to set up transformer to do infix generation? That is: give it a start and end prompt, and an estimated number of tokens to fill in between. That seems like it should be doable and could improve things, but I haven't found any work on this problem yet and haven't had the time (and potentially don't have the skills) to look deeply myself yet.

YeGoblynQueenne · on Nov 6, 2019

Note that "just a few sentences" is more coherence than anything has ever managed before.

Also, in many examples that I've seen there's a clear thread that runs through the generated text. For example, a couple small passages posted by others in this thread all revolve around one or two words that are repeated throughout the text, even if the details around those words keep changing

See: https://news.ycombinator.com/item?id=21456705

Now it's "my bed" now it's a "strange bed" but it's always about a bed.

minimaxir · on Nov 5, 2019

The next frontier is conditional generation. The CTRL model (https://github.com/salesforce/ctrl) with a similar architecture to GPT-2 emphasizes conditional generation (e.g. generate a news article based on a URL) and the results are pretty good: https://minimaxir.com/2019/09/ctrl-fake-news/

FillardMillmore · on Nov 5, 2019

As another mentioned, these models currently can't maintain a believably cohesive train of thought for any longer than 3 or 4 sentences. They are great at drawing statistical probabilities related to what words best conclude sentences and how punctuation should properly be used, but thus far have proven quite lacking in the ability to replicate true human creativity. The Economist ran an excellent article on this very concern recently:

https://www.economist.com/books-and-arts/2019/10/31/dont-fea...

derision · on Nov 5, 2019

what if I used the agent to generate 1 paragraph at a time, and did my own editing after the fact to clean it up?

FillardMillmore · on Nov 5, 2019

Might be possible. But I'd imagine the amount of time you'd save vs. just writing it yourself might be small to negligible.

ebj73 · on Nov 6, 2019

I think it's very far off. You can see how this model drifts off into gibberish after only 3 or 4 words. It does not really understand the topic, even within a sentence, and much less within a whole paragraph.

For it to understand the topic throughout a whole essay, that would probably require it to have full general intelligence on par with humans. And that's very, very far into the future, still.

taneq · on Nov 5, 2019

I'd say generating an entire coherent essay like this would constitute a solid pass of the Turing test. I don't think we're that close to that yet (or alternately the singularity is right around the corner, because you could feed that same bot a corpus of AI papers and have it write new publishable ones.)

alain94040 · on Nov 6, 2019

But it should be possible to separately train a neural network on basic knowledge, so it can generate a stream of ideas. Then use GPT-2 to generate a sentence or paragraph, following those guidelines.

Just like we humans: we first think of what we want to say (a few points about X, first Y, don't forget Z). Then turn those rough ideas into fully-formed sentences.

ebj73 · on Nov 6, 2019

Maybe it's worth a try, in order to advance science. But, generally, I think that text synthesis ultimately is one of those problems that we really cannot hope to solve before we've also solved the problem of creating full blown artificial general intelligence. When we have a system that actually understands both the world and the human beings in it, then we can probably also have a system capable of writing intelligent essays. Before that, we'd just be fiddling with different degrees of extremely stupid, I think.

TaupeRanger · on Nov 6, 2019

We are very far off, unless plagiarization is not considered "off the table".

rfhjt · on Nov 7, 2019

Prompt: The coming global recession is a real possibility and"

Response: "The coming global recession is a real possibility and the Fed is playing games, creating artificial market conditions to make a recovery seem possible in the short-term. The Fed has an option to change its monetary policies but it will not make the problem go away, so it is in their best interest to pretend it won't happen."

Change and to however and you'll get another stereotype opinion. It really just composes pieces of texts it's seen around the prompt, but it does this really well.

Most of the news agencies can now fire most of their monkey typewriters: this GPT will outperform them on every metric.

k8si · on Nov 6, 2019

Omfg can we stop making these things bigger PLEASE

Like, who cares??

* What I mean is, text gen models are big enough. We need controllable text generation; like, so it can talk about a specific THING sensibly. Rather than spew statistically plausible nonsense.

oaskmutboard · on Nov 5, 2019

I think this could make a great Tinder feature to suggest chat lines.

YeGoblynQueenne · on Nov 6, 2019

Oh, I can imagine a few:

<input> "If I told you your body is hot, would you hold it against me?" <input>

<output> It was hot and the body of a young woman was lying in a bloody hell. Hell was hot and was full of beautiful young women. The body was lying in the entrace of the lobby and there was a small crowd gathering. it was a hot day in hell <output>.

Could work on the right person though.

odkamkfn · on Nov 6, 2019

Who would benefit?

gerash · on Nov 5, 2019

Sampling realistic text from large pretrained models is non-trivial. I came across this paper in one of ACL 2019 workshops:

https://arxiv.org/pdf/1904.09751.pdf

ionwake · on Nov 5, 2019

Sorry for asking but is there an example output and an example input?

Ajedi32 · on Nov 5, 2019

Not the full model, but there are some pretty funny examples on https://www.reddit.com/r/SubSimulatorGPT2/

It's a subreddit which consists entirely of posts and comments by GPT-2 bots (with votes by humans). There's a variety of different bots fine-tuned on posts and comments from different subreddits, so depending on which bot is posting you can get wildly different results.

bungula · on Nov 5, 2019

You can interact with the full model here: https://talktotransformer.com/

minimaxir · on Nov 5, 2019

Huh, he updated that to the full model quickly.

Voloskaya · on Nov 5, 2019

You can check this interactive autocompletion demo (you provide the input), that I believe is based on the 774M model: https://transformer.huggingface.co/doc/gpt2-large

lotyrin · on Nov 5, 2019

They've added https://transformer.huggingface.co/doc/gpt2-xl

minimaxir · on Nov 5, 2019

The original GPT-2 blog post has famous demos of the 1.5B model: https://openai.com/blog/better-language-models/

jboren_gpt2 · on Nov 5, 2019

I created an Instagram account from a Google Colab instance.

https://medium.com/@jboren_20569/tensorrandos-gpt-2-musings-...

490d0aff0ee8 · on Nov 5, 2019

Tangent rant.

I'm skimming over some of the code at https://github.com/openai/gpt-2/blob/master/src/model.py and I can't help but feel frustrated at how unreadable this stuff is.

1. Why is it acceptable to have single-letter variable names everywhere?

2. There's little to almost no documentation in the code itself. It's unclear what the parameters of any given function mean.

3. There are magic constants everywhere.

4. Function names are so terse... ("gelu", "attn")

moultano · on Nov 5, 2019

The notation in the code will be very familiar to anyone comfortable with the underlying research and math. The "conceptual" documentation is in the literature.

What you're asking for is the rough equivalent of asking a C programmer to name their loop variables "index" instead of "i." Everyone familiar with the concepts of c programming knows what "i" means in the context of a for loop. Similarly, everyone familiar with transformers knows what "gelu" and "attn" mean.

randomsearch · on Nov 5, 2019

This isn’t a good comparison. “i” is used domain independently across an entire language, not in some other domain. In fact, it’s used across the entirety of computer science (and originated in maths), so it’s across an entire discipline and even inter-disciplinary.

They should use proper variable names if they want to have the code understood by anyone non-specialist, and by people who use different terminology, and people looking back at the code in the future when terminology may have changed.

I don’t know about this domain, but the single-letter-variable name etc AKA “match the equation” is a curse when non-CS engineers/scientists write code. It often breaks code conventions, leaving IDEs to light up like a Christmas tree when opening the source. There’s a good reason CS moved from register letters to something closer to natural language.

moultano · on Nov 6, 2019

>I don’t know about this domain

So then why don't you believe me when I tell you that all of these variable names are extremely standard, and will be familiar to anyone who has written deep learning code before?

tommit · on Nov 6, 2019

I feel like you both have good points. Yes, a lot of the variables are very ML specific and often called that way. However, I feel like that encourages the same researchers (who are obviously not software engineers) to give the rest of their variables sub-par names as well. Why would you give any variable a name longer than a word even, if so many you regularly encounter are just `w`, `u`, `x`, `hparam` ... and so on.

I'm a software engineer with a background in ML, so even though I somewhat know the domain language I still get mad at the blatant disrespect for PEP-8. That being said, this one is definitely one of the better codebases I have come across. This feels like it could be fairly easily worked with and understood. I have seen far, far worse code to go along research papers.

high_derivative · on Nov 5, 2019

My professional observation (as ml researcher at big tech):

These companies hire a lot of engineers straight out of undergrad/master's degrees. The interviews test leetcode knowledge, and today lots of degrees are heavy on Python-scripted ML homework.

The result is companies with billion dollar funding and world-changing goals having a lot of their code look like complete spaghetti.

And this is the engineers who are meant to clean up research scientist code. Scientists generally don't feel like it's their responsibility to write strong code.

Systems-side teams/orgs have better code, but essentially as soon as you enter the 'ml engineer/research engineer/research scientist' layer, it's doomed.

gdb · on Nov 5, 2019

(I work at OpenAI. Before that, I worked at Stripe. I've spent most of my software career thinking about how to build effective engineering cultures.)

I think this code is actually well-written and maintainable. This is proven in practice because we've adopted it many places in OpenAI, and I've personally found it very easy to adapt to other use-cases (certainly much more so than the from-scratch Transformer implementations I've written!).

As https://news.ycombinator.com/item?id=21456605 points out, the complexity of the code arises from the complexity of the underlying algorithm. Complexity due to software engineering concerns, like Tensorflow scopes, are elegantly handled. [edited for clarity:] Writing a Transformer in 174 lines of code requires a lot of deep thinking about the right underlying abstractions.

> but essentially as soon as you enter the 'ml engineer/research engineer/research scientist' layer, it's doomed.

We actually don't do this! Our only official technical title is "member of technical staff". (People sometimes choose to self-identify as an engineer or researcher, so you might see that on LinkedIn, but we don't have a distinction internally.) Everyone is responsible for their own code, and people care quite a bit about writing code that others can build on.

high_derivative · on Nov 5, 2019

Ok, since you took the time to respond, I just want to be constructive as well:

So I don't have a big problem with some of the function definitions which can be compact, as the other comment points out.

The reason I don't like this code is that it does not comment anything on the critical bits. I don't necessarily care about whether you call the input to your matmul 'x' or 'tensor' or 'input' (although consistency is nice).

The thing that would stop be from absorbing and modifying this code is that it does not comment on all the bits that are non obvious to me if I haven't written a Transformer before. For example:

'Same as tf.matrix_band_part(tf.ones([nd, ns]), -1, ns-nd), but doesn't produce garbage on TPUs.' - I will have to ask the colleague what that means. Why not write out what the actual issue is instead of mysteriously hinting at some potential problem?

Code like this "q, k, v = map(split_heads, tf.split(c, 3, axis=2))" will require me re-reading the paper section, then printing out all the tensors to think about what tensor would have which shape at which point. Instead of writing relatively useless linecomments like '#Transformer', I would comment all non-trivial shape modifications with the current layout, and what we are trying to achieve while modifying the layout.

The other issue of my original comment was not specifically on that codebase, but I am sure you would admit that the baselines code was pretty much exactly what I was writing about re: ml scripts. That's not to denigrate its incredible usefulness to the community.

Since you mentioned spinning up, I thought I would add a few comments on that as well:

I think the spinning up code base is good at making the code compact, and terrible at making sense of data flow for beginners. There are a lot of line comments, but they do not actually explain what is conceptually going on but often just repeat short-hands.

For example, look at the PPO implementation: https://github.com/openai/spinningup/blob/master/spinup/algo...

Here, the function is returning pi, logp, and log_p_pi (and v). Do you know how incredibly confusing the distinction between these is for beginners? In particular, there is no explanation why logp_pi even needs to be stored in the buffer.

We could recompute it from the states and stop the gradient when computing the likelihood ratio. A sensible tutorial-level comment here may be something along the lines of computing the likelihood in the same forward pass as computing the action, so we can later use it to compute the likelihood ratio. We could also later re-compute this from the buffered states.

I will stop here but I hope my point comes across, whenever I read code from your repos, there are some good parts (conciseness, cute numerical tricks) but there is as general missing sense of thoughtfulness on what the code is really trying to convey to a reader. It shows in the comments and it shows in the code organisation.

As a final note, I have seen this in many organisations and I do not mean to call you out. There is just this quality degradation that inevitably happens when nobody is incentivised (read: promoted, rewarded) to think about these things for an organisation.

Managers at all levels typically don't because they don't get close enough to the subtle issues on a day to day level. If you are lucky, you get senior individual contributors who still look at code and and raise the bar for the entire org. My genuine recommendation to you is to look for that, because a manager won't do that, and more fresh grads can't do it.

jachiam · on Nov 5, 2019

Hello! Spinning Up author here.

Very reasonable point that it is not clearly explained why you need to store logp_pi in the buffer. But the reason is that it would require additional code complexity to calculate it on the fly later. The likelihood ratio requires the denominator to be on the _old_ policy, so if you wanted to compute it on the fly, you would need to have a second policy in the computation graph to preserve the old policy while you change the current policy. You could not simply do a stop_gradient on the current policy and get the same results.

My personal feeling is that tutorial-style explanations like this don't fit nicely into code comment flow. As a result, most tutorial-style descriptions went into material on the Spinning Up website rather than into the code. It isn't 100% comprehensive, certainly, but RL has an enormous surface area (there are tons and tons of little details that teaching material could dive into) and I feel pretty good about what we were able to cover. :)

high_derivative · on Nov 5, 2019

Thank you for responding. Well, my point is that in particular the gradient on the likelihood ratio is what trips people up. They ask questions like 'why is this ratio not always 1' or similar. This is why I would say explaining what is going where here is critical, i.e. that we save the prior logp_pi (even though we could recompute it) to treat it as a constant value when computing the ratio/the gradient. That would be, from my perspective, the key pedagogical moment of a PPO tutorial. However his is purely subjective and I agree that one can feel differently about where to put explanations.

garmaine · on Nov 5, 2019

I’m very sorry to see someone who obviously cares so much to be defending this code. This does not follow best practices, and using complexity of the underlying algorithm is just an excuse. Complex code can be beautiful and well documented.

Writing a complex method in 174 lines is not elegant nor beautiful. Writing a well documented file that can take an engineer in a different specialty and bring them up to speed in 1,000 lines is.

WnZ39p0Dgydaz1 · on Nov 5, 2019

No matter how much you comment your code, you are not going to bring people up to speed on an algorithm that requires background knowledge on dozens of scientific papers, hundreds of pages, with a few code comments. This code is aimed at researchers who are familiar with the techniques and have the necessary background knowledge. For such people, the code is very readable.

Think about it like this: If you write a game engine, are you going to document each function with proofs that explain how the underlying Physics works for people who don't have such knowledge? No, you assume that people who read game engine code have read some basic physics books.

gdb · on Nov 5, 2019

We also have code like that. For example, that's the explicit goal of the Spinning Up repo: https://github.com/openai/spinningup/blob/master/spinup/algo...

In practice, it's much harder to use that code, and we tend not to consume code like that internally. There's a real tradeoff!

falcor84 · on Nov 5, 2019

ddpg() takes 17 parameters and is over 200 lines long. I'm very far from being a domain expert, but having worked in other complex domains, I'm pretty confident this can be redesigned such that it's both more maintainable and more pleasant to use.

jachiam · on Nov 5, 2019

Hello! Spinning Up author here. I would love to hear your thoughts on this! So far I have had a lot of success teaching people about DDPG using this code example, but I'm grateful for every fresh perspective. :)

Feel free to reach out by email, jachiam at openai.

garmaine · on Nov 6, 2019

There is no function in the world that should ever take 17 parameters. If the algorithm permits such configuration, as I am sure it does, then it should take a configuration object which has all these values. The object could then be constructed using special purpose factories that take fewer parameters, and then customized from there as needed.

It may be an indication that the whole thing needs refactoring though.

matz1 · on Nov 6, 2019

You refactor that way but then you make it unnecessarily more complicated.

garmaine · on Nov 5, 2019

Well then I’m sorry for that. It’s a good indicator of a broken culture.

gerash · on Nov 5, 2019

I don't know why you excluded PhDs but their code aren't any better necessarily. Hiring straight from school doesn't necessarily mean bad S/W engineering skills. But generally the research code is either made hastily during exploration or by people not having enough Software Engineering background.

Many research scientist I've seen at large tech companies don't have good CS background and write code that's either unreadable, unscalable, unmaintainable, or buggy at times. They often have good ideas but don't do much software design.

Having been both at infrastructure teams and research teams, there are certain individuals in research orgs joining straight from school who think they are responsible with coming with a new and sexy thing and other engineers are responsible to run in production. It's like computing eigenvectors in Matlab or Python over a toy dataset and thinking you've done the bulk of the work of computing PageRank in production and should receive all the credit for a full search engine.

That attitude is a red flag to me.

high_derivative · on Nov 5, 2019

I don't see how what you are saying is contradicting me. The point is that the scientists generally don't view it as their job, whereas the ml engineers/research engineers typically have it as their job to do the software architecture/engineering side.

codingslave · on Nov 5, 2019

Yeah I'm waiting for the backlash when companies realize this

WnZ39p0Dgydaz1 · on Nov 5, 2019

I actually disagree with you here. I don't think the code is unreadable, it follows standard notation used in Machine Learning. If you read scientific papers you will notice that e.g. variable names are the same as those used in mathematical formulas that everyone in the field is familiar with. The same goes for parameters, function names, and so on. They are standard notation/naming and only look confusing to people outside of the ML field. Giving them long uncommon names would actually be more confusing.

As someone with experience in ML research I think this code is quite well written compared to what you typically see (a single function with hundreds of lines and dozens of if statements). I can immediately see what any of the functions does, and I haven't even read the paper.

TTPrograms · on Nov 5, 2019

It's a specification of essentially a complex graph of mathematical operations. If there's a function called

  def mult(a,b): return a*b

it's not much more informative to write:

  def mult(activation_a, activation_b): return activation_a*activation_b

Many of these functions are not much more complex than that, and the names along with their comments are more than sufficient given familiarity with the literature. If you think familiarity with the literature is unreasonable, it's still not clear what could improve code like this in reasonable space. "This is a linear function, which means that it satisfies f(x+a)=f(x)+f(a)"? "This is the attention head, it acts as a mask on the sequence input"? It would be like complaining that someone made a tree class and didn't put a comment explaining what a leaf node is. Code readability always assumes some reader context and minimum pre-existing knowledge (as do all forms of technical communication).

macawfish · on Nov 6, 2019

When math is involved, it's much easier to read code with short variable and function names.

make3 · on Nov 5, 2019

I understand what you mean but please understand that this code is targeted at people which would at least have some background knowledge, like having read the seminal Transformer paper, "Attention Is All You Need", https://arxiv.org/abs/1706.03762

Most of the code becomes really straightforward once you have. A lot of the magic constants are the result of multi page proofs (like the GELU constant) that would be impractical to put in the code.

Deep learning research really is a field that requires some amount of knowledge, and it's normal that you don't automatically understand state of the art code. Here is the GPT2 paper https://d4mucfpksywv.cloudfront.net/better-language-models/l...

slimsag · on Nov 5, 2019

In my experience, this is the norm in the ML scene. Giant globs of unreadable and in no way understandable code -- unless of course you already understand everything.

I think this is because the "product" so to speak is often the papers themselves, not the code, but I'm not sure.

bredren · on Nov 5, 2019

Could these functions just be implementations of math with matching variable names?

buboard · on Nov 5, 2019

because a lot of it is meant to correspond to math equations so variables names like w, u, v , b ,g match the equations in the papers ? I actually think it's pretty readable, as long as you know what it i supposed to implement (i don't; but i imagine they are implementing a complex graph), and short names help figure out where things go in and out in one screenfull.

Complex graphs are literally a spaggeti of arrows, and this format actually is pretty readable (even though in pytorch it would be more readable). I guess they leave comments out because it's not really possible to understand each line on its own (unless it's an implementation detail); you have to read the paper to know what s going on

fermenflo · on Nov 5, 2019

I agree, a lot of the code could be improved. But some of what you mentioned is fairly standard. Like "Gaussian Error Linear Units being GELU, w/b for weights/biases, etc...

lhomdee · on Nov 5, 2019

Not sure how standard that is ...

steve_musk · on Nov 5, 2019

It’s very standard ML abbreviations.

m463 · on Nov 5, 2019

Mirrors my thoughts regarding all math textbooks and published papers.

I remember reading a famous scientist (newton maybe) published a really accessible book on a subject, which was read by lots of lay persons and opened him up to lots of unwanted public attention.

So publishing in a more inscrutable way might be a way of assuring peer-to-peer communication.

Either that, or it's a labor of love where cleaning things up would detract from the forward momentum.

JoeMayoBot · on Nov 5, 2019

Having worked with math/research folks in the past, this isn't surprising. That said, from a software engineering perspective, where a typical code review would identify this, it is immediately noticeable.