PHP code generated by GPT-2

ndpsr · on Feb 26, 2019

Still better than Wordpress core.

lmilcin · on Feb 26, 2019

I especially like how the comments have absolutely nothing in common with the code they are supposedly commenting, just like in real life.

smhg · on Feb 26, 2019

And almost-real naming like 'DbAppAndFNAAppRegistrationService'.

tyingq · on Feb 26, 2019

Looking forward to com.ai.gpt-2.autogen.java.lang.factory.thing.verb

sli · on Feb 26, 2019

Really looking forward to some iOS code, with (real) identifiers like:

outputImageProviderFromBufferWithPixelFormat:pixelsWide:pixelsHigh:baseAddress:bytesPerRow:releaseCallback:releaseContext:colorSpace:shouldColorMatch:

kCMSampleBufferConduitNotificationParameter_UpcomingOutputPTSRangeMayOverlapQueuedOutputPTSRange

ahje · on Feb 26, 2019

That put a smile on my face on a bad day. Thank you! :D

tyingq · on Feb 26, 2019

The tweets that go with this: https://twitter.com/moyix/status/1096255984866082816

cheeko1234 · on Feb 26, 2019

>Anyway, I expect that what I just did – fixing up slightly broken bot-generated code – will be the most common occupation within 10 years

Imagine debugging code all day everyday for life. Oh wait...

fffrantz · on Feb 26, 2019

"Accidentally picked up" Somehow they must have fed it with some JavaScript and some php, right ?

moyix · on Feb 27, 2019

Yeah, the dataset was 40GB of text from pages linked from Reddit, so I imagine it was quite hard to clean it to just English text. They also noted in their paper that it "accidentally" learned to translate English into French, even though they removed non-English web pages, because of examples like

"I’m not the cleverest man in the world, but like they say in French: Je ne suis pas un imbecile [I’m not a fool]."

tyingq · on Feb 26, 2019

Yes. But that's the sort of hype that seems to be desired. "OMG zombies, we accidentally put some bad samples in the training data"

tiborsaas · on Feb 26, 2019

With sloppy scraping I guess JS was picked up besides text.

PHP should be intentionally used as training material.

lugg · on Feb 26, 2019

Sloppy scraping?

You forget html and JS is perfectly valid syntax to find in a .php file.

tiborsaas · on Feb 26, 2019

Yes, of course. But JS is frontend mostly, so I can imagine it's easier to accidentally scrape some JS with text.

PHP, not so much, only if the server returns source by accident.

yorwba · on Feb 28, 2019

What if the server is GitHub? Or some random blog about PHP development? There are lots of situations where it's very intentional that PHP is contained in HTML.

userbinator · on Feb 26, 2019

Any context on what this is supposed to be...? I can vaguely read PHP, but the code does not appear to be doing anything of much substance.

At first I thought it was something to do with a second revision of GUID Partition Tables.

bitexploder · on Feb 26, 2019

It’s a text generation algorithm. It’s not meant to be real code, just look like it. This is the infamous “too dangerous to release” GPT-2 making this code.

userbinator · on Feb 26, 2019

Reminds me somewhat of https://en.wikipedia.org/wiki/Article_spinning

...and now I hope that searching for code in the future won't become polluted by such spam.

bitexploder · on Feb 26, 2019

I think everyone has written a Markov Chain generator and fed it The Bible plus a random flavor text. Right?

corobo · on Feb 26, 2019

The flavour text was always jerkcity.txt in every markov bot I've ever come across

rahimnathwani · on Feb 26, 2019

Context: https://blog.openai.com/better-language-models/

glup · on Feb 26, 2019

I have been working with a group that is trying to clone this dataset and make it publicly available (https://github.com/jcpeterson/openwebtext), and I have noticed quite a bit of code in the scraped dataset. Future releases of our dataset will be pre-filtered with another LSTM language model that will filter sentences by their probability under more conversational / literary datasets.

pamparosendo · on Feb 26, 2019

It will be interesting when AI finds out there's no need for her to generate human-readeable code.

aboutruby · on Feb 26, 2019

With some automated formatting: https://pastebin.com/7F2Leqy1

Navarr · on Feb 26, 2019

What on earth did you use to format that.

This will be far more readable to PHP devs; https://gist.github.com/navarr/a20284c0533ea6f6ebc0946d62c96...

munk-a · on Feb 26, 2019

Until GPT-2 can participate in a formatting holy-war all our jobs are secure. It's time to get worried when it starts posting opinionated comments on the internet about "how spaces make my code look the same on everyone's machine" that's when it'd be a good idea to invest in a bunker.

cubano · on Feb 26, 2019

Just what I didn't need to see this morning.

I am literally living in the streets, freezing my ass off and hungry, looking for any kind of programming work for the past month, and now I have to see some AI bot generating more inexpensive shit code that I am sure some manager will convince themselves might get them that final career promotion by lowering their labor costs to near zero.

WTG, geniuses, for developing AI that before you know will have all of us living in the streets and hungry...

I'll save you a spot.

leesalminen · on Feb 26, 2019

I see you’re proficient in a LAMP stack?

Gingrapp.com is looking for a developer. Our parent company owns many other companies as well and are always looking for developers.

My personal email is on my profile. Please feel free to reach out! I’ve helped find work for other devs down on their luck here on HN successfully.

openbasic · on Feb 26, 2019

Where do you live? There's plenty of jobs in my area and my company is also hiring. I would be happy to help you land something.

52-6F-62 · on Feb 26, 2019

People are downvoting your crassness, but I sympathize with your situation.

I don't know what Las Vegas is like, but there is a lot of LAMP/WordPress work here in Toronto. (Frankly, I want nothing to do with it, but there's plenty of it and they pay alright. Some also offer PT or FT remote)

---

The company I'm with is even hiring for such a role: https://twitter.com/scarbiedoll/status/1095714031023714305

---

Best of luck

iamleppert · on Feb 26, 2019

I can sympathize with your situation but I have to ask: where are your friends/family in all this? You don’t have any support system at all? I realize it’s possible but I want to understand how you ended up in your present situation.

My next comment is you are in the wrong industry if something like this scares you. You have the wrong attitude. Instead of lamenting about a new tech replacing your current skills, you should be asking yourself, how can I learn this new technology and put it to work for me?

Some people may say someone in your position has more important things to worry about and I would agree. Get yourself the first job you can find (tech related or not) and get your basic needs in order. Then invest your time in learning a tech with some staying power.

Jumping from short lived and volatile coding jobs isn’t a long term solution.

root_axis · on Feb 26, 2019

Welcome to the life of all the non-engineers out there either up to their eyeballs in debt or otherwise unable to earn a living wage. How many secretaries or data entry workers or webmasters were made obsolete because you were paid to destroy their job?

However, you don't have to worry, free open source tools and off the shelf B2B software will make your job obsolete long before AI is actually a time saver when writing code.

PaulHoule · on Feb 26, 2019

They're going to have to make PHP that compiles before we lose our jobs...

jayar95 · on Feb 26, 2019

PHP is interpreted o.o

gambler · on Feb 26, 2019

This is very fishy. You can get code like this by substituting words in identifier names for other words, but how can an algorithm trained on English dataset "learn" that keywords like 'function' and 'class' are exempt from substitution? I know most people here have unwavering faith in the magic of deep neural networks, but you'd need _a lot_ of examples to deduce this with any certainty, regardless of how you do it.

hnarn · on Feb 26, 2019

> you'd need _a lot_ of examples to deduce this with any certainty

Are you saying that "a lot" isn't almost semi-trivial to obtain, seeing how much code is available online?

gambler · on Feb 26, 2019

Why would a model trained on English texts see "a lot" of PHP code? What was the prompt used for generating this code?

IanCal · on Feb 26, 2019

It was trained on contents of links found on Reddit, wasn't it? Links to sample code or stack overflow posts could be pretty prevalent.

gambler · on Feb 26, 2019

So you're buying the idea that it looked at a bunch of code snippets embedded at various pages, managed to build a sub-model for PHP (separate from all other languages it should have encountered) and managed to generate a long, nearly syntactically correct program uninterrupted by English text?

And while it makes tons of obvious mistakes in English (which is a much more flexible and forgiving language), its PHP is somehow nearly syntactically perfect?

-

Examples from GPT-2 GitHub have a lot of code:

https://raw.githubusercontent.com/openai/gpt-2/master/gpt-2-...

To me, this doesn't seem like an argument in favor of this model "understanding" English (or C, or PHP). It seems more like an indication that it memorizes way more information than the paper implies and then does clever word substitution.

moyix · on Feb 27, 2019

Yes, I do think that it learned a model of PHP and JavaScript syntax. 40GB of text data is a lot, and PHP syntax is a lot simpler than English grammar, which it learns quite well.

See also the example in the paper of accidentally learning to translate into French even though they tried to remove French pages from the corpus.

xkapastel · on Feb 26, 2019

What is fishy? This isn't even state of the art for program synthesis. Here's another simple example trained on the Linux kernel: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

I'm not sure what point you're trying to make. Do you think a neural net is not capable of generating the code in the gist? Because it's pretty easy to do that. The harder part that we're still trying to figure out is getting that code to do something meaningful.

gambler · on Feb 26, 2019

Because there is a huge difference between generating C code by training on C code and generating PHP code by training on random internet articles.

root_axis · on Feb 26, 2019

Did you read the GPT-2 paper? Frankly the english examples therein are much more impressive than this, and this certainly seems within the realm of possibility for GPT-2 based on some of the other emergent behavior of the model (e.g. inadvertent french translation skills)

chadbennett · on Feb 26, 2019

Motivated by this post, I decided to test it out. It's impressive how powerful the software is even with the limitations. I made a simple tutorial on how to test GPT-2 out for yourself at https://medium.com/heroic-com/how-to-quickly-generate-full-a...

scrollaway · on Feb 26, 2019

Important note: The AI did not generate that exact version of the code. It was almost syntactically correct. Here's the diff:

https://gist.github.com/moyix/dda9c3180198fcb68ad64c3e6bc7af...

yorwba · on Feb 26, 2019

Also, the original in https://raw.githubusercontent.com/openai/gpt-2/master/gpt-2-... has two more lines.

The first

// web/application/handlers/add-full-no-app-directly.md (1071 bytes)

and the last

public function registerPipeHandlerInterceptor

kodablah · on Feb 26, 2019

Can someone shed more light behind this? What is the true source? Was it generated via the unreleased full model by an OpenAI employee? Or did someone generate it with the released "smaller model"? Can we, the curious public, see the model and replicate the results?

moyix · on Feb 28, 2019

I found it while browsing the "raw" samples generated by the full-sized model. You can read through them all here:

https://raw.githubusercontent.com/openai/gpt-2/master/gpt-2-...

The PHP sample is Sample 195.

beager · on Feb 26, 2019

This makes me think that something like Stack Overflow could be used to train a model that generates code to answer a question—and that software specifications that are decomposed into a series of requirements or "questions" could be fed into this model to produce code that's equivalent to a team of remote contractors.

Your model would be based on NLP/votes of the questions, NLP/votes of the answers, and separating the text from the code in both.

The fact that many markdown/code formatting tools have you select the language for syntax highlighting is useful for classifying code as well.

scrollaway · on Feb 26, 2019

Finally, StackSort (https://xkcd.com/1185/) could actually be useful in a real world application.

52-6F-62 · on Feb 26, 2019

My god—the JOBINTERVIEWQUICKSORT is just brilliant.

rbrtdrmpc- · on Feb 26, 2019

Look ma, no Laravel

gambler · on Feb 26, 2019

BTW, I would like to point you to the MIT's project Genesis as an example of what a rule-based text comprehension system could do almost a decade ago.