Hacker News new | past | comments | ask | show | jobs | submit login
PHP code generated by GPT-2 (gist.github.com)
96 points by monort on Feb 26, 2019 | hide | past | favorite | 53 comments



Still better than Wordpress core.


I especially like how the comments have absolutely nothing in common with the code they are supposedly commenting, just like in real life.


And almost-real naming like 'DbAppAndFNAAppRegistrationService'.


Looking forward to com.ai.gpt-2.autogen.java.lang.factory.thing.verb


Really looking forward to some iOS code, with (real) identifiers like:

outputImageProviderFromBufferWithPixelFormat:pixelsWide:pixelsHigh:baseAddress:bytesPerRow:releaseCallback:releaseContext:colorSpace:shouldColorMatch:

kCMSampleBufferConduitNotificationParameter_UpcomingOutputPTSRangeMayOverlapQueuedOutputPTSRange


That put a smile on my face on a bad day. Thank you! :D



>Anyway, I expect that what I just did – fixing up slightly broken bot-generated code – will be the most common occupation within 10 years

Imagine debugging code all day everyday for life. Oh wait...


"Accidentally picked up" Somehow they must have fed it with some JavaScript and some php, right ?


Yeah, the dataset was 40GB of text from pages linked from Reddit, so I imagine it was quite hard to clean it to just English text. They also noted in their paper that it "accidentally" learned to translate English into French, even though they removed non-English web pages, because of examples like

"I’m not the cleverest man in the world, but like they say in French: Je ne suis pas un imbecile [I’m not a fool]."


Yes. But that's the sort of hype that seems to be desired. "OMG zombies, we accidentally put some bad samples in the training data"


With sloppy scraping I guess JS was picked up besides text.

PHP should be intentionally used as training material.


Sloppy scraping?

You forget html and JS is perfectly valid syntax to find in a .php file.


Yes, of course. But JS is frontend mostly, so I can imagine it's easier to accidentally scrape some JS with text.

PHP, not so much, only if the server returns source by accident.


What if the server is GitHub? Or some random blog about PHP development? There are lots of situations where it's very intentional that PHP is contained in HTML.


Any context on what this is supposed to be...? I can vaguely read PHP, but the code does not appear to be doing anything of much substance.

At first I thought it was something to do with a second revision of GUID Partition Tables.


It’s a text generation algorithm. It’s not meant to be real code, just look like it. This is the infamous “too dangerous to release” GPT-2 making this code.


Reminds me somewhat of https://en.wikipedia.org/wiki/Article_spinning

...and now I hope that searching for code in the future won't become polluted by such spam.


I think everyone has written a Markov Chain generator and fed it The Bible plus a random flavor text. Right?


The flavour text was always jerkcity.txt in every markov bot I've ever come across



I have been working with a group that is trying to clone this dataset and make it publicly available (https://github.com/jcpeterson/openwebtext), and I have noticed quite a bit of code in the scraped dataset. Future releases of our dataset will be pre-filtered with another LSTM language model that will filter sentences by their probability under more conversational / literary datasets.


It will be interesting when AI finds out there's no need for her to generate human-readeable code.


With some automated formatting: https://pastebin.com/7F2Leqy1


What on earth did you use to format that.

This will be far more readable to PHP devs; https://gist.github.com/navarr/a20284c0533ea6f6ebc0946d62c96...


Until GPT-2 can participate in a formatting holy-war all our jobs are secure. It's time to get worried when it starts posting opinionated comments on the internet about "how spaces make my code look the same on everyone's machine" that's when it'd be a good idea to invest in a bunker.


Just what I didn't need to see this morning.

I am literally living in the streets, freezing my ass off and hungry, looking for any kind of programming work for the past month, and now I have to see some AI bot generating more inexpensive shit code that I am sure some manager will convince themselves might get them that final career promotion by lowering their labor costs to near zero.

WTG, geniuses, for developing AI that before you know will have all of us living in the streets and hungry...

I'll save you a spot.


I see you’re proficient in a LAMP stack?

Gingrapp.com is looking for a developer. Our parent company owns many other companies as well and are always looking for developers.

My personal email is on my profile. Please feel free to reach out! I’ve helped find work for other devs down on their luck here on HN successfully.


Where do you live? There's plenty of jobs in my area and my company is also hiring. I would be happy to help you land something.


People are downvoting your crassness, but I sympathize with your situation.

I don't know what Las Vegas is like, but there is a lot of LAMP/WordPress work here in Toronto. (Frankly, I want nothing to do with it, but there's plenty of it and they pay alright. Some also offer PT or FT remote)

---

The company I'm with is even hiring for such a role: https://twitter.com/scarbiedoll/status/1095714031023714305

---

Best of luck


I can sympathize with your situation but I have to ask: where are your friends/family in all this? You don’t have any support system at all? I realize it’s possible but I want to understand how you ended up in your present situation.

My next comment is you are in the wrong industry if something like this scares you. You have the wrong attitude. Instead of lamenting about a new tech replacing your current skills, you should be asking yourself, how can I learn this new technology and put it to work for me?

Some people may say someone in your position has more important things to worry about and I would agree. Get yourself the first job you can find (tech related or not) and get your basic needs in order. Then invest your time in learning a tech with some staying power.

Jumping from short lived and volatile coding jobs isn’t a long term solution.


Welcome to the life of all the non-engineers out there either up to their eyeballs in debt or otherwise unable to earn a living wage. How many secretaries or data entry workers or webmasters were made obsolete because you were paid to destroy their job?

However, you don't have to worry, free open source tools and off the shelf B2B software will make your job obsolete long before AI is actually a time saver when writing code.


They're going to have to make PHP that compiles before we lose our jobs...


PHP is interpreted o.o


This is very fishy. You can get code like this by substituting words in identifier names for other words, but how can an algorithm trained on English dataset "learn" that keywords like 'function' and 'class' are exempt from substitution? I know most people here have unwavering faith in the magic of deep neural networks, but you'd need _a lot_ of examples to deduce this with any certainty, regardless of how you do it.


> you'd need _a lot_ of examples to deduce this with any certainty

Are you saying that "a lot" isn't almost semi-trivial to obtain, seeing how much code is available online?


Why would a model trained on English texts see "a lot" of PHP code? What was the prompt used for generating this code?


It was trained on contents of links found on Reddit, wasn't it? Links to sample code or stack overflow posts could be pretty prevalent.


So you're buying the idea that it looked at a bunch of code snippets embedded at various pages, managed to build a sub-model for PHP (separate from all other languages it should have encountered) and managed to generate a long, nearly syntactically correct program uninterrupted by English text?

And while it makes tons of obvious mistakes in English (which is a much more flexible and forgiving language), its PHP is somehow nearly syntactically perfect?

-

Examples from GPT-2 GitHub have a lot of code:

https://raw.githubusercontent.com/openai/gpt-2/master/gpt-2-...

To me, this doesn't seem like an argument in favor of this model "understanding" English (or C, or PHP). It seems more like an indication that it memorizes way more information than the paper implies and then does clever word substitution.


Yes, I do think that it learned a model of PHP and JavaScript syntax. 40GB of text data is a lot, and PHP syntax is a lot simpler than English grammar, which it learns quite well.

See also the example in the paper of accidentally learning to translate into French even though they tried to remove French pages from the corpus.


What is fishy? This isn't even state of the art for program synthesis. Here's another simple example trained on the Linux kernel: http://karpathy.github.io/2015/05/21/rnn-effectiveness/

I'm not sure what point you're trying to make. Do you think a neural net is not capable of generating the code in the gist? Because it's pretty easy to do that. The harder part that we're still trying to figure out is getting that code to do something meaningful.


Because there is a huge difference between generating C code by training on C code and generating PHP code by training on random internet articles.


Did you read the GPT-2 paper? Frankly the english examples therein are much more impressive than this, and this certainly seems within the realm of possibility for GPT-2 based on some of the other emergent behavior of the model (e.g. inadvertent french translation skills)


Motivated by this post, I decided to test it out. It's impressive how powerful the software is even with the limitations. I made a simple tutorial on how to test GPT-2 out for yourself at https://medium.com/heroic-com/how-to-quickly-generate-full-a...


Important note: The AI did not generate that exact version of the code. It was almost syntactically correct. Here's the diff:

https://gist.github.com/moyix/dda9c3180198fcb68ad64c3e6bc7af...


Also, the original in https://raw.githubusercontent.com/openai/gpt-2/master/gpt-2-... has two more lines.

The first

// web/application/handlers/add-full-no-app-directly.md (1071 bytes)

and the last

public function registerPipeHandlerInterceptor


Can someone shed more light behind this? What is the true source? Was it generated via the unreleased full model by an OpenAI employee? Or did someone generate it with the released "smaller model"? Can we, the curious public, see the model and replicate the results?


I found it while browsing the "raw" samples generated by the full-sized model. You can read through them all here:

https://raw.githubusercontent.com/openai/gpt-2/master/gpt-2-...

The PHP sample is Sample 195.


This makes me think that something like Stack Overflow could be used to train a model that generates code to answer a question—and that software specifications that are decomposed into a series of requirements or "questions" could be fed into this model to produce code that's equivalent to a team of remote contractors.

Your model would be based on NLP/votes of the questions, NLP/votes of the answers, and separating the text from the code in both.

The fact that many markdown/code formatting tools have you select the language for syntax highlighting is useful for classifying code as well.


Finally, StackSort (https://xkcd.com/1185/) could actually be useful in a real world application.


My god—the JOBINTERVIEWQUICKSORT is just brilliant.


Look ma, no Laravel


BTW, I would like to point you to the MIT's project Genesis as an example of what a rule-based text comprehension system could do almost a decade ago.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: