Programming with natural language is going to work (2010)

armchairhacker · on May 24, 2023

Remember the phrase "developers spend 10% of the time writing code and 90% of the time debugging?"

Even if it isn't 90%, most developers like writing code more than debugging, so most would prefer to automate the latter.

AI translating natural language into code probably isn't as important as AI generating bug-free code and/or debugging its code. Even GPT-4 struggles with this: sometimes you point out a bug and it works, but sometimes it just can't find the issue and starts hallucinating even more as it gets confused.

Everyone's trying to train GPT models to write code, but maybe we should be training them how to use a debugger. Though its a lot harder to map text generation to debugging...

Also, it's a bit ironic how one way to prevent bugs is using stronger type systems and formal methods. But, AI is particularly bad at formal methods. But maybe with a system like MCTS combined with much faster generation...

blibble · on May 24, 2023

I have worked on one 30 year old system that was written by people that knew what they were doing

it was then maintained by at least 20 different people that had no idea what they were doing

very little of the altered logic makes any sense and 95% of the time consists of trying to reverse engineer what they were trying to do

then fix it up without breaking other parts of the codebase as many of the logic bugs interact with each other and end up cancelling out

with the garbage generated by LLMs... I suspect all development is going to turn into this

pydry · on May 24, 2023

You mean one day rich executives are going to be in desperate need of and highly reliant upon skilled developers who can understand and maintain their mission critical systems?

Sounds awful. I was hoping for the future where we would all be out of a job.

thfuran · on May 24, 2023

I was hoping for the future where I could get paid to work on something slightly less unpleasant than my most byzantine nightmare.

blibble · on May 24, 2023

those changes were probably the most stressful thing I've ever done, with several billion dollars a day reliant on them, with absolutely no rollback possible

we had two new starters and one long term employee quit after being asked to work on that codebase

(and ended up having to do it myself at extremely short notice... worked though)

bryanrasmussen · on May 24, 2023

check the wages for Cobol developers in your area recently?

on edit: https://www.zippia.com/cobol-programmer-jobs/salary/#

pydry · on May 24, 2023

Those aren't FAANG wages but they aren't awful wages for most of the country either.

It also used to pay better, although at this point I'm pretty sure operating COBOL is just a sign that you're unwilling or unable to invest in upgrading your <1990s IT. The remaining companies operating COBOL systems likely aren't even able to pay top dollar.

bryanrasmussen · on May 24, 2023

Ok, well if that was the case then the original description of rich people paying to maintain out of date things would probably fall into the same scenario - although my experience is whenever I have checked COBOL over the years the wages have never been particularly impressive.

on edit: amusingly, there is a relevant front page discussion right now https://news.ycombinator.com/item?id=36055986

pydry · on May 24, 2023

>Ok, well if that was the case then the original description of rich people paying to maintain out of date things would probably fall into the same scenario

Not really. We're comparing a programming language that has been legacy since the 1990s and has been gradually phased out over a period of 30 years and a process that accelerates the creation of legacy code in any language.

COBOL was always going to die eventually. I think, if anything, the fact you can still net over 100k writing it in 2023 kinda proves my point.

Guthur · on May 24, 2023

My thought exactly, and compounded that we will now have far less practiced coders as they all generate the banal they should be learning with.

nonethewiser · on May 24, 2023

Or maybe it can help you understand the code

blibble · on May 24, 2023

9 times out of 10 there is nothing to understand

the logic is simply broken

the_other · on May 24, 2023

> Everyone's trying to train GPT models to write code, but maybe we should be training them how to use a debugger. Though its a lot harder to map text generation to debugging...

AI lay-person, web developer here.

This hits at something that's been nagging at me for a few weeks. LLMs get labelled "intelligent", but they're "just" spitting out word/symbol patterns. They don't comprehend the meaning of the patterns (as best I have come to understand it from reading news).

Deeper than this, "intelligence" isn't rooted in words. It's rooted in experience, logic and their interaction. The words are a side-effect; they're a way to label the experiences.

We're not training LLMs with experiences, only a reference to experience had by others, which may or may not map to the experience of the interlocutor. The LLM has no experience and limited/no logic to wrestle those experiences into wisdom.

Debugging is a "pursuit of wisdom" activity. Statistics, which underpin LLMs often identify truths (we wouldn't use them in science if they didn't), but they also hide truths. You need to clean data, take unusual slices of it, recombine it, and otherwise process it in order to uncover the true truths. I'm not sure LLMs can do that yet (if at all), and strongly doubt they can without _experience_ (sensory input, intention, failure etc).

Version467 · on May 24, 2023

> They don't comprehend the meaning of the patterns (as best I have come to understand it from reading news).

It is not obvious to me that this is true.

The argument against this is that a model has to develop an abstract understanding of the world in order to get better at predicting the next token.

You'll find different opinions as to wether this happens with todays llm's, but it's definitely not a clear cut issue.

ankurdhama · on May 24, 2023

> The argument against this is that a model has to develop an abstract understanding of the world in order to get better at predicting the next token.

I think they only develop abstract understanding about the relationships of the words/group of words.

Ygg2 · on May 24, 2023

> LLMs get labelled "intelligent", but they're "just" spitting out word/symbol patterns.

I think the word you are looking is knowledgeable. They have huge knowledge repository and can remix it in whatever way they please, but I don't think if you gave them a novel problem (outside their realm of knowledge) they could solve it.

gaogao · on May 24, 2023

The unicorn example from MSFT's Sparks of AGI paper, https://arxiv.org/abs/2303.12712, probably counts as a novel problem.

As another example, ImageBlind, https://ai.facebook.com/blog/imagebind-six-modalities-bindin..., learned modality translations explicitly not in its training set such as text -> sound.

mattlutze · on May 24, 2023

Most software devs would prefer a debugger, but most non-software devs would probably prefer a virtual developer.

If a technical product manager was able to generate the code needed for a feature using some combination of LLMs and CoPilots—even for proof of concepts, for example—the business would need to hire fewer developers and saves significant operating costs.

Extending it to your everyday person. Aunt Jane bought a new washing machine and wants to add it into her smart house. Either she's gotta call the niece to hook it all up, write the integration to Home Assistant, write automations for her iOS devices, etc, or, she asks her favorite LLM / Co-Pilot for help and it step-by-step walks her through it directly, with code, with individual troubleshooting.

There's business in both use cases, but the total addressable market is quite a bit bigger for one of them.

noirscape · on May 24, 2023

Aunt Jane likely didn't even want or might not even have a smart house in the first place. That's perhaps the thing you're missing here - your average person doesn't give a shit if it's "smart", they just want things to work and the only difference for them between "it's smart" and "it's dumb" is if the remote they're fiddling with is their phone, an on-device button or a regular remote (as well as how likely it is that they get spied on externally).

Your average person's ideal amount of exposure to code is zero (as in, most would prefer to not even think about code). If the device isn't automated from the getgo in an easy way, then Aunt Jane has no smart device and no matter how easy it is to write the connection, she's not going to have it unless her niece shows up.

You're severely overestimating the desire for an everyday person to automate their home. Most people are perfectly fine manually flicking light switches and unless you live on a very specific schedule, often automation ends up falling short of practical reality, so even the majority of people who "automate" just end up moving the light switch to their phone.

ohwellhere · on May 24, 2023

I think there are more roles out there that can benefit than people realize.

I know a cartographer who makes maps in some specialized software and then also has to occasionally interact with some official federally managed database. This person is not a software engineer and the database interaction has been historically painful in the office. This year ChatGPT just helped whip up some Python scripts.

Two hypotheses: 1) the “in house tooling” market is huge and will make great use of this tech; 2) that will contribute to pushing the floor lower and lower where more and more roles and people will take advantage of it.

auggierose · on May 24, 2023

How about the washing machine just plugs itself seamlessly into her smart house, without Aunt Jane having to do anything than a quick identity check on her phone? Probably needs to be an Apple washing machine, though.

crabmusket · on May 24, 2023

You mean some kind of interoperability standard?? We'll invent AGI before that becomes a reality!

brokenkebaby · on May 24, 2023

...with expensive quickly deteriorating cable

moffkalast · on May 24, 2023

Those are sold extra. Also it only has an Apple Socket™ by default, so you'll need a dongle to have it fit common outlets.

discreteevent · on May 24, 2023

>it step-by-step walks her through it directly, with code

This is exactly where it completely breaks down. The AI might as well show her some hieroglyphics.

GuB-42 · on May 26, 2023

I think the idea of replacing devs with LLMs is missing the point of what a developer does.

Yes, today, developers are people who know programming languages, that's something non-developers don't know, so teach an AI do code and you can spare a developer, right?

But no, what developers really do is tell the computer precisely what it should do, a programming language simply the best way we have found to do it. Using GUIs, natural languages, etc... may fell less obscure to the layman, but the problem doesn't go away, in fact, it is often made worse because of the fuzziness of natural languages. There is a reason developer have been using programming languages for so long despite countless attempts to find alternatives, they simply haven't been matched for the task of telling a computer what to do precisely. And if something better is found, then developers will use that, and besides some syntactic technicalities, the job will be essentially the same.

cookieperson · on May 24, 2023

So you postulate that the installation of every home appliance will have instructions for how to do so with 100% accuracy in Chatgpt? I don't know about you, but having read many installation instructions... I wouldn't trust most humans to get them right, and definitely not an LLM. Half of them refer to screws that don't exist...

bjourne · on May 24, 2023

Spot the bug in the following C function for reading files:

    bool
    files_read(const char *path, char **data, size_t *size) {
        FILE *f = fopen(path, "rb");
        if (!f) {
            return false;
        }
        fseek(f, 0, SEEK_END);
        size_t n = (size_t)ftell(f);
        rewind(f);
        if (size) {
            *size = n;
        }
        if (!data) {
            goto ok;
        }
        *data = (char*)malloc(sizeof(char)*(n + 1));
        if (fread(*data, 1, n, f) != n) {
            free(*data);
            return false;
        }
        *data[n] = '\0';
     ok:
        fclose(f);
        return true;
    }

ChatGPT solves this task instantly. Many, even experienced C developers, would have trouble finding the bug. And the lack of error checking on malloc is not a bug.

golergka · on May 24, 2023

Just for reference, copied this snippet in GPT-4, it generated this answer in 1,5 seconds:

> The issue in this code is related to memory access. The line data[n] = '\0'; should be (data)[n] = '\0';.

And then went on to generate corrected code and this explanation:

> The original expression, data[n], first applies the array subscript operation, which is equivalent to (data + n), and then tries to dereference the resulting pointer, which is not the intention here. This will likely lead to a segmentation fault because data is a pointer to pointer, and data[n] will try to access memory that has not been allocated.

> The corrected expression, (*data)[n], first dereferences data to get the allocated char array, and then applies the array subscript operation to set the null terminator at the correct location.

ChatGTP · on May 24, 2023

Just wondering if you think this type of bug hasn't been covered at least 50x in many online forums, I'd be very surprised?

I'm not trying to say this isn't impressive, but seems fairly obvious and more like something a linter / analyzer would pickup ?

almog · on May 24, 2023

Even if this isn't based on similar questions that can be found online, a bug in existing systems usually wouldn't introduce itself by framing an isolated section of code. More often than not it'd be through indirection or a side effect that covers a larger execution path.

golergka · on May 24, 2023

To an experienced C developer who works with the language constantly — probably. I'm not one, so I wouldn't know. I've found GPT to be much more helpful when you're working with technologies you'r not as familiar or up to date with.

thfuran · on May 24, 2023

What does it do if you correct the bug and then ask it what the bug is?

Joker_vD · on May 24, 2023

Not going to talk about the actual bug(s), but... why is it trying to NUL-terminate the buffer full of binary ("rb") data? Also, sizeof(char)? That's 1 by definition.

Actually, no, let's talk about the bug. What's with people's obsession of eliminating local temporaries and working straight with pointer-to-result?

        char *buff = malloc(n + 1);
        if (fread(buff, 1, n, f) != n) {
            free(buff);
            fclose(f);
            return false;
        }
        buff[n] = '\0';
        *data = buff;
      ok:
        fclose(f);
        return true;

This way, data won't contain a freed pointer when reading fails.

bjourne · on May 24, 2023

"rb" because you don't want Windows to open the file in translated mode and munging its contents. Null-termination so that the same code can deal with both binary and text files. And I prefer all mallocs to follow the pattern malloc(sizeof(tp)*n) because it's less hassle than thinking about what types' sizes may vary. This rabbit hole is endless and we haven't even started covering my use of fseek/ftell/rewind!

Joker_vD · on May 24, 2023

Yeah, "fread(buff, 1, result_from_ftell, f)" can have some pretty strange effects on the buff's content if CHAR_BIT != 8.

pfdietz · on May 24, 2023

You also forgot to close f when returning false.

raincole · on May 24, 2023

> Also, it's a bit ironic how one way to prevent bugs is using stronger type systems and formal methods. But, AI is particularly bad at formal methods.

It kinda works tho. In my anecdote Copilot works much better with C# than Python simply because I can write the signature of a function and that it generates the content.

(I know Python has type annotations too, but Copilot just isn't as smart as with C#. Perhaps because there isn't enough training data in typed Python?)

moffkalast · on May 24, 2023

Hmm, so the endgame would be the most strongly defined and strict language so LLMs can immediately see and fix mistakes, but automate the tedium of writing it by having them as an intermediary?

raincole · on May 24, 2023

The endgame is either the elimination of jobs or the elimination of human beings.

But yes, what you described matches my experience. I'd say Copilot benefits those "tedious but relatively strict" languages the most.

Turskarama · on May 24, 2023

Training an AI to write unit tests (and refactor code to be more testable) would be a real game changer.

riffraff · on May 24, 2023

Copilot does a decent job with "dumb" unit tests, e.g. "when the condition X is Y then Z".

But it obviously doesn't have knowledge of your domain, and I don't think any LLM could do a perfect job without that.

protortyp · on May 24, 2023

I already use gpt4 for this. Works quite well if you show it a couple of reference unit tests.

toolslive · on May 24, 2023

you can quite easily use a GPT to generate test examples. For example, "generate me 1000 customer support mails about ..."

byby · on May 24, 2023

I don't think you get it. The game has already changed. And nobody trained the AI to write unit tests or refactor code. It could do it as emergent behavior.

It happened so fast that so many people are in denial and many people aren't even asking the obvious questions. Simply ASK chatGPT to refactor your code to be more unit testable and IT Can do it.

Of course it clearly has your typical LLM problems but it is roughly 50% there. The game has changed. If AI gets to 100%, not only will the game be changed, but humans will no longer be part of the game.

See what I asked chatGPT:

   ME:
   def (x: int):
      for i in range(x):
         print(i)

   Can you refactor the code so it can be more unit testable?


   CHATGPT:
   Certainly! To make the code more unit testable, we can refactor it into a function that returns a list of integers instead of directly printing them. Here's the refactored code:

   python

   def generate_number_list(x: int) -> List[int]:
       number_list = []
       for i in range(x):
           number_list.append(i)
       return number_list

   In this refactored code, we've created a function called generate_number_list that takes an integer x as input. Instead of printing each number, we use a number_list variable to store the generated numbers by appending them in the loop. Finally, we return the number_list.

   By returning the list instead of printing the numbers directly, we make the code more testable because we can now write unit tests that assert the correctness of the generated list.

If boggles my mind that I see threads of people saying "what if we train LLMs to do this or to do that?" or whatever. Guys, just ask the LLM to do what you want in plain English. It can do it.

barrkel · on May 24, 2023

This is a fine, absolutely trivial, example. But LLMs are simply not all that.

IME GPT-4 can't write a bug-free 10 line shell script. It's particularly poor at inferring unstated requirements - or the need to elicit the same.

There's a general problem with LLMs: they're too eager to please. It shows up as confirmation bias. Embed a perspective in your prompt, and LLMs continue in the same vein.

You can, with careful prompting, try to provoke and prod the text generation into a more correct shape, but often it feels to me more like a game than productivity. I have to know the answer already to know how to ask the right questions and make the right corrections. So it feels like I'm supervising a child, and that I should be amazed it can do anything at all. And it is amazing; but for productivity outside tightly constrained environments (e.g. converting freeform dialogue into filling out a bureaucratic form - I think this is a close to ideal use case), I struggle to see it scaling up much, from what I've seen so far.

For creativity - e.g. making up a story for a child - it's not bad. One of my favourite use cases, after discovering how bad it is at writing code.

byby · on May 24, 2023

If you read my post I said 50 percent of the way there.

So that means a 10 line bash script at best would have 5 lines of bugs.

But the AI is actually better then that. You've just increased the bar.

mjburgess · on May 24, 2023

That isn't the game.

The game is designing software to requirements. It's writing literature for a new era. It's creating X for A audience with N vauge unspecified needs -- where X is a complex product made of many parts, involving many people, with shifting and changing problems/solutions/requirements.

The game was never writing the stack overflow answer -- that was already written.

moffkalast · on May 24, 2023

So? Those requirements can be specified, holes inferred, and probably stuck to much more closely by a machine than man. If history's shown anything it's that if something takes a lot of mental effort for people it's probably an easy target for automation. The best developer is the one that doesn't get depressed when the requirements change for the 15th time in a month and just rewrites everything again at 2000x the speed of a human dev while costing basically nothing in comparison.

People say, "oh but clients will have to get good at listing specs, that'll never happen". Like bruh the clients will obviously be using LLMs to make the specs too. Eventually the whole B2B workflow will just be LLMs talking to each other or something of the sort.

byby · on May 24, 2023

>The game was never writing the stack overflow answer -- that was already written.

The problem is this was never a stackoverflow question and there was never an answer for it.

Try finding it. The LLM is already playing the game because it came up with that answer which is Fully Correct, Out of Thin Air.

Look, clearly the LLM can't play the game as well as a trained adept human, but it's definitely playing the game.

>The game is designing software to requirements. It's writing literature for a new era. It's creating X for A audience with N vauge unspecified needs -- where X is a complex product made of many parts, involving many people, with shifting and changing problems/solutions/requirements.

It can do all of this. It can talk like you and parrot exactly what your saying and also go into more detail and re-frame your words more eloquently.

What you're not getting is that all the things you mentioned the LLM can do in actuality to varying degrees to the point where it is in the "game." and at times it does better than us. Likely, you haven't even tried asking it yet.

mjburgess · on May 24, 2023

> Fully Correct, Out of Thin Air

I think if you're an expert in an area, this effect is easier to see through. You know where the github repo is, where the library example is, which ebooks there area -- etc. and you're mostly at-ease not using them and just writing the solution yourself.

These systems are not "fully correct" and not "out of thin area". They are trained on everything ever digitised, including the entire internet. They, in effect, find similar historical cases to your query and merge them. In many cases, for specific enough queries, the text is verbatim from an original source.

This is less revolutionary than the spreadsheet; it's less than google search. It's a speed boost to what was always the most wrote element to what we do. Yes, that often took us the longest -- and so some might be afraid that's what labour is -- but it isnt.

We never "added value" to products via what may be automated. Value is always a matter of the desire of the buyer of the products of our labour (vs. the supply) -- and making those products for those buyers was always what they wanted.

This will be clear to everyone pretty quickly, as with all tech, it's "magic" on the first encounter -- until the limitations are exposed.

I actually work in an area where what took 3mo last year, I can now do in maybe 3 days due to ChatGPT. But when it comes to providing my customers with that content, the value was always in how I provided it and what it did for them.

I think this makes my skills more valuable, not less. Since the quality of products will be even more stratified by experts who can quickly assemble what the customer needs from non-experts who have to fight through AI dialogue to get something generic.

mchaver · on May 24, 2023

I agree. LLMs are very impressive, but it isn't helpful to think of them of magic. LLMs are a great tool to explore and remix the body of human knowledge on the internet (limited to what it has been trained on).

The user needs to keep in mind that it can give plenty of false information. To make good use of it, the user needs to be able to verify if the returned information is useful, makes sense, compare with first hand sources, etc. In the hands of expert that is really powerful. In the hands of a layman (on the subject in question), they can generate a lot of crap and misunderstand what it is saying. It is similar to the idea that Democracy can be a great tool, but it needs an educated and participatory populous or it may generate a lot of headaches.

byby · on May 24, 2023

> I agree. LLMs are very impressive, but it isn't helpful to think of them of magic. LLMs are a great tool to explore and remix the body of human knowledge on the internet (limited to what it has been trained on).

Of course you shouldn't think of it as magic. But, the experts self admit they don't fully understand how LLMs can produce such output. It's definitely emergent behavior. We've built something we don't understand, and although it's not magic, it's one of the closest things to it that can exist. Think about it. What is the closest thing in reality to magic? Literally, building something we can't understand is it.

It's one thing to think of something as magic, it's another thing to try to simplify a highly complex concept into a box. When elon musk got his rockets to space why were people so floored by decades old technology that he simply made cheaper?

But when someone makes AI that can literally do almost anything you ask it to everyone just suddenly says it's a simple stochastic parrot that can't do much?

I think it's obvious. It's because a rocket can't replace your job or your identity. If part of your skillset and identity is "master programmer" and suddenly there's a machine that can do better than you, the easiest thing to stop that machine is to first deny reality.

mjburgess · on May 24, 2023

> the experts self admit they don't fully understand how LLMs can produce such output

Well I take myself to be an expert in this area, and I think it's fairly obvious how they work. Many of these so-called "Experts" are sitting on the boards of commercial companies with vested interests in presenting this technology as revolutionary. Indeed, much of what has been said recently in the media is little more than political and economic power plays disguised as philosophical musings.

A statistical AI system is a function `answer = f(question; weights)`. The `answer` obtains apparent "emergent" properties such as "suitability for basic reasoning tasks" when used by human operators.

But the function does not actually have those properties. It's a trick -- the weights are summaries of unimaginable number of similar cases, and the function is little more than "sample from those cases and merge".

Properties of the output of this function obtain trivially in the way that all statistical functions generate increasingly useful output: by having increasingly relevant weights.

If you model linear data with just y = ax then as soon as you shift to "y = ax + b" you'll see the "emergent property" that the output is now sensitive to a background bias, b.

Emergence is an ontological phenomenon concerning how `f` would be reaslised by a physical system. In this case any physical system implementing `f` shows no such emergence.

Rather the output of `f` has a "shift in utility" as the properties of the data its training on, as summarised by the weights, "shifts in utilty".

In other words, if you train a statistical system on everything ever written by billions of people over decades, then you will in fact see "domains of applicability" increases, just as much as when you shift from a y=ax model to a y=ax+b.

To make this as simple as I can: statistical AI is just a funnel. ChatGPT is a slightly better funnel, but moreso, it's had the ocean pass through it.

Much of its apparent properties are illusary, and much of the press around it puts in cases where it appears to work and claims "look it works!". This is pseudoscience -- if you want to test a hypothesis of ChatGPT, find all the cases where it doesnt work -- and you will find that in the cases where it does there was some "statistical shortcut" taken

FeepingCreature · on May 24, 2023

I think this is a motte-bailey, "true and trivial vs incredible and false" type of thing. Given a sufficiently flexible interpretation of "sample from multiple cases and merge", humans do the same thing. Given a very literal interpretation, this is obviously not what networks do - aside one paper to the contrary that relied on a very tortured interpretation of "linear", neural networks specifically do not output a linear combination of input samples.

And frankly, any interaction with even GPT 3.5 should demonstrate this. It's not hard to make the network produce output that was never in the training set at all, in any form. Even just the fact that its skills generalize across languages should already disprove this claim.

mchaver · on May 24, 2023

> It's not hard to make the network produce output that was never in the training set at all, in any form.

Honest request because I am a bit skeptical, can you give an example of something it is not trained in any form and can give output for? And can it output something meaningful?

Because I have run a few experiments on ChatGPT for two spoken languages with standard written forms but without much of a presence on the internet and it just makes stuff up.

FeepingCreature · on May 24, 2023

Well, it depends on the standard of abstraction that you accept. I don't think that ChatGPT has (or we've seen evidence of) any skills that weren't represented in its training set. But you can just invent an operation. For instance, something like, "ChatGPT: write code that takes a string that is even length and inverts the order of every second character." Actually, let me go try that...

And here we go! https://poe.com/s/UJxaAK9aVN8G7DLUko87 Note that it took me a long time, because GPT 3.5 really really wanted to misunderstand what I was saying; there is a strong bias to default to its training samples, especially if it's a common idea. But eventually, with only moderate pushing, its code did work.

What's interesting to me here is that after I threw the whole "step by step" shebang at it, it got code that was almost right. Surprisingly often, GPT will end up with code that's clever in methodology, but wrong in a very pedestrian way. IMO this means there has to be something wrong with the way we're training these networks.

edit: https://poe.com/s/gZW5ZGgiomWzabKJCUcA I gave it a more complete prompt because I only have one completion per day, but GPT-4 got it in one shot.

edit: https://poe.com/s/2lS8rjbGqHrzSkpEvLzr GPT 3.5 flubbed it given the same prompt.

cookieperson · on May 24, 2023

Well you tried. You really did. But there are already people trying to form religions around LLMs. Some people can't be reasoned with.

pmoriarty · on May 24, 2023

Are you speaking figuratively, or do you know of any specific instances of people forming actual religions around them? I'd be very interested in the latter.

cookieperson · on May 24, 2023

I've seen people posting about it on a few message boards. Most of them sound like they e lost their minds or are under the influence being completely honest. I could try to dig up posts if you want but it's more sad than interesting.

pmoriarty · on May 24, 2023

Well, I am interested if it's some sort of organized religion, and not mere posturing/speculation on forums.

cookieperson · on May 24, 2023

I have not seen organized religions around AI yet. But I have seen people writing some pretty wild ravings about how their god is an AI and how chatgpt connects too it or something. There's also people dating LLMs. Some guy in Belgium commit suicide because his ai gf told him too leaving his wife and kids behind

byby · on May 24, 2023

Yeah those crazies are far and few in between and none of them are on this thread. Throwing out religious accusations is going too far.

cookieperson · on May 25, 2023

It'll be interesting to see how these sorts of less than anticipated sociological things emerge. Take a look at scientology, many practicers, pretty scifi beliefset, I think all we really need is another L Ron Hubbard and lots of not super crazy people could start to worship these things.

https://www.thedailybeast.com/the-radical-movement-to-worshi...

byby · on May 25, 2023

Yeah but to keep on topic you suggested that this sort of thing was happening in the thread.

I disagree, it's not.

cookieperson · on May 25, 2023

https://www.thedailybeast.com/the-radical-movement-to-worshi...

byby · on May 24, 2023

He's just talking _. Clearly nobody here on both sides are having religious fervor around ai. One side is saying we don't understand LLMs completely and the other side is saying we absolutely do understand it's all statistical parroting.

But to keep it with the religious theme... which side sounds more similar to religion? The side that claims it's absolutely impossible for LLMs to be anything more then a statistical operation or the side that claims they don't know? One side seems to be making a claim based on faith while another side is saying we don't know enough to make a claim... So which side sounds more religious?

pmoriarty · on May 24, 2023

"I take myself to be an expert in this area, and I think it's fairly obvious how they work"

We can also say we understand chemistry but we don't understand how consciousness comes out of chemistry.

You can also say that humans are "just" physical processes, but that word "just" is doing a lot of heavy lifting.

mjburgess · on May 24, 2023

I'd also say I've sufficient expertise in animal learning to reject the idea that animals have shallow interior lives comprised of compressions of historical cases.

A child touches a fireplace once -- not a thousand times. Because they are in direct causal contact with the world and their body has a whole-organism biochemical reaction to that stimulus which radically conditions their bodies in all sorts of ways

This is a world apart from statistical learning wherein P(A|A causes B) and P(A|B) are indistinguishable -- and the bridge of "big data" merely illusory

byby · on May 24, 2023

>Well I take myself to be an expert in this area, and I think it's fairly obvious how they work. Many of these so-called "Experts" are sitting on the boards of commercial companies with vested interests in presenting this technology as revolutionary. Indeed, much of what has been said recently in the media is little more than political and economic power plays disguised as philosophical musings.

Bro if you are an expert you'd already know that most of the exclamations that they don't fully understand LLMs is coming from researchers at universities. Hinton was my example on an "expert" as well and he literally quit google just so he can say his piece. You know who Hinton is right? The person who repopularized backprop.

>A statistical AI system is a function `answer = f(question; weights)`. The `answer` obtains apparent "emergent" properties such as "suitability for basic reasoning tasks" when used by human operators.

Every layman gets its a multidimensional curve fitting process. The analogy your using here to apply properties of lower dimensional and lower degree equations to things that are millions of dimensions in size on a complex curve simply doesn't apply because nobody fully understands the macro details of the curve and how that maps to the output it's producing.

The properties of a 2d circle don't map one to one to 3d let alone 500000000d.

>Much of its apparent properties are illusary, and much of the press around it puts in cases where it appears to work and claims "look it works!". This is pseudoscience -- if you want to test a hypothesis of ChatGPT, find all the cases where it doesnt work -- and you will find that in the cases where it does there was some "statistical shortcut" taken

You don't even know what science is. Most of software engineering from design patterns to language choice to architecture is not science at all. There's no hypothesis testing or any of that. An expert (aka scientist) would be clear that ML is mostly mathematical theory with a huge dose of art layered on top.

The hypothesis for the AI in this case is, and I'm parroting the real experts here,: "we don't understand what's going on." That's the hypothesis. How is that even testable? It's not so none of this is "science". ML never was a science, it's an art with some theoretical origins.

But your "hypothesis" is it's just "statistical parroting" which is also untestable. But your claim is way more ludicrous because you made a claim and you can't prove it while I made a claim that basically says "we can't make any claims because we don't understand". See the difference?

byby · on May 24, 2023

>I think if you're an expert in an area,

Experts in the area, including Hinton, the father of modern AI, self admit they don't fully understand what's going on but they think that LLMs know what they are talking about.

>These systems are not "fully correct" and not "out of thin area". They are trained on everything ever digitised, including the entire internet. They, in effect, find similar historical cases to your query and merge them. In many cases, for specific enough queries, the text is verbatim from an original source.

I never said the systems are fully correct. I said that for my specific example the answer is fully correct and out of thin air. No such question and answer pair exists on the internet. Find it and prove me wrong.

>This will be clear to everyone pretty quickly, as with all tech, it's "magic" on the first encounter -- until the limitations are exposed.

Except many experts are saying the exact opposite of what you're saying. I'm just parroting the experts..

>I actually work in an area where what took 3mo last year, I can now do in maybe 3 days due to ChatGPT. But when it comes to providing my customers with that content, the value was always in how I provided it and what it did for them.

So if they knew you were just copying and pasting their queries to chatgpt would they still care about the "how"? I doubt it.

mchaver · on May 24, 2023

I think this is one of the killer applications of LLMs, a friendly Stack Overflow where you can ask any programming question you want with out fear of being reprimanded. Of course, this capability in LLM is probably due to the terseness of Stack Overflow and the large database of code in Github.

However, in its current state users still have to know how to program in order to make good use of it. It will still give you lots of errors, but being able to get something close to your goal can save you a lot of time. Someone who does not know how to program will not be able to use these to put together a complex, useful and reliable system. It might change in the future, but these things are hard to predict.

byby · on May 24, 2023

> fear of being reprimanded.

Don't worry about this. You can get over the fear. I'm in the top 10% of stackoverflow users in terms of points and it's all because my stupidest questions from decades back gathered thousands of points from other stupid idiots like me. Who cares. Literally the line graph keeps climbing with no effort from me all from my dumbest questions. Just ask and don't worry about the criticism, you'll get a bit, but not too much.

>However, in its current state users still have to know how to program in order to make good use of it. It will still give you lots of errors, but being able to get something close to your goal can save you a lot of time. Someone who does not know how to program will not be able to use these to put together a complex, useful and reliable system. It might change in the future, but these things are hard to predict.

Of course. I think the thing I was trying to point out is the breadth of what chatgpt can do. So if you ask it to do a really in depth and detailed task it's likely to do it with flaws. That's not the point I was trying to emphasize, not the fact that it can't do any task with great depth but the fact that it can do ANY task. It has huge breadth.

So to bring it line with the direction of this thread. People were thinking about making special LLMs that refactor code to be unit testable. I mean we don't have to make special LLMs to do that because you can already ask chatgpt to do it already. That's the point.

cookieperson · on May 24, 2023

I've had several SO questions get flamed, down voted and closed. I don't think this is great advice. What I would say is read the rules, search SO for duplicates try to think of near duplicates, try to Google the answer, then post.

byby · on May 24, 2023

Probably not then. But I just post whatever I want and I'm already in the top 10 percent. And I'm not an avid user either. I just ask a bunch of questions.

I've had a few flamed and closed but that's just 1 or 2 out of I'd say around 13 or 14 questions. It's a low percentage for me.

And I absolutely assure you much of my questions are stupid af.

mchaver · on May 24, 2023

It is a frequent complaint I have seen from new users. I do think for the purpose of Stack Overflow it does make sense to weed out questions that have already been answered and remove poorly formed ones. It's just that ChatGPT for programming questions often works better than trying to look it up in Stack Overflow so now I recommend it as an additional tool. You can ask questions and refine them without bothering random people on the internet.

hutzlibu · on May 24, 2023

"The problem is this was never a stackoverflow question and there was never an answer for it."

Your example is so trivial, that there are definitely similar code examples. Maybe not word for word, but similar enough, that this is not really mindblowing "making things out of thin air" for me. It seems like a standard coding class example, so not surprising, that it also can make the unit tests.

byby · on May 24, 2023

>Maybe not word for word, but similar enough

Find one. Dated before 2021. In fact, according to the theory that it's statistical parroting there should be multiple examples of for loops printing out numbers being converted to unit testable functions because AI needs multiple examples of it to form the correct model.

Find one. And it doesn't have to be from stack overflow either. Just a question and answer data point.

ahtihn · on May 24, 2023

> Fully Correct

It's not though. It doesn't print the values anymore, so the behavior isn't the same.

Refactoring isn't allowed to change behavior.

byby · on May 24, 2023

It is. There is literally zero other way to make that function unit testable. What are you gonna compare that data with in a test if it's thrown into IO?

By definition all unit testable functions have to return data that can be asserted. You throw that data to IO it's not unit testable.

IO is testable via an integration tests. But not unit tests. Which is what my query exactly specified. I specified unit tests.

ahtihn · on May 24, 2023

That doesn't change the fact that it's not a valid refactoring. If you can't make it unit testable without changing behavior, then it should tell you that.

Replacing a function that does `print("hello world")` with a function that does `return "hello world"` isn't a valid way to make it unit testable.

byby · on May 24, 2023

Alright fine, I can concede to this. ChatGPT should not have given me the best alternative but it should have given me the exact technically correct answer. You're right.

blibble · on May 24, 2023

if it actually understood what it was doing it would tell you that that logic doesn't need a test as the python has the range(x) functionality built-in

instead it generates a load of redundant boilerplate

if I saw a developer check that in I'd think they were incompetent

gjadi · on May 24, 2023

This.

I'm not good at prompting (if I believe what others say they can do with ChatGPT), but that's one thing that bother me with this system. They will do anything you ask them to without questioning it (in the limit given by their creators).

Is it possible to set it up in a way that they will challenge you instead of blindly doing what you ask? In this particular case, is it possible to ask it to do a code review in addition to performing the task?

I've tried various time (with the v3.5) to "tune" it so that each answer will follow a specific format, with links and recommended resources, with several alternatives, etc. The goal is to have it to broad my perspectives as opposed to focus too much on what I'm asking. But it never worked for more than a couple of questions.

Are there ways to do that? What am I doing wrong?

pixl97 · on May 24, 2023

In some cases you can ask about a particular function "Is this the best way to do this" or "Is there a better way to do this"

byby · on May 24, 2023

Sort of. There's an input variable that adjusts the "creativity" of the LLM. If you adjust the variable the answers become more and more "creative" approaching the point where it can challenge you. But of course this comes at a cost.

As it stands right now, chatGPT can actually challenge you.

byby · on May 24, 2023

I simply asked it to make it unit testable and it did the task 100 percent.

I'm not sure where your side track is coming from. Who in their right mind would ever check in code that prints a range of numbers from 0 to x?

The example wasn't about writing good code or realistic code. It's about an LLM knowing and understanding what I asked it to do. It did this by literally creating a correct answer that doesn't exist. Sorry it doesn't satisfy your code quality standards but that's not part of the task is it? Why don't you ask it to make the code quality better? It can likely do it Maybe that will stop the subtle insults (please don't subtly imply I'm incompetent that's fucking rude)

Like why even get into code quality about some toy example? What's the objective? To fulfill some agenda against AI? I think that's literally a lot of what's going on in this thread.

blibble · on May 24, 2023

> Like why even get into code quality about some toy example? What's the objective?

it was your example

plus: toy examples are all it really does

byby · on May 24, 2023

Yeah. Why did you get pedantic, rude and heated about the code quality for my toy example?

It's my example sure, but it's your reaction to it that makes no sense.

plus: that's not true.

blibble · on May 24, 2023

unless you're literally an LLM, I was not pedantic, heated or rude towards you

(and it is true I'm afraid)

Marazan · on May 24, 2023

You do get that that code is garbage and if any dev tried to check that code in they would get laughed out of the room right?

It is pure pablum. It is almost the perfect example of LLMs producing fluent vapid bilge.

byby · on May 24, 2023

The code is not garbage, it's just your highfalutin python opinion makes it so you only ever use list comprehensions or return generators.

For loops in python that return non lazy evaluated lists are fine. Python was never suppose to be an efficient language anyways, grading python based off of this criteria is pointless.

It doesn't matter how snobbish you are on language syntax though. I fed it code and regardless of whether you think it's garbage it did what I asked it to do and nothing else.

Would you prefer the AI say, "this code is garbage, here's not only how to make it unit testable but how to improve your garbage code." Actually we can make the output more unpredictable as LLMs do have a non deterministic seed that can increase the creativity of the answer.

Marazan · on May 24, 2023

It has wrapped range() with useless code. It has added no functionality, it has not improved testability in any way.

.

Please, take the code it has produced and integrate it into the original function. All it does is replace the range call. That's it. It has absolutely and totally failed at the given task whilst outputting plausible garbage about why it has succeeded.

moomoo3000 · on May 24, 2023

It removed the print and returns a list instead.

Marazan · on May 24, 2023

Which is useless because _it has changed the semantics of the function_

moomoo3000 · on May 24, 2023

It was a bad function

Marazan · on May 24, 2023

And yet I was able to make it unit testable without changing it's semantics.

byby · on May 24, 2023

You have to change the semantics of the function to make it unit testable. Literally tell me how else can you test that function with a unit test?

By definition a unit test can only test functions that return data. So there's no other option here.

Marazan · on May 24, 2023

Let me tell you your mind is going to be blown once you learn about Monads.

A mutable object is functionally identical to a return value if you control the initial state and lifetime of the object. Like you can do in a unit test.

And as I demonstrated in my other comment I 100% retained the semantic structure of the function whilst making it 100% unit testable.

byby · on May 24, 2023

I think you don't understand what unit testability means. It means removing IO and side effects from your code.

How the hell do I test a print function? I take the print function and match it with what? It has no output so how can I test it printed the correct thing? I can't.

I can test a list. I just match it with another list. Making your code unit testable is about segregating IO from logic. Write pure logic where all functions have inputs and outputs and those things can be tested. Your io prints should be small because all functions that do io cannot be fully tested.

IO is pollution. Any output to IO is the program exiting the logical mathematical universe of the program and that output can be verified only by an external entity. Either your eyes for stdout or another process or files or a bunch of other ways.

Unit tests are about internal local tests that touch local functionality and logic. If you want something unit testable it needs a local output and an input and it shouldn't rely on io in it's data path.

I think your complaint here is an example of chatGPT superiority. It understood something you didn't. Well now you know.

Removing the print function from the logic and returning the data is 100 percent the correct move. Do you understand?

Marazan · on May 24, 2023

blink

Of course you can make the function with a print statement more unit testable without completely changing it's semantics!

You pass in an outputstream and use that as the target for print.

Then your unit test can create its own stream and test the content of the stream whilst production code can pass in standard out.

That way you don't completely change the semantic meaning of the code.

And once again that GPT function is useless. It is identical to list(range()) and it doesn't do what the first function does. Anyone can make anything more unit testable if it doesn't have to do the same thing.

byby · on May 24, 2023

Bro, dependency injection and mocking is the same thing as segregating your function from IO. Your replacing io calls to stdout with io calls to something else. But that doesn't make your code unit testable.

The function is still touching io. You gonna test it with another function that touches io? That defeats the point of the definition of unit testability.

> and doesn't do what the first function does.

Are you serious? You mock your output streams with hacky monkey patching your function ALSO stops doing what it originally does. It's essentially black magic globals that mutate your program... very bad practice.

Chatgpt here just didn't write the obvious io component of the code because it would be freaking pedantic. The full code would include a function that prints lists composed with a function that produces lists. The composition allows part of the program to be testable while leaving the io part of it not testable. For the original program NONE of it was testable.

Your Monkey patching here would be replaced by different io functions. You want to change the output stream? then you change the IO function. Compose the list producer with another IO function. Play type Tetris and you can recompose your list producing function with all kinds of modular io. The point it you separated the core logic away from IO thereby making it more modular and more testable.

None of the io functions are testable via unit tests, that is the point. That is the definition of the most basic form of testing... Unit tests.

You literally HAVE to change your code in order to make it unit testable. If your code is throwing shit to io and retrieving values from io then none of your code is unit testable. You're at the integration test level and at this level things become hacky and more complicated. Your tests not have external dependencies like state, the operating system and you have to run hacks like your monkey patch.

Where ever you work or whatever you've been doing if you haven't been doing what I described then you (and your work buddies) haven't been testing your code via unit tests.

That's fine, whatever works bro. But chatGPT knows the common parlance for testing and unit testing, and it did exactly the correct thing.

Your interpretation of what testing is the thing that is strange and off here.

Marazan · on May 24, 2023

I'm sorry, I clearly haven't explained myself well as otherwise you would not have wasted a huge amount of text tying yourself in knots based clearly on a mistaken apprehension of what I was saying.

For clarity I reproduce the original function you gave and then I present what the change I am suggesting is

  def cool_function(x):
    for i in range(x):
      print(i)

My change

  def cool_function(x, output_stream=sys.stdout):
    for i in range(x):
      print(i, file=output_stream)

Does it now become clear what I am suggesting? My new function can be used as a 1-for-1 replacement for the old function, no code of the system needs changed as the default value provided to the new variable ensures semantically identical operation without changing any further code. Yet it is now unit testable

But now we can write a unit test like

  def test_output():
    output = io.StringIO()  
    cool_function(1, output)   
    contents = output.getvalue()
    assert contents=="1\n"

So I've made the code unit testable, kept semantics completely identical and not had to worrty about any weird IO concerns that you have. No monkey patching, no weird file IO, no bizarelly re-implemnting list(range(x)).

byby · on May 24, 2023

> I'm sorry, I clearly haven't explained myself well as otherwise you would not have wasted a huge amount of text tying yourself in knots based clearly on a mistaken apprehension of what I was saying.

No need to apologize. This is a discussion. No one did anything wrong.

>For clarity I reproduce the original function you gave and then I present what the change I am suggesting is

This is called dependency injection and it's a valid way of segregating IO away from pure logic. Although this pattern is popular among old school OOP programmers it's getting out of vogue due to the complexity of it all. You used a python trick here of default values, but typically dependency injection changes the function signature and ups the complexity of the code by a lot. Let me show you the full output of the code that chatgpt was implying:

   #unit testable code (without using dependency injection tricks)

   def cool_function(x: int) -> None:
       IO_function(logic_function(x))

   def logic_function(x: int) -> List[int]:  
       return [i for i in range(x)]

   def IO_function(x: Any) -> None:
       print(x)
       
   def test_output():
       assert logic_function(4) == [i for i in range(4)]

Chatgpt only gave you logic_function, because IO_function is sort of obvious.. it's just "print" (I only wrapped print in "IO_function" to keep things clear, typically you won't define that function). But basically the full complete code would be to recompose IO with logic. You now have two components one of which is testable.

As a side note you will see it's actually an improvement to the code. It's simpler, no dependency injection, no confusing function type signature and a much simpler test case. The other thing that must be noted is the modularity.

Making tests unit testable in this way allows for your logic to be portable. What if I want to repurpose cool_function to output it's logic to another function? In your example you don't have the components to do that, it's harder for your case as you'd have to create another component for injection.

In short not only did chatGPT produce A correct answer. But it produced the better answer compared with your dependency injection. That being said your dependency injection is valid BUT you were not correct in saying that chatGPT's answer was worse or incorrect.

byby · on May 25, 2023

>You've written 3 functions instead off one.

3 functions is better. Think about it. Do people write all their stuff in one big function? No. Better to compose higher level functions with smaller ones rather then write one big monolith like you did. The more modular something is the better.

Also IO_function is there for illustration purposes. Technically it's just wrapping print with a name so you can understand the intent. In reality you just use the regular print here without a wrapper, so in actuality only two functions are defined.

>The job of ChatGPT was to make cool_function unit testable. You haven't done it.

It did. By giving it a return value. Just like you did by giving it a new input value.

>You still have cool_function using side effect generating code hitting the actual IO system.

Yeah but one component of cool_function is pure and you can unit test that. Cool function itself can never be tested because it generates no output, you test the unit components of cool function. That's the point of unit tests.

>Genuinely the worst unit test I have ever seen written, on a poor form per line basis, absolute bananas. If you don't understand why [i for i in range(4)] is bad in a unit test and [0,1,2,3] is correct then I need you to walk away from the computer.

Let's just talk about it like adults. Just tell me what exactly about it makes you think it's bad?

Most likely it's some pedantic stylistic philosophy you have? I'm thinking you only want to test literals? Perhaps you prefer [0,1,2,3]? Am I right on the money?

Logic potentially has errors so you don't put logic in your test code. Makes sense, but who cares. For trivial shit it's fine. While in this case the logic in the test is identical to the function, typically 'logic_function' represents something significantly more complex and the list comprehension so I could care less if I'm not following the strictest form of testing. The comprehension is just something akin to an alias shortcut I prefer to use over writing out a massive literal. For the toy example the test is pointless because the logic is identical but typically it's fine to use range as an alias to represent a sequence of numbers.

Someone who strictly follows these stylistic rules without seeing intent or having the ability to bend the rules is just an inflexible pedantic programmer. It's not good to boast about it either by telling other people to walk away from a computer. That's just rude.

wizofaus · on May 24, 2023

That would be fine if the core thing needing unit testing was the data generation/ transformation logic, but just as often as not it's the output formatting too. Did you try asking ChatGPT to write a unit test to confirm that the output is displayed as expected?

byby · on May 24, 2023

>That would be fine if the core thing needing unit testing was the data generation/ transformation logic, but just as often as not it's the output formatting too.

Output formatting touches io. In this case it is no longer a unit test that touches these things. Unit tests by definition test ONLY internal logic and transformations.

It is literally the definition of unit tests.

When you test things like stdout that becomes an integration test and Not a unit test. It requires some external thing or some global black magic monkey patch that changes what print does to do integration testing.

(Btw making print formatting unit testable means segregating the formatting from the print. Produce the string first, test that, then print, because print can never be unit tested by definition)

Typically programmers segregate these levels of testing because unit tests are easier to write. But to write unit tests your code has to be written in a way to cater to it. Often this style of coding actually improves your code it makes it much more modular. The reason is because pure functions that output data can be composed with all kinds of io functions. You can move it all over the place and to different platforms with different forms of IO. Print has no meaning in certain embedded systems so it can't be moved... By segregating the logic out it makes it so I can move the logic without the io baggage.

Chatgpt 100 percent gets the difference that's why it did what it did. I think you and the OP don't fully understand the meaning of unit testing.

Don't take this the wrong way, but just because you don't know this doesn't say anything about your skills as a programmer. But just recognize that this concept is basic and is pretty much something universal among testing.

wizofaus · on May 24, 2023

> Unit tests by definition test ONLY internal logic and transformations

Output formatting is still a type of transformation! The function explicitly takes the numbers and prints them as decimal integers with newlines between each. A test to confirm that it IS in that format is still a unit test.

BTW I gave ChatGPT the prompt I would give, and I have to say the answer looks pretty good, even if I'm not a Python programmer and it's not the way I'd do it (which would be to change the function to allow passing in an output stream):

    class MyFunctionTestCase(unittest.TestCase):
      def test_my_function(self):
        expected_output = "0\n1\n2\n"
        with patch('sys.stdout', new=StringIO()) as fake_out:
          my_function(3)
          self.assertEqual(fake_out.getvalue(), expected_output)

With a few more prompts I also managed to get it give me this version:

  def my_function(x: int) -> str:
    output = ""
    for i in range(x):
        output += str(i) + "\n"
    return output

Which I'd argue somewhat changes the code that was originally written, but it's still a pretty decent answer. There's no doubt there's some impressive stuff going on that it can do such things, the real issue for me is that when I've tried on far more complex functions it's tended to break down (quite badly in some cases).

byby · on May 24, 2023

>Output formatting is still a type of transformation!

I'll quote part of my reply (which you missed):

   (Btw making print formatting unit testable means segregating the formatting from the print. Produce the string first, test that, then print, because print can never be unit tested by definition)

Right? Think about it. You want to unit test your formatting, remove the logic from the atomic IO function. Otherwise you can't test it via a unit test because that's the definition of unit testing. I realize that there is formatting that's part of the internal functionality of printf, but really all that means is that funcitonality can never really be unit tested. If you want to test printf, that happens at the integration level... By Defintion.

>BTW I gave ChatGPT the prompt I would give, and I have to say the answer looks pretty good, even if I'm not a Python programmer and it's not the way I'd do it (which would be to change the function to allow passing in an output stream):

It's wrong in this case. Unless you specifically asked it to write unit testable code, what it did here is write a hack that monkey patches the program. It's a huge hack. It didn't write unit testable code, but rather it wrote a integration test that monkey patches stdout, negating any need to make your code "unit testable" no refactoring needed using this method. The entire concept of refactoring code to be unit testable flies out the door in this case as you're just using integration tests to get around everything.

I mean yeah you use the unit test library but is not technically a unit test. It's fine I'm not a stichler for what style of testing is used in practice but what I am saying is that what chatgpt did previously was literally follow my instructions to the letter. It did it exactly 100% correctly. Think about it. I asked chatgpt to make the Code more unit testable. You didn't have chatgpt do anything to the code. You simply changed the test from a unit test to integration test. Huge difference. I mean if your case was the "proper" way then what does it even mean to make code "unit testable" if you're not even touching the code? Like why does the concept of "making code more unit testable" even exist if we're just changing tests to make everything unit testable? Ask yourself this and you'll realize that the only answer is basically what I just told you previously.

wizofaus · on May 24, 2023

I've been writing unit tests for over 15 years (actually longer, but before that they were just throwaway run-once testing stubs). I wouldn't consider what you got ChatGPT to produce to be an adequate rewrite of a function to make it unit testable (and several others in this thread have expressed the same view). Even the "hack" using monkey patching makes for a more actually-useful test.

byby · on May 24, 2023

I've been writing them for over 20. Doesn't matter. What matters is factual correctness.

See here what a unit test is versus integration test: https://www.testim.io/blog/unit-test-vs-integration-test/

And also see my reply to the other guy where I definitively show that not only is what chatgpt produced correct, but better:

https://news.ycombinator.com/item?id=36064097

wizofaus · on May 24, 2023

I'm perfectly aware of what the difference is, thank you. The function you gave to ChatGPT explicitly requests a stream to format integers as decimals, separated by newlines. The version it gave as being 'unit-testable' did not, and hence wasn't a 'factually correct' answer. In some cases that may be perfectly fine, but in others it most definitely isn't.

byby · on May 24, 2023

This is the function I gave chatGPT:

   ME:
   def (x: int):
      for i in range(x):
         print(i)

It takes an int. Are you trolling now or did you get mixed up with something else?

cookieperson · on May 24, 2023

So your current stance is, LLMs can't do everything yet, but don't bother thinking about extending it's capabilities just ask it it can do everything? Fascinating...

byby · on May 24, 2023

It's not a stance. I'm stating a fact of reality. Huge difference.

I didn't say don't bother extending it's capabilities either. You're just projecting that from your imagination. An hallucination so to speak not so far off from what LLMs do. I find your similarity to LLMs quite fascinating.

What I said is, the capability of doing the "extension" you want is already in the LLM. Sure go extend it but what you're not getting is that we've already arrived at the destination.

Turskarama · on May 24, 2023

Yeah, it does great on little toy examples.

What I would like to do is feed in my entire 50k line program and get something out.

moffkalast · on May 24, 2023

I really wonder how Claude 100k does on larger workspaces, has anyone tried that? (I don't feel like paying another $20 to Anthropic too) Allegedly it's only marginally better than 3.5-turbo on average so it'll probably spit out nonsensical code but maybe the huge context can help.

cookieperson · on May 24, 2023

Or you can just write the code yourself instead of praying something else can do your job for you better than you can...

moffkalast · on May 24, 2023

But that would require me to not be a lazy ass, which we all know is impossible.

Also writing code, in 2023? With your hands? Pshh, that's so 2010s.

byby · on May 24, 2023

So I said it's like 50 percent of the way there implying that it gets things right at a rate of 50 percent. That's a fuzzy estimation as well, obviously so don't get pedantic on me with that number.

When you ask for large output or give it large input you are increasing the sample size. Which means more likely that part of the answer are wrong. That's it. Simple statistic that are inline with my initial point. With AI we are roughly half way there at producing answers.

If you keep the answers and questions short you will have a much higher probability of being correct.

So that 50k line program? My claim is roughly 25k of those lines are usable. But that's a fuzzy claim because I LLMs can do much better than 25k. Maybe 75% is more realistic but I'll leave it at 50% so there's a lower bar for the nay sayers to attack.

toolslive · on May 24, 2023

  >    ME: 
  >        f(x) = exp(cos(x))
  >
  >        what is f(0) ?
  >
  >    GPT-3.5:
  >        f(0) = 1

(note, it most often generates the correct result, but I've seen it do the above too)

glenneroo · on May 24, 2023

Thanks for at least admitting you used GPT 3.5, which is very out of date and hence no longer useful when discussing AI capabilities. If you want to test current tech (which is moving fast), at least use GPT-4 (which also gets updated regularly).

toolslive · on May 25, 2023

(latest greatest, a different function)

    > g(x) = sin(x)/x ; what is g(exp(-200)) ?
    > ChatGPT
    > 
    > To find the value of g(x) = sin(x)/x at the point g(exp(-200)), we 
    > substitute x = exp(-200) into the function:
    > g(exp(-200)) = sin(exp(-200))/exp(-200)
    >
    > Now, let's calculate this value using numerical methods:
    >
    > sin(exp(-200)) ≈ 
    > 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

    > (there it breaks off, running out of tokens )

byby · on May 24, 2023

I mean. I said 50 percent of the way there right? So your example here is inline with what I said. It's not perfect, but it's halfway there.

crabmusket · on May 24, 2023

This a classic case of understanding programming as an activity that is primarily about text production. This is what Naur was arguing against in Programming As Theory Building [1].

Taking Nair's viewpoint, coding AI will be useful only to the extent that it assists programmers to build and employ their theory of the program. Or to itself develop a useful theory of a specific program, not just programs in general.

[1] https://gist.github.com/onlurking/fc5c81d18cfce9ff81bc968a7f...

Ygg2 · on May 24, 2023

Honestly ChatGPT-4 sucks at writing code (if similar code isn't in its corpus of knowledge).

I asked it to write a YAML parser in pure Rust. The first thing it did was write this:

    pub enum YamlNode {
       Scalar(String),
       Sequence(Vec<YamlNode>),
       Map(HashMap<String, YamlNode>),
    }

This is what I expect a freshman in college to write as their first Yaml parser.

Literally unusable as a starting point.

I did try to correct it but it tripped up and somehow managed to fuck that up to.

hderms · on May 24, 2023

Why does it suck, because it's doing a lot of allocations?

Ygg2 · on May 24, 2023

No. You have a fair point, but not my biggest gripe. It's much more fundamental to what YAML is.

YAML allows YamlNodes as keys in a map (aka complex keys). E.g.

    ? [ a, b]
    : "complex key, scalar value"

You asked it for YAML parser and it gave you a shitty JSON parser.

Also this is the most banal thing about YAML, something that should be obvious just browsing the examples. And I gave it test suite and link to YAML spec.

Most work in YAML is around super obscure corner cases.

pornel · on May 24, 2023

Rust is 90% arguing with the compiler and 10% debugging. And yesterday GPT-4 solved a WTF lifetime issue for me pulling a trick I wouldn't have thought about.

Ygg2 · on May 24, 2023

Probably because its corpus of knowledge includes someone solving it like that.

pixl97 · on May 24, 2023

I'm not really a programmer, but when I write code for stuff I need this is how I operate. The novel idea I want to solve is the end goal, not the intermediate code that I'm gluing together that other people have generally created.

ChatGTP · on May 24, 2023

What was the WTF lifetime issue?

pornel · on May 24, 2023

I wanted to compare function pointers, but that's tricky due to unique function types, disallowed non-primitive casts, plus apparently not allowed for types with higher-ranked lifetimes (for<'a> fn(&'a)). GPT-4 came up with using a type alias (type FnComparable<'a>) that added necessary coercions and gave them equal lifetime.

api · on May 24, 2023

Stronger type systems really do prevent bugs. In Rust I probably spend 20% of the time at most debugging. Of course you pay for that by the fact that it takes longer up front to write Rust because you have to think about it more, but overall I think it’s a net win.

You also get the bonus of nasty bugs like crash bugs and exploitable memory problems become almost entirely a thing of the past. You can ship new software you know is very reliable.

AnimalMuppet · on May 24, 2023

I think that training GPT to use a debugger is going to be hard. For many debug problems, the amount of context you have to keep is going to be really hard for GPT.

> But, AI is particularly bad at formal methods.

AI? Or GPT? Earlier, reasoning-based attempts at AI seem like they would be a natural fit for formal methods.

ly3xqhl8g9 · on May 24, 2023

Train them how to hallucinate the debugger and the world. Forget about running the debugger only for your program, simulate the entire stack, networking and databases included, for a true timeless debugger.

bravura · on May 24, 2023

I’ve said it before and I’ll say it again. A time traveling retroactive debugger integrated with an LLM for interrogating what happened and why, would be awesome.

hirrolot · on May 24, 2023

I'm imagining an AI based on a deductive system rather than sequential text generation. This is roughly how "strong type systems" work, and so it might be simpler to map this model to formal methods.

By the way, if you spend 90% of time debugging your code, I think that's really sad. Either the programming language sucks, or the codebase sucks, or both. I probably spend 10-20% max of my time on debugging.

hnfong · on May 24, 2023

The "strong type systems" work by having you design the types correctly in the first place. If designed correctly they work wonders, and 90+% of the time if the code compiles it's probably correct. That's a big "if" though, and most of your time writing "strong type systems" is coming up with the correct type structure.

The one thing worse than debugging code is debugging "types". Those 30-page C++ compiler errors are definitely worse than whatever I had to do with (for example) Python.

lmm · on May 24, 2023

> Those 30-page C++ compiler errors are definitely worse than whatever I had to do with (for example) Python.

That's definitely in the "programming language sucks" category. Good languages make it easy to debug problems with your types.

hirrolot · on May 24, 2023

> The "strong type systems" work by having you design the types correctly in the first place.

I was responding to a comment that was suggesting letting an AI to do that.

> The one thing worse than debugging code is debugging "types".

It depends on a language. In languages with first-class types, we have the same tools to debug types. Classical C++ templates are more akin to a macro system than a type system.

mikebenfield · on May 24, 2023

> Those 30-page C++ compiler errors are definitely worse than whatever I had to do with (for example) Python.

C++ is sort of the canonical example of a language with a terrible type system, so it's not really the right place to look for this comparison.

raincole · on May 24, 2023

> Those 30-page C++ compiler errors are definitely worse than whatever I had to do with (for example) Python.

Everyone complains how slowly Rust compiles. But at least its error messages are much better.

byby · on May 24, 2023

>Everyone's trying to train GPT models to write code, but maybe we should be training them how to use a debugger. Though its a lot harder to map text generation to debugging...

We can actually try this now. Literally tell the LLM what you want to do and work with it. See how far you can take it. You will of course be asking the LLM for debug line inputs and you will be providing it with outputs as you run the debugger yourself.

antihipocrat · on May 24, 2023

Or just learn to write it ourselves? If it takes the same amount of time to work with the LLM as coding it from scratch then I'd prefer to improve my coding ability while I do the work.

Kostchei · on May 24, 2023

With no experience in java, no coding for 30years since pascal in high school, no previous use of git or github, no hands on experience of the azure stack... I stood up 4 static web apps that do things I want in my hobby in 4 weeks- the first one took 12 hours including being shown git, installing npm etc etc. The last one took me 40 minutes. They do things for me in D&D that I have wanted for 20 years- now that capability is accessible. Whole monster manual ingested into a level, terrain and faction based encounter system that give ranges and features for the encounter ie a battle map. Scaling encounters suitable for the party at any level that theme with the terrain and dominant faction. The best thing about an MMO but for 5thed dnd.

Did I learn a bit of java and css and git?- sure, but I was up and running in about 4 hours with a mvp for my 1st one. There is NO way I could "learn" that in that timeframe. I just asked chatGPT 4 how to do it, and it told me. When I didn't know how to commit, it told me (actually I didn't even know the concept). It held my hand every step of the way.

I didn't need to learn something first, I just did it. And I have started doing it at work. "hmm 4 GB of fortinet logs in 20 files of gzip on mac.. how do I find a host name in that? - chatgpt.. oh- 1 line of zgrep.. never heard of it- hey it works.."

admittedly, I am bathed in tech, been hanging around folks talking about projects for years. But NOW I can execute- the problem? When it hits about 500 lines of java- maybe 10 functions, it is too big to drop into the prompt to debug and I don't know enough to fix myself. Solution, make smaller apps, get them working, create data files to reference in json, chain them together. eh, not perfect, but good enough for hobby.

Beware- fools like me who know nothing will be bringing code to production near you soon. Cool that you like to learn stuff, but syntax bores the crud out of me, each to their own, I'm just going to make. I find it more satisfying. Terrifying that code born like mine will end up in someone's prod, but it will.

antihipocrat · on May 24, 2023

It sounds like you've learned a lot in the process of using the LLM, and perhaps you will use the LLM less for the basic stuff next time.

Kostchei · on May 25, 2023

Maybe, but I think it is more likely I will try a different type of project, a different stack. See if that is the only easy path. Try something with graphics (a visual map) or that uses the llm api (generate a narrative etc). But my mate who is a programmer agrees with you- he sees the same thing- it is a good way to learn while being productive.

byby · on May 24, 2023

This is a good answer. You don't have the bias of years of programming experience or training. You don't have your identity tied to the job.

If AI helps you, you'll emphasize on the overall benefit rather then nitpick at the details because of the clear conflict of interest that LLMs present to programmers.

byby · on May 24, 2023

I'm just saying the tech is already here. The core engine can do it.

Before you go on and write such a system it's better to test if the LLM can do debugging to an efficacy level that we require. I don't think anyone has tried this yet and we do know LLMs have certain issues.

But make no mistake, the possibility that an LLM knows how to debug programs is actually quite high. If it can do this: https://www.engraved.blog/building-a-virtual-machine-inside/ it can likely debug a program, but I can't say definitively because I'm too lazy to try.

antihipocrat · on May 24, 2023

Thanks for sharing that link, from that example I can see how LLMs could be used to speed up the learning process.

I do wonder though whether the methods that the LLM provides are reflective of best practice or whether they are simply what happens to be most written in SO or blog posts.

pixl97 · on May 24, 2023

So an interesting behavior of LLMs is something like the following.

"Write C++ code that sorts the following inputs"

versus

"Write (version) C++ code that sorts the following inputs, ensure the code is secure and uses best practices"

And you'll likely get a different answer.

mrleinad · on May 24, 2023

Doesn't matter if it can. You'll have to know how to do it too. Otherwise, you'll never be able to recognize a good fix from a bad one provided by the AI.

cookieperson · on May 24, 2023

Imagine explaining to your boss "sorry for taking down prod but really it's all chatgpts fault!". I bet that would go over real real well...

Kostchei · on May 24, 2023

No different from "the team that built that is all gone, they left no doco, we assumed X, added the feature you wanted, but Y happened under load" , which happens a lot in companies pushing to market older than a minute.

My default assumption now, after watching dozens of post mortems, is that beyond a certain scale, nobody understands the code in prod. (edited added 2nd para)

cookieperson · on May 24, 2023

Going to have to disagree with a lot of this based on my experiences.

byby · on May 24, 2023

This is off topic. Clearly we all know the LLM is flawed. We are just talking about it's capabilities in debugging.

Why does it always get side tracked into a comparison on how useful it is compared to human capability? Everyone already knows it has issues.

It always descends into a "it won't replace me it's not smart enough" or a "AI will only help me do my job better" direction. Guys, keep your emotions out of discussions. The only way of dealing with AI is to discuss the ramifications and future projections impartially.

Forgotthepass8 · on May 24, 2023

I fiddled around with some things on the weekend (i am not a programmer, i actually hate it so using LLMs is great for me - us EEs always write awful code) to automatically create a debug file of any output that gets a traceback and create a standard report using pdb, inspect, etc (never used them before) regarding the functions, parameters and variables, current state etc etc.

Though i was surprised i can't easily run pdb instance via a python program, still have to use stdin/out apparently.

Next i want to implement automerge (or semiautomerge) between different outputs which e.g. contain variants of the same function to automatically resolve issues spawned from the model forgetting. That's so annoying

I also suspect a lot of issues are due to the training data being on old SW. I think we can automatically remap this with whitelisted functions and parameters (i recall inspect can do this), blacklisted ones from old version NOT present in the current, and maybe a transformation between the two -- or automatisch regenerate if it's wrong, maybe with a modification to the prompt.

Also talking to it in other languages generates massively different code (i used deepl) so i had the crazy idea of spawning Dockers and just letting this automatic/semiautomatic trouble Shooting+ just parallel generating lots of functions using wildly different inputs (and models) to brute force the problem of having to code

I do need to look into a nice terminal interface for N-way merges and parallel gen monitoring.

The most useful thing for me was making some vim keybinds and scripts to automatically grab Codeblocks, run them and quickly regenerate. You can literally just tell it "DF" and if fixes a pandas issue sometimes

The holy grail will probably be local fine tunes/LoRAs for specific issues or libraries, since it only costs a few $ for one. Sign me up for an expert plotly AI in a box for neat plots please

Edit : i also have literally no idea what I'm doing either, but linting and analyzing generated Code blocks could help expedite this whole process as well. And in principle you don't even have to run it if you know the type is wrong or something.

I don't know what this is called but computer science is ostensibly mathematics so i assume/hope there is some rigor here

lcuff · on May 24, 2023

I suspect as things evolve with the Large Language Models, there will be integration with existing computer languages and frameworks. That the ability to say "Create a web site using language X with framework Y" will become a reality. This ability to get Hello World done in 5 minutes instead of 1/2 a day to walk through a intro book/tutorial, well, that's a win.

Then, down the road, each AI might have a preferred/default language, technology and framework. Quite possibly newly created. This has a parallel to intermediate representations (the earliest of which I'm aware of is p-code in UCSD Pascal). But it is also analogous to compiling C to assembler, and then machine code. Similarly with Java. Wolfram is correct when he talks about needing it to be a representation that the creator can inspect and verify. Not necessarily the least bit easy with a complex project. It will be an engineering journey, but it does spark in me the hope that English (or anyone's native language) becomes the high level language of choice for guiding machines in tasks. Mathematica? Humph. Stephen Wolfram is very pleased with things he's thought of or perhaps synthesized, but I'm gonna say, I think Mathematica is not the generic solution we will want.

nordsieck · on May 24, 2023

> I suspect as things evolve with the Large Language Models, there will be integration with existing computer languages and frameworks. That the ability to say "Create a web site using language X with framework Y" will become a reality.

So, I think a key challenge is that modern programming languages do 2 things simultaneously.

1. They provide direct instruction to computers.

2. They document precise human intent.

Even if the need for part 1 goes away, part 2 will always be with us. I would expect that if natural language programming becomes a thing, that a dialect forms - akin to legalese - that embodies best practices for precisely documenting human intent.

galleywest200 · on May 24, 2023

> I would expect that if natural language programming becomes a thing, that a dialect forms - akin to legalese - that embodies best practices for precisely documenting human intent.

Sort of like how the SQL syntax seems designed for business admin folks to use, but us programmers ended up using it.

ofey404 · on May 24, 2023

> Sort of like how the SQL syntax seems designed for business admin folks to use, but us programmers ended up using it.

So it seems that a declarative programming language would be a suitable target for LLM to generate.

Maybe someday there would be a popular, shared IR across models, like SQL does to relational database.

ok_computer · on May 24, 2023

I was going to say the same thing, we’ll all be YELLING OUT SQL to our mic terminals and syntactically backtracking from the expressiveness and imprecision of our dynamic typed languages of choice.

tbrownaw · on May 24, 2023

> This ability to get Hello World done in 5 minutes instead of 1/2 a day to walk through a intro book/tutorial, well, that's a win.

So a bit like `create-react-app` or `dotnet new` or such, but now powered by AI?*

lcuff · on May 24, 2023

Well, I haven't used either of those, so I can't comment on a comparison ... but the point is less 'powered by an ai' and more the fact that I'm using English, not the jargon of create-react-app or dotnet new or the rails version or the django version ... Right now, my experience programming with an AI is that I'm much better off asking for subroutines to do this and that, not ask for a complete architecture. Mostly for debugging reasons: When it doesn't work, I want the debugging to be manageable. Getting the AI to debug its own code has not been a productive path for me, but hopefully soon.

dboreham · on May 24, 2023

> This ability to get Hello World done in 5 minutes...

This has existed since at least 1992 (Visual C++) and probably before that (Borland Pascal?).

lcuff · on May 24, 2023

Well, I'm thinking more of a web site Hello World than a C++ Hello World. The original point of Hello World (per Kernighan and Ritchie) was to ensure that all the tools were installed (the compiler itself wasn't a given) and in your path and configured correctly. So yeah, I was say even in 1988 (when I started programming in C) there's a way in which Hello.c was 5 minutes. But when I started with Rails (years later), it was more like 1/2 a day, and the Hello World notion had expanded in scope by quite a bit. I didn't spell out the _kind_ of Hello World I had in mind. My bad.

riceart · on May 24, 2023

Web hello world in rails is literally a single command and seconds.

lcuff · on May 24, 2023

After you read far enough in the book or tutorial to find that command.

6510 · on May 24, 2023

Front end js is like the nest we've been building for it - not knowing why.

byby · on May 24, 2023

>Create a web site using language X with framework Y

Not sure if you played with chatGPT, but you can ALREADY ask it to do this and it will give you something that works 50% of the time.

dboreham · on May 24, 2023

We've been doing this for 200+ years. All the programming languages we use, and mathematical notation, are based in some way upon natural language concepts (yes even FP languages). So really we're talking about a continuum. And also worth noting that human languages evolve over time. So both programming languages will evolve to be more natural, and natural language will evolve to be more machine-parsable.

gwoolhurme · on May 24, 2023

Yeah. The first time I saw Haskell written out, it felt like a blending of formal math and programming. I was really excited at how it looks like it could have come from my discrete math book. I don't think programming in english is a good idea, but like you said, we will probably get something that looks more like formal languages, because that's what we've always done.

grt_thr · on May 24, 2023

Mathematics as shown in textbooks is not rigorous. I don't understand why so many people fetishize something they saw at university.

Formalised mathematics are incomprehensible to humans and orders of magnitude longer then anything you can see in textbook or mathematical papers outside automated theorem proving.

hnfong · on May 24, 2023

This reminds me of a tangential rant in the book "The Poincare Conjecture":

"... the postulates are unclear. Does postulate 2 mean that we can extend any line segment forever? Does it mean that we can cut up any segment? And if it means the first, who is to say that the resulting line is unique? And how seriously should we take the definitions? Are they just meant to provide guidance about a word that is essentially undefined (today's, and probably Euclid's, in-terpretation) or are they supposed to completely specify the object named? In the latter case, just what does the phrase "a breadthless length" mean?

Mathematicians and scholars know that there are gaps in Euclid, and there has been a great deal of discussion over the ages about alternate axioms, or possible additional ones. That has not stopped generations of worshipful school-masters, besotted with the majestic order, the accessibility and the patent usefulness of the Elements from rushing in and trumpeting it as the finest in human thought. However, to a thoughtful student, the Elements can seem less rational than capricious. The insistence that the Elements is flawless, and the apex of rigorous thought, turns some students away from mathematics. One wonders how much fear of mathematics stems from the disjuncture between the assertion that Euclid is perfect and some students' intuitive, but difficult to articulate, sense that some things in it are not quite right. Unless you are unusually rebel-lious, it is easy blame yourself and conclude that mathematics is beyond you.

It is worth bearing in mind that mathematical results, for all they are represented as eternal and outside specific human cultures, are in fact transmitted and understood within definite social and cultural contexts. Some argue, for example, that the Greeks invented proof in order to make sense of the statements of mathematical results of Babylon and Egypt without access to the context in which such results were used and discovered. In order to make use of the results, the Greeks needed to sort out different, seemingly..."

gwoolhurme · on May 24, 2023

We might have different definitions my good sir. Granted I don't have a PhD in math and my math stopped at a masters. To me formalization is theorems and proofs, which are 100% comprehensible to humans. For reference: http://www.vdash.org/formal/#math

Everything in that link looks like programming, especially when you consider something like Haskell. Even formalism as a philosophy tries to add logic to natural language it self. So I am not sure where I am fetishizing what I saw at university. Care to explain without the snark?

grt_thr · on May 24, 2023

That is a proof of the same way that

    print("Hello World")

is an executable.

gwoolhurme · on May 24, 2023

Ah I did say they were proofs... that is my fault, what I mean to say is that formalization is the way they are written, is very much a language. Either way. Not sure where the venom was coming from

grt_thr · on May 24, 2023

Pointing out that people are impressed by form and not substance isn't venom, it's the first step to bringing mathematics into the 20th century.

gwoolhurme · on May 24, 2023

I never said they were proofs right? I am still confused. I was talking about foramalizing and how at least to me I see programming in it. Even something extremely complex like "The Strong Perfect Graph Theorem" is still readable and reads like english+programming. That's why I said haskell looks like it comes out of a discrete mathbook. I am not sure what your point is or where my fetishization is coming from.

cubefox · on May 24, 2023

I think the previous comment wasn't so much disagreeing as adding a related thought.

behnamoh · on May 24, 2023

> So both programming languages will evolve to be more natural, and natural language will evolve to be more machine-parsable.

Hard disagree. Is Rust similar to natural language? Has English become more similar to <pick the easiest programming language>?

gwoolhurme · on May 24, 2023

Just to play with the logic a bit. I could see natural language coming up with formal templating to get something closer to a programming language. Not like how we speak will change, but more like how a haiku works is formalized. Similarly to how every CS1 class in university has people who do the "write the steps in making a sandwich" exercise. That exercise is actually easier in a programming language, because of the nature of what a programming language is, its instructions in steps at the end of the day. So perhaps something is formalized that way? Also most modern languages have been slowly adding in things that are easier to read from a natural language perspective VS classical C syntax style. Rust is very C-like but still has plenty of FP like functions built in too.

EamonnMR · on May 24, 2023

The way you're using brackets sure resembles Backus-Naur form.

dusted · on May 24, 2023

One way to view a programming language, that differs from the traditional math-oriented perspective, is that a programming language is a subset of a natural language which has become sufficiently unambiguous.

Ambiguity is in my opinion the biggest reason why "plain {insert natural language} programming won't be a thing".. The major challenge in normal commercial software development is not writing working code, it's aligning expectations and identifying assumptions and eliminating ambiguities.. At that point, the coding part is more or less mapping the description to whatever particular words the implementation language is using.

fsflover · on May 24, 2023

> Ambiguity is in my opinion the biggest reason why "plain {insert natural language} programming won't be a thing"

How about Ithkuil? https://news.ycombinator.com/item?id=36022731

classified · on May 25, 2023

You're joking. Even the inventor doesn't speak it fluently.

fsflover · on May 25, 2023

I wasn't talking about fluency, just about possibility to use it for programming.

thomastjeffery · on May 24, 2023

Programming languages don't entirely remove ambiguity: they isolate themselves from it.

Everything in a program is unambiguous, but no useful program stays entirely within its own black box.

Ambiguity still exists between programs. This ambiguity is the source of incompatibility.

omginternets · on May 23, 2023

Sure, if the natural language we're talking about is formalized to the point of basically being math. Let's not confuse that with "everyone will be able to program", though.

speed_spread · on May 24, 2023

It'll be the same way everybody can program with Excel right now. Things will work fine, until they don't anymore, at which point a professional will be brought in to help fix "the bug". Obviously, that bug will the obvious result of the whole thing being a horrible mess and will be unfixable without a proper rewrite.

_Maybe_ the AI will be able to help with the rewrite, generating test cases and translating business logic, which would be nice.

qumpis · on May 24, 2023

I think many people give up too quickly (as could perceived by those professionals who had to put sweat when learning how to put things back when they break). But a capable language model might serve as a debugger: "you're stuck? About to call a professional? No worries, explain to me what you see and let me ask you questions in order to find a fix together"

noisy_boy · on May 24, 2023

I think it'll evolve into the AI Assistant being an always-on/always-available app/service/client (ala Siri/Alexa) coupled with a much more powerful online service. Advantage is that once the client is installed, it has access to practically everything the user is doing (which is what the tech companies dream of) and can actually ingest the inputs without the user having to summarize/translate into a text box (problem with Excel? AI assistant can read the excel sheet that has the issue so you can just ask questions).

I also think that the main two reasons behind Alexa/Siri not seeing very high adoption rates are misinterpretation of the voice commands + quality of results. With LLMs, at least the first issue should improve dramatically and if the popularity of ChatGPT is anything to go by, the second issue should also see improvements.

cableshaft · on May 24, 2023

I'd much rather program with the help of ChatGPT 4, as it is right now, over programming with Excel (I've done both). And at least on a smaller scale, the code generated by ChatGPT hasn't been a mess at all.

Sometimes it's incorrect or insufficient and it can't handle the more complex solution I'm asking of it, but the code isn't messy and doesn't require me to rewrite it from scratch, just fix what's incorrect.

6gvONxR4sf7o · on May 23, 2023

I'd love if the ambiguities could be a dialogue of question/answer, rather than being fully specified ahead of time like we generally have programming today. It seems much more efficient.

dools · on May 23, 2023

That's what programming with ChatGPT is like.

anothernewdude · on May 23, 2023

A massive pain in the ass?

dools · on May 24, 2023

Have you not spent much time working with ChatGPT? Or maybe you haven't upgraded to plus so you get GPT4? It's so fucking good. It's a bit like pair programming. Even though it can't always give you the result you want, it does so an appreciable percentage of the time and it's a fabulous way to think through problems, especially as a complement to the "google -> stackoverflow -> copy -> paste" style programming when you're trying things out or unsure of how to do something.

cookieperson · on May 24, 2023

Do people actually do that much copy paste programming? People I work with are more inclined to read the docs and lean on intellisense. The people I've seen cling to chatgpt have been spending a lot of time forgetting LSPs exist and wondering why some hallucinated method doesn't exist... They also tend to think really long 1 liners are good code over explicit easy to read 3 liners...

dools · on May 27, 2023

> Do people actually do that much copy paste programming?

So much:

https://stackoverflow.blog/2021/12/30/how-often-do-people-ac...