(follow-up: Figured this should be a different comment)
I wanted to demonstrate what I said above so I came up with some examples of things I think a human would have an easy time implementing but might be hard to implement. BUT a key part is that I expect these to be in the dataset! I just don't expect these to be in hundreds or thousands of githubs because they will be uncommon (but not rare). Also, we'll pretty much ask for few-liners to give the model the biggest advantage we can (errors will compound).
Prompt:
from torch import nn
class LipSwish(nn.Module):
""""
The Swish activation function is defined by a gated linear unit,
where the gate is defined by a sigmoid function and multiplies the input with
a learnable parameter, beta. Beta is initialized as 0.5.
The Lipswish function normalizes the output by the upper bound of 1.1.
""""
def __init__(self:
super().__init__()
Result: Mostly correct but missing the division by 1.1. The forward is `return x * F.sigmoid(self.beta * x)`, which is Swish (it also assumes we had "import torch" and applied type hinting). It did properly set the parameter (this is just a 3 liner)
Discussion: The Swish function should be in the dataset and is a well known activation function (though beta is not in the pytorch version). Despite LipSwish being in the dataset (introduced in 2019 from Residual Flows[0]) it is not common. I could get the code to generate the swish function (initializing beta, and performing the gate) but could not get the code to divide the output by 1.1. I would not expect a human to have difficulties with this.
Okay, so let's try something else that might be a bit more common and older. The same paper uses a concatenated activation function, and those aren't "uncommon". CReLU was introduced in 2016[1] and there's plenty of concatenated activations around since then. The pytorch documentation even uses it as an example[2]. There's far more examples of CReLU (3k python results for "class CReLU" vs 58 for "class LipSwish. Use these numbers as weak hints because search sucks and isn't always accurate).
Prompt:
from torch import nn
from torch.nn import functional as F
class CReLU(nn.Module):
""""
Concatenated version of ReLU. The activation is applied to both the positive and
negative of our input and the result is concatenated.
Result: `return torch.cat([x.clamp(min=0), -x.clamp(min=0)], 1)`. This is correct but not the expected one-liner result.
Discussion: This was a bit surprising, it didn't use functional as we might expect (or hinted). But interestingly it will if we change the class name to "ConcatenatedReLU". I found exact copies on GitHub with the full name (memorization) but the fist page of instances for CReLU I found used functional (I did find one that was exactly the above code, when adding "clamp" to the search, but missing the minus sign. There were plenty of errors in CReLU implementations). Interesting side note: CReLU continues and defines a function CReLU6 with uses the same docstring but clamps with a max of 6 on the positive input whereas Concatenated starts to define a convolutional block (Conv + BatchNorm + ReLU) called Conv2d.
So we have kinda mixed results, and in both cases these are rather odd and probably not what we wanted. We can clearly see that there are issues where a human would not have too much trouble. There's a big issue in these types of problems: we need to memorize a lot of information (otherwise we can't even write code or know library calls) but too much memorization prevents creativity. There is a lot of gray area between the _pure_ "Stochastic Parrot"/"Fancy copy machine" vs a generalized intelligence (with a broad and flexible definition of intelligence). I'd still call them stochastic parrots because to me the evidence suggests that we're closer to the memorization side than the creation side. But that doesn't mean these frameworks aren't useful. We all know a lot of code is boiler plate (otherwise we wouldn't have the joke "copy paste from SO") and these tools can be very useful for that. But I think the utility is highly going to depend on what you are coding for and how you code. If you're doing standard stuff, this probably has high utility to you and can save you a lot of time. The same way writing macros does, but this is FAR more powerful. It can also help novices a lot. Also, if your main errors are reading mistakes (e.g. you're dyslexic) -- this is my largest problem -- then this might make things difficult as you have a tendency to gloss over text and miss minor errors. I also don't think these tools would help if you're a researcher or writing optimized or specialized code. These differences are probably why we see such differences in people's reactions. But it may also be a hint into what people do and how they work when we see who raves and who rants about these.
Edit: We can also check if code is in the stack[3]. We see that [0] is indeed in the dataset so we know there is information leakage. Interestingly the exact copy I found in the previous comment[4] isn't! (The repo, though the user is)
I wanted to demonstrate what I said above so I came up with some examples of things I think a human would have an easy time implementing but might be hard to implement. BUT a key part is that I expect these to be in the dataset! I just don't expect these to be in hundreds or thousands of githubs because they will be uncommon (but not rare). Also, we'll pretty much ask for few-liners to give the model the biggest advantage we can (errors will compound).
Prompt:
from torch import nn
class LipSwish(nn.Module):
""""
The Swish activation function is defined by a gated linear unit,
where the gate is defined by a sigmoid function and multiplies the input with
a learnable parameter, beta. Beta is initialized as 0.5.
The Lipswish function normalizes the output by the upper bound of 1.1.
""""
Result: Mostly correct but missing the division by 1.1. The forward is `return x * F.sigmoid(self.beta * x)`, which is Swish (it also assumes we had "import torch" and applied type hinting). It did properly set the parameter (this is just a 3 liner)Discussion: The Swish function should be in the dataset and is a well known activation function (though beta is not in the pytorch version). Despite LipSwish being in the dataset (introduced in 2019 from Residual Flows[0]) it is not common. I could get the code to generate the swish function (initializing beta, and performing the gate) but could not get the code to divide the output by 1.1. I would not expect a human to have difficulties with this.
Okay, so let's try something else that might be a bit more common and older. The same paper uses a concatenated activation function, and those aren't "uncommon". CReLU was introduced in 2016[1] and there's plenty of concatenated activations around since then. The pytorch documentation even uses it as an example[2]. There's far more examples of CReLU (3k python results for "class CReLU" vs 58 for "class LipSwish. Use these numbers as weak hints because search sucks and isn't always accurate).
Prompt:
from torch import nn
from torch.nn import functional as F
class CReLU(nn.Module):
""""
Concatenated version of ReLU. The activation is applied to both the positive and
negative of our input and the result is concatenated.
""""
Result: `return torch.cat([x.clamp(min=0), -x.clamp(min=0)], 1)`. This is correct but not the expected one-liner result.Discussion: This was a bit surprising, it didn't use functional as we might expect (or hinted). But interestingly it will if we change the class name to "ConcatenatedReLU". I found exact copies on GitHub with the full name (memorization) but the fist page of instances for CReLU I found used functional (I did find one that was exactly the above code, when adding "clamp" to the search, but missing the minus sign. There were plenty of errors in CReLU implementations). Interesting side note: CReLU continues and defines a function CReLU6 with uses the same docstring but clamps with a max of 6 on the positive input whereas Concatenated starts to define a convolutional block (Conv + BatchNorm + ReLU) called Conv2d.
So we have kinda mixed results, and in both cases these are rather odd and probably not what we wanted. We can clearly see that there are issues where a human would not have too much trouble. There's a big issue in these types of problems: we need to memorize a lot of information (otherwise we can't even write code or know library calls) but too much memorization prevents creativity. There is a lot of gray area between the _pure_ "Stochastic Parrot"/"Fancy copy machine" vs a generalized intelligence (with a broad and flexible definition of intelligence). I'd still call them stochastic parrots because to me the evidence suggests that we're closer to the memorization side than the creation side. But that doesn't mean these frameworks aren't useful. We all know a lot of code is boiler plate (otherwise we wouldn't have the joke "copy paste from SO") and these tools can be very useful for that. But I think the utility is highly going to depend on what you are coding for and how you code. If you're doing standard stuff, this probably has high utility to you and can save you a lot of time. The same way writing macros does, but this is FAR more powerful. It can also help novices a lot. Also, if your main errors are reading mistakes (e.g. you're dyslexic) -- this is my largest problem -- then this might make things difficult as you have a tendency to gloss over text and miss minor errors. I also don't think these tools would help if you're a researcher or writing optimized or specialized code. These differences are probably why we see such differences in people's reactions. But it may also be a hint into what people do and how they work when we see who raves and who rants about these.
[0] https://arxiv.org/abs/1906.02735
[1] https://arxiv.org/abs/1603.05201
[2] https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html
Edit: We can also check if code is in the stack[3]. We see that [0] is indeed in the dataset so we know there is information leakage. Interestingly the exact copy I found in the previous comment[4] isn't! (The repo, though the user is)
[3] https://huggingface.co/spaces/bigcode/in-the-stack
[4] https://github.com/bertomartin/stat4701/blob/ec2b64f629cbbf6...