Hacker News new | past | comments | ask | show | jobs | submit login
Are language models deprived of electric sleep? (cbs.dk)
76 points by rahimnathwani on July 23, 2022 | hide | past | favorite | 92 comments



So green text is easier to spoof than HN comments, but my, uh, friend tells me that the HN comment corpus is floating around.

Someone is eventually going to speed run 10k karma on HN and YouTube it. This year? Next? Who knows. But not a decade.

https://youtu.be/efPrtcLdcdM


Nice title. Many levels to the play on the 1968 “Do Androids Dream of Electric Sheep?”[0]

[0] https://en.m.wikipedia.org/wiki/Do_Androids_Dream_of_Electri...


Just a side note, I was showing some friends gpt3 and asking some trivial questions, trying to impress them, when one of them asked me: "Is it any different from like asking google for the same information?" I mean, In a pure naive and practical point of view she wasn't that wrong.


I watched Google grow up, and for people who’s main experience is with the last decade or so, I’d say they have a very insightful view.

Yes, some have argued that Google can’t make content and from a technological standpoint that’s right, but here’s the fulcrum: from the perspective of “I asked a question and got a result that seems right though I’m not sure”, there’s shockingly little difference between GPT-3 generated content, and low-effort SEO-optimized content found on the first page of Google.


> low-effort SEO-optimized content found on the first page of Google

often, this is also mechanically generated


When you use Google, you expect it to help you to find something tangible that already exists. When you prompt GPT-3, it feels more like synthesising something that didn’t exist, even though all of the words and phrases in the response have been used before.

The line is blurry, though. Google uses machine learning too, so it’s possible that nobody on the planet has ever seen your particular search results for a given query, which feels a bit like synthesis, although the building blocks are bigger. And GPT-3 isn’t really synthesising at all, of course, but it’s a convincing enough illusion that synthesis is a helpful mental model.


Yes, of course it's different. Google can point you to a real Wikipedia article while GPT-3 will generate links to fake ones (or you could ask it to generate a fake article.) It's the difference between truth and fiction.

Or at least, a first-level approximation. Wikipedia articles can be wrong.


Well google also can point me to fake links with no real content at all as well. But I get your point.


Google can also return "People also asked" results that make so sense or give incorrect answers. Sometimes they even contradict the cited article.

E.g

Ask "is pi disjunctive" and you'll get the top result of

> = 3.1415926535897932384626433832795... is a disjunctive number.

But if you click into the article (what percentage even will?) you'll find it's part of a larger statement:

> It is not known whether π = 3.1415926535897932384626433832795... is a disjunctive number.


Earlier this year you could google "is covid airborne" and get a WHO article explaining that it is not airborne above the first result :)


At the level of abstraction I believe your friend was talking about, yes, this is effectively a condensed model of "The Internet" and the results returned by Google.

For those saying it can generate new things, well it is interpolating between existing data points in the corpus, but mushed through a bunch of math.


Yes, but also, mixing together different articles on the Internet by different authors results in a complicated mixture of truth and fiction. What you get back depends on the query and some amount of randomness (depending on settings).

By not mixing the articles together, a search engine lets you see where the information came from.

These are both useful things to do, but one is more useful for fiction and the other for nonfiction.

I'm not sure "interpolation" is the right word, though, for these mixtures. Transformer output seems more creative than that.


As a creative person, creativity is over rated. But dont let me speak for everyone.

It is a straight line in a higher dimensional space. We vibe with the space, or it chooses us by prior experience. Creativity is in some aspect, in the eye of the beholder. I think the creativity we see in ML models is not unlike the creativity we prescribe to humans.

Interpolation is the right word and the right concept and metaphor. A mixing between two concepts to create a path between the two.


Maybe from a theoretical point of view too — the internet is a generative network, yes?


It can do a lot more than answer questions, it can generate original content which google can not do at all.


This is the Copenhagen Business School (one of the largest universities in Denmark) library homepage and presumably the library blog. Which makes it weird that the author uses "I", when the author name is not shown.


From the comments section we learn that the author is Thomas Basbøll, the same person mentioned in the prompt, "What about Thomas Basbøll? Didn't he play a part in this process?" in step 8. As this is obviously some kind of inside joke, I guess, we are supposed to know already and regulars to the blog will know.


As a blog editor, I can definitely see a scenario where submissions are made by those unaware of details about the final form - especially when students (likely early in their publishing career) are involved.


I’m well aware that this is an unpopular opinion.

When your game plan is take a multi-head attention transformer and overfit it on as much of the Internet as we have A100s to fit it on, you’re going to get a distribution, and if you sample from it, it’s going to sound like the Internet.

People read too much into this.

The technology is obviously powerful borderline dangerous, but it’s still just maximizing P(this-sentence-was-on-the-internet).


I completely agree.

Nonetheless, this approach of taking a fairly complex writing prompt and simply regurgitating contextually similar text from the internet often yields 100% meaningful results. This suggests that our intelligence is a lot more mimetic than we'd like to believe, which is perhaps an even more unpopular opinion.


Even if a sentence is meaningful standalone in vacuum, no communication between humans ever happens in such a way. It always matters who said it and why; we never take the substance independently from the agenda and the mind behind it, the context in which that mind existed and its relationship with our own, and so on.

This strikes me as a crucial part of what intelligence (whether in humans or other creatures) means.

In case of an algorithm, there is no intent of its own—except the intent and the minds of humans who trained it, ran it and supplied inputs.

IMO the onus is on AGI believers to prove that material world is somehow the source of consciousness or intelligence (it wouldn’t hurt defining the terms first either). Otherwise it’s a philosophical position, and while one is entitled to hold their own one is not entitled to force it onto others.


>Even if a sentence is meaningful standalone in vacuum, no communication between humans ever happens in such a way.

True, although much less so for (pseudo)anonymous online communication, as we’re having here.

>It always matters who said it and why; we never take the substance independently from the agenda and the mind behind it, the context in which that mind existed and its relationship with our own, and so on.

I have no idea who you are, what your agenda is, or anything about how your mind operates. There is essentially zero relationship between your mind and my own. The only context I have about you is the text of your post.

>This strikes me as a crucial part of what intelligence (whether in humans or other creatures) means.

Totally agreed. This is why GPT-3 can do a fairly good job emulating anonymous online discourse, but cannot convincingly emulate a person we actually know.


Personally, the fact that we are on HN of all places supplies a lot of context and agenda and to me is a meta-filter of sorts (pardon the pun). Perhaps one relationship between our minds is that we both have found our way here. I think there is a shared motive in trying to locate truth with a hint of procrastination[0].

Let’s imagine a slightly more down-to-earth exchange and raise the stakes in a way. For example, say it is still mostly a philosophical discussion here on HN, but in which someone is pushing a position on how much potential a certain company or store of value has; would you not question (at least to yourself) whether that person is invested and seeking to profit from it in short term, which would taint the motive? Or say someone was arguing in support of a controversial policy of a particular government known for its strong control over access to information and freedom of expression; would you not wonder whether the person is in fact a citizen of that country being misled by own government (and/or motivated to support it rather than seek truth)?

Yes, the substance of what they say may or may not be true independently of that context, but if we want to function socially and exhaustively validating every claim being made is not an option we have to take shortcuts, and I think we do it all the time even without realizing it. (I’m not writing that lightly since it seems similar to profiling which is ethically icky, but it is my conclusion upon introspection.)

In these cases we can at least imagine possible motive mismatch (known unknowns); in case of a GPT3-like thing instead of a motive you get a scary abyss or much more obscured motives of its human creators. I can’t imagine it having no impact on how I participate in an exchange.

[0] Even still, you can see how elsewhere in the thread there are warranted accusations of the motive being tainted by human exceptionality bias.


I tend to agree that the math gets a lot simpler if the sun doesn’t rotate around a mystical axiom that humans are “special”.


High-energy physicists are about as serious as it gets, but they’re still dragging their feet around many worlds because it has uncomfortable implications for consciousness and free will.

Homo Sapiens has a serious agenda around being something other than the smartest chimpanzee.


No, they're "dragging their feet" because it's impossible to test, it's unfalsifiable.


And because if every outcome happens the probabilities of those outcomes are meaningless, and those probabilities are the predictive content of a quantum mechanical theory. It's an "interpretation" that requires you to ditch the entire value of the thing you're "interpreting". You don't need to know anything about QM other than that it has been empirically tested and is probabilistic in nature to reject the many worlds interpretation on this basis.

Each time I mention this online I state it more confidently, in the hope that some day someone will Cunningham's law me and change my mind. If I haven't committed some gross misunderstanding it's disappointing that so many physicists fall for such obvious bunk. I've already seen the attempted arguments listed at [1] and none of them are remotely convincing.

[1] https://en.wikipedia.org/wiki/Many-worlds_interpretation#Pro...


Hey if you’ve got David Deutsch dead to rights you should book a meeting and have him hand you his Fellowship in the Royal Society.

I’m not a physicist but I am an interested layman, I’m pretty sure I’ll catch wind of the Nobel Prize you’d win for proving him wrong about everything.


This isn't some unique insight of mine. There's a section about it in the wikipedia page for MWI, which I linked in my comment. Everett himself referenced the problem and attempted, unconvincingly, to argue around it in his original paper.

The mystery to me is why the obvious flaw in the idea hasn't killed it dead for so many smart people, such as David Deutsch. Instead they engage in bizarre logical contortions to try and recover the Born probabilities and some shadow of a meaning for them. If you think there's some actual merit in their attempts, listed at the wiki link I pasted, I'm all ears.


Right, it’s well known that if you’re committed to getting probabilities out of amplitudes things go off the rails. Everett himself couldn’t go the full way.

Fully deterministic, you see the part of the wave function that you see. You’re entangled with the apparatus.

Deutsch, and his protege Marletto go the whole way: no free will, no arrow of time, no subjective human experience at all.

Now? No measurement problem. No interpretations of collapse.

These people have decided that the subjective experience of observing an experiment is of secondary importance to clean math.

I appreciate that “you and I don’t exist in any way we’d recognize it” is a big pill to swallow, but I find the idea that humans looking in microscopes mutates the universe a bigger pill.


Superdeterminism (a bad name for ordinary determinism plus a ridiculous unspecified mechanism that links the very thoughts and intentions of an oberver to the system being observed) is even worse. MWI having thrown the baby out with the bathwater, superdeterminism then discards the entire building and everyone living in it. You can "explain" anything with a theory of this nature just as you can with intelligent design. It has no value as an intepretation of a perfectly well functioning mathematical framework like QM.

And the question here is one of interpretation. You say that MWI gives you "clean math" but this isn't about the maths. The maths of quantum mechanics is what it is regardless of how you interpret it. Whether the Schrödinger equation describes the evolution of the system state and observation is a state vector collapse, or the Schrödinger equation describes the evolution of a multiverse and the apparent state vector collapse is an artifact of distinct outcomes decohering, the maths is exactly the same. MWI's claim that it's somehow "mathematically cleaner" or "just what the maths is saying" is nonsense. It's there in the name. It's an interpretation.

QM is a highly effective theory (or rather framework for theories) that has been tested to the umpteenth degree. The question of intepretation is a philosophical one more that a mathematical one, and certainly fascinating. To my mind a "good" interpretation would have to give some actual meaning to the functional components of the theory. If QM predicts something will happen 30% of the time and I test it and it happens 30% of the time, an interpretation that fails to give a meaning to that predictable, observed fact is no use to me.

You seem to think the problem others have with your preferred interpretations is that they are too small-minded to accept that reality is the way these ideas imply. This isn't the case at all. I don't find "free will" an interesting concept and I'm perfectly happy to contemplate a multiverse, deterministic or otherwise, if that's what the scientific method leads us to. What I'm not happy to do is to launch off into a world of wacky, unfalsifiable notions that remove meaning from our existing theories rather than adding it.

I could equally well make claims about the psychology of MWI's adherents, suggesting emotional explanations for why they seem bound to make ever more absurd claims in defence of their idea rather than accepting that it's flawed. I won't, though, because doing that is arrogant and presumptuous.


I’m a seriously interested layman at best, so I could be failing to understand some of the subtleties, if so I apologize.

I’ve got no agenda around Everett’s initial idea and monograph being the end state. To your point, there are some known flaws with his initial formulation.

I find the idea that QM needs an interpretation absurd, MWI was a step on the path to that way of thinking. At a high level, the notion that all possible outcomes of an observable-producing operator on the wave function are equally “real” is clarifying.

I tend to agree with Deutsch and Marletto that “that which is admissible is admitted”.

It’s possible we’re in violent agreement.

QM, and more specifically QEM and QCD make wildly accurate predictions. As long as no one is talking about a causal structure unique to “consciousness” or some ridiculous narcissism like that, I’m very satisfied to rely on the falsifiable and well-tested laws.


Well that and the string theory people can kindly die in a fire.


Superposition collapse is impossible to test by definition of not including the observer in the system.

If we just admit the possibility that the math was right all along we win, the math works, and we lose “I am a unique, distinct entity.”

I’m on team math.


That's math;not physics. You need experiments in physics.


Can you explain the experiments and physics that justifies physicists beliefs in the Copenhagen interpretation of QM?


Experiments justify 'predictions' of certain mathematical models, not 'interpretations'. As for a belief in interpretations, it's more of a philsophy, and Feynman said just 'shut up and calculate'.


I agree that interpretations of QM are currently in the realm of philosophy, but (and, at the risk of being a fool in saying this) I disagree with Feynman when he says “shut up and calculate” [1].

I tend to lean towards Sean Carrol’s approach of the interpretation is important and we should spend real effort thinking about the different impacts of different interpretations. After all, Einstein didn’t get to general relativity just by calculating. It first required deep thought and consideration to decide what to calculate. And I’m not convinced that the interpretations will forever remain in the realm of philosophy (though, I accept I could be totally wrong about this).

[1] I don’t really disagree with Feynman on this. I think it’s perfect advice for people getting started in really learning QM (which is beyond where I am!). Being very familiar with the mechanics and being able to calculate fluently are probably a good starting point for people learning QM. But I do think that it’s worth experts in the field spending some time beyond the mathematical fundamentals.


> we lose “I am a unique, distinct entity.

I agree with this sentiment. quantum mechanics (stemming out of statistics mechanics) start from regarding all particles as indistinguishable.

IMO, this theory is more mathematics than it is physics.

Given a wide enough application, humans end up ourselves as particles, essentially we are commoditifiying our individuality.


I think of humans as a lens, mirror or piece of turbulence in a high dimensional space that information about the universe flows through. Maybe the smartest chimp, but not any more or less alive than anything else. Somewhat fortunate in the ways that we get to experience existence.


What this experiment shows is that making stuff up off the top of your head, using memory alone, isn't as hard as it looks.

But when doing actual research, we check our work against the real world. For example, that's how you get a list of real references rather than fake ones.

Suppose we played a guessing game: given a title, does the Wikipedia article exist or not? You could fairly confidently say that "Apple" exists and "wjifdvq" does not, but given a plausible-looking name of a person or place that you don't recognize, you'd have a harder time. It's not a problem in practice though, because you can look it up.


I sometimes fine tune new language models on the HN comment corpus for fun.

It generates comments arguing that ML models are just parrots.


As a bird owner, they are not “just parrots”. There is clear intent to communicate. Even species who can’t say human words use distinct vocalizations for specific purposes and situations.


I didn’t say that fine-tuned language models have enlightened opinions about living things, just that they produce comments indistinguishable to me from the GP :)



1. They aren't overfit.

2. It was far from obvious a priori that this could work at all.

The existence of the sentiment ‘lol you trained a big model on X so obviously it produces a good probability distribution on X’ only exists because big models proved to be extraordinarily more effective than anyone expected.


When you’re churning through terrawords of training data and aren’t cross-validating against proportionally-sized hold out sets?

It’s not a certainty that you’re overfitting but the burden is to show otherwise.

Besides, if your corpus is asymptomatically everything, why wouldn’t fitting it perfectly fitting it be the goal?

It would probably be more accurate to say that the bias/variance tradeoff loses meaning as the training set goes to infinity.

Grapevine is that the big actors are going to 5x their flops at any price in the next three years.

What does validation loss even mean at that scale?


> Besides, if your corpus is asymptomatically everything, why wouldn’t fitting it perfectly fitting it be the goal?

That's called Google, and it serves a different purpose. Perfectly learning the distributional properties of text instead is not overfitting, that's just fitting.


>The existence of the sentiment ‘lol you trained a big model on X so obviously it produces a good probability distribution on X’ only exists because big models proved to be extraordinarily more effective than anyone expected.

No, the "popularity" of that sentiment exists because of their "effectiveness". That sentiment was existing and being voiced 10 years ago.


You’ve stated this point better than I did.

Otherwise brilliant and rational people just go all mystical when we start talking about meaningless words like “intelligence” or “consciousness”.

An animal or a human or a big matrix perform X well on Y task. That’s quantitative and objective.

All this “is it smart” bullshit is thinly-veiled “how does the world still rotate around my subjective experience given this thing writes better Tolkien fan fiction than I do”.

Performance on tasks. Everyone else shuffle over to the Philosophy department.


Worth noting that in the limit the distribution P(human did/said A | context B) is a complete definition of human behavior. If you could model this perfectly, that would be a perfect model of a human ie identical.


Exactly.

And the real problem is the way humans decide sentience in the first place - by how much the machinery acts as if it were sentient. There is no other information - just our perception. If we imagine two agents, A and B, one sentient and one insentient, but both acting identically, we theoretically couldn't decide which one is which.

So whether or not there is sentience in a machine then becomes a question as unanswerable as whether there is something that exists outside the universe we can perceive. We cannot know what we cannot possibly perceive.


Actually more like P(fraction of humans sampled did this | given context X). Which deprives the model of viewpoints, cultures, thought patterns, means of verbal communication not ostensibly on the internet. Given the ills and change in human behavior that came with the advent of social media, I think at best it would be a distorted model of humanity; and at worst, a morally shambolic one.


That's the brute force approach anyway. Even if you do take an inordinate amount of data to sample from you'll likely get something that's woefully impractical to operate, even if it produces vaguely human like responses.

We do on the other hand know for a fact it's possible to run an instance of consciousness in a volume of about a liter that consumes like 20 watts (aka your average human brain), so there's something probably wrong with our general approach to the matter. GPT-3 already uses about twice as many parameters as our organic counterparts do, with much worse results. And it even doesn't have to process a ridiculously large stream of sensor data and run an entire body of muscle actuators at the same time.


> GPT-3 already uses about twice as many parameters

This isn't accurate. GPT-3 has 175B parameters. The human brain has ~175B cells (neurons, glia, etc.) The analog to GPT-3's parameter count would be synapses, not neurons, where even conservative estimates put the human brain at several orders of magnitude larger. It's likely that >90% of the 175B could be pruned with little change in performance. That changes the synapse ratios since we know the brain is quite a bit sparser. In addition, the training dataset is likely broader than the majority of Internet users. Basically, its not an apples-to-apples comparison.

That said, I agree that simply scaling model and data is the naive approach.


OPT-6.7B is good, but not even close to GPT-3.

If you can get GPT-like performance out of a 17B model, you should publish that.


I’m referring to post-training pruning not smaller models. This is already well-studied but it’s not as useful as it could be on current hardware. (Deep learning currently works better with the extra parameters at training time).

Retrieval models (again, lots of published examples: RETRO, etc.) that externalize their data will bring the sizes down by about that order as well.


I agree that RETRO is cool. I think you might be stretching it a bit with the applicability, but I take your point.


Is evolution not brute-force?


Naturally, but it takes a few billion years. Not sure about you but I don't really feel like waiting.


LLMs for the most part learn P(someone on internet said A | internet specific context B) given ginormous amounts data. There’s no other type of A, B with that much training data at hand.


Exactly right.

But extremely serious scientists, very smart people, are still drawing epicycles on blackboards studying “consciousness”.


Pretending consciousness doesn't exist or that it has no function I think reflects poorly on someone who studies human behavior.


I’m saying that the idea of a bright line between the emergent behavior of a dolphin and a human is very pre-Copernicus.

Studying, even measuring the capabilities of an animal is science.

Justifying a soul is the purview of spirituality, not science. (Nothing against spirituality, I have a spiritual life, I just don’t confuse it with science).


I strongly disagree. Pure behaviorism is just willful blindness. Consciousness is a real phenomenon, as any person not philosophically committed to denying its existence can tell you. It's front and center of our experience of human cognition. It would be quite strange for it to serve no function in the human mind.

Yeah, it's hard to quantify and isolate and experiment on, but that just speaks to either current limitations of human science, or possibly to limitations that cannot be surpassed. Given how much mileage certain philosophical movements have gotten out the common intuition that emerged during the Elightenment that everything is scientifically tractable, I understand the resistance to accepting these limitations and opening the door to all of the philosophical consequences of that intuition failing. But sorry, reality doesn't care about your philosophical attachments.


You strongly disagree that other intelligent, social, creative animals are built along similar lines to Homo sapiens?

You really think that we’re a special case, that a difference in degree has become a difference in kind?

I personally experience a feeling that I’m conscious subjectively, but I have no evidence that I’m any more or less motivated by pleasure or pain or community than a dolphin is.

Where do we draw the line? What’s the acid test for “yup now we’re dealing with consciousness”?


I don't mean to suggest that animals don't also have consciousness, or that it's not important to explaining their behavior too.


Why not go the other way and just admit that Descartes gave us the Cartesian plane (among other things) but was at best a product of his time with: “I think therefore I am”.

Descartes was a genius, but he was no Alan Turing, and Alan-fucking-Turing got it wrong on the most famous thing named after him (among the lay population at least). The Turing Test was a great idea, but it’s now trivially useless.

Humans are special to (mostly) themselves and (substantially) other humans.

They are not special to the universe. We’ve had this argument, it was called the “Inquisition” at least once, and we eventually cleared up once and for all what celestial body rotates around the bigger one.


Your position sounds much more religious and dogmatic than those held by the people you are arguing against.


Is that true? Ie if you have something that looks too average too often it’s rather unlikely.


Didn't we try behaviorism out until in the 50's we decided this notion of "context" was either too small or too intractably expansive to be useful, let alone explanatory, and that it was necessary to start thinking about internal cognitive processes instead?


> I’m well aware that this is an unpopular opinion

It’s so unpopular that literally every post on HN/proggit related to language models has the same comment ;)


Haha well played. I’ll counter your non-sequitor with an appeal to authority ;)

I used to do this for a living.


If you’re a language model, congrats on hitting 0.00 perplexity on the HN comments dataset. Your sponsors would be proud :P


If I was going to hook a big language model to HN I wouldn’t do it under my real name.

But… I take your meaning. I think I was subconsciously channeling OPT because that’s what the thread is about.

I agree it’s the kind of thing a fine-tuned HN chat bot would say all the friggin time.


There’s a case to be made for the proposition that diffusion is what makes the kind of “intelligence” that we value. If you cook a great dish and you cannot explain how you did it, you get a lot of credit for being a great cook. If, on the other hand, you can point to exactly one recipe that you followed, you will get a lot less credit. So in a sense, the harder it is to trace down the inspirations from which a composition sprung, the higher we value it.


The whole show in ML these days is in the infra around massively distributed, redundant, tunable, train the same fucking transformer we’ve all been using since 2017.


What about diffusion models? (And all the other things with less publicity that aren’t that)


Sure, I’m obviously being a big glib.

So it’s transformers plus let’s force some loss into the super-resolution pipeline.

“Residual” sounds a lot fancier than X = fn + X, and “Diffusion” sounds a lot fancier than “let’s do deconv repeatedly and jam a loss term in there”.

My point is all of this stuff is on Huggingface. You/I/We just don’t have 5k A100s for 6 months, so we can’t play.


If you can find a more efficient or effective approach than this “redundant” one you’ll be rich. There’s a lot of smart people working in this space.


I’m not the person to break this logjam of “moar GPUs”. But I agree that someone will, and I agree that person will do quite well.


> but it’s still just maximizing P(this-sentence-was-on-the-internet).

But are we doing any different?


Not even a little!


It seems like the bigger transformer you have, the less the generated text feels like a speech of somebody with Wernicke's encepalopathy and more like a normal person, but there is still the feeling of some "brain lesion" in there somewhere.


Part of the evaluation procedures for the models like these is checking whether the output they produce is present in the training set. The fraction of samples being verbatim reurgitations is low unless you prompt-engineer your way to that, copilot being able to recite the fast inverse square root function being a common example. But the ability to recite some samples is not proof of inability to create novel outputs.


Oh sure. I didn’t mean to imply that GPT-6 will have memorized Wikipedia.

As you start to get to the size and comprehensiveness in the corpus that’s going on now, novel model outputs approach being something that might not have been in the training set, but likely will be in the future.


>The fraction of samples being verbatim reurgitations is low

Copilot spitting out a function out of its training data with changed variable names to match those in your file(s) - and no one is actually testing what proportions of results those are - is still regurgitation.


> and no one is actually testing what proportions of results those are

The idea that models could possibly overfit the training data is hardly a new idea. It's standard practice to test for that. Check section 7 of the PaLM paper for example. https://arxiv.org/abs/2204.02311


Copilot is weird.

The N in the NLP training set and arch implies a fuzzy match.

Why anyone would rather start with a plausible but broken-in-a-subtle way buffer rather than a blank one is beyond me.


You better watch what you're saying - soon the champion (in the medieval sense) of ML, Gwern, will come to defend his liege's honor. I see the other champion, Veedrac, has already come out :)


Oh I took a licking from @gwern (who I do respect but isn’t as insightful as his fame would imply) a few weeks ago for denying that the Sigularity Is Near.

Being Mr Who Ever Heard of This Guy myself, I took it as a compliment. :)

@veedrac on the other hand, who is smart, pretty clearly got the worse on that friendly little skirmish.


The wake-sleep algorithm is designed to provide generative models with necessary sleep, at least in their developmental stages.

(Sorry, can’t resist.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: