Hacker News new | past | comments | ask | show | jobs | submit login

>The fraction of samples being verbatim reurgitations is low

Copilot spitting out a function out of its training data with changed variable names to match those in your file(s) - and no one is actually testing what proportions of results those are - is still regurgitation.




> and no one is actually testing what proportions of results those are

The idea that models could possibly overfit the training data is hardly a new idea. It's standard practice to test for that. Check section 7 of the PaLM paper for example. https://arxiv.org/abs/2204.02311


Copilot is weird.

The N in the NLP training set and arch implies a fuzzy match.

Why anyone would rather start with a plausible but broken-in-a-subtle way buffer rather than a blank one is beyond me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: