>The fraction of samples being verbatim reurgitations is low
Copilot spitting out a function out of its training data with changed variable names to match those in your file(s) - and no one is actually testing what proportions of results those are - is still regurgitation.
> and no one is actually testing what proportions of results those are
The idea that models could possibly overfit the training data is hardly a new idea. It's standard practice to test for that. Check section 7 of the PaLM paper for example. https://arxiv.org/abs/2204.02311
Copilot spitting out a function out of its training data with changed variable names to match those in your file(s) - and no one is actually testing what proportions of results those are - is still regurgitation.