SWE-bench Lite is a subset of extremely simple issues from a cherry-picked subset (SWE-bench) of a handful of large (presumably well-run) Python-only projects.
Here are some rules they used to trim down the SWE-bench Lite problems:
* We remove instances with images, external hyperlinks, references to specific commit shas and references to other pull requests or issues.
* We remove instances that have fewer than 40 words in the problem statement.
* We remove instances that edit more than 1 file.
* We remove instances where the gold patch has more than 3 edit hunks (see patch).
You can't demonstrate whether a dataset is representative or not by "an example or two". You need to look at all the data.
And all of this is fine. It's just a benchmark suit and doesn't need to be fully representative. The dataset itself doesn't even claim to be that as far as I can find. All I'm saying that the title wasn't really accurate.