The DBIR is an interesting dataset in that it only covers breaches that have been covered by the media.
It does not include the vast majority of breaches that happen every year and are reported to federal and state regulatory bodies or as posted to cybercrime / ransomware sites.
One of the coolest things is that this process though flawed is transparent and semi-open to the public.
The dataset and the underlying process for which events are selected takes place in the open on GitHub.