The section "3.2. Training data collection & preprocessing" covers what you're inquiring about:
"We train Moonshine on a combination of 90K hours
from open ASR datasets and over 100K hours from own
internally-prepared dataset, totalling around 200K hours.
From open datasets, we use Common Voice 16.1 (Ardila
et al., 2020), the AMI corpus (Carletta et al., 2005), Gi-
gaSpeech (Chen et al., 2021), LibriSpeech (Panayotov
et al., 2015), the English subset of multilingual Lib-
riSpeech (Pratap et al., 2020), and People’s Speech (Galvez
et al., 2021). We then augment this training corpus with
data that we collect from openly-available sources on the
web. We discuss preparation methods for our self-collected
data in the following."