How can people help? Sounds like a global index of sources is needed and the work to validate those sources, over time, parceled out. Without something coordinated I feel like it is futile to even jump in.
I spent a bunch of time on this project feeling like it was futile to jump in and then just jumped in; messing with data is fun even if it turns out someone else has your data. But the government is huge; if you find an interesting report and then poke around for the .gov data catalog or directory index structure or whatever that contains it, you're likely to find a data gathering approach no one else is working on yet.
There's coordinated efforts starting to come together in a bunch of places -- some on r/datahoarders, some around specific topics like climate data (EDGI) or CDC data, there's datasets being posted on I think one way is to find a topic or kind of data that seems important and search around for who's already doing it. Eventually maybe there'll be one answer to rule them all, but maybe not; it's just so big.