I don't understand why archive.org keeps all kinds of formats, .ogg and .mp3, .pdf and .jpg for the same resource? Why not just whatever the original format was? Sometimes its a dozen or so formats it seems like.
There are probably a lot of reasons, but I don't know where to chat about it to learn more.
edit: now that I think about it, if the original is pdf, then jpeg makes sense for loading one-page-at-a-time in the browser, but it seems like mp3 transcoding to ogg is reasonable to leave up to the user?
The derivations are auto generated for easier access by various clients. In the metadata, they are marked as original versus not original for later discernment and perhaps rerendering.
So, then, the convenience outweighs the disk space cost?
And also, I guess, transcoding the least-popular format on-the-fly is still too CPU intensive for large files (zips of hundreds of jp2 images, etc)
Guess I'm looking for a magic bullet where there might not be one, I just want to see Archive.org keep doing what it's doing far into the future.
Who knows, maybe we will stumble on some magic bullet, new compression algos (Zstandard? AVIF? AV1?), user clouds for compression (like boinc; archive already lets users assist via bittorrent for bandwidth costs). Thinking out loud.
Anyway, keep doin' what you're doing. And podcasting. That's good too :)
There are probably a lot of reasons, but I don't know where to chat about it to learn more.
edit: now that I think about it, if the original is pdf, then jpeg makes sense for loading one-page-at-a-time in the browser, but it seems like mp3 transcoding to ogg is reasonable to leave up to the user?