Note that $2,400 is disks alone. You'd obviously need chassis, powere supplies, and racks. Though that's only 17 12 TB drives.
Factor in redundancy (I'd like to see a triple-redundant storage on any given site, though since sites are redundant across each other, this might be forgoable). Access time and high-demand are likely the big factor, though caching helps tremendously.
My point is that the budget is small and rapidly getting smaller. For one of the largest collections of written human knowledge.
There are some other considerations:
- If original typography and marginalia are significant, full-page scans are necessary. There's some presumption of that built into my 5 MB/book figure. I've yet to find a scanned book of > 200MB (the largest I've seen is a scan of Charles Lyell's geology text, from Archive.org, at north of 100 MB), and there are graphics-heavy documents which can run larger.
- Access bandwidth may be a concern.
- There's a larger set of books ever published, with Google's estimate circa 2014 being about 140 million books.
- There are ~300k "conventionally published" books in English annually, and about 1-2 million "nontraditional" (largely self-published), via Bowker, theh US issuer of ISBNs.
- LoC have data on other media types, and their own complete collection is in the realm of 140 million catalogued items (coinciding with Google's alternate estimate of total books, but unrelated). That includes unpublished manuscripts, maps, audio recordings, video, and other materials. The LoC website has an overview of holdings.
At 5 MB per book, this works out to about 200 TB of disk storage.
At about $12/TB, hosting the entire LoC collection would cost roughly $2,400 presently, with prices halving about every three years.