210k works in the Public Domain, searchable

doener · on May 8, 2023

My description was not correct. There are 210k works, but not all if them are in the Public Domain:

"Here you will find high-quality digitized copies of books, manuscripts and other media from the Staatsbibliothek zu Berlin. Where the originals are in the public domain, we provide them with a public domain license. Currently there are 209,779 works in total."

maxnoe · on May 8, 2023

No, the German clearly states the 210k works as the number of those being in the public domain.

doener · on May 8, 2023

Ah, interesting – yes.

NKosmatos · on May 8, 2023

All of them are scans of the old material, but many/most of the works also have the OCRed text available. It would be nice if all these were fed to an algorithm and an AncientChatGPT was created :-)

fock · on May 8, 2023

they are in ChatGPT most likely. The question is why those and all the other inputs are not really accessible (by a subscription for example) for the general public...

jbaiter · on May 8, 2023

See also https://digitale-sammlungen.de, 1.2 million works from the Bavarian collections. Unfortunately not Public Domain :-/

pvorb · on May 8, 2023

I worked on this for the library of Uni Würzburg when I was a student. Pretty cool to see it on HN.

We worked on high quality scans, processing Terabytes of raw image material using Akka (this was around 2012 I think) and also created pipelines for performance OCR at scale on these scans. Doing this on Fraktur and medieval minuscule scripts was tricky and we didn't get really good results during my time there.

pvorb · on May 8, 2023

Another thing I worked on was creating interesting visualizations of those books, e.g. http://vb.uni-wuerzburg.de/ub/lskd/regal.html

It was first created using Adobe Flash and then later ported to JS with WebGL/Three.js. God, this looks silly in retrospect.

walterbell · on May 7, 2023

Related queries: https://search.creativecommons.org

bodo11 · on May 8, 2023

related, searchable digitilized manuscripts: https://handschriftenportal.de