Hacker News new | past | comments | ask | show | jobs | submit login
210k works in the Public Domain, searchable (staatsbibliothek-berlin.de)
174 points by doener on May 7, 2023 | hide | past | favorite | 10 comments



My description was not correct. There are 210k works, but not all if them are in the Public Domain:

"Here you will find high-quality digitized copies of books, manuscripts and other media from the Staatsbibliothek zu Berlin. Where the originals are in the public domain, we provide them with a public domain license. Currently there are 209,779 works in total."


No, the German clearly states the 210k works as the number of those being in the public domain.


Ah, interesting – yes.


All of them are scans of the old material, but many/most of the works also have the OCRed text available. It would be nice if all these were fed to an algorithm and an AncientChatGPT was created :-)


they are in ChatGPT most likely. The question is why those and all the other inputs are not really accessible (by a subscription for example) for the general public...


See also https://digitale-sammlungen.de, 1.2 million works from the Bavarian collections. Unfortunately not Public Domain :-/


I worked on this for the library of Uni Würzburg when I was a student. Pretty cool to see it on HN.

We worked on high quality scans, processing Terabytes of raw image material using Akka (this was around 2012 I think) and also created pipelines for performance OCR at scale on these scans. Doing this on Fraktur and medieval minuscule scripts was tricky and we didn't get really good results during my time there.


Another thing I worked on was creating interesting visualizations of those books, e.g. http://vb.uni-wuerzburg.de/ub/lskd/regal.html

It was first created using Adobe Flash and then later ported to JS with WebGL/Three.js. God, this looks silly in retrospect.



related, searchable digitilized manuscripts: https://handschriftenportal.de




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: