Hacker News new | past | comments | ask | show | jobs | submit login
The British Library puts 1M newspaper pages online for free (ianvisits.co.uk)
324 points by aries1980 on Aug 11, 2021 | hide | past | favorite | 53 comments



Interesting facts about the British Library:

- It requires every physical book published in the UK to be collected by the Library (since 1662)

- It has 60 million individual newspaper editions

- In 1999, the Library earmarked 60,000 volumes of non-British newspapers for disposal because it was running out of storage space (inviting criticism)

- The newspapers were offered to overseas museums, or put up for auction. But the short notice given to museums meant many were unable to accept them (they also needed time to free up physical space)

- The American writer Nicholson Baker used his own retirement money to purchase "2000 bound volumes of American newspapers - the last remaining copies in the world - including a complete run of the Chicago Tribune from 1888 to 1958 and hundreds of editions of Joseph Pulitzer's ground-breaking colour broadsheet of the 1890s, the New York World." [1]

- The physical copies of the American newspapers were saved and become part of the American Newspaper Repository [2] a non-profit organisation which Baker founded. In 2004, the collection moved to Duke University.

- Baker went on to publish a book of the whole affair in 2001 called Double Fold: Libraries and the Assault on Paper. The Guardian published an interview with him in 2002 (below)

[1] Paper Chase: https://www.theguardian.com/education/2002/mar/22/museums.re...

[2] From an archived copy of the American Newspaper Repository website: "Research libraries everywhere, including the Library of Congress, the New York Public Library, and the Center for Research Libraries, have replaced most of their often richly illustrated sets of late 19th and 20th century newspapers with black and white microfilm."


> It requires every physical book published in the UK to be collected by the Library

Legal deposit is more the rule than exception for national libraries. Many national libraries are also saving copies of the national relevant web.


This is great. Historical newspapers are one of the largest corpora of information that has yet to be adequately brought on line.

In the U.S. the Library of Congress has digitized a fair number, but at the state and local level it's really hit or miss. Some states such as California and New York have put quite a bit on line, but many others rely on individual towns and historical societies.

Different pay services cover various papers, but there has never been concerted effort to digitize the staggering amount of microfilm that is out there.


> Historical newspapers are one of the largest corpora of information that has yet to be adequately brought on line.

Not just information, but works of art as well!

A few years ago, I trained an image classifier to help me find Krazy Kat comics in newspaper archives. In the process of doing that, I came across a shocking amount of other comics and artwork. I was honestly surprised to see how many amazing illustrations and comics are just sitting in newspaper archives, waiting to be rediscovered.


Fascinating. Fantagraphics (https://www.fantagraphics.com/collections/newspaper-comics) have done great work restoring and publishing such classics, but there might be much more to discover.


I wonder what ever happened to all the newspapers that were fed into services like CompuServe in the 80's.

I dated a newspaper reporter during that era, and all of her stuff went into the online services. But her newspaper's current online archive only goes back to about 2005, even for subscribers.



Thanks, it's nicely parametrised and easy to browse; shame they decided to require free registration in order to actually view anything though.


You only get to see three pages for free. The original submission title is really misleading:

  >Try the British Newspaper Archive for FREE

  >View 3 pages FREE when you register to help you get started


Presumably that copy refers to how the site used to work prior to today, when they announced the new million free pages: https://blog.britishnewspaperarchive.co.uk/2021/08/09/introd...

That blog post has a screenshot of an interface where you can chose to search either in "Free To View" or "Subscriber Access" - so presumably you now also get 3 free "subscriber access" pages.

Confirmed: I created an account and I'm able to view unlimited pages with the "Free To View" filter.


I just came back from - actually, physically - visiting the British Library last week to do some research in the newsroom with microfilm and all.

Title of this piqued my interest, but looking at the details, but the selection of papers they're adding seems kinda meh. Mostly the sort of local papers that British Newspaper Archive always had, and still can't compete with the horrendously proprietary Gale and ProQuest archives, which have national papers (Guardian, Observer, etc) and require physically turning up to the library to use.

I used to have a pay-as-you-go subscription to BNA, to spite the monthly pay option that I figure I wouldn't make the most of, but it quite scandalously "expired".


Trove does this for Australian newspapers

https://trove.nla.gov.au/


And because the newspapers were syndicated they also have the big stories around the world.

Like the Carrington Event & Krakatoa, or Alexander Graham Bell's firewall to stop people stealing electricity, or pirate attacks on junks off Hong Kong.


Latvia has it's own digital library of periodicals. It was digitized in few projects, and was made available to general reader (via periodika.lv) for free, except last 60 or so years, which were subject to copyright. However, when pandemic came, digital library became available for everyone without any copyright deductions.


I wish more institutions 'tested the waters' with these copyright laws.

Put a newspaper up from 100 years ago and see if anyone complains, if they do; take it down. Subtract 10 years every year until someone does complain.


I think you underestimate how assiduous copyright lawyers are. The automated systems would spot the post and prompt a complaint within minutes.


Do they really have automated systems searching for 50+ year old newspapers? Probably a lot of them haven't even been digitised before, so it would be impossible to search for them in an automated way.


From personal experience, I can tell you that there are automated systems searching for such pieces of art.

As for newspapers, all it takes is one copyright troll to realize it's happening, and suddenly there's lawsuit settlements everywhere, making him rich.


The trolls can always try. OTOH I would be a little surprise if, a) one can intimidate The British Library by just sending some silly nastygrams, b) The British Library either doesn’t have in-house counsel or didn’t consult competent law firms before publishing these papers.


For that to happen they'd have to be the owners of the Copyright, they're not exactly patent trolls who can use the law creatively.

The Newspapers are largely regional (and mostly defunct) papers. Who at the Derby Evening Telegraph is going to waste lawyers fees getting a Library to remove 100year old content?


The Derby Evening Telegraph may have sold the rights of their archives to an entity (like ProQuest) which has a financial interest in finding and shutting down competition.


Patent trolls own the patent.


Yes but they can creatively say it applies to a whole manner of tenuous things. A sneaky newspaper owner can’t claim copyright on anything other than what they own


Hm… I’m not convinced about the likeliness of that outcome. Automated systems like that are mostly used on YouTube/Facebook, aren’t they?


Do anyone know any existing effort on converting these scanned image to text corpus ( probably a new OCR model needed to be developed on these old text ) ? I think it would be more usable if they are in text form in terms of search and research purpose.


Well when Apple releases the next OS it will automatically OCR all images, so one possibility is just downloading them all on an Apple device.


It already is. I do text searches all the time here and have paid for a subscription for awhile now.


It's fascinating to read the British account of the American Revolutionary War in their newspapers. Many people were sympathetic, and it wasn't the top story most days. Just some trouble in the colonies.

See: https://foreignpolicy.com/2012/07/03/how-did-the-british-pre...


> The British Library keeps to a ‘safe date’ when determining when a newspaper can be considered to be entirely out-of-copyright, which is 140 years after the date of publication.

It's depressing that copyright has been extended so far it's now longer then any single individual's possible lifetime.


This needs to change, big time.

There is almost no cash value to an article one day later, yet we completely impoverish the public domain for its sake. Not only are creators of valuable works is usually pretty distant from direct ownership anyway, there's no possible way for them to profit directly from this work. The only way a spotify-like deal works is because copyright ownership is conglomerated.

IMO, public (especially free-as-in-beer) access to newspaper archives could be pretty liberally justified on fair use grounds.


> There is almost no cash value to an article one day later, yet we completely impoverish the public domain for its sake.

We're not imposing publishing restrictions on past works in order to preserve their cash value. We're imposing publishing restrictions on past works in order to stop them from competing with present works. It keeps the cash value of present works up.


That makes even less sense, though. Yesterday's news does not really compete with today's news, they're effectively in separate markets.


So? Yesterday's books, movies, and music compete with today's books, movies, and music. Nobody cares about the news one way or the other. So the news gets treated just like everything else.


There are virtually unlimited quantities of free to consume media - TV, Books, Movies. People don't pay more money for new creations because the older works aren't published or available.

Relaxing copyright restrictions after a more reasonable period (seven years) doesn't take value away from new work, it enriches the public commons in a way that encourages the creation of new culture.

A tiny portion of cultural works remain profitable after the first few years. Grant seven years of copyright automatically, and let creators purchase extensions in five year blocks that increase in price. $100 for five additional years, $1000 for five years after that, $10000 for five more, $100000 for five more. It makes no sense to barricade all cultural works behind a copyright wall to protect the tiny slice of properties that continue to pay dividends after a generation.


Well... the fair use case and/or the public good case are pretty strong for news.


This.

Even if the works are not economically valuable, people's attention is, as is gatekeeping control over archives (allowing one to set narratives and agendas and control context).

The time I spend going through an archive (I immediately hit the 3-article registration wall at the BLNA, so ... little time) is time I'm not spending consuming the present-moment adverts-laden infotainment stream.


> There is almost no cash value to an article one day later, yet we completely impoverish the public domain for its sake.

If there is no value to an article one day later, it’s literally impossible that restricting it impoverishes anyone.

Copyright law is messed up, but demanding access to something by asserting it’s worthless makes no sense.


>If there is no value to an article one day later, it’s literally impossible that restricting it impoverishes anyone.

You have mistakenly generalized "cash value" to "value" without reason. I find a ton of value reading old media, because it allows one to "play the historian", you get a unique and magical ability to experience an era just like its natives have done, except not with their outlook on life or assumptions. It's the closest you can get to traveling to a far foreign country.

This is value, but not cash value. Releasing this media impoverishes no one (in terms of cash value) but massively impoverishes people like me (in the sense of value).

>Copyright law is messed up,

10^grahams_number amen to that, it's bloody ridiculous to withhold something decades after its creator died and their family is now in the 5th generation and probably don't know the thing ever existed.

Copyrights, patents and intellectual property are a blight upon the earth, a massive bug in civilization's conception of ideas and knowledge that no one is willing to report.


>This is value, but not cash value.

I agree that then copyright on the news should expire a lot sooner, but really if people (as a general category) find value in something they (as a general rule) will pay some amount to have it. It may not be a great amount, and you may not be one of the people who will pay for it despite valuing it (many people who value music won't pay for it, but there are obviously people who do value it and will pay).

I am not arguing that there should be copyright on this media just that what the parent commenter said is correct, you can't claim that it holds some form of value without also admitting that someone will pay (another form of value) for it.

In point of fact now that I think of it I have some old hardbound police gazette collections that are republished sometime in the last 20 years from 100+ years ago. I paid for those.


Economically numbers close enough to zero are zero. Basically every method of value extraction has overhead so something can both be valued and have no economic value.


> every method of value extraction has overhead so something can both be valued and have no economic value

Right. Which is why I don't bother selling on ebay or amazon, it isn't worth the time, postage, or trip to the post office. It just goes to the thrift store, as they have a business model that enables them to eke out a profit on such items.


yet books that compile publications from previous centuries are published and bought but evidently the ledgers are not debited or credited for the parties to these transactions. Basically if a method of value extraction has so much overhead that it has no economic value I would expect that method to disappear, thus when methods of value extraction are still in use I think that someone has managed to derive economic value from that method - perhaps a very paltry economic value in comparison to some of the companies we discuss here quite frequently - but still some value.


The National library of Sweden has a similar website but for newspapers printed in Sweden. They write "Copyright protection is valid for 115 years on the day. The free material is moved forward by one day every day." I get that likes such as Disney has an interest in extending it. Maybe a middle ground could be the ability to apply for extensions of individual works instead of a universal blanket extension? This long period really hampers my ability to do research on 1700s literature - a lot of such research was done in the early 1900s. But that content, even if it's 100 years old, just isn't indexed anywhere (e.g. Google books) due to copyright even if it is already digitized. https://tidningar.kb.se


Wasn't it Disney and other media companies that pushed copyright duration towards this 140 year value? I vaguely recall that as a driving factor though maybe wrong and one of those unproven theories or meme'd news of times past.

Be ironic (A children focused company like Disney pushing thru a change in law that actually in the end harms the children as it limits their access for their lifetime into copyright servitude that shows little thought for the children) if was as would sure add a whole new spin to the "think of the children" sound-bite often used to push thru some change in law/rules.

Still, somewhat sad. More so as we build building today too a standard that is not that long-standing in years by design.


AFAIK the 140 year is just an arbitary value the British Library have used, it isn't part of copyright law. The UK has used a life + X[1] years system since the 1840s[2]. I guess they assume 140 years is enough to ensure any unknown / untracable authors have been dead for long enough for a work to be near unquestionably in the public domain.

[1] Where X has steadily increased with each copyright law revision.

[2] The US was a weird outlier on this for a long time after most of the world settled on this system.


It's even worse than that, from the failed registration page:

    View 3 pages FREE when you register to help you get started
    Explore hundreds of national, regional and local titles dating from the 1700s-2000s
    Search, save and organise your favourite topics


Oh. It's worse than that. Copyright apparently extends back to the middle ages:

https://stiobhart.net/2021-04-8-bubble-trouble/


The article does not say that. It conflates copyright and trademarks a couple of places, but it's clear that this is a combination of Redbubble applying their own over-cautious policies combined with trademark registrations, not copyright.

There are plenty of issues with trademarks too, but they are very different to copyright.


I think the nub of the article is that the artist had his artwork banned just for referring to the fact his designs were inspired by a mediaeval manuscript ie. 'The Book of Kells' because Trinity College have trademarked 'The Book of Kells' to use as a brand name on products they sell.

As it says in the article

  >...So, are [Trinity College] actually saying that no one is even allowed to mention "The Book of Kells", for fear of violating their copyright? If so, that’s really going to fuck up a lot of History books!...
I take on board what you mean about the distinction between Copyright and Trademark, but I think it still serves to illustrate just how ridiculous the legal minefield around both issues has become.


It's almost as if they're slowly declassifying documents over time.


A funny thing is actual US classification guidance is the default declassification length is 10 years from publication, and the maximum length without a special appeal is 25 years: https://ustr.gov/sites/default/files/foia/Classification%20G...


The irony is that they gave money to the digitisation of newspapers, but this outfit then charged money fir it abs cut out the general public.

Hopefully this will be like Australia’s Trove.


only 3 pages are free. To be able to view more you need to pay a monthly subscription.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: