Dublin Core, what is it good for?

dmje · 2024-09-02T05:34:14 1725255254

DC was massive in museums in the 90s. I think there are still remnant uses today - stuff like schema is a good reason for thinking about on-page metadata and dc was a part of that.

I was involved with a gov / lottery funded series of projects called “New Opportunities Fund” [0] which mandated DC markup. The exciting idea for us at the time was that we’d be able to create simple cross-site searchable assets. So we (I was at The Science Museum at the time) could create our project with 30k museum records in it, the NHM could make theirs and then ultimately someone could make a “portal” (ah, the 90s phrases are flooding back…) where users could search across all the NOF funded sites.

To the best of my knowledge the portal part was never made - we all did the dc bit but nothing global emerged from NOF.

There is (to this day) a conversation about how to allow this sort of interop across museum data. On the one side are SemWeb types, on the other lightweight microformat types. I’m massively over simplifying - but this is sort of how it goes.

I’ve always been fairly much in the latter camp. DC and microformats are incredibly crude - when you say an object has a “date” for example it clearly needs qualifying. Date it was made? Found? Used? Bought? Etc… - BUT to me it’s better to have this crude description than the alternative which is some kind of deeply complex (and thus never to be agreed / implemented across tens/hundreds/thousands of museums) sort of “perfect” standard.

Of course nowadays much of this is made irrelevant by good search, and the way the majority of people search the web. A general audience doesn’t actually want to search for all paintings made by x artist on y date - and when they do they’re content (for good or bad) to settle with Google results. And I guess AI will help. Maybe.

There are still a lot of data interop projects in museums and cultural heritage - stuff like the Museums Data Service [1] is brand new, TANC [2] has been going a while - and many more out there.

[0] https://www.gov.uk/government/organisations/new-opportunitie... [1] https://museumdata.uk/ [2] https://www.nationalcollection.org.uk/

cratermoon · 2024-09-02T17:09:41 1725296981

> much of this is made irrelevant by good search

Perhaps we're seeing a resurgence of interesting in good metadata as a result of the decline in quality of search?

> A general audience doesn’t actually want to search for all paintings made by x artist on y date

This a common fate of cataloging systems: they are made by archivists, for archivists. Most people aren't archivists, they are looking for something in the context of their particular use for the information.

Sönke Ahrens in his book "How to Take Smart Notes" says, "Do they wonder where to store a note or how to retrieve it? The archivist asks: Which keyword is the most fitting? A writer asks: In which circumstances will I want to stumble upon this note, even if I forget about it? It is a crucial difference."

PaulHoule · 2024-09-02T00:15:19 1725236119

I first encountered Dublin Core when I was working at a big university library. My take was it was an embarrassment compared to the 1970 MARC standard

https://en.wikipedia.org/wiki/MARC_standards

that it was more of what you'd expect from an elementary school library as opposed to a university library. Specifically it never implemented a way to (1) use authority records and (2) specify the order of the authors. It was the kind of thing that wrecked people's perception of "the semantic web" before it even got started.

Spartan-S63 · 2024-09-02T02:17:07 1725243427

At the same time, it seems like in some circles Dublin Core is what got metadata standardization started.

I encountered it as an older format in the mid-2010s when working on data discoverability for geospatial datasets. At the time, we were emitting Dublin Core as one format for our datasets in the archive. We were actively transitioning to supporting a more fully featured FGDC format as well as diving into the ISO 19115 standard and translating to that.

In all, I think Dublin Core was, at one point, useful. However, metadata standards have moved forward with more specialized schemas that are more useful for discoverability.

ggm · 2024-09-02T01:20:36 1725240036

I also was massively disappointed. I think it was the "we dont want to argue so lets put the least argumentative join over everything said in the room that one time" outcome.

If you ever play with date-time in images, and dive into EXIF date-time encoding you enter the door of "yes, we know people need to say "around 1800" but we've decided not to make a canonical system for indicating approximate dates, ante- or post- dates, or the necessary mapping into YYYY-MM-DD:HH:MM:SS so instead you can come up with your own non-standard"

And a lot of the discussion points back to the DC

JKCalhoun · 2024-09-02T01:53:54 1725242034

I only knew of Dublin Core as it relates to image metadata. I was told it was popular among photo-journalists (it has tags: publisher, keyword, creator). The more nerdy EXIF metadata is what the camera often provides and tells you about how the image was taken (f-stop, shutter speed), not who or what.

ofrzeta · 2024-09-02T03:07:12 1725246432

Not "who" or "what" but "where" can also be saved in the EXIF data.

mschuster91 · 2024-09-02T07:22:56 1725261776

Now if there were some universal standard for embedding alt-texts into a picture that would also survive right-click "copy image" and paste-uploading... not just for social media (where you'll get blasted for forgetting one even just one time) but also for websites and newspapers where the image description has to be edited manually in a CMS and is, again, not attached to the picture itself...

edent · 2024-09-02T11:04:05 1725275045

There is. The problem is that automatically attached alt-text isn't particularly useful.

You don't know what the intention is of an image. Is it of a banana? A specific banana? The sticker on the banana?

https://shkspr.mobi/blog/2023/07/should-you-embed-alt-text-i...

And, no, this isn't something which AI can help with.

https://tink.uk/thoughts-on-screen-readers-and-image-recogni...

qingcharles · 2024-09-02T16:27:55 1725294475

Fantastic articles, thank you.

I work with alt text a lot. I used a hinted LLM prompt for the first pass, and then I edit if needs be. Sometimes it misses obvious details, other times it sees things (most) human eyes would never see. Like all tools, just have to be careful how you use it.

mschuster91 · 2024-09-02T20:29:38 1725308978

My use case is "someone shares a dank meme on twitter, i save it to share it on bsky", and every time I have to copy the alt-text myself.

SSLy · 2024-09-02T13:28:24 1725283704

Photographer entry can be done in camera-specific section, it's recognized by a lot of parsers

gnz11 · 2024-09-02T13:40:02 1725284402

I think most of the news media has long switched to IPTC meta data.

JKCalhoun · 2024-09-02T14:12:31 1725286351

You're right and I am wondering now if I was confusing IPTC with Dublin Core. (Oops.)

anotherhue · 2024-09-02T01:33:03 1725240783

Named for Dublin, Ohio in case you were wondering.

No need for comments about the deterioration of O'Connell St.

deepfriedbits · 2024-09-02T02:56:35 1725245795

Assume it's connected to OCLC, the Ohio Library College Center, in Dublin?

dredmorbius · 2024-09-02T06:17:27 1725257847

Yes:

OCLC’s research staff were instrumental in the development of the initiative, starting with a hallway conversation at the 2nd International World Wide Web Conference in late 1994. OCLC researchers Stuart Weibel and Eric Miller, OCLC Office of Research Director Terry Noreault, Joseph Hardin of the National Center for Supercomputing Applications (NCSA), and the late Yuri Rubinsky of SoftQuad, were remarking about the difficulty of finding resources on the Web. Their discussion provided the impetus for development of the Dublin Core Metadata Element Set, now known simply as “Dublin Core,” now an international metadata standard.

<https://www.oclc.org/research/activities/dublincore.html>

defrost · 2024-09-02T04:29:28 1725251368

I was hoping it was a genre to rival the Limerick Grind of the Rubber Bandits.

tejtm · 2024-09-02T07:32:22 1725262342

Well it still just got my eye twitching.

Not really Dublin core itself, it is one of the more sedate ontologies.

But to keep up with the pack they did err, refine and expand into "terms" and "elements" but not cleanly so we will forever have the same labels for nodes in different Dublin core name spaces.

You can have academic reasons till the cows come home but the bottom line is:

                  You had one job.

```

    The four DCMI namespaces are:

    http://purl.org/dc/elements/1.1/ The /elements/1.1/ namespace was created in 2000 for the RDF representation of the fifteen-element Dublin Core and has been widely used in data for more than twenty years. This namespace corresponds to the original scope of ISO 15836, which was published first in 2003 and last revised in 2017 as ISO 15836-1:2017 [ISO 15836-1:2017.
    http://purl.org/dc/terms/ The /terms/ namespace was originally created in 2001 for identifying new terms coined outside of the original fifteen-element Dublin Core. In 2008, in the context of defining formal semantic constraints for DCMI metadata terms in support of RDF applications, the original fifteen elements themselves were mirrored in the /terms/ namespace. As a result, there exists both a dc:date (http://purl.org/dc/elements/1.1/date) with no formal range and a corresponding dcterms:date (http://purl.org/dc/terms/date) with a formal range of "literal". While these distinctions are significant for creators of RDF applications, most users can safely treat the fifteen parallel properties as equivalent. The most useful properties and classes of DCMI Metadata Terms have now been published as ISO 15836-2:2019 [ISO 15836-2:2019]. While the /elements/1.1/ namespace will be supported indefinitely, DCMI gently encourages use of the /terms/ namespace.
    http://purl.org/dc/dcmitype/ The /dcmitype/ namespace was created in 2001 for the DCMI Type Vocabulary, which defines classes for basic types of thing that can be described using DCMI metadata terms.
    http://purl.org/dc/dcam/ The /dcam/ namespace was created in 2008 for terms used in the description of DCMI metadata terms.
 ```

aorth · 2024-09-02T07:18:13 1725261493

Dublin Core is the main metadata schema for many institutional repositories, for example the DSpace platform https://github.com/DSpace/dspace. The schema essentially only covers basic bibliographic metadata and has a strong pre-digital library feel to it. We end up augmenting with other custom schemas to be able to describe content in our repository, for example podcasts and journal articles with different issue and online dates, as well as extra metadata like author affiliations, funders, internal programs etc.

knadh · 2024-09-02T03:09:08 1725246548

Omeka-S[1] is an open source publishing platform (for museums, libraries, artifact collections etc) that has first class support for Dublin Core.

Dublin Core “clicked” for me when we started using it in Omeka to publish a collection of digitised books online[2].

- https://omeka.org

- https://gpura.org

kkfx · 2024-09-02T16:15:42 1725293742

A summary: it's a successfully failed idea we can reach a day something like semantic search thanks to carefully written metadata, or we can classify anything to make anything easy to retrieve as a single information atom (a book, a report, a map) or even inside it finding just the bit of information we look for not in a single atom but across many.

Another Library of Babels/Biblioteca universalis by Conrad Gessner (~1545). Unfortunately while in theory the system could work we can't ensure anyone use is WELL in practice and just some metadata to classify anything in a coherent way it's far from being enough.

DC was an immense diplomatic effort in library science still with a damn limited practical outcome.

rjsw · 2024-09-02T11:33:57 1725276837

If you are adding metadata to a schema then you may as well copy Dublin Core instead of creating your own incompatible equivalent from scratch.

kcartlidge · 2024-09-03T17:07:01 1725383221

EPUB (even v3) uses some Dublin Core metadata. Unlike the page the thread refers to it's presented as `<dc:title>` not `DC.Title` but other than that it's the same thing. Only a handful of tags are used.

    https://idpf.org/epub/30/spec/epub30-publications.html#sec-metadata-elem

turnsout · 2024-09-02T02:13:10 1725243190

I’ve been asking this question for literally 20 years and still haven’t come up with any answer. I’d love to support it, but… why?

riffraff · 2024-09-02T04:17:55 1725250675

The article gives a simple reason: some things support it. If you want nice snippets in google search, Instagram, zotero etc.. you can use it.

The real reason was always "so you can publish data people can use programmatically" but turns out this doesn't work cause the interest of publishers ("visit my site") and the interest of consumers ("I want an answer") are not aligned.

astrange · 2024-09-02T03:39:38 1725248378

It seemed people in the 2000s had some sort of mystical belief about computers that if you took all your data and put it in some kind of XML format it would give you good karma.

porker · 2024-09-02T05:56:41 1725256601

We did. I don't think we comprehended how others would use it for their own commercial gain.

turnsout · 2024-09-05T14:15:05 1725545705

I 100% wanted those brownie points!

westurner · 2024-09-02T14:38:41 1725287921

Do regular search engine index DCMI dcterms:? Doe Google Scholar or Google Search index schema.org/CreativeWork yet?

lakomen · 2024-09-03T11:24:46 1725362686

Absolutely nothing.

The only meta tags that matter are the opengraph and Twitter tags.

Those get turned into rich media on social networks.

todfox · 2024-09-02T15:56:23 1725292583

Absolutely nothin'!