SeMuSe the Future of Semantic Museum Data

Executive Summary

SeMuSe is an open and collaborative community based project to work on a Semantic Museum vision, and provides a forum for discussion of the future of applied cultural and natural heritage data management. Members of SeMuSe can greatly benefit from advancements made in the Semantic Technology community. The goal of SeMuSe is to help organizations and practitioners to introduce Semantic Technologies and concepts to cultural and natural heritage data management efforts and to capitalize on the results of more than a decade of Semantic Technology research. Emerging technology standards like RDF, RDFS and OWL and domain specific vocabularies such as museumdat and the CIDOC CRM ontology specification are a marriage made in Semantic Technology heaven, allowing to lead semantic cultural and natural heritage data management to its full potential – SeMuSe.



SeMuSe [1] is an open and collaborative community based project to work on a Semantic Museum vision, and provides a forum for discussion of the future of applied cultural and natural heritage data management. The goal of SeMuSe is to help organizations and practitioners to introduce Semantic Web technologies and concepts to cultural and natural heritage data management efforts.

In 2000 I started to work on a project to bring a significant historical event of Irish cultural heritage, the Easter Rising 1916, to the web as part of my PhD research. At the time Enterprise Ireland, an Irish industry and research funding agency, was ahead of the game in realizing the potential of the web and what global information and communication technology could bring to the market place and to preserve cultural identity in the 21st century. It was early on in the project that it became clear that the multifaceted relationships between data for cultural heritage objects and different media artifacts resembled a network like data structure and would require a new approach to the management of distributed data on the web. Around the same time, the Resource Description Framework (RDF) was envisioned by the W3C, European Union and DARPA research projects. By 2008 the growing number of available semantic technology services and appropriate tools made it reasonable to predict that RDF, RDFS and OWL would have a tremendous impact on the cultural and natural heritage data management landscape. This was when SeMuSe formed to help organizations to capitalize on the results of more than a decade of Semantic Technology research.

In the past, institutions like the Getty Foundation acted in many ways as gate keepers to the riches of our cultural and natural heritage documentation with its proprietary vocabularies such as the Art & Architecture Thesaurus (ATT), Getty Thesaurus of Geographic Names (TGN) and Union List of Artist Names (ULAN). But documentation standards for museums did not exist in the same way they evolved over centuries in the field of Library and Information Science for archives and libraries. This is indeed a result of a short history of museum institutions as independent organizations. Museums in the beginning and heydays of the 19th century were mostly – and in particular in the US – privately owned organizations and still are. There was no organizational need to align and define a common vocabulary or even ontology across institutions. Most transactions such as acquisition, accession, loans and exhibitions were managed with the help of paper documentation and loan slips. This situation changes with the increasing digitization of archives and collections and the electronic transfer of information between institutions. We are now at the point where domain ontologies meet the web and the open world assumption. This will critically change the picture for future content management practice in the field of professional documentation.

On the technical side, the dominance of the relational database in the museum field led to an application of an inappropriate data model for managing network like data. The not unexpected solution to the problem comes via the CIDOC CRM [2] which was initially conceived as an implementation agnostic conceptual reference model and originated in the object oriented research domain. In the ’90s this coincided with the maturing of object oriented databases and led to a focus on object oriented implementation solutions. The Semantic Web and its fundamental building block, RDF, introduce a number of concepts for data modeling that are not compatible with either conventional relational or object oriented database approaches. To represent hierarchical data you will commonly find the approach to store path expressions in database records that are not part of the relational database logic and further more you will find the adjacency list model, data dictionaries and tree traversal algorithms etc to deal with linked lists and metadata in database tables. The domain logic was typically customized to accommodate localized requirements. This still is an obstacle in data integration and re-usability. The introduction of XML eases the situation by providing a common transport syntax for data, taxonomies and classification hierarchies, but is a far cry from a solution to achieve semantic interoperability. XML vocabularies emerged, such as museumdat [3] and CDWA Lite [4], as so called harvesting formats to help to optimize retrieval and publication, and the automatic delivery of core data to museum portals. Museumdat is largely built on CDWA Lite, which was developed by the Getty and others, with a specific arts focus. Museumdat now applies for all kinds of object classes, such as cultural, technology or natural heritage, and is compatible with the CIDOC CRM and is an outcome of the working group Documentation of the German Museums Association (DMB).

The tools have now reached a level of maturity to appropriately implement the CIDOC CRM and to make it actionable in a Semantic Technology framework. The following extract gives background information and describes the objective of the current CIDOC CRM specification [5], which provides an ontology with an RDFS compliant serialization. The CIDOC CRM facilitates the integration, mediation and the exchange of heterogeneous cultural heritage data and is the result of work by the International Committee for Documentation (CIDOC) of the International Council of Museums (ICOM). The work on the CIDOC CRM began in 1996 with the endorsement of the ICOM-CIDOC Documentation Standards Working Group. In 2000, the development of the CIDOC CRM was officially delegated by ICOM CIDOC to the CIDOC CRM Special Interest Group, which collaborates with the ISO working group ISO/TC46/SC4/WG9, to bring the CIDOC CRM to the form and status of an international standard. The CIDOC CRM aims to provide the semantic definitions and clarifications to enable the transformation of heterogeneous, localized information sources into a coherent global resource. Its perspective is supra-institutional and abstracted from any specific local context. This goal determines the constructs and level of detail of the CIDOC CRM. More specifically, it defines and is restricted to the underlying semantics of database schemas and document structures used in cultural heritage and museum documentation in terms of a formal ontology. It does not define any of the terminology appearing typically as data in the respective data structures; however, it foresees the characteristic relationships for its use. It does not aim at proposing what cultural institutions should document. Rather it explains the logic of what actually is currently documented, and thereby enables semantic interoperability. It intends to provide an optimal analysis of the intellectual structure of cultural documentation in logical terms. As such, it is not optimized to implementation-specific storage and processing aspects. The aim of the CIDOC CRM [5] is to inform developers of information systems as a guide to good practice in conceptual modeling, in order to effectively structure and relate information assets of cultural documentation. The CIDOC CRM serves as a common language for domain experts and IT developers to formulate requirements and to agree on system functionalities with respect to the correct handling of cultural contents and serves as a formal language for the identification of common information in different data formats; in particular to support the implementation of automatic data transformation algorithms from local to global data structures without loss of meaning. The latter being useful for data exchange, data migration from legacy systems, data information integration and mediation of heterogeneous sources.

The CIDOC CRM is extensible and users are encouraged to create extensions for the needs of more specialized communities and applications. The most recent implementation of the CIDOC CRM, the Erlangen CRM [6], pursues an OWL-DL1.0, an OWL Description Logic subset. The Erlangen CRM is available for download and can be readily used to annotate cultural and natural heritage data. The CIDOC CRM, Erlangen CRM and museumdat are endorsed and actively discussed in the SeMuSe community.