From Athenawiki

Jump to: navigation, search

Contents

General context

In this part we present the general context of a museum dealing with its own collection. This basic knowledge will help the reader to understand our general approach and really benefit from our recommendations. In order to optimize their intelligibility by museum people, we have chosen to employ a "you-style" phrasing as if the reader is effectively someone working in a museum with no strong skill in Information Engineering and Linguistics. So by now let us say: "you are a museum representative". Moreover, at any step of the presentation, we use and enrich our recommendations with some examples, so that the information shall appear more concrete.

The following information has been gathered and consolidated all along the WP4 activity. Here we synthesize the basic knowledge in five sub-parts, and each of these sub-parts answers a simple question:

  • What technological reality you are working in?
  • What do we call "terminology management" in your case of museums?
  • What is a datamodel in relationship with terminology management?
  • What are the different types of terminology you can use?
  • How a terminology and a datamodel are connected?


Technological reality

Social Web

Nowadays your are certainly aware of, even familiar with, the so-called Social Web or Web 2.0. As an evolution of the primar Web, the Web 2.0 has permitted the emergence of networks of people who are meeting and instantly exchanging online on different platforms like Facebook, Twitter or LinkedIn. After having offered an access to information spread around the world, the Web has allowed new kinds of social relationships.

Semantic Web

Then these last years a new trend has appeared: the Semantic Web, also known as Web 3.0. This new version of the Web is the new environment your digital resources will be exploited in. Now they are living in a world of connected pieces of knowledge more than a on a network of pieces of information. Roughly speaking, yesterday your digital resources were simply and blindly connected, today their relations with the network can have an explicit meaning. The hyperlink is becoming semantic.

More technicallcy, as we presented it in the deliverable D4.2, the Semantic Web (part of Web 3.0) is "the Web of data with meaning in the sense that a computer program can learn enough about what the data means to process it" . It provides "a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by World Wide Web Consortium (W3C) with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming. It was proposed by World Wide Web inventor Tim Berners-Lee" .

If you want more technical information about Semantic Web, please see the dedicated section...

Open Linked Data

In the world of Semantic Web, a new "philosophy" is getting to become the reference you should know when you want to link your digital resources with the ones already available online. This initiative called Open Linked Data (LOD) is so important that Europeana considers it as critical for the success of its policy. From a general point of view, LOD participates to the evolution of the Web which is no longer a flat list of data but a structured access to all the available resources.

Open Linked Data addresses a set of rules, tools and recommendations to the content providers (like museums). Among all of this, most of all you can keep in mind that all the data you want to provide to Europeana have to be named and linked. Our recommendations below help you to complete these required actions before the ingestion of your data on that platform.

Formats

In order to be part of the Linked Data "cloud" and use Semantic Web technologies the terminology of an institution has to be in compliant format. When you want to represent or model your terminology, and to exploit it on the Web, you have to use a format standard. The most commonly used format standards are SKOS, OWL, RDF, and XML. Some of them can be combined, and some of them can be wrapped by others. Using a format standard will result in the metadata, expressed with your terminology, being effectively represented in a way the Web technologies can recognize and interpret.

Below are brief descriptions of these format standards with the aim of a better understanding of their connections.

XML



Museums case

You are a European museum and you intend to make your collections available on Europeana. Now, some of your collections are natively digital and some others are not. Many European projects such as Minerva or MICHAEL have raised the importance of the digitisation as much for the access by the general public all over Europe than the cultural heritage preservation. Europeana, as the digital European library gives only access to digital elements. This is the step of digitization which enables you to integrate your physical collection into your digital one. But this is not enough. After the digitization, you need to manage your documentation (description of objects, references...) and transform it into a set of metadata related to all your digital objects. So that you finally have a collection of digital resources for managing your collections in your own database or system and then make them available to Europeana.

Image:digitization.jpg

Ex: You have collections of paints, sculptures and manuscripts hold by your museum. In order to make these collections available on the Web, and especially on Europeana, you digitize them by producing photographs, 3D renderings, OCR -generated texts, and by fulfilling digital notices for the complete catalog. All these elements are your digital resources intended to be available on the Web.

More precisely, you intend to make your collection of digital resources available and retrievable on Europeana. Basically, when you want to reach such a goal, first you have to prepare your digital resources. Indeed your resources are not natively compliant with the Europeana's requirements. Athena project aims to help you specifically at this stage by offering you tools, guidelines and recommendations.


Image:ensemble_simple.jpg Then, in order to prepare your digital resources for ingestion in Europeana, you have to take care about two general aspects. The first aspect is technical, it consists in guaranteeing the access to every digital resource as an object among a collection of objects. The second aspect is semantic, it consists in exploiting the content of any digital resource as a meaningful element in a collection. This specific aspect of semantic re-use of digital resources is developed further in the document.


Ex: In your collection of paints you notably have a Greco's piece of work: Saint Louis, Roi de France. You want to make compliant in particular one digital photograph of that work with Europeana requirements: you are going to prepare that picture both technically and semantically.
Image:ensemble_double.jpg
Image:ensemble_double_ex.jpg


Technical preparation

As you can read in the parallel recommendations jointly provided by Athena WP3 and WP7, the technical preparation requires identifiers and referencing of the collection items. Indeed, the technical access to the digital resource on Europeana implies its identification as a singular object among a mass of items, and its cataloguing as an element of the collection.

You can refer to the documents and guides elaborated by the WP3 of Athena for more information on this technical preparation.

The identification name system you can use depends on the type of works you manage, and on your administrative constraints (e.g. if your information system is based on ARK instead of PEARL, a subscription step in the identification process impacts the identifier name).

Such an identification is used by the datamodel which enables you to declare any item of your collection as a singular element.

Ex: You are technically preparing the digital photograph of Saint Louis. It means you register the digital file in your file management system. Your identification system based on ARK provides a unique identifier for that picture: http://gallica.bnf.fr/ark:/12148/btv1b6904167m

Then you use this identifier when you employ your datamodel LIDO in which are referenced your collection ("French Collection"), the work title ("Saint-Louis, Roi des Francais"), its type ("Paint"), and the related classification as well ("My Italian Paints").

Image:Ensemble_datamodel.jpg Image:Ensemble_datamodel_ex.jpg


Semantic preparation

The semantic preparation requires a description of the digital resource as a meaningful element in an editorial collection. It implies the use of a terminology enabling you to feature your digital resource thanks to terms, even concepts and relations in order to contextualise them. We call terminology management all the activity consisting in the handling of the semantic description of the digital resources.


Ex: You are going to semantically prepare your digital picture of Saint Louis. You want to join metadata to that file in order to express that the author is Il Greco, that it has been painted in 16th century in Italy, and that it belongs to the artistic period called "Renaissance".

So that you use your usual terminology to fulfil correctly your metadata schema by giving the value "Il Greco" in the field Author, "16th century" in the field Creation Date, "Italy" in the field Creation Country and "Renaissance" in the field Artistic Period.

Image:Ensemble_terminology.jpg Image:Ensemble_terminology_ex.jpg


Datamodel presentation

Here we provide the reader with general presentation of what a datamodel is and what especially LIDO is. All these coming information come from the documents produced in Athena by WP3.

General presentation

As it is introduced in WP3 report "Standards landscape for European museums, archives, libraries", we can consider that a datamodel in general helps identifying a collection object by giving a core set of information respecting the format Dublin Core (DC). Namely, 9 out of the 15 DC elements are used in the descriptions.

These elements are:

  • Title: The name (or names) under which the standard is known. In most cases both the abbreviated and the full name is listed.
  • Creator: The name of the organisation or individual who originally created the standard.
  • Publisher: The name of the organisation that makesthe standard publicly available.
  • Date: The date on which the standard was originally published.
  • Identifier: A number or other identifier under which standard is published or a URL which points to the definition of the standard.
  • Rights: Whether rights restrictions, e.g. patents, apply to the standard.
  • Description: A textual description explaining the standard and its usage.
  • Subject: Keywords that identify the nature of the standard.
  • Relation: Other standards that this standard relates to, and associated websites.

The Dublin Core is a simple metadata element set intended to facilitate discovery of electronic resources. Elements can be grouped into those having data on: Content - Coverage, Description, Type, Relation, Source, Subject, Title; Intellectual Property - Contributor, Creator, Publisher, Rights; Instantiation - Date, Format, Identifier, Language. Its use has been mandated by several governments in Europe (e.g. UK) and throughout the world (e.g. Australia).

LIDO

Among all the existing standards of datamodel, we particularly recommend LIDO (Light Information Describing Objects) to the European Museums. There are four main reasons.

First of all, this datamodel has been defined by Athena WP3 specifically for the museums. Mixing elements coming from Spectrum, MuseumDat and DC, LIDO takes into account the specificities of your situation.

Then LIDO is already mapped with the Europeana datamodel ESE (Europeana Semantic Elements) and available on the data ingestion platform (Athena Ingester). So if you use LIDO you are quite sure not to have to worry about the compliancy with Europeana today.

Moreover, LIDO offers more possibilities than DC to describe efficiently your digital objects since it is conceived as a set of classes gathering fields. These classes are: Object Classifications, Object Identifications, Events, Relations, Administrative Metadata.

Finally, LIDO with its classes will be easy to map with the next Europeana datamodel. Indeed Europeana is currently releasing a new datamodel, EDM (Europeana DataModel) which will progressively replace ESE. EDM offers a class-based structure which is close to the structure of LIDO and perfectly compliant with the Linked Open Data. If you already use LIDO to be compliant with ESE today, tomorrow the transition with EDM will be easy to do.

Types of terminology resources

For your semantic description of your digital resources, different types of terminology are available. We have presented them in detail on the section Definitions. Here we propose a very schematic graph as a short and synoptic reminder.

Image:terminologies_types.jpg

Connexion terminology <--> datamodel

In this deliverable we focus on the semantic aspect, thus we provide recommendations about terminology management rather than datamodel management. However the two aspects are not totally separate, there is a connexion in-between we precise here a bit. This connexion enables you to link the element semantic description to the object technical identification. Indeed the datamodel can transport the semantic descriptions if these descriptions are compliant with its features. The connexion ensures the compliancy of these descriptions with the datamodel.

Ex: Since you want to provide the type of the work Saint-Louis in your datamodel, you have to connect the LIDO field of description Type with your related list of terms structured as following:
  • Peinture (as preferred label in French)
    • Painting (as equivalent label in English)
    • Dipinto (as equivalent label in Italian)
  • Scuplture (as preferred label in French)
    • Sculpture (as equivalent label in English)
    • Scultura (as equivalent label in Italian)
  • Manuscrit (as privileged label in French)
    • Manuscript (as equivalent label in English)
    • Manoscritto (as equivalent label in Italian)

From the theoretical point of view, such a connexion is the link between your grammar (the datamodel) and your vocabulary (the terminology). And from the point of view of the implementation, it means: Whatever the input format of your terminology, there are formats to make your semantic descriptions exploitable by an engine.

This general context we have proposed must help you understand and apply the recommendations given in the next part. Since these recommendations concern only the terminology management, now we have to focus on the specific part of the semantic exploitation of your description.

Image:Ensemble_terminology_focus.jpg

This page was last modified on 28 June 2011, at 13:03.This page has been accessed 15,966 times.