From Athenawiki

Jump to: navigation, search

Contents

Athena Format

Presentation

The Athena Format is the format that Athena Thesaurus is expressed with. This format is here proposed to the museums who want to map their own terminology with Athena Thesaurus. In this case, they have to use Athena Format in order to form their descriptions before mapping. As a SKOS-compliant format, Athena Format guarantees to the museums that their descriptions respect the relative Europeana requirement regarding SKOS.

Although SKOS is a basic structure for the formal representation of controlled vocabularies, it can be extended and customized very easily to have a more precise description of the terms and also include lexical elements related to these terms. The ATHENA Format is mainly based on the SKOS core data model, and it has been inspired by the museumvok format in order to include some specific details.


Format

Metadata

The metadata part of the ATHENA Format is intending to provide administrative information on the terminology that has been converted in SKOS.

Image:Athena_Format_Metadata.jpg

These elements are borrowed from the Dublin Core data model (with the prefix "dc:") and provide details about the terminology. Designing the metadata of the terminology in Dublin Core could eventually enable the OAI harvesting in the context of a repository or database of lexical or terminology resources.

The last element is not part of the Dublin Core data model but may be useful to check if the SKOS version of the terminology has a 'valid', 'in validation' or 'draft' status. This set of elements is defined in order to get the same information for all the terminology resources that will be transformed into SKOS within the ATHENA project.

Concept

Image:Athena_Format_Concept.jpg

As we already said, the concept is the central element of the SKOS data model. The data model makes a distinction between classes and properties. The first items of the table above skos:Concept and skos:ConceptScheme are classes whereas the next items (skos:inScheme, skos:hasTopConcept, skos:topConceptOf) are properties.

The property skos:topConceptOf is set in italics because it is the inverse property of skos:hasTopConcept then duplication of these two properties for linking two same concepts is not useful. Therefore this property is optional. When a property is the inverse of another one it supposes that only the subject or the object of an assertion need to have the mention of the property. A same concept cannot have these two properties at the same time with the same object. For example: A skos:hasTopConcept B B skos:topConceptOf A

These two assertions express the same information then it is possible to use only one of them and avoid duplication of information.

Collection

Image:Athena_Format_Collection.jpg

The class for ordered collection and the corresponding property are set in italics to highlight that this is a possibility offered by the SKOS data model but it has to be used only if the order of the concepts within the collection is really relevant.

As we intend to bring together very different terminologies with very different scopes, the notion of collection may be useful to set these concepts as groups within the ATHENA Thesaurus. Indeed, some terminologies are only used for indexing, others are designed to improve information retrieval. Some terminologies are aiming at professionnals whereas others are reachable by general public. The notion of collection can help to bring consistency among this diversity and give a facility to create thematic groups.

Description

Image:Athena_Format_Description.jpg

In this description block, we include the three different types of lexical labels. The preferred label is set in bold font because we define it as a mandatory property for the ATHENA Thesaurus. As we saw in the SKOS section, the SKOS data model does not force the use of labels for expressing a concept since a concept can be defined only through its semantic relations. But in the context of the ATHENA Thesaurus which is made from existing thesauri, the migration from descriptors to labels should be done carefully. Then we consider that the use of a preferred term is mandatory. We define the skos:notation property as optional since we gathered very few classifications during our inventory phase and therefore we privilege the use of labels instead of notations.

skos:note is the most generic type of note, then in order to force a more precise description of terms we set this property as optional in the ATHENA format. skos:historyNote is mainly dedicated to keep track of diachronic evolution of terms. As the terminology resources gathered for the ATHENA thesaurus do not provide this information in most of the cases, we set this property optional. Also, there might be a confusion between the skos:historyNote and the skos:changeNote; the skos:changeNote is mainly used to keep track of the evolution of description of a concept, e.g. a change in the labels used to express this concept or a change in its semantic relations.

Almost all the documentation properties have been included in the ATHENA Format since it is important to keep as much as possible of the information from the source terminology in order to keep track of the versions and changes of concepts.

As recommended by the SKOS data model, the language tags introduced in RDF by the @xml:lang attribute, are set as mandatory in the ATHENA Format in order to enable the multilingualism and highlight the linguistic richness of the resources that will compose the ATHENA Thesaurus. This attribute will be used for the labels and the documentation properties as well.

Relation

Image:Athena_Format_Relation.jpg

These semantic relations constitute the core and the strength of the SKOS data model, and then it is logical to emphasize them in the ATHENA Format. Although the transitive properties skos:broaderTransitive and skos:narrowerTransitive are set in italics, since they may be useful to make transitive assertions, the use of these properties is optional.

Mapping

Image:Athena_Format_Mapping.jpg

As for the semantic relations, the mapping properties constitute the essence of the SKOS data model. Then these properties will be used in the ATHENA Format to make alignment links between concepts from different concept schemes.

This section presented the format that will be used for all the terminology resources gathered during the inventory phase initiated for the first WP4 deliverable in order to constitute the ATHENA Thesaurus. We wanted here to emphasize that the main features of the SKOS data model are reused in this format but although some of these features are made mandatory or optional in the framework of the ATHENA Thesaurus in order to get a homogeneous description of very heterogeneous terminologies.


SKOS: some examples

RAMEAU

RAMEAU
RAMEAU Concepts are described here according to the SKOS model to represent knowledge organization systems on the semantic web. Each concept comes with labelling information (either preferred or alternative labels) but also semantic relations to other concepts (broader, related) and various kind of notes.

EuroVoc

EuroVoc
The Eurovoc thesaurus is a multilingual, polythematic thesaurus focusing on the law and legislation of the European Union (EU). It is maintained by the Office for Official Publications of the European Communities and it is accesible in 21 official languages of the EU. Within the EU, the Eurovoc thesaurus is used in the Library of the European Parliament, the Publication Office as well as other information institutions of the EU. Moreover, the Eurovoc thesaurus is used in the libraries and documentation centers of national parliaments e.g. Spanish Senate) as well as other governmental and private organizations of member (and non-member) countries of the EU. Eurovoc exists in 22 official languages of the European Union (Bulgarian, Spanish, Czech, Danish, German, Estonian, Greek, English, French, Italian, Latvian, Lithuanian, Maltese, Hungarian, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene, Finnish and Swedish) and one another language (Croatian).

This page was last modified on 18 April 2011, at 20:24.This page has been accessed 14,483 times.