B4. IDENTIFY YOUR CONCEPTS AND VALIDATE THE STRUCTURE
Since you have refined your SKOSified version of your thesaurus by precising the labels, you can now go further by technically identifying your concepts and map them with the datamodel. To do so we advice you to follow the 5 stars scheme proposed by Tim Berners-Lee:
* make your stuff available on the Web (whatever format) under an open license ** make it available as structured data (e.g., Excel instead of image scan of a table) *** use non-proprietary formats (e.g., CSV instead of Excel) **** use URIs to identify things, so that people can point at your stuff ***** link your data to other data to provide context
The W3C define two main steps to proceed to the identification of concepts:
- Creating (or reusing) a Uniform Resource Identifier (URI) to uniquely identify the concept
- Asserting in RDF using the rdf:type property that the resource identified by this URI is of type skos:Concept
Use of a Persistent Identifying System for the definition of the URIs As we described them above, we recommend the use of standards for the identification of the concepts. Indeed, as the identification of concepts is achieved with the definition of HTTP URIs, these URI must be declared to persistent identification systems such as PURL which is normalised. This will also be of a great benefit since it is location-independent, e.g. if the terminology is moved from one location (housing server) to another, the URIs identifying the concepts of this terminology will not have to be modified.
Use of non-explicit URIs It is highly recommended to use non-explicit URIs in order to avoid the reuse of a same URI for identifying two different concepts. Indeed as natural languages are by definition ambiguous and polysemous, it is possible that two different concepts might have two similar labels. The use of explicit URIs supposes that the choice of one specific natural language has been made during the definition or the migration of the terminology which cannot be convenient in a multilingual context.
Suppose your terminology is hosted and managed by your institution but used by several other institutions. You have to define your identifiers so they can state the origin of the concepts (domain name) but also being flexible enough so the other institutions do not have to make any modification if your identification system change. It is better to use non explicit URIs in order to avoid the ambiguity of natural languages. The Bibliotheque Nationale de France (BnF), the French National Library, for example is using the ARK persistent identifiers system (see details below).
Here is an example of URI with ARK from the BnF.
Methods and tools:
Different systems for Persistent Identifiers are in use. Here some information of these main systems:
PURL: A PURL (Persistent Uniform Resource Locators) consists of a URL; instead of pointing directly to the location of a digital object, the PURL points to a resolver, which looks up the appropriate URL for that resource and returns it to the client as an HTTP redirect, which then proceeds as normal to retrieve the resource. PURLs are compatible with other document identification standards such as the URN.
URN: The URN (Uniform Resource Name) is designed to describe an identity rather than a location; for example, a URN may contain an ISBN (International Standard Book Number, used as a unique, commercial book identifier).
NBN: National Bibliography Numbers (NBNs) is a URN namespace used solely by national libraries, in order to identify deposited publications which lack an identifier, or to reference descriptive metadata (cataloguing) that describe the resources. These can be used either for objects with a digital representation, or for objects that are solely physical, in which case available bibliographic data is provided instead.
ARK: The Archival Resource Key (ARK) is a URL scheme developed at the US National Library of Medicine and maintained by the California Digital Library. ARKs are designed to identify objects of any type -- both digital and physical objects. The ARK scheme encourages semantically opaque identifiers for core objects. Unlike an ordinary URL, an ARK is used to retrieve three things: the object itself, its metadata, and a commitment statement from its current provider.
Open URL: An OpenURL contains resource metadata encoded within a URL and is designed to support mediated linking between information resources and library services. This standard is not primarily designed as a persistent identifier/resolver but is described as a metadata transport protocol.
DOI: The Digital Object Identifier (DOI) is an indirect identifier for electronic documents based on Handle resolvers. According to the International DOI Foundation (IDF), formed in October 1997 to be responsible for governance of the DOI System, it is a "mechanism for permanent identification of digital content".
We can see from these short introductions that some of these standards are more adapted to specific field (for instance, URN and NBN are more adapted for the libraries), however standards such as PURL or DOI could be used for definition of URIs. You can also have a look on the booklet "Persistent identifiers: Recommendations for institutions" elaborated by the WP3 of ATHENA.
We invite you to pursue the step by step process by going to the next step: B5: Ensure the documentation of concepts.
The different tasks we are going to detail are:
- B1: Evaluate how far SKOS is compliant with your terminology features
- B2: Roughly SKOSify your terminology
- B3: Define with precision the labels expressing concepts
- B4: Identify your concepts and validate the structure
- B5: Ensure the documentation of concepts
- B6: Map your concepts
- B7: Map your (multilingual) terms
- B8: Validate your SKOSification
You can also navigate through the recommendations by using the synoptic map below. This map will be available on each page of the recommendations process. In order to know the name of a step in particular, just rollover and stay a bit on the very box so that the name appears.