1. Data Model of the EEA Terminology Database

Independent of any particular implementation (see chapter 4), a data model for the terminology database (TDB) of the EEA needs to be specified. For this purpose ISO DIS 12620 "Terminology - Computer Applications - Data categories" (ISO 1995) is applied. This document will have to be used in its entirety, thus it will be made available upon request to the project partners by ISEP. The data model specified here is based on a "pragmatic simplification" of the categorization of ISO 12620, but is fully compatible with it.

This "pragmatic" view mentioned above is governed by the multidimensional criteria described in chapter 2 as far as user needs and requirements are concerned.

Since the data model is independent of any implementation, it is actually a meta-model, that can be transformed into different data models for their implementation in specific software environments and within specific computer paradigms (relational, object-oriented, etc.). Thus the presentation of the meta-model is deliberately in free form and does not reflect any particular modelling technique. This meta model is governed by the methodology of terminology management, since it focuses on what is contained in a single entry (information on a particular concept and its representations in a language, equivalencies in other languages, conceptual relationships, etc.).

The components of the meta model are data categories and the content-driven relationships among them.

meta-language: the language that is used to present, name and describe terminological information, the language of a particular field name in an entry, e,g. "synonym" is the English name of the data field, that contains synonyms to main entry terms

object language: the language that is used to represent the data to be recorded. An entry includes several language sections, e.g. equivalent terms in 5 languages of a particular concept. They are independent of the meta-language, i.e. the language that is used to present this information.

entry: in a terminology database an entry contains all information on a particular concept. The conceptual relationships are expressed in cross-references among entries.

subject field: area of human knowledge to which an entry is assigned. May be assigned freely by the originator or according to a common classification scheme or a thesaurus. Is used for retrieval operations with filters, so that all entries on a particular topic can be retrieved by a simple operation.

The following values can be chosen from, implemented as a pick list. The list is only a proposal for the demonstration version of the system, going well beyond the scope of environment as such and including related subject fields as well, a different list of subject fields can be agreed upon.

  • Environment (general) Environmental Protection
    Environmental Policy
    Environmental Science
    Environmental Legislation
  • Biology Zoology
    Botany
    Microbiology
  • Genetics, Genetic Engineering
  • Medicine, Health Health Hazards
  • Chemistry
  • Biochemistry
  • Hydrology

  • Earth Sciences Soil
    Geology
  • Agriculture
  • Forestry
  • Fisheries

languages (this list is open to be extended, for this document it contains the official working languages of the EU plus Norwegian):

  • English
  • German
  • French
  • Italian
  • Spanish
  • Danish
  • Dutch
  • Swedish
  • Finnish
  • Norwegian
  • Portuguese
  • Greek

term: designation of a concept in a particular language. Also contains multiword expressions.

definition: statement that describes a concept and permits its differentiation from other concepts.

context: a text or part of a text where the term is used. Usually gives important information on the conceptual meaning of the term and how it is used in discourse.

synonym: any term that represents the same or a very similar concept as the main entry term (= term). It is often useful to specify the degree of synonymy in a note.

short form (as opposed to full form): an abbreviated form results from omitting part of the term, while the full form is the complete representation of a term that has an abbreviated form: short forms are useful information because they are frequent in discourse, but rarely explained.

restriction on term: here geographical usage is included: many terms have different variants within languages, that are used in several countries (see following list).

  • French France
    Belgium
    Switzerland
    Canada
  • Italian Italy
    Switzerland
  • German Germany
    Austria
    Switzerland
  • English United Kingdom
    Ireland
    United States
    Canada
    Australia
    New Zealand
Note: in the data model for the thesaurus under development descriptors with such a geographical restriction are called "national variants"

term status: indicates normative authorization, temporal and frequency and other qualifiers:

standardized (officially recommended term contained in an international, European or national standard document or documents of equivalent normative nature)

deprecated (term that is recommended not to be used anymore - this may have different reasons, e.g. a misleading term, ambiguity and multiple meanings)

suggested (proposal of a new term - in most cases for new concepts)

obsolete (terms that fell into disuse in professional discourse)

neologism (newly coined term recently introduced in professional discourse)

vernacular (word from general language used as a term denoting a specific concept)

in-house (term introduced and used only within a certain institution (companies, etc.), a network of institutions or persons. Note: in such cases the name of this institution should be given in the source field.

trade-mark (registered trade mark or trade name, important for copyright reasons)

international scientific term: a term that is part of an official international scientific nomenclature, such as zoological and botanical nomenclatures, chemical nomenclatures and medical nomenclatures. In many cases they co-exist with a vernacular term that is used in non-scientific contexts. Since a terminology database is used by different user groups, it is important to record both terms.

source: bibliographic information on the source of any piece of information in an entry

Note: in one and the same entry different sources might be listed, e.g. one for a term, another source for its definition.

note: remark on any piece of information in an entry, e.g. statements of warning to users that a particular term is overlapping with another one in their conceptual meanings.

equivalent: term in another language than the original object-language denoting the same concept. It is important to note here that in many cases there is no full, but only partial equivalence, since in many cases the conceptual meaning of one term is different from the other term. In discourse this means that one term is only equivalent to another term in another language in specific contexts. An indication of the concrete equivalence relationships is especially needed by translators, but it is useful to all users of a terminology database.

date of entry: administrative information, usually automatically recorded, includes dates of revision

originator: author of any piece of information included in an entry.

relation to other concepts: it is a cross-reference to other (conceptual) entries. Different types of conceptual relations may be distinguished (see list of values in category "concept position", below. In this field the "other" entry (target entry) is explicitly mentioned.

concept position (indicates the kind of relation of the entry in question to the target entry that is recorded in the field described above (relation to other concepts):

Thesaurus information

(Information on thesaurus data or terminological information designed for inclusion in the environmental thesaurus, motivated by the task of providing terminological support to CB5 of the ETC/CDS project work)

thesaurus name: identifier of a thesaurus that is either the source of terminological information or the target of information contained in the respective entry in the terminology database. Note: Do not confuse this with the source field described above

descriptor: term in a thesaurus that is the preferred term serving representation of a concept to be used for indexing any kind of information and for retrieving it in search operations in respective databases

non-descriptor: synonym of the descriptor that refers the user to it. Non-descriptors may not be used for indexing and searching.

top term: descriptor that has the highest possible position in a conceptual hierarchy of a thesaurus.

broader term: descriptor that is higher in the conceptual hierarchy than another descriptor. Abbreviated: BT

narrower term: descriptor that is lower in the conceptual hierarchy than another descriptor. Abbreviated: NT

related term: descriptor that is related in some way to another descriptor. Abbreviated: RT

Note: The type of relationship is not indicated, such information is usually only collected in terminology databases.

scope note: brief conceptual explanation of a descriptor.

Note: definitions as they are found in terminological dictionaries, terminology standards and terminology databases are more strict in their formal structure and often more detailed. Thus terminological definitions can be used to derive from them scope notes to be included in a thesaurus.

Scope notes are only included in a thesaurus if the concept in question is expected by the thesaurus managers to be difficult to understand by the users. Terminological definitions, however, are recorded in terminology databases whenever they are available.

Remarks

It must be noted that this list of information categories might give the impression that terminology work is too complicated and detailed. In reality each entry only contains a specific and relevant selection of certain categories: not all terms have synonyms or short forms, for instance. The abstract data model, on the other hand, must contain all possible categories, even if some of them are rarely used in real terminology work.

The consistent application of these categories in all terminological activities in EIONET is a major prerequisite for all users to derive maximum benefit from actively contributing terminological entries and retrieving them for use in their actual work. This consistency is also needed for the exchange of terminological information as a tertium comparationis, since existing databases containing such information are usually modelled in very different ways with (seemingly) incompatible categorizations of terminological information.

In Figure 2 the lines between the nodes represent a specific relationship that is explicitly given: a single entry represents one concept and may have links to any other entry. The cross-reference indicates the type of relationship. While some categories will have only one value in each entry (date of entry, subject field, etc.), others may have several values (and are thus repeatable by language, such as term, or repeatable within a language, e.g. several synonyms of a term, conflicting definitions found in the literature. Information on sources and notes is combinable with any other information. The diagram shows the logical relationships among all data categories.

Click here for Figure 2 - Meta-Data Model of the Terminology Database in free form presentation


Back to Contents