Copyright on terminology and similar data
(contributed by Christian Galinski, Infoterm, Vienna)
Abstract:
This summary report and its recommendations are based on a number of documents prepared by high-level expert groups as well as on the experience gained since 1988 with organizing the preparation of the "Guide to Terminology Agreements" in close cooperation with a number of international organisations of the UN system and many pertinent institutions/organisations as well as with organizing or coorganizing theThe report focusses primarily on the copyright aspect, and only secondarily on other IPRs and neighbouring rights, although any of them can - under certain circumstances and for certain aspects or types of data - apply also to terminology. The contents of the report also applies to data similar to terminology, such as documentation languages (e.g. documentation thesauri) and - to a minor extent - bibliographic and factual data, if expressed largely by terminology or textual data..
Emphasis is further laid on legal, technical and ethical aspects as well as on solutions to immediate practical problems rather than on wider dimensions, such as the protection of intellectual property and the respective intellectual property rights (IPRs) as a major socio-economic and R&D-political issue of the future European Information Infrastructure, not to mention a major conflict area between "information rich" and "information poor" countries at a global scale as implied in the slogan "free flow of Information".
By means of digitalization any representation of information and knowledge (IKR), whether simple or highly complex,
Today traditional "analogous copying" is the exception rather than the rule; most of today's copying processes require already a conversion process into digitalized representation from the outset. The very concept of "copy", therefore underwent substantial changes. The means (or elements of these means, viz. computer hardware and software) for converting and processing data itself are also subject to various kinds of IPRs. This digitalization accompanied by the gradual convergence of information and communication technologies (ICTs) revolutionize the hitherto largely static and linear IKRs into dynamic (e.g. by applying hypermedia and multimedia) and spatial (multi-dimensional) representations.
Increasingly unrestricted possibilities to manipulate (viz. modify, convert or transform,) IKRs make it difficult to distinguish clearly between the "original" (i.e. a "work" meeting the legal requirements with regard to "originality" constituting copyright) and its offsprings. Moreover, the transfer of information via wide-area information networks (WAN) - and the more so via future information super-highways - linking together information producers, re-users (i.e. modifiers, converters, transformers etc.) and users create a global socio-economic situation, where information ultimately becomes a "raw material" that can be further processed - in principle freely - into a fully marketable/commerciable commodity on the one hand, and value-added products and services on the other hand (including hitherto unknown kinds of exploitation).
The subject matter of the international protection of literary and artistic property, better known as copyright, covers works represented in the form of words, music, pictures, three-dimensional objects, or combinations thereof. Practically all national copyright laws provide for the protection among others of:
IPRs are closely linked to the "originator" or "creator" of a "work" ("author" in the case of copyright) who "owns" the respective IPR as soon as the work is created or registered. IPRs can - and as a rule are - transferred to an "exploiter" for commercial or non-commercial exploitation. Users, too, have certain rights (e.g. citation right) - and obligations (e.g. to pay fees).
Specialized information and knowledge (including terminology) can be represented by a variety of linguistic and non-linguistic symbols. In addition there are different IKR levels, such as the basic level of conceptual knowledge (represented by terms etc.) and higher ranks for propositional knowledge, sets of propositions, theories etc. expressed by texts, formula etc. IKRs form IKR units, of which those of the higher levels are as a rule anyhow covered by copyright. Any such IKR unit (e.g. when contained in a database) can be decomposed into smaller and less complex units or elements. In this connection the "smallest unit" to constitute an IPR unit is posing definition, identification and legal problems.
A terminology databases (TDB) is a peculiar kind of database for factual data on concepts (represented by linguistic symbols, such as terms, definitions etc., and non-linguistic symbols, such as graphical symbols, images, formula etc.). Depending on the data model TDBs (and others having similar characteristics from a copyright perspective) are composed of entries and/or records, which again are composed of fields. In TDBs fields and their values (data elements) can be linked in many ways from a formal point of view, as well as from a contents point of view: among others by hyperlinks). But also entries/records can, and often must have formal or other links - sometimes accross different files. The links in most cases have to be established in the course of recording, otherwise the complexity of conceptual knowledge is not sufficiently retained. These links, therfore, in a way also represent intellectual property.
Moreover, the fields of individual entries/records can be taken from different sources (which, therefore, belong to different copyright owners). Last but not least the research efforts carried out in order to find methods for creating new databases by "automatically" re-using individual data stemming from a multitude of existing records - possibly of different databases selected and transformed according to "intelligent" routines, thus creating "new knowledge" - will further complicate the copyright situation.
The types of data that may - not today, but maybe in the future - qualify as IPR units in advanced TDBs are among others:
TDBs as a rule consist of several different files, such as
The documentation language entries are applied for indexing and retrieval purposes to both terminological data and bibliographic data. In addition there may be phraseology files (containing phraseological units together with contexts etc.), factual data files (containing numerical, graphical or textual information) etc. All files and entries/records comprise also data for administrative and other formal purposes, which are less subject to copyright, but indispensible for data management, identification, data security and other purposes not related to contents. Any field containing linguistic representations as a rule can occur with different language equivalents or translations (which may stem from different sources or originators) for the respective field contents. In the case of different writing systems such foreing language elements can also occur in transliterated or transcribed form.
Bibliographic data in hardcopy published form are under copyright in the same way as any monograph or periodical. Individual entries can be used for quotation purposes and other kinds of indvidual use or re-use. In the case of high-level bibliographies with comprehensive entries comprising also keywords, descriptors and/or abstracts, individual entries may be subject to copyright. Lengthy extractions of entries are anyhow subject to copyright. The same applies to bibliographic data available or accessible in electronic form.
Terminological data can be found among others in:
According to existing copyright provisions and jurisdictional practice only the data collection as a whole is subject to copyright in some countries. If individual entries/records contain sentences representing full statements these could in principle be protected under copyright. This situation may radically change with new legislation at European level formulating a sui generis law for the protection of databases.
Strictly speaking any selection and import/input of data from a given publication or database into a(nother) database is normally based on a different data model, which inevitably requires conversion processes, in other words "manipulation" of data. The original copyright is thus easily superseded by the new "original", although this may be largely derived from the original work.
Scanning of data on hardcopy, which is still somehow near to copying, is possible without major problems only in cases, where meticulously designed and unambiguous rules with respect to representation and layout of data were strictly applied. It is difficult already with bibliographic data (which have a comparatively unsophisticated structure) and so far proved next to impossible with terminological and lexicographical data, where in addition to the "visible" data much information, especially links between fields etc. is hidden (and more or less only "retrievable" via the human eye by the human brain). Scanning, therefore, for quite some time to come will normally not be feasible because of economic reasons (i.e. costs due to post-editing etc.).
In such a scenario, which is becoming common in many subject fields, a detailed and highly consistent methodology (at least for the group coordinator and database manager) needs to be designed, which allows the storage of a maximum of additional data (e.g. source, originator, dates etc.) until the final stage of a given project is reached. Then most of these data can be dropped according to defined rules. Hereafter these entries/records and the respective set of mutually related entries/records contitute new copyright.
As a rule, the individual contributions of the experts will be amalgamated into a joint copyright, whose "owner" is the agency holding the secretariat or the authority from which it is equipped with the mandate. Authoritative sources for some of the data (especially for standardized terms and definitions) should be identified by means of an unambiguous source code (which, if necessary, would refer the user to a more detailed source indication, e.g. a bibliographic record).
Highly elaborate bibliographical records (including indexing terms and abstracts) are in some cases or countries copyright protected, in others not. It is not rather difficult, however, to ensure and enforce this protection.
Textual information in terminology and lexicography can be protected by copyright, if explicit or implicit statements are included. Definitions as a rule are not protected so far.
If external data are input into a database, as a rule new copyright is genereated due to changes of layout, different data structure (incl. links within the record and between records), need for additional information etc.
If data from various sources are merged, even more changes, additional data and other human interventions are required. This further reduces the copyright protection of the original data.
In cases of highly authoritative data, such as standardized terminology, the copyright situation is not clear at all. In some countries standards, including terminology standards, are not considered regular publications available through normal book market channels. From a formal point of view they often can, therefore, be considered as &quto;grey literature".
If taken to the extreme the following additional features might become necessary in terminology management systems in the future:
In the case of re-use of terminological data of a lower degree of authoritativeness the originator of the data could be provided with feedback in the form of well documented changes (modifications, revisions, etc.) of all records to which this is applicable. Although the work on the respective data generates a new copyright, it would be useful, if some sort of acknowledgement is attached to the new records.