DRAFT - To appear in "High Energy Physics Libraries Webzine"

Subject Classifications in the Scientific
and Overall Digital World

Antonella De Robbio (*), Dario Maguolo (*), Alberto Marini (**)

ABSTRACT

In the present work we point out opportunities, problems, tools and techniques about interconnecting discipline-specific subject classifications, primarily organized as search devices in bibliographic databases, with general classifications originally devised for book shelving in public libraries.
First of all, we state the fundamental distinction between topical (or subject) classifications and object classifications, regarding the way, respectively mediated or immediate, classifications refer to objects.
Then we trace the story of the structural limitations that constrain subject classifications since their library origins, and of the devices that were thought out to overcome the gap with genuine knowledge representation.
After recalling some general notions on structure, dynamics and interferences of subject classifications and of the objects they refer to, we sketch a synthetic overview on discipline-specific classifications in Mathematics, Computing and Physics, on one hand, and on general classifications on the other.
In this setting we present The Scientific Classifications Page, which collects:
1. groups of Web pages produced by a pool of software tools for developing hypertextual presentations of single or paired subject classifications from sequential source files;
2. facilities for grasping information from KWIC lists of classification descriptions.
Further we propose a concept-oriented methodology for interconnecting subject classifications, which is based on object identification and description in a relational system acting as a common conceptual pole among the classifications. Such a system should be formalized in a suitable way to allow object descriptions to be plugged and searchable in metadata that are managed by search engines, whatever be the particular language or classification scheme indexers and users work with.
The methodology provides for four phases:
1. recognizing the tree-based structure of the classification number space
2. recognizing the structure of a space of buses, i.e. abstract (mobile across the classification numbers) nodes in time, which get identified via (permanence or limited variation of) textual descriptions
3. identifying object envelopes as connected sets of buses
4. extracting conceptual elements from the bus-carried descriptions, interactively with topologically minded examinations over the relation between envelopes and conceptual elements, and relational analyses to be performed by means of a suitable representation language.
The methodology is illustrated with the concrete support of a relational analysis of the whole Mathematics Subject Classification, along its evolution since 1959, as is available for online searches in the MathSci database.
Finally, we recall a very basic method for interconnection provided by coreference in bibliographic records among index elements from different systems, and point out the advantages of establishing the conditions of a more widespread application of such method.

Part of these contents has be presented under the title Mathematics Subject Classification and related Classifications in the Digital World at the Eighth International Conference “Crimea 2001”, "Libraries and Associations in the Transient World: New Technologies and New Forms of Cooperation", Sudak, Ukraine, June 9-17, 2001, in a special session on electronic libraries, electronic publishing and electronic information in science chaired by Bernd Wegner, Editor-in-Chief of Zentralblatt MATH.

Connecting classifications in the digital world
Subject classifications and object classifications
To tree or not to tree: the question between
partitioning space and representing knowledge
Descriptions and addresses:
visiting a subject classification space
Subject classifications for Mathematics, Computing, Physics
General library subject classifications
Displaying classification schemes:
The Scientific Classifications Page
Buses in the classification space-time
The space-time of Mathematics Subject Classification
Envelopes and objects
Inside the metadata machinery

-- TEXT --

Connecting classifications in the digital world

Users in different settings, with different demands and expectations want to fulfil their information needs wherever information is available, cutting costs and times as much or more than possible, regardless of the heterogeneity of sources: from quite specialized databases or dedicated portals to general online library catalogues or Web search engines, from reference (metadata) databases to full-text or hypermedial digital libraries, from e-journal aggregators to preprint servers and authors' self-archives (commonly called e-print systems).
The organization, the functionalities and the interaction modes exploited by networked digital libraries may be completely different from those generally met with in traditional paper-based libraries. Moreover, just on the line of e-print systems, the development of technical mechanisms and organizational structures to support their interoperability, which is promoted by the Open Archives Initiative (OAI) [1], is making them evolve into genuine building blocks of a transformed scholarly communication model, radically different form the traditional one, which is dominated by the heavy mediation business of scholarly publishing companies.

On the other hand, users do not want to change their mind to meet the particular way of storing, indexing and presenting information for any source they face: this should be automatically worked out by the system. But such a task is not trivial.
As for subject indexing, different classifications, thesauri or otherwise structured terminologies, or even ontologies, while insisting over the same area, can keep presenting strong linguistic (which can not be worked out by mere translation), structural and semantic disagreements, in spite of any effort for harmonization. Dramatic disagreements are evidenced in passing from the specialized world of discipline-oriented classifications to general classifications widely used in public, school or even general academic libraries, such as Dewey Decimal Classification, Universal Decimal Classification or Library of Congress Classification.

Misinterpretations are easy to occur when the same words are used in different contexts or for different purposes. Moreover, even in using one and the same classification, differences and inconsistencies are normal practice, either among different applications or inside the same application. Correctly-minded people could expect that good interconnections among classifications are at the basis of good retrieval across classifications, but this seems not to be a common case.
While a number of approaches to the issues of connecting classifications or thesauri exploit statistical methods or neural network techniques, a different trend is oriented towards the analysis, modeling and support of conceptual organization by humans. The former can be very helpful, even in view of the latter; a well defined integration seems to be the recipe for the near future [see D01].
Actually, it's still worthwile, and not only for educational purposes, to work out well defined connections between classifications or the like, which provide that the objects each of them refers to are identified unambiguously by means of a suitable representation language. With the knowledge representation languages currently being designed and implemented in computer applications, this task is getting feasible.

Subject classifications and object classifications

classification

topical, or subject classifications: abstract structured spaces, or models for arranging material spaces, where respectively immaterial or material objects can get a location according to selected characteristics, so that objects of interest can be found just by choosing and moving along the paths provided by the space structure or the concrete arrangement defined by the model. Typical material objects being located by means of a classification are books shelved in a library, or even bibliographic entries in printed indexes; as for the immaterial, we can think to fields or disciplines of human knowledge or activity, to concepts and objects of a certain field or discipline, or generally to subjects of documents abstractly taken as information units [WJ99].
object classifications: partial, and possibly very entangled, ordered sets of concepts, or intensional objects, or descriptions, which make reference to objects, or have extension, in some domain. Thesauri and ontologies are seemingly oriented towards this pattern, although very often they can be shown to behave in the topical mode.

A (topical or object) classification can be given an appropriate semantics in terms of some notion of space, possibly less constrained and more complex than usual material ones, even if it is not involved in physical space arrangement, but acts as a pure information device, e.g. in computer-managed bibliographic records.
The space of topical classifications is the form of a container, a grossly operative space for concept packaging and package linking; it is quite different from the space of objects as they are actually intended by classification users, a space that can be more or less definitely taken off from the classification like a conceptual space of true effective meanings.

Objects, be they material or not, exist in temporal and relational arrangements: they can begin and end, split and merge, exchange parts with other objects, increase or decrease their extent, scope or complexity, change characteristics and relations with other objects, and can be perceived and managed differently in time.
Classifications evolve too, as objects' clothes (topical classifications) or description structures (object classifications), through different versions that come in use subsequently, according to different perceptions, or awareness, of objects and their environments, or to changes in tools and techniques for representing them, and for managing such representations. A single version of a classification offers a snapshot of a system of objects; a sound knowledge of objects, in their intrinsically changing nature and in their changing contexts, can be acquired by looking through classifications which refer to them, provided that a comprehensive follow-up of versions is observed.

To tree or not to tree: the question between
partitioning space and representing knowledge

At the crossroad of Artificial Intelligence, Computational Linguistics and Database Theory, such complex structures can be represented with good effectiveness in the frame of reference of Formal Ontology [2], by means of formalisms like Conceptual Graphs (CG) [3], Description Logics (DL) [4], and the Unified Modeling Language (UML) [BJR98], which comes from the field of software engineering and is proposed as an approach for modeling ontologies and encoding the knowledge content of Web pages [C01].
Metadata formats for document representation are being defined progressively along this way; the draft for the Academic Metadata Format [KW01], which is being defined in the scope of the Open Archives Initiative, is a clear example of such a trend [5].

As we face with subject classifications, structured representations (which were conceivable even in times when formal languages for expressing them were lacking) have yet to be cut down to get compliance with the tree-like forms in which subject classifications constrain their operability. Although this reduction comports unavoidably serious information losses, subject classifications have been provided with more or less effective devices to remedy for this gap.
>From the pioneering work of Ranganathan since 1933 with Colon Classification, through the elaborations of the British Classification Research Group in the '50s and '60s, the addition of Auxiliary Tables to the Dewey Decimal Classification since its 18th edition, published in 1971 [see CCMS96], the development of the Preserved Context Indexing System (PRECIS) in the '70s, and the publication in 1986 of the standard ISO 2788 (BS 5723) Guidelines for the establishment and development of monolingual thesauri, a compositional approach to subject analysis, named facet analysis, has been progressively established [F96]. Within facet analysis, complex concepts are decomposed into combinations, specified by means of role indicators, of atomic elements, which belong to homogenous, mutually exclusive classes, the facets [AGB97].

Turning back to subject classifications, an organization of the classification space (named pre-coordination) which permits complex objects to be recovered via suitably compound addresses, and a more or less rich and organized apparatus of cross-references between places, are useful means especially if objects may be located in one place only.
If a subject classification is used in settings that allow the simultaneous employment of different classification codes for the same object, mechanisms and directions for post-coordination are provided in order to partially recover complex meaning by listing addresses together in suitable ways, either in databases that offer information or in queries that ask for it.

By definition, subject classifications are plainly rougher, or less fine-grained than thesauri and modern-fashioned ontologies; anyway, standard relationships that are used to connect classification codes are rather blurred and their intrinsic significance is very loose. But this is the case also for thesauri and even for a good deal of ontologies. In order to recover capabilities for sound conceptual representation of intended entities and relationships, and so for effective accomplishment of retrieval tasks, a careful disambiguation for expressed relationships is appealed to for both thesauri and ontologies [6], by introducing well defined and ontologically grounded specializations. As for subject classifications, we shall instead advance a topological argument to try to get an account for such relationships.

Descriptions and addresses:
visiting a subject classification space

Subject classification descriptions, be they textual or otherwise performed, are means to orientate the user in the classification space. They refer to objects through the mediation of places that gather them, or channels that convey them, in order to meet some external specifications or constraints (human readability, manageability for use). So one description may refer to a collection of objects that are intended distinctly by the user, but are collected according to the classification organization. On the other hand, one object or place may be represented in different forms, still observing the linguistic or semiotic conventions of the classification.
Thesauri and lists of subject headings, on the contrary, are worried to maintain a tight correspondence between objects and descriptions, at the price of bothering about preferred and non-preferred forms: but this amounts to constrain the variety of natural language to pass through the cog-wheel of machine identifiers. The addition of more or less free text scope notes is a further signal of this blurring.

It's the role of addresses to guide the travel machinery: for this work there is no need to know why the traveller wants to reach a certain place, and to find what. So in subject classifications addresses (commonly named classification codes or numbers) are fundamental in their very form for material document shelving in material libraries, and lists of addresses are major means for subject indexing in bibliographic databases and online library catalogues; addresses encode and display the space structure, but they act as mere linking elements, without any real semantic content.
The real carriers of semantic content are descriptions, and the classification organizes them inside a structure that exists independently from the actual forms of the addresses, i.e. from the forms that are fixed for representing the classification space structure in view of external reference and linking.

Moreover, while both descriptions and addresses can change, in time or across different linguistic, semiotic or encoding conventions, it is not necessary that they change in dependence from one another, or from the changes, transformations, births and deaths among the objects, the spaces and the ways objects and spaces are organized and perceived. Addresses may change while descriptions remain the same, or space structure, at least locally, is preserved; descriptions may change while objects remain the same; objects may change while addresses remain the same, and so on.
Different classifications that cover overlapping areas can exercise in time influence on one another, especially on structure and descriptions, in order to get similar or compatible views of the same objects, even if they are seen from different viewpoints or on different scales, and different groupings can be kept within each classification.

Subject classifications in Mathematics, Computing, Physics

Mathematics Subject Classification (MSC)
MSC is compiled and updated by the editorial offices of the world most important bibliographical directories for mathematical research: MathSci and Zentralblatt MATH.
The classification covers all branches of pure and applied mathematics, including probability and statistics, numerical analysis and computing, mathematical physics and economics, systems theory and control, information and communication theory.

The MathSci database
MathSci is produced by the American Mathematical Society (AMS).
The paper version consists of the journals Mathematical Reviews (MR), published since 1940, and Current Mathematical Publications (CMP).
MSC, compiled since 1959 (by AMS alone till the first '70s), in the first years of its existence was very unstable. So, for the part which appeared in print from 1940 to 1972, the MathSci database got new classification data, which are stable for relatively long time (1940-1958, 1959-1972) and therefore more suitable for database search than the frequently varying ones of the print version. Starting with 1973 the database is indexed with the same classification codes that appear in the print version. [7]

The Zentralblatt MATH database
Zentralblatt MATH is edited by European Mathematical Society (EMS), the Fachinformationszentrum (FIZ) Karlsruhe and the Heidelberg Akademie der Wissenschaften (Germany); it is established in cooperation with Cellule de Coordination Documentaire Nationale pour les Mathématiques (Math Doc Cell, France). Several European Editorial Units cooperate with the Editorial Office in Berlin.
The paper version consists of the journal Zentralblatt MATH (with this title since 1999), founded as Zentralblatt für Mathematik und ihre Grenzgebiete in 1931, formerly issued by Deutschen Akademie der Wissenschaften zu Berlin; published since 1931 by Springer.
The database is indexed with the 1991 and 2000 MSC versions; some superseded classification codes from preceding versions are also present. [8]

The evolving structure of MSC
After 1973 major MSC revisions came in use in 1980, 1985, 1986, 1991, 2000.
>From 1959 to 1985 the MathSci version of MSC counts 60 major sections; 61 from 1986 to 1999 and 63 since 2000.
Until 1972 the classification was issued in two levels; an intermediate level became available in 1973, and is progressively being exploited, as far as MSC increases in detail and so grows in size.
Started with 1436 numbers in 1959, MSC counts 4895 numbers in 1999 and 5590 since 2000.
A consistent and ever growing apparatus of cross references helps understanding connections between different branches of mathematics.

The EULER project
Mathematics Subject Classification is one of the classification systems provided for by the Dublin Core (DC) metadata format, and is used inside DC metadata for the search engine developed in the European Union project European Libraries and Electronic Resources in Mathematical Science (EULER) [9].
The main objective of EULER was the realization of a "one-stop shop" for research on mathematics information resources such as books, pre-prints, Web pages, abstracts, collections of articles and reviews, periodicals, technical reports and theses.
The result is a Web meta-interface for parallel simultaneous queries to a heterogeneous collection of databases.

Let's look other classifications in the field of Mathematics:

Referativnyj zhurnal: Matematika. Classification Scheme
It was prepared as a piece of the Universal Decimal Classification (UDC) for Referativnyj zhurnal: Matematika. An English translation is provided by the AMS site [10].

Zentralblatt für Didaktik der Mathematik Classification Scheme (ZDM) [11]
This scheme is used for the bibliographic database on mathematics education and related fields MATHDI, active since 1976, which can be accessed through the sites of the European Mathematical Information Service (EMIS).
The paper version of the database is Zentralblatt fur Didaktik der Mathematik.

In the field of Computing we start with:

ACM Computing Classification System (CCS)
This classification is issued by the Association for Computing Machinery (ACM) in the USA, for the directories Computing Reviews (CR) and Guide to Computing Literature (GCL).
Moreover, it is adopted by the bibliographic database CompuScience, produced by Fachinformationszentrum (FIZ) Karlsruhe, Department of Mathematics & Computer Science Berlin, which contains references from CR since 1976, from GCL since 1977 and from Section 68 Computer Science of MSC in ZM/MA.
ACM's first classification system for the computing field was published in 1964. Then, in 1982, the ACM published an entirely new system. New versions based on the 1982 system followed, in 1983, 1987, 1991, and 1998 [12].

Moving into the field of Physics we find:

Physics and Astronomy Classification Scheme
PACS is prepared by the American Institute of Physics (AIP) in collaboration with certain other members of the International Council on Scientific and Technical Information (ICSTI) having an interest in physics and astronomy classification. The most recent internationally agreed scheme was published by ICSTI in 1991.
Revised editions of PACS are published biennially, or as necessary, by AIP.
PACS contains 10 broad categories subdivided into 66 major topics [13].

INSPEC Classification [14]
INSPEC is an English-language bibliographic information service providing access to the world's scientific and technical literature in physics, electrical engineering, electronics, communications, control engineering, computers and computing, and information technology.
INSPEC was formed in 1967, based on the Science Abstracts service, which has been provided by the Institution of Electrical Engineers (UK) since 1898. Still today Physics Abstracts, Electrical & Electronics Abstracts and Computer & Control Abstracts together form the Science Abstracts series of journals, which is the paper version of the INSPEC database.
INSPEC Classification is divided into four major sections:

Section A: Physics

Section B: Electrical & Electronic Engineering
Section C: Computer & Control
Section D: Information Technology

General library subject classifications

Dewey Decimal Classification [15]
The Dewey Decimal Classification (DDC) system was conceived by Melvil Dewey in 1873 and first published in 1876. The latest (21st) edition was released in 1996, so an average 6 year period intercourses between an edition and the next.
The Dewey Decimal Classification is published in two editions, full and abridged.
The Classification is kept up-to-date electronically through electronic versions: Dewey for Windows, a CD-ROM product that is updated annually; and WebDewey in CORC, a Web-based product that is updated quarterly.
The DDC is published by Forest Press, a division of OCLC Online Computer Library Center, Inc.

DDC is widely used all over the world, not only for book shelving in libraries, especially in public, school and general academic ones, but also for subject indexing and browsing in general online document retrieval tools, such as bibliographic databases (including the national bibliographies of sixty countries), online library catalogues (including WorldCat, the OCLC Online Union Catalog), digital libraries, Web search engines.
The DDC has been translated into over thirty languages.

The classification is developed and maintained in the US national bibliographic agency, the Library of Congress.
The Dewey editorial office is located in the Decimal Classification Division of the Library of Congress, where annually the classification specialists assign over 110,000 DDC numbers to records for works cataloged by the Library. Having the editorial office within the Decimal Classification Division enables the editors to detect trends in the literature that must be incorporated into the Classification. The editors prepare proposed schedule revisions and expansions, and forward the proposals to the Decimal Classification Editorial Policy Committee (EPC) for review and recommended action.

The print version of Edition 21 is composed of nine major parts in four volumes as follows:

Volume 1:

New Features: A brief explanation of the special features and changes in Edition 21
Introduction: A description of the DDC and how to use it
Glossary: Short definitions of terms used in the DDC
Index to the Introduction and Glossary
Tables: Seven numbered tables of notation that can be added to class numbers to provide greater specificity. Except for notation from Table 1 (which may be added to any number unless there is an instruction in the schedules or tables to the contrary), table notation may be added only as instructed in the schedules and tables
Tables, together with the very structure of the hierarchy in some areas of the classification, make up an effective approximation to facet analysis.
Lists that compare the previous edition with the new edition:
- Relocations and Reductions;
- Comparative and Equivalence Tables;
- Reused Numbers.

Volumes 2 and 3:

Schedules: The DDC numbers arranged in their hierarchical organization, presented with descriptions, links, etc.

Volume 4:

Relative Index: An alphabetical list of subjects with the disciplines in which they are treated subarranged alphabetically under each entry
Manual: A guide to classifying in difficult areas, information on new schedules, and an explanation of the policies and practices of the Decimal Classification Division at the Library of Congress. Information in the Manual is arranged by the numbers in the tables and schedules.

Universal Decimal Classification
UDC was created towards the end of the Nineteenth century by Paul Otlet and Henri LaFontaine as an adaptation of DDC in view of the preparation of a universal bibliography.
Until recently responsibility for the scheme belonged to the FID (Federation Internationale de Documentation); this responsibility was passed to a consortium of publishers (the UDC Consortium) in 1992.
The scheme consists of 60,000 classes (divisions and sub-divisions) as well as a number of auxiliary tables.

Library of Congress Classification
In 1899 the Librarian of Congress Dr. Herbert Putnam and his Chief Cataloguer Charles Martel decided to start a new classification system for the collections of the Library of Congress (established 1800). Basic features were taken from Charles Ammi Cutter's Expansive Classification.
LCC is an enumerative system built on 21 major classes, each class being given an arbitrary capital letter between A-Z, with 5 exceptions: I, O, W, X, Y.
After this was decided, Putnam delegated the further development of different parts of the system to subject specialists, cataloguers and classifiers.
Initially and intentionally the system was, and has remained, decentralized and the different classes and subclasses were published for the first time between 1899-1940.
This has lead to the fact that schedules often differ very much in number and the kinds of revisions accomplished.

Displaying classification schemes: The Scientific Classifications Page

H-volumes

Various tools for exploring subject classifications have been realized in this way and are collected in The Scientific Classifications Page
http://www.math.unipd.it/~biblio/math/eng.htm.
Besides hypertextual presentations of subject classifications, the page collects some H-volumes presenting KWIC (Key-Word-In-Context) lists extracted from the descriptions of one or more combined classifications. Descriptions are circularly permuted on significant words, i.e. words out of a stop-word list; the very long list of resulting strings is dispalyed on the right, subdivided into smaller manageable lists, which can be accessed through an index appearing in the left frame. This redundant but properly paginated presentation allows the rapid exploration of lexical similarities among descritions to obtain suggestions about possible affinities of contents.
The Scientific Classifications Page page includes:

The Mathematics Classification Page
http://www.math.unipd.it/~biblio/math/engmsc.htm
which collects six hypertextual frame presentations of the latest version of Mathematics Subject Classification, MSC2000.

From a sequential ASCII file containing the whole MSC2000, two H-volumes were obtained, respectively

MSC2000b H-volume, simple frame presentation:
http://www.math.unipd.it/~biblio/math/mainb/mhbmain.htm
MSC2000d H-volume, double view presentation:
http://www.math.unipd.it/~biblio/math/doppiaeng/mhdmain.htm

The same process being worked out on a file containing an Italian translation of MSC2000, we obtained the simple frame

MSC2000id H-volume, Italian translation:
http://www.math.unipd.it/~biblio/math/italiana/mhimain.htm

while instead of the double-view one, we processed the two files in combination with the first file, to obtain the simple frame

MSC2000l H-volume, interleaved English and Italian texts:
http://www.math.unipd.it/~biblio/math/it+eng/mhlmain.htm

From the combination of the first ASCII file with other ones, containing collections of specific data, we obtained other H-volumes:

From a file resulting from a comparison of MSC2000 with the 1991 version, we obtained
MSC2000c H-volume, simple frame presentation, including changes from MSC 1991:
http://www.math.unipd.it/~biblio/math/complexc/mhcmain.htm
From a file containing data about subject specific pages of relevant Websites, we obtained a true Virtual Reference Desk for Mathematics,
MSC2000w H-volume, simple frame presentation, with guide pages linking to subject specific pages of relevant Websites
http://www.math.unipd.it/~biblio/math/travel/mhwmain.htm

Mathematics Subject Classification MSC and Dewey Decimal Classification DDC
http://www.math.unipd.it/~biblio/math/engddc.htm

We advanced on this line by throwing off connections between classification numbers from the DDC 21 and MSC2000 schemes; a draft page in double view presentation was then produced:

Connections between the classification schemes DDC21 and MSC2000
http://www.math.unipd.it/~biblio/msc-cdd/index.html.

In view of the revision of the 510 section of DDC, Mathematics, we are updating such a draft along the proposal presented by Giles Martin, Assistant Editor of the Dewey Decimal Classification [16].
Meanwhile, we have put together the descriptions of:
- the proposed revision of the 510 DDC section
- MSC2000
- the sections E - N of the ZDM classification, encoded as 97E - 97N in the MSC style
to produce the KWIC list H-volume

Lexical connections between the classification schemes DDC22 510 and MSC2000 + ZDM E-N

http://www.math.unipd.it/~biblio/kwic/msc-cdd/index.html

KWIC (KeyWords In Context) lists for Scientific Subject Classification Descriptions
http://www.math.unipd.it/~biblio/math/engkwic.htm.

The following H-volumes have been produced:

KWIC list of phrases of MSC2000 classification scheme
http://www.math.unipd.it/~biblio/kwic/msc/
KWIC list of phrases of PACS 2001 classification scheme
http://www.math.unipd.it/~biblio/kwic/pacs/
KWIC list of phrases of ACM Computing Classification System (1998)
http://www.math.unipd.it/~biblio/kwic/acm/
Combined KWIC list of phrases of MSC2000 and
PACS 2001 classification schemes
http://www.math.unipd.it/~biblio/kwic/msc-pacs/
Combined KWIC list of phrases of MSC2000 and
ACM Computing Classification System (1998)
http://www.math.unipd.it/~biblio/kwic/msc-acm/.

Such kind of preliminary lexical support shall be worked out for investigating the connections among other groups of classification schemes.
Furthermore, some improvements obtainable by discrimination of homonyms, synonyms and secondary terms shall be investigated.

Buses in the classification space-time

So the first step in the process of getting objects out of the classification space is to recognize the buses that carry objects in time, through a course of succeeding versions of the classification, moving across the addresses that mark the (possibly changing) paths and places in the classification space. Each bus during its trip passes through one or more places; the addresses of such places, with the indication of the period of passage, set up the schedule for that bus.
Consistent sequences of descriptions have to be identified; such sequences set up the description of buses in the classification space-time. A good taste of the subject matter is needed at this stage; the step can be worked out also with the help of conversion tables, which are generally provided by the classification editorial agencies, especially in case of deep or extensive changes in the classification.

Even if any synchronic slice of the classification space-time is tree-like, the whole structure may not be tree-like, as nodes or subtrees can migrate from one branch to another.
Besides the main hierarchical structure, cross-references and explicitly stated pre-coordination and post-coordination mechanisms, taken dynamically as well, give substantial contributions to the definition of the classification space-time.

The space-time of Mathematics Subject Classification

[17]

The database consists of 25 tables, which can be conceptually arranged in two layers, each of 11 tables, and 3 tables that account for relationships between corresponding entities represented in the two layers. Every table provides data for the beginning and end years of the period of existence of the object or validity of the relation represented in each record.

The first layer: classification places

The simple entities are:

Topical addresses, with indication of the upper address if unique during address lifetime (7039 units)
Address elements for document form/gender (9 units)
Address elements for residual areas after all topic specifications of the upper address (2 units)

The compound entities and relationships are:

Addresses for topic + document form/gender (472 units)
Addresses for residual areas at upper addresses (639 units)
Compound addresses by pre-coordination, or post-coordination relationships between topical addresses (491 records)
Polyhierarchical relationship in time from topical addresses (190 records)
Polyhierarchical relationship in time from residual area addresses (16 records)
Non-hierarchical relationships between a topical address and a (list of) topical addresses (2719 records)
Non-hierarchical relationships between a topical address and an addresses for topic + document form/gender (132 records)
Post-coordination relationships between an addresses for topic + document form/gender and a topical address (63 records)

The second layer: classification buses

The simple entities are:

Topical buses, with indication of the upper bus if unique during bus lifetime (5791 units)
Bus elements for document form/gender (9 units)
Bus elements for residual areas after all topic specifications of the upper bus (4 units)

The compound entities and relationships are:

Buses for topic + document form/gender (455 units)
Buses for residual areas at upper buses (585 units)
Bus descriptions (7162 records)
Post-coordination relationships between topical buses (18 records)
Polyhierarchical relationship in time from topical buses (291 records)
Non-hierarchical relationships between a topical bus and a (list of) topical buses (2501 records)
Non-hierarchical relationships between a topical bus and an buses for topic + document form/gender (126 records)
Post-coordination relationships between an buses for topic + document form/gender and a topical bus (62 records)

The cross-layer relationships

A relationship between topical addresses and topical buses (7300 records)
A relationship between topical address elements and bus elements for document form/gender (10 records)
A relationship between topical address elements and bus elements for residual areas (4 records)

Envelopes and objects

Merging buses into envelopes

Getting objects out of their envelopes
The further step of the object identification process is the extraction and the refinement of conceptual elements from the descriptions, by means of text analysis techniques on the basis of subject matter knowledge.
Conceptual elements coming from different envelopes can be unified if their contents turn out to be the same; anyway, each conceptual element maintains a relationship with each envelope it comes from.

The interplay of external and internal relational reasoning
At this point, topological reasoning takes a very significant role: we can accommodate envelopes as formal neighborhoods and conceptual elements as concrete points inside the basic pair that is envisaged in the Basic Picture perspective on Constructive, or Formal, Topology [17]; the forcing relation accounts for the "comes from" links.
Following a methodology that interleaves topologically minded relational examinations on the space of conceptual elements [V] and relational analyses to be performed by means of a suitable representation language, objects can be identified and described in formats suited for applications.

Inside the metadata machinery

Object descriptions in the metadata machine

The CARMEN project
This is the end of a deep-principled methodology for interconnecting subject classifications. A more practical (and shallow) approach is being worked out with the German project Content Analysis, Retrieval an Metadata: Effective Networking" (CARMEN), lasting from October 1999 to February 2002 [19]. The CARMEN project aims to overcome the gap between physical access possibility to networked information resources and their effective availability, due to content dishomogeneity, by approaching content analysis with developments and prototypical implementations in three fields:

Metadata
Treatment of (remaining) heterogeneity
Retrieval for structured documents and heterogenous data types.

Within Working Package 12: Cross concordances of classifications and thesauri, programs for interconnecting general classifications such as DDC and discipline-specific ones (MSC, PACS, and the classification for social sciences) are being developed in Java on a relational database system with an abstract intermediate level to allow a transit to different producers of database software.

Coreference interconnections
Anyway, the most effective (and obvious) way for interconnecting subject classifications, thesauri or lists of subject headings is provided by bibliographic records, when more than one system is used for subject indexing inside the same records.
Actually, the same documents come mostly to be represented, in different bibliographic utilities or catalogues, with indexing data from different systems. While general library OPACs rely on DDC and national lists of subject headings, specialized bibliographic databases are each confident on its discipline-specific classification or thesarus. It would suffice to put these data for matching records together to create the bridge.
In this way, browsing inside one subject indexing system can be integrated either with direct access to document metadata (or possibly documents), or with passage to another subject indexing system for further navigation. Suitable metadata for identifying versions of subject indexing systems should be required for effective navigation tracking, but a metadata format for such objects has yet to be defined.
Work for defining a metadata format for subject classifications and their versions in the framework of metadata formats for documents is strongly at issue now.

While backing such developments, our realizations in subject classification displaying are intended to demonstrate possibilities for library OPACs to integrate their functionalities with discipline-specific environments for document search and retrieval.
Moreover, our approach could be exploited in the development of gateways and portals pointing to e-print servers. By means of our KWIC list displays for descriptions of single or combined classifications, words or phrases used to describe places in different classification spaces could be turned into addresses of communicating sites in different environments. Through the metadata that match the identified codes in the discipline-specific classifications, an OAI compatible service provider could transform these abstract addresses into actual full-text documents available from discipline-specific servers.

In the next future, the keywords that will index a cooperative effort on scientific classifications will be
OPAC, OAI compatible e-print server, metadata.

ACKNOWLEDGEMENTS

We are grateful to Martin Doerr, Nicola Guarino, Silvio Valentini, Chris Welty for enlightening conversations on the topics of this work.

NOTES

[1]	See the Website at http://www.openarchives.org
[2]	see the Website of the Ontology group at LADSEB-CNR (Padova, Italy), at http://www.ladseb.pd.cnr.it/infor/ontology/ontology.html; for Formal Ontology in information systems, see [FOIS98]; a more librarianship-oriented perspective in [S00]
[3]	[S84]; for an application of a variant of CG, see [GMV99]
[4]	[JoLC99, CDLNR98]; see The DL Website at http://www.ida.liu.se/labs/iislab/people/patla/DL/index.html
[5]	Further information metadata, in relation with thesauri, can be found in [H01]
[6]	For thesauri see [D01, TAJ01]. For ontologies an evolving line of is thought displayed in [G98, G99, GW00, GW01]
[7]	The 1995 and 2000 versions are available in hypertextual presentation at http://www.ams.org/msc/
[8]	Math Doc Cell issues a multilingual (French, English, Italian) Web presentation of the 2000 MSC version, available at http://www-mathdoc.ujf-grenoble.fr/MSC2000/db.html The English data has been taken from the AMS site (http://www.ams.org/msc/); the Italian ones from the the site we set up at http://www.math.unipd.it/~biblio/math/.
[9]	See the EULER site: http://www.emis.de/projects/EULER/
[10]	http://www.ams.org/mathweb/Classif/RZhClassification.html
[11]	A Web presentation of the ZDM classification is available at: http://www.mathematik.uni-osnabrueck.de/projects/zdm
[12]	Web presentations of the 1964, 1991 and 1998 versions are available at: http://www.acm.org/class/1998
[13]	http://www.aip.org/pubservs/pacs.html
[14]	http://www.iee.org.uk/publish/inspec/docs/classif.html
[15]	http://www.oclc.org/dewey/products/index.htm
[16]	The DDC 510 revision proposal presented by Giles Martin is visible at http://www.oclc.org/dewey/updates/discussion/doc/request_for_comment.htm
[17]	[RSG99]; for related topics in the field of hypertext functionality, see the whole special issue of "Journal of Digital Information", [JoDI99]
[18]	[SG99, CSSV]; see the homepage of the Padua Logic Group, at http://www.math.unipd.it/~logic/
[19]	http://www.mathematik.uni-osnabrueck.de/projects/carmen/CARMEN.htm

REFERENCES

AGB97	J. Aitchison, A. Gilchrist, D. Bawden "Thesaurus construction: a practical manual", 3rd ed., ASLIB, 1997
BJR98	G. Booch, L. Jacobson, J. Rumbaugh "The Unified Modeling Language User Guide", Addison-Wesley, 1998
C01	S. Cranefield Networked Knowledge Representation and Exchange using UML and RDF "Journal of Digital Information", 1(8), 2001 http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Cranefield/
CCMS96	L.M. Chan, J.P. Comaromi, J.S. Mitchell, M.P. Satija "Dewey Decimal Classification: a practical guide. 2nd ed., revised for DDC 21", OCLC Online Computer Library Center, 1996
CDLNR98	D. Calvanese, G. De Giacomo, M. Lenzerini, D. Nardi, R. Rosati Description logic framework for information integration "Proceedings of the 6th International Conference on the Principles of Knowledge Representation and Reasoning (KR'98)", Morgan Kaufman, 1998. p. 2-13
CSSV	T. Coquand, G. Sambin, J. Smith, S. Valentini Inductively generated formal topologies to appear
D01	M. Doerr Semantic Problems of Thesaurus Mapping "Journal of Digital Information", 1(8), 2001 http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Doerr/
F96	A.C. Foskett "The Subject Approach to Information", 5th ed., Library Association Publishing, 1996
FOIS98	"Formal Ontology in Information Systems: proceedings of FOIS'98", N. Guarino (ed.), IOS Press, 1998
G98	N. Guarino Some Ontological Principles for Designing Upper Level Lexical Resources "Proceedings of the First International Conference on Lexical Resources and Evaluation, Granada, Spain, 28-30 May 1998" http://www.ladseb.pd.cnr.it/infor/Ontology/Papers/LREC98.pdf
G99	N. Guarino The role of Identity Conditions in Ontology Design "Proceedings of the IJCAI-99 Workshop on Ontology and Problem Solving Methods (KRRS), Stockholm, Sweden, August 2, 1999" Republished in "Spatial Information Theory: Cognitive and Computational Foundations of Geographic Information Science", C. Freksa and D. M. Frank (eds.), Springer Verlag, 1999 http://www.ladseb.pd.cnr.it/infor/Ontology/Papers/IJCAI99.pdf
GMV99	N. Guarino, C. Masolo, G. Vetere Ontoseek: Content-based Access to the Web "IEEE Intelligent Systems", 14(3), 1999. p. 70-80 http://www.ladseb.pd.cnr.it/infor/Ontology/Papers/OntoSeek.pdf
GW00	N. Guarino, C. Welty Ontological Analysis of Taxonomic Relationships "Proceedings of ER-2000: The 19th International Conference on Conceptual Modeling", A. Laender, V. Storey (eds.), Springer Lecture Notes in Compute Science, 2000 http://www.ladseb.pd.cnr.it/infor/Ontology/Papers/LADSEB05-2000.pdf
GW01	N. Guarino, C. Welty Identity and Subsumption Ladseb Internal Report 01/2001 http://www.ladseb.pd.cnr.it/infor/Ontology/Papers/Identity&Subsumption.pdf
H01	J. Hunter MetaNet - A Metadata Term Thesaurus to Enable Semantic Interoperability Between Metadata Domains "Journal of Digital Information", 1(8), 2001 http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Hunter/
JoDI99	"Journal of Digital Information" - Vol. 1, Issue 4: "Hypermedia Systems" http://jodi.ecs.soton.ac.uk/?vol=1&iss=4
JoDI01	"Journal of Digital Information" - Vol. 1, Issue 8: "Networked Knowledge Organization Systems" http://jodi.ecs.soton.ac.uk/?vol=1&iss=8
JoLC99	"Journal of Logic and Computation" - Vol. 9, No. 3: "Special Issue on Description Logics"
KW01	T. Krichel, S. Warner Vocabulary for the Academic Metadata Format, draft http://openlib.org/amf/doc/ebisu.html
RSG99	G. Rossi, D. Schwabe, A. Garrido Designing Computational Hypermedia Applications "Journal of Digital Information", 1(4), 1999 http://jodi.ecs.soton.ac.uk/Articles/v01/i04/Rossi/
S84	J. Sowa "Conceptual Structures: Information Processing in Minds and Machines", Addison-Wesley, 1984
S00	E. Svenonius "The Intellectual Foundations of Information Organization", MIT Press, 2000
SG99	G. Sambin, S. Gebellato A preview of the basic picture: a new perspective on formal topology Proceedings of "Types '98", T. Altenkirch, W. Naraschewski and B. Reus (eds.), Springer Lecture Notes in Computer Science, 1999 http://www.math.unipd.it/~logic/ftp/BPP.ps
TAJ01	D. Tudhope, H. Alani, C. Jones Augmenting Thesaurus Relationships: Possibilities for Retrieval "Journal of Digital Information", 1(8), 2001 http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Tudhope/
V	S. Valentini Formal Topology and Search Engine in preparation
WJ99	C. Welty, J. Jenkins Formal Ontology for Subject "Data and Knowledge Engineering", 31(2), 1999. p. 155-182 http://www.cs.vassar.edu/faculty/welty/papers/subjects/subject.html http://www.cs.vassar.edu/faculty/welty/papers/subjects/subject.pdf

AUTHORS DETAILS

Antonella De Robbio
e-mail: derobbio@math.unipd.it
Home Page: http://www.math.unipd.it/~derobbio/home/antohp.htm
Dario Maguolo
e-mail: dario@math.unipd.it
Biblioteca del Seminario Matematico
Università degli Studi di Padova

Alberto Marini
e-mail: alberto@iami.mi.cnr.it
Home Page: http://www.iami.mi.cnr.it/~alberto/
Istituto per le Applicazioni della Matematica e dell'Informatica
Consiglio Nazionale delle Ricerche (IAMI-CNR), Milano

Last modified 30th May 2001

Subject Classifications in the Scientific and Overall Digital World