Subject Classifications in the Scientific and Overall Digital World

SLIDE INDEX

0.0 - Title Page
0.1 - Contents

1.0 - Connecting classifications in the digital world
1.1 - Users in different settings
1.2 - The organization, the functionalities and the interaction modes
1.3 - Users do not want to change their mind
1.4 - Approaches to the issues of connecting classifications
1.5 - Encoded in the metadata

2.1 - Mathematics Subject Classification
2.1.1 - The database MathSci
2.1.2 - The database Zentralblatt MATH
2.1.3 - The evolving structure of MSC
2.1.4 - The EULER project
2.2 - Referativnyj zhurnal: Matematika. Classification Scheme
2.3 - Zentralblatt für Didaktik der Mathematik Classification Scheme
2.4 - ACM Computing Classification System
2.5 - Physics and Astronomy Classification Scheme
2.6 - INSPEC Classification
2.6.1 - INSPEC Classification Sections

3.1 - Dewey Decimal Classification
3.2 - The CARMEN project
3.3 - Universal Decimal Classification
3.4 - Library of Congress Classification

4.0.0 - Displaying classifications: our achievements
4.0.1 - The Scientific Classifications Page
4.1 - The Mathematics Classification Page
4.2 - Mathematics Subject Classification MSC and

Dewey Decimal Classification DDC

4.3 - KWIC (KeyWords In Context) lists for

Scientific Subject Classification Descriptions

5.0 - Conclusions


Eighth International Conference "Crimea 2001" "Libraries and Associations in the Transient World: New Technologies and New Forms of Cooperation" *Sudak, Ukraine, June 9-17, 2001* *Section 4. Digital Libraries* Mathematics Subject Classification and related classifications in the digital world *Antonella De Robbio* e-mail: derobbio@math.unipd.it Home Page: http://www.math.unipd.it/~derobbio/home/antohp.htm *Dario Maguolo* e-mail: dario@math.unipd.it Biblioteca del Seminario Matematico Università degli Studi di Padova *Alberto Marini* e-mail: alberto@iami.mi.cnr.it Home Page: Istituto per le Applicazioni della Matematica e dell'Informatica Consiglio Nazionale delle Ricerche (IAMI-CNR), Milano


CONTENTS Connecting classifications in the digital world In the present work we point out opportunities, problems, tools and techniques about interconnecting discipline-specific subject classifications, primarily organized as search devices in bibliographic databases, with general classifications originally devised for book shelving in public libraries. In the proceedings paper we trace the basis of a methodology for interconnecting subject classifications, which is based on object identification and description; Subject classifications in Mathematics, Computing, Physics here we take a look at discipline-specific classifications in Mathematics, Computing and Physics, on one hand, General subject classifications and on general classifications on the other; Displaying classifications: our achievements in this setting we show a pool of hypertextual presentations of subject classifications, in single or paired view, that we produced by means of dedicated software tools for developing highly linked groups of Web pages from sequential source files. Furthermore, we show presentations of KWIC lists extracted from the descriptions in one or more classifications that allow the rapid exploration of lexical similarities among descritions to obtain suggestions about possible affinities of contents. Conclusions


Connecting classifications in the digital world Reliable connections among such knowledge representation, information retrieval and lexical tools as classifications, lists of subject headings, thesauri, terminological collections and ontologies, are a necessity in the ever more pervasive world of networked knowledge-based activities.


Users in different settings, with different demands and expectations want to fulfil their information needs wherever information is available, cutting costs and times as much or more than possible, regardless of the heterogeneity of sources: from quite specialized databases or dedicated portals to general online library catalogues or Web search engines, from reference (metadata) databases to full-text or hypermedial digital libraries, from e-journal aggregators to preprint servers and authors' self-archives (commonly called e-print systems).


The organization, the functionalities and the interaction modes exploited by networked digital libraries may be completely different from those generally met with in traditional paper-based libraries. Moreover, just on the line of e-print systems, the development of technical mechanisms and organizational structures to support their interoperability, which is promoted by the Open Archives Initiative (OAI), is making them evolve into genuine building blocks of a transformed scholarly communication model, radically different form the traditional one, which is dominated by the heavy mediation business of scholarly publishing companies. See the OAI Website at http://www.openarchives.org


On the other hand, users do not want to change their mind to meet the particular way of storing, indexing and presenting information for any source they face: this should be automatically worked out by the system. But such a task is not trivial. As for subject indexing, different classifications, thesauri or otherwise structured terminologies, or even ontologies, while insisting over the same area, can keep presenting strong linguistic (which can not be worked out by mere translation), structural and semantic disagreements, in spite of any effort for harmonization. Dramatic disagreements are evidenced in passing from the specialized world of discipline-oriented classifications to general classifications widely used in public, school or even general academic libraries, such as Dewey Decimal Classification, Universal Decimal Classification or Library of Congress Classification. Misinterpretations are easy to occur when the same words are used in different contexts or for different purposes. Moreover, even in using one and the same classification, differences and inconsistencies are normal practice, either among different applications or inside the same application. Correctly-minded people could expect that good interconnections among classifications are at the basis of good retrieval across classifications, but this seems not to be a common case.


While a number of approaches to the issues of connecting classifications or thesauri exploit statistical methods or neural network techniques, a different trend is oriented towards the analysis, modeling and support of conceptual organization by humans. The former can be very helpful, even in view of the latter; a well defined integration seems to be the recipe for the near future. Actually, it's still worthwile, and not only for educational purposes, to work out well defined connections between classifications or the like, which provide that the objects each of them refers to are identified unambiguously by means of a suitable representation language. With the knowledge representation languages currently being designed and implemented in computer applications, this task is getting feasible.


The set of structured descriptions so obtained is the leading pole of the connections: through it, correct reference links can be established between such descriptions and items in the classifications. Starting from one classification, users can choose for the structured description(s) that correctly represent the intended meaning, and hence pass to the corresponding items in another classification. Moreover, such descriptions should be encoded in the metadata that are managed by search engines. Frequently such standard format metadata sets are the result of conversions from heterogeneous databases, and serve as indexes for queries in the original databases.

Now we turn to an overview on subject classifications. We begin with subject classifications in Mathematics, Computing, Physics.
First of all


Mathematics Subject Classification (MSC) MSC is compiled and updated by the editorial offices of the world most important bibliographical directories for mathematical research: MathSci and Zentralblatt MATH. The classification covers all branches of pure and applied mathematics, including probability and statistics, numerical analysis and computing, mathematical physics and economics, systems theory and control, information and communication theory.


*The MathSci* database** MathSci is produced by the American Mathematical Society (AMS). The paper version consists of the journals Mathematical Reviews (MR), published since 1940, and Current Mathematical Publications (CMP). MSC, compiled since 1959 (by AMS alone till the first '70s), in the first years of its existence was very unstable. So, for the part which appeared in print from 1940 to 1972, the MathSci database got new classification data, which are stable for relatively long time (1940-1958, 1959-1972) and therefore more suitable for database search than the frequently varying ones of the print version. Starting with 1973 the database is indexed with the same classification codes that appear in the print version. The 1995 and 2000 versions are available in hypertextual presentation at http://www.ams.org/msc/


*The Zentralblatt MATH* database** Zentralblatt MATH is edited by European Mathematical Society (EMS), the Fachinformationszentrum (FIZ) Karlsruhe and the Heidelberg Akademie der Wissenschaften (Germany); it is established in cooperation with Cellule de Coordination Documentaire Nationale pour les Mathématiques (Math Doc Cell, France). Several European Editorial Units cooperate with the Editorial Office in Berlin. The paper versions consists of the journal Zentralblatt für Mathematik und ihre Grenzgebiete / Mathematics Abstracts (ZM/MA), issued since 1931, formerly by Deutschen Akademie der Wissenschaften zu Berlin; published by Springer-Verlag. The database is indexed with the 1991 and 2000 MSC versions; some superseded classification codes from preceding versions are also present. Math Doc Cell issues a multilingual (French, English, Italian) Web presentation of the 2000 MSC version, available at http://www-mathdoc.ujf-grenoble.fr/MSC2000/db.html The English data has been taken from the AMS site (http://www.ams.org/msc/); the Italian ones from the the site we set up at http://www.math.unipd.it/~biblio/math/, which I will explain shortly.


The evolving structure of MSC After 1973 major MSC revisions came in use in 1980, 1985, 1986, 1991, 2000. >From 1959 to 1985 the MathSci version of MSC counts 60 major sections; 61 from 1986 to 1999 and 63 since 2000. Until 1972 the classification was issued in two levels; an intermediate level became available in 1973, and is progressively being exploited, as far as MSC increases in detail and so grows in size. Started with 1436 numbers in 1959, MSC counts 4895 numbers in 1999 and 5590 since 2000. A consistent and ever growing apparatus of cross references helps understanding connections between different branches of mathematics.


The EULER project Mathematics Subject Classification is one of the classification systems provided for by the Dublin Core (DC) metadata format, and is used inside DC metadata for the search engine developed in the European Union project European Libraries and Electronic Resources in Mathematical Science (EULER). The main objective of EULER was the realization of a "one-stop shop" for research on mathematics information resources such as books, pre-prints, Web pages, abstracts, collections of articles and reviews, periodicals, technical reports and theses. The result is a Web meta-interface for parallel simultaneous queries to a heterogeneous collection of databases. See the EULER site: http://www.emis.de/projects/EULER/

Let's look other classifications in the field of Mathematics:


*The Referativnyj zhurnal: Matematika* classification scheme** It is prepared as a piece of the Universal Decimal Classification (UDC) for Referativnyj zhurnal: Matematika. An English translation is provided by the AMS site in textual form, at the address: http://www.ams.org/mathweb/Classif/RZhClassification.html


Zentralblatt für Didaktik der Mathematik Classification Scheme (ZDM) This scheme is used for the bibliographic database on mathematics education and related fields MATHDI, active since 1976, which can be accessed through the sites of the European Mathematical Information Service (EMIS). The paper version of the database is Zentralblatt fur Didaktik der Mathematik. A Web presentation of the ZDM classification is available at: http://www.mathematik.uni-osnabrueck.de/projects/zdm

In the field of Computing we start with:


ACM Computing Classification System (CCS) This classification is issued by the Association for Computing Machinery (ACM) in the USA, for the directories Computing Reviews (CR) and Guide to Computing Literature (GCL). Moreover, it is adopted by the bibliographic database CompuScience, produced by Fachinformationszentrum (FIZ) Karlsruhe, Department of Mathematics & Computer Science Berlin, which contains references from CR since 1976, from GCL since 1977 and from Section 68 Computer Science of MSC in ZM/MA. ACM's first classification system for the computing field was published in 1964. Then, in 1982, the ACM published an entirely new system. New versions based on the 1982 system followed, in 1983, 1987, 1991, and 1998. Web presentations of the 1964, 1991 and 1998 versions are available at: http://www.acm.org/class/1998

Moving into the field of Physics we find:


Physics and Astronomy Classification Scheme (PACS) PACS is prepared by the American Institute of Physics (AIP) in collaboration with certain other members of the International Council on Scientific and Technical Information (ICSTI) having an interest in physics and astronomy classification. The most recent internationally agreed scheme was published by ICSTI in 1991. Revised editions of PACS are published biennially, or as necessary, by AIP. PACS contains 10 broad categories subdivided into 66 major topics http://www.aip.org/pubservs/pacs.html


INSPEC Classification INSPEC is an English-language bibliographic information service providing access to the world's scientific and technical literature in physics, electrical engineering, electronics, communications, control engineering, computers and computing, and information technology. INSPEC was formed in 1967, based on the Science Abstracts service, which has been provided by the Institution of Electrical Engineers (UK) since 1898. Still today Physics Abstracts, Electrical & Electronics Abstracts and Computer & Control Abstracts together form the Science Abstracts series of journals, which is the paper version of the INSPEC database.


INSPEC Classification Sections INSPEC Classification is divided into four major sections: Section A: Physics - it's a version of PACS; Section B: Electrical & Electronic Engineering Section C: Computer & Control Section D: Information Technology

Now we are at the general subject classifications; we start with:


Dewey Decimal Classification The Dewey Decimal Classification (DDC) system was conceived by Melvil Dewey in 1873 and first published in 1876. The latest (21st) edition was released in 1996, so an average 6 year period intercourses between an edition and the next. The Dewey Decimal Classification is published in two editions, full and abridged. The Classification is kept up-to-date electronically through electronic versions: Dewey for Windows, a CD-ROM product that is updated annually; and WebDewey in CORC, a Web-based product that is updated quarterly. The DDC is published by Forest Press, a division of OCLC Online Computer Library Center, Inc. DDC is widely used all over the world, not only for book shelving in libraries, especially in public, school and general academic ones, but also for subject indexing and browsing in general online document retrieval tools, such as bibliographic databases (including the national bibliographies of sixty countries), online library catalogues (including WorldCat, the OCLC Online Union Catalog), digital libraries, Web search engines. The DDC has been translated into over thirty languages. The classification is developed and maintained in the US national bibliographic agency, the Library of Congress. The Dewey editorial office is located in the Decimal Classification Division of the Library of Congress, where annually the classification specialists assign over 110,000 DDC numbers to records for works cataloged by the Library. Having the editorial office within the Decimal Classification Division enables the editors to detect trends in the literature that must be incorporated into the Classification. The editors prepare proposed schedule revisions and expansions, and forward the proposals to the Decimal Classification Editorial Policy Committee (EPC) for review and recommended action. The print version of Edition 21 is composed of nine major parts in four volumes as follows: Volume 1: New Features: A brief explanation of the special features and changes in Edition 21 Introduction: A description of the DDC and how to use it Glossary: Short definitions of terms used in the DDC Index to the Introduction and Glossary Tables: Seven numbered tables of notation that can be added to class numbers to provide greater specificity. Except for notation from Table 1 (which may be added to any number unless there is an instruction in the schedules or tables to the contrary), table notation may be added only as instructed in the schedules and tables Tables, together with the very structure of the hierarchy in some areas of the classification, make up an effective approximation to facet analysis. Lists that compare the previous edition with the new edition: - Relocations and Reductions; - Comparative and Equivalence Tables; - Reused Numbers. Volumes 2 and 3: Schedules: The DDC numbers arranged in their hierarchical organization, presented with descriptions, links, etc. Volume 4: Relative Index: An alphabetical list of subjects with the disciplines in which they are treated subarranged alphabetically under each entry Manual: A guide to classifying in difficult areas, information on new schedules, and an explanation of the policies and practices of the Decimal Classification Division at the Library of Congress. Information in the Manual is arranged by the numbers in the tables and schedules.


The CARMEN project Just for overcoming the gap between physical access possibility to networked information resources and their effective availability, due to content dishomogeneity, the German project Content Analysis, Retrieval an Metadata: Effective Networking" (CARMEN), lasting from October 1999 to February 2002, is approaching content analysis with developments and prototypical implementations in three fields: MetaData Treatment of (remaining) heterogeneity Retrieval for structured documents and heterogenous data types. Within the Working Package 12: Cross concordances of classifications and thesauri, programs for interconnecting general classifications such as DDC and discipline-specific ones, (MSC, PACS, and the classification for social sciences) are being developed in Java on a relational database system with an abstract intermediate level to allow a transit to different producers of database software.


Universal Decimal Classification (UDC) UDC was created towards the end of the Nineteenth century by Paul Otlet and Henri LaFontaine as an adaptation of DDC in view of the preparation of a universal bibliography. Until recently responsibility for the scheme belonged to the FID (Federation Internationale de Documentation); this responsibility was passed to a consortium of publishers (the UDC Consortium) in 1992. The scheme consists of 60,000 classes (divisions and sub-divisions) as well as a number of auxiliary tables.


Library of Congress Classification In 1899 the Librarian of Congress Dr. Herbert Putnam and his Chief Cataloguer Charles Martel decided to start a new classification system for the collections of the Library of Congress (established 1800). Basic features were taken from Charles Ammi Cutter's Expansive Classification. LCC is an enumerative system built on 21 major classes, each class being given an arbitrary capital letter between A-Z, with 5 exceptions: I, O, W, X, Y. After this was decided, Putnam delegated the further development of different parts of the system to subject specialists, cataloguers and classifiers. Initially and intentionally the system was, and has remained, decentralized and the different classes and subclasses were published for the first time between 1899-1940. This has lead to the fact that schedules often differ very much in number and the kinds of revisions accomplished.


Displaying subject classifications: our achievements Our work with subject classifications has been directed to the generation of highly portable hypertexts, suitable to facilitate readability and discovery of meaning by humans in a generality of complex documentation structures as classification schemes, terminologies, metadata collections, etc. We are especially exploiting a presentation mode (double view) that allows moving to and fro parallel views of the same or similar structures along links inside or between the structures; this proves very useful in our setting. Such hypertexts are produced mainly by a pool of standard C programs, which operate only on sequential ASCII files and are aimed to the analysis and transformation of specific texts and to the generation of groups of syntactically simple but highly connected and JavaScript enriched HTML pages (H-volumes).


*The Scientific Classifications Page* Various tools for exploring subject classifications have been realized in this way and are collected in The Scientific Classifications Page http://www.math.unipd.it/~biblio/math/eng.htm. Besides hypertextual presentations of subject classifications, the page collects some H-volumes presenting KWIC (Key-Word-In-Context) lists extracted from the descriptions of one or more combined classifications. Descriptions are circularly permuted on significant words, i.e. words out of a stop-word list; the very long list of resulting strings is dispalyed on the right, subdivided into smaller manageable lists, which can be accessed through an index appearing in the left frame. This redundant but properly paginated presentation allows the rapid exploration of lexical similarities among descritions to obtain suggestions about possible affinities of contents. The Scientific Classifications Page page includes: *The Mathematics Classification Page* *Mathematics Subject Classification MSC and Dewey Decimal Classification DDC* *KWIC (KeyWords In Context) lists for Scientific Subject Classification Descriptions*


*The Mathematics Classification Page* http://www.math.unipd.it/~biblio/math/engmsc.htm Collects six hypertextual frame presentations of the latest version of Mathematics Subject Classification, MSC2000. From a sequential ASCII file containing the whole MSC2000, two H-volumes were obtained, respectively MSC2000b H-volume, simple frame presentation: http://www.math.unipd.it/~biblio/math/mainb/mhbmain.htm MSC2000d H-volume, double view presentation: http://www.math.unipd.it/~biblio/math/doppiaeng/mhdmain.htm The same process being worked out on a file containing an Italian translation of MSC2000, we obtained the simple frame MSC2000id H-volume, Italian translation: http://www.math.unipd.it/~biblio/math/italiana/mhimain.htm while instead of the double-view one, we processed the two files in combination with the first file, to obtain the simple frame MSC2000l H-volume, interleaved English and Italian texts: http://www.math.unipd.it/~biblio/math/it+eng/mhlmain.htm From the combination of the first ASCII file with other ones, containing collections of specific data, we obtained other H-volumes: From a file resulting from a comparison of MSC2000 with the 1991 version, we obtained MSC2000d H-volume, simple frame presentation, including changes from MSC 1991: http://www.math.unipd.it/~biblio/math/complexc/mhcmain.htm From a file containing data about subject specific pages of relevant Websites, we obtained a true Virtual Reference Desk for Mathematics, MSC2000w H-volume, simple frame presentation, with guide pages linking to subject specific pages of relevant Websites http://www.math.unipd.it/~biblio/math/travel/mhwmain.htm


*Mathematics Subject Classification MSC and Dewey Decimal Classification DDC* http://www.math.unipd.it/~biblio/math/engddc.htm We advanced on this line by throwing off connections between classification numbers from the DDC 21 and MSC2000 schemes; a draft page in double view presentation was then produced: *Connections between the classification schemes DDC21 and MSC2000* http://www.math.unipd.it/~biblio/msc-cdd/index.html In view of the revision of the 510 section of DDC, Mathematics, we are updating such a draft along the proposal presented by Giles Martin, Assistant Editor of the Dewey Decimal Classification. Meanwhile, we have put together the descriptions of: - the proposed revision of the 510 DDC section - MSC2000 - the sections E - N of the ZDM classification, encoded as 97E - 97N in the MSC style to produce the KWIC list H-volume *Lexical connections between the classification schemes DDC22 510 and* *MSC2000 + ZDM E-N* http://www.math.unipd.it/~biblio/kwic/msc-cdd/index.html.


*KWIC (KeyWords In Context) lists for Scientific Subject Classification Descriptions* http://www.math.unipd.it/~biblio/math/engkwic.htm. The following H-volumes have been produced: *KWIC list of phrases of MSC2000 classification scheme* http://www.math.unipd.it/~biblio/kwic/msc/ *KWIC list of phrases of PACS 2001 classification scheme* http://www.math.unipd.it/~biblio/kwic/pacs/ *KWIC list of phrases of ACM Computing Classification System (1998)* http://www.math.unipd.it/~biblio/kwic/acm/ *Combined KWIC list of phrases of MSC2000 and* *PACS 2001 classification schemes* http://www.math.unipd.it/~biblio/kwic/msc-pacs/ *Combined KWIC list of phrases of MSC2000 and* *ACM Computing Classification System (1998)* http://www.math.unipd.it/~biblio/kwic/msc-acm/ Such kind of preliminary lexical support shall be worked out for investigating the connections among other groups of classification schemes. Furthermore, some improvements obtainable by discrimination of homonyms, synonyms and secondary terms shall be investigated.


Conclusions Anyway, the most effective (and obvious) way for interconnecting subject classifications, thesauri or lists of subject headings is provided by bibliographic records, when more than one system is used for subject indexing inside the same records. Actually, the same documents come mostly to be represented, in different bibliographic utilities or catalogues, with indexing data from different systems. While general library OPACs rely on DDC and national lists of subject headings, specialized bibliographic databases are each confident on its discipline-specific classification or thesarus. It would suffice to put these data for matching records together to create the bridge. In this way, browsing inside one subject indexing system can be integrated either with direct access to document metadata (or possibly documents), or with passage to another subject indexing system for further navigation. Suitable metadata for identifying versions of subject indexing systems should be required for effective navigation tracking, but a metadata format for such objects has yet to be defined. Work for defining a metadata format for subject classifications and their versions in the framework of metadata formats for documents is strongly at issue now. While backing such developments, our realizations in subject classification displaying are intended to demonstrate possibilities for library OPACs to integrate their functionalities with discipline-specific environments for document search and retrieval. Moreover, our approach could be exploited in the development of gateways and portals pointing to e-print servers. By means of our KWIC list displays for descriptions of single or combined classifications, words or phrases used to describe places in different classification spaces could be turned into addresses of communicating sites in different environments. Through the metadata that match the identified codes in the discipline-specific classifications, an OAI compatible service provider could transform these abstract addresses into actual full-text documents available from discipline-specific servers. In the next future, the keywords that will index a cooperative effort on scientific classifications will be OPAC, OAI compatible e-print server, metadata.

Last modified May 26 2001

SLIDE INDEX

Mathematics Subject Classification and related classifications in the digital world

Mathematics Subject Classification
and related classifications in the digital world