Digital libraries and digital archives: new
distribution models in the information chain
John Mackenzie Owen
TICER / Tilburg University
The ongoing move
towards digital distribution of information through the global network
infrastructure is creating a shift from the traditional role of the library as
a ‘clearing house’ for printed
publications to a role as a supplier of networked services for digital
information resources. The library of the future can be characterised as
follows (Mackenzie Owen, J.S. and Wiercx, A., 1996):
·
Services will be
based on digital, networked
information resources;
·
User interaction
with the library will be through from the desk-top (distance access) instead of by physically visiting the library
(on-site access);
·
Emphasis will be
on access to networked resources instead of on storing materials in the
library;
·
The traditional
library catalogue will evolve into a networked
resource discovery mechanism;
·
Bibliographic
data included in library systems will be extended to include non-document resources (e.g. persons,
organisations, datasets etc.);
·
New organisational models and distributed functions will arise, based on
co-operation and domain-based services.
Libraries belong to
the so-called ‘memory organisations’, together with archives and museums. This
reflects the fact that the global library system acts as the collective memory
of the world’s cultural and scientific heritage as recorded in the printed
word. In the world of printed publications only libraries perform a memory
function which guarantees to a certain extent that publications are not lost
after immediate use (fig. 1).
The long-term storage
of publications is more a (fortunate) outcome of librarian’s reluctance to discard
infrequently used publications than the outcome of sound management. In fact,
very few libraries have an explicitly stated responsibility with respect to
long-term storage and preservation. A legal responsibility for this exists only
for the national libraries of Europe through deposit legislation.
However, the memory
function in the publication chain is far from perfect. It is highly selective and random. What is preserved for
future generations depends on a large number of decisions, and any publication
stands a chance of being lost from the collective memory. To give a few
examples:
·
What enters into
the library system depends on what authors and publishers decide to publish.
There are many examples of cultural and scientific works which are inaccessible
because they have remained unpublished;
·
All libraries
have an acquisitions policy that determines which publications enter the
library collection. There is no system which guarantees that each publication
will be acquired by at least one library, with the exception of publications in
countries with a well-organised legal deposit system;
·
Libraries do not
always store indefinitely all publications they acquire. Although there is a
natural tendency amongst libraries not to discard items from the collection,
this sometimes is necessary, e.g. for economic reasons;
·
Publications
stored in libraries are sometimes lost due to media deterioration, either
caused by inadequate storage conditions or other factors such as the chemical
self-destruction of publications printed on chlorine paper;
·
Many kinds of
disasters, such as fire or flooding, can lead to loss of publications.
·
Finally,
political factors and censorship frequently prevent publications from being
acquired by libraries, or lead to their removal from the collection.
However, in the
printed world preservation of the intellectual record is enhanced by large
print-runs, which means that publications are usually produced in at least
hundreds of copies and are acquired by many different libraries, often distributed
over the entire world. The chance that at least a single copy of a publication
is preserved once it has been published is usually quite large.
In the world of
digital publications the collective memory is as selective and random as it is
in the world of print. In fact, the situation is far worse when we consider the
following:
·
Digital
publications are produced and archived in a far smaller number of copies – in
most cases only a single copy is made available and stored on the network;
·
The cost per
access for digital archiving is higher that that of print archiving; since
budgets are limited, less copies will eventually be archived;
·
Digital materials
periodically need to be migrated to new storage media, data formats and system
environments; the future cost of migration is uncertain and it is likely that
many digital archives will not be maintained in a way which guarantees that all
materials will remain accessible;
·
In general there
is a lack of understanding of digital archiving issues, which at least
initially could lead to data loss;
·
Libraries are
still focused on print publications, and tend to neglect their memory function
for digital publications;
·
Finally, the
dynamic, interactive, distributed document types which are now emerging are
extremely difficult – if not -
impossible to archive in comparison with the current text and image based
documents.
The nature of digital publications
makes the archival task for libraries more difficult. But without adequate
measures, there will be no archiving by libraries at all, and as a consequence
the collective memory of science will disappear. This becomes clear if we look
at the various models for digital publishing that are now beginning to emerge.
These models all imply distribution directly from the creator or publisher to
the end-user over the network, with no direct involvement from intermediary
organisations such as libraries. The immediate consequence of this is that the
distribution channel no longer has a memory function performed by organisations
that have long-term archiving as their implicit (most libraries) or explicit
(deposit libraries) responsibility. Consider the three publishing models
described in fig 2:
·
Self-publishing, i.e. by individual authors or their parent
organisations. There is no guarantee that they will have the inclination or the
resources to maintain long-term availability. The archives (such as WWW and
FTP-sites) they set up on the network will be subject to frequent changes and
will usually have a short life-span, as is already noticeable to anybody trying
to access materials put onto the Internet more than a year ago.
·
Publisher archives. Many large, international publishers are now
creating so-called ‘archives’ or repositories for distributing their
publications in digital form. Although some now also distribute journals in
digital form to libraries, they most certainly will not continue to do so. However,
it is clear (and some publishers have already explicitly stated this) that
materials will only be available through these repositories for as long as
there is sufficiently frequent demand to justify the cost of storage. After a
certain period (probably 2 to 5 years) publications will be removed from the
repository and will no longer be available. When a publication goes ‘out of
print’ in this way, there will be no copies stored in libraries as is the case
with printed publications.
·
Push technology. The current publication model is based on the
‘pull’-concept: users interested in a publication go to a library or digital
repository and pull the document out of the files for personal use. This is
precisely the reason why (short-term) storage is required: to hold the
information in a file until a user comes and asks for it. In certain areas of
publishing – and perhaps in future in science publishing too – this model is
being replaced by the ‘push’-concept: the user indicates the type of materials
he or she is interested in, and relevant materials are immediately sent to the
user when it is created or published. In this model, there is no need for a
memory function anywhere in the distribution channel.
From our analysis it
becomes clear that digital archiving, i.e. maintaining accessibility of
publications for future use, is a function that needs to be organised in an
explicit way. It is highly unlikely that creators and publishers of digital
information will be able to provide a coherent and persistent memory system.
They have no commercial interest in long-term archiving, and they will not have
the technical skills and funds to maintain digital collections indefinitely.
What is needed for
digital archiving is a system which gives the responsibility for digital
archiving to organisations which have a specific archival function, which can
develop the highly specialised skills required for long-term storage and
preservation, and which can guarantee global accessibility to archival
materials over the network. The European approach, which can serve as a model
for other geographic areas, is the system of national deposit libraries
(Mackenzie Owen, J.S. & Walle, J. v.d., 1996). These have a legal responsibility
for archiving print materials that is currently being extended to cover digital
publications. This system could well be supplemented by other archival
organisations in specific subject domains, e.g. scientific institutes and
emerging virtual libraries operating on a global scale.
Digital deposit
libraries could interconnect to form a comprehensive archival backbone for
other libraries to provide service to users. In this way, there is no need for
these other libraries to maintain their own digital collections (other than
very frequently used current materials). Although the cost of digital archiving
is higher than that of print archiving, this system would create enormous
savings as compared to the current system. In the current system, the same publication
is stored in a large number of libraries, each creating its own archival cost.
In the system proposed here, only one storage location is required (or at least
an extremely limited number for reasons of security and network efficiency). On
a global scale the reduction in archival cost could be very large.
The system of archival
deposit libraries for digital materials is based on two simple principles, viz.
That publishers are willing (or legally obliged) to deposit a copy of digital
materials on publication, and that the deposit library is allowed to provide
global access to these materials as soon as they are no longer accessible from
a repository under control of the publisher.
What does this mean
for libraries in general? The ongoing move towards digital distribution of
information through the global network infrastructure described at the
beginning of this paper has major consequences for the traditional archival
function. In the networked world a single location is sufficient. There is no
need for the traditional ‘many copies, many libraries’ approach. In addition,
publishers will not allow libraries to store digital publications because they
wish to control access and maintain direct relationships with their customers,
i.e. the end user. Therefore, publishers will set up digital repositories as
short-term archives (possibly through outsourcing to subscription agents).
However, publishers will not take on the responsibility for long-term
archiving. But long-term digital archiving is expensive and requires
specialised skills and infrastructure. Therefore, digital archives can only be
maintained by national libraries and/or large, specialised, international,
domain-based virtual libraries (Mackenzie Owen, J.S., 1996).
The large national
deposit libraries are, at least in Europe, beginning to perform the long-term
archival function to maintain access to digital information. It is therefore
essential that they obtain a legal basis that extends their responsibilities to
include digital materials. However, archiving on a national scale is not
sufficient in a globally networked environment. It is also necessary that the
digital deposit libraries join forces to create a globally interconnected
archival system, together with specialised digital archives, e.g. for specific
areas of science.
Recently there has
been some discussion on the use of the word ‘(digital) archive’ for what many
librarians would regard as the digital library collection. Traditionally, the
distinction between libraries and archives is based on the following
characteristics:
·
Libraries collect
items which are 'published' (either by official publishers or as grey
literature by other organisations or individuals), whereas archives collect items
related to 'work processes' (e.g. the work carried out by a specific
organisation) and organisational entities or individuals.
·
Libraries collect
items in anticipation of their primary use (reading, studying); archives
collect items after their primary use (the 'work process' in which they were
used).
·
Libraries collect
items which are available in multiple copies; archives collect items which are,
in the majority of cases, unique (e.g. correspondence).
·
The value of
library items is in their content as such; the value of archival items is not
in their intrinsic content, but in what they tell us about the work process in
which they were used and/or the organisation or individual by which/whom they
were used. That is the reason why archival items lose their meaning if they are
not stored 'in context', i.e. in relation to other items from the same process
or originator.
In the traditional
sense, therefore, digital archives are not archives. However, in the context of
digital information the term 'archive' is acquiring a rather different meaning.
It is now being used merely to refer to a storage location for digital objects.
Especially publishers tend to use the term 'digital archive' to refer to
depositories of digital publications on the Internet. There seems to be a need
for a term to describe these repositories. Of the various functions of a
library, the storage function is becoming isolated from the rest of library
services, and is indeed shifting from
libraries to
publishers and other organisations (c.f. the Los Alamos pre-print site). In
fact, it is becoming clear that digital libraries will provide many types of
useful services, but will themselves not maintain digital collections (the
'storage
versus access'
debate). Therefore, the term 'digital archive' refers to a 'collection' (of
digital documents) which is not part of a library.
For digital libraries
without an explicit archival and preservation responsibility, the digital
collection will be relatively unimportant. Local storage of digital materials
will have the function of a short-term cache (e.g. to improve the efficiency of
access to frequently requested materials), not of a long-term archive. This
means that digital libraries will be able to – and have to – concentrate on
their key functions: providing access to materials stored in large digital
archives, handling license agreements for end-user access to copyright
materials, providing a coherent set of access and delivery tools and
procedures, and offering service and support to users. In addition, digital
libraries could develop a role in end-user digital publishing and as an
intermediary between authors and digital archives.
Mackenzie Owen, J.S.
(1996) – Preservation of digital materials for libraries. In: European research
libraries co-operation; the LIBER quarterly, 6(1966)4, p. 435-451.
Mackenzie Owen, J.S.
& Walle, J. v.d. (1996) - A study of issues faced by national libraries in
the field of deposit collections of electronic publications: final report. -
Luxembourg: European Commission.
Mackenzie Owen, J.S.
and Wiercx, A. (1996) - Knowledge models for networked library services. -
Luxembourg: European Commission.
Task Force (1996) -
Preserving digital information: report of the Task Force on archiving of
Digital Information commissioned by the CPA and the RLG: final report and
recommendations.
(This paper is a
revised and expanded version of a paper for the Academia Europaea Workshop ‘The
impact of electronic publishing on the academic community’, Stockholm, April
1997)