received 4 January 2001
Nevertheless, we witness a sometimes heated debate on the value of such "electronic documents". In my view, we have to make a difference between documents that look, smell and sound like a paper document but are stored and transmitted by electronic means, and documents that are originally created for an electronic environment, and hence are new animals in the zoo of scientific communications.
The discussion on the value of electronic documents is often hampered by the fact that one starts from what one is accustomed to in the paper world and attempts to impose that on an electronic environment. The scientific paper as we know it is a paper-based object that obviously can be cast into various technical forms, but intrinsically remains a paper object.
In order to grasp the impact of the current electronic revolution, as well as being able to set out a policy towards the future, we have to abstract from the presentation form and start with the aims and content of scientific communication before we zoom in on a particular presentation form.
We have to step back and analyse what it means to write for an electronic medium and what it means to read material that is stored electronically. In a paper world, writing and reading are very close. Writing for an electronic medium, means an understanding of the full capacities the medium contains. Reading electronic articles on the other hand doesn’t mean reading from a screen. The presentation becomes flexible! In contrast to paper, the electronic media allow a distinct difference in presentation between the author’s favoured presentation and the consumer’s reading practice.
An electronic document is not the electronic version of a traditional paper document with embellishments such as hyperlinks, colour pictures and illustrative animations. An electronic document is a document comprising a variety of different types of information presentations that are brought together by an author in order to present a comprehensive scientific argument. Or to put it in other terms: in an electronic publication, images, animations and so on cease to be illuminating illustrations to the text, but are now semi-independent knowledge representations that together with the text comprise the scientific argument communicated to peer scientists.
In order to develop new insights in an editorial policy that maintains
the essential virtues of the paper document as well as incorporates all
the new exciting features, I will firstly discuss the scientific paper
as we know it. Subsequently, new ways for knowledge expression are dealt
with. In the concluding section, I try to set out some guidelines for the
coming period.
The necessity of a clear understanding of what a scientific publication actually is, is well formulated as: Publication is the hard currency of science. It is the primary yardstick for establishing priority of discovery, making the status of a publication a critical factor in resolving priority disputes or intellectual property claims. Academic tenure and promotion decisions are based in large part on publication in peer-reviewed journals or scholarly books. To make these decisions fairly and with confidence, scientists and their institutions need assurance of what counts as a legitimate electronic publication.
Thus, the challenge is to ensure that, independent of the technology used, the use and exchange value of this type of currency can be established universally for all participants in the world of science.
The Working Group proposes a list of minimum characteristics to qualify a document as a "publication". It is worthwhile to confront this list with, on the one hand, the expansion of the concept document to all coherent knowledge presentations being textual, non-textual or a mixture, and on the other hand the list of communication needs presented by Kircz and Roosendaal (Kir96). This list of communication needs reads: 1) awareness of knowledge, 2) awareness of new research outcomes, 3) specific information, 4) scientific standards, 5) platform of communication, and 6 ) ownership protection. It is immediately clear that scientific communication needs as such encompass a much wider range of interaction between scientists than formal publications.
The Working Group makes a useful distinction between an informal notification, a first publication and a definitive publication. They recommend four main characteristics that adhere to all publications; we will discuss them now.
This means that the demand for fixedness must be tailored towards a demand for the inalterableness of the content of the said object. This means that we have to interpret the demand for fixedness as a demand for a well-defined descriptive standard about the content of the document. A standard that enables the storage and maintenance of the integrity of the information independent of the carrier of that information, be it a clay tablet or a future DNA chip. It goes without saying that the current developments in descriptive languages such as the Standard General Mark-up Language (SGML) and its successor the eXtended Mark-up Language (XML) are of the utmost importance. If, finally, all information in a document is properly coded according to such a language, we deal with simple ASCII, or better Unicode, strings that can be handled in all conceivable material memory structures. For integrity reasons, such a file can be endowed with an electronic watermark. For the future user of the once-stored document, only the capability to read it again from the then popular medium is of importance. For the immediate future, an interesting initiative is the NCSA Astronomical Digital Image Library (ADIL), a repository providing astronomers with research quality images strait from the telescope to their desk over the Web (Pla99).
Secondly, we have the aspect of internal integrity and coherence. This is typically an XML issue. This persistence aspect can be covered by the introduction of a complete list or map of contents as an integral part of every document. Not only do the bitstreams of every component of the document have to maintain their integrity, but also the mutual relations between the various components. We also need a mechanism to check that all components are present. This last demand can become a serious problem in the future. More and more documents will be rendered from components residing in different databases. Think about an astronomy article that calls for data from a huge database filled with satellite measurement data. As an electronic publication is, in principle, a modular entity and not an essay (Kir98a), the persistence demand requires that a publication guarantee that all components remain available. This demand is closely linked to the problem of dead hyperlinks. All this converges to the discussion on the Digital Object Identifier initiative. The International DOI foundation was "created in 1998 and supports the need of the intellectual property community in the digital environment, by the development and promotion of the Digital Object Identifier system as a common infrastructure for content management" ( Doi00a, Doi00b). The DOI foundation is supported by almost all major (commercial) publishers and societies. The idea behind DOI is that every item that has an assigned copyright (hence also books) will get a unique identifier. In the course of the developments, this identifier will be endowed with metadata such as bibliographic information, genre, but also publishers’ information and price. In the first round as experimented with in Crossref (Cro00), DOI is limited to a one-to-one link with the URL of scientific articles in a publisher’s data base. In the full implementation, it is envisioned that also DOI allows choices, e.g., to go to a copy of the identified entity or to a metadata record about the entity, or to an identical copy of the same entity at different location (mirror site). Adding metadata to DOI’s will allow the reader to choose which type of realisation of a particular document is wanted, e.g., as a PFD file, an XML file or whatever other storage types are available. It is clear that the DOI approach is a strong attempt to ensure the integrity of information entities seen not only as intellectual property containers but also as a step towards electronic commerce and trade with intellectual property rights.
A competitive scheme for reference linking, emerging from the scientists who are engaged in the world of pre-print servers is the Open Archives initiative. Its goal reads: "The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. The Open Archives Initiative has its roots in an effort to enhance access to e-print archives as a means of increasing the availability of scholarly communication. (Oai00). The aim of this initiative is to promote those active authors for whom self-archiving solutions are a preferred option. The interoperability between such archives becomes the prime research (Som00). Though DOI and OAi are approaching the problem from two antagonistically philosophical backgrounds, both schemes at the end must ensure the integrity and quality demands that are at the basis of proper scientific discourse.
A bibliographic record (metadata) is essential for fulfilment of this recommendation. The issue of metadata that entails much more than the traditional bibliographic information is also dealt with in section 3.
Depending on the availability and affordability of the technical means, the Working Group recommends:
For a first publication, next to version control, the Working Group recommends,:
Another issue here is of a more archival nature, namely that from every serious document, at least one copy is stored safely in an archive. This is an important and strong demand in a period where paper is on the way out and a plethora of digital media, each with their own way of data handling, emerges. This point is closely related with the first point on fixation. It is also closely related with the metadata issue. It implies that a central organization such as the National Library of Congress in the USA must install a legal depot of all items with unique storage codes for ever.
For the definitive publication, the Working Group recommends, alongside persistence and version control, assignment and persistence of a web address.
Out of all this discussion, one thing becomes crystal clear, namely that the issue is very much domain dependent. Whilst in theoretical physics the pace of research is such that every new idea is immediately broadcast via pre-print servers, although often after internally peer reviewed by the researcher’s institute, in more experimental fields, the tempo is more relaxed. After all, it is easier to steal an idea than to redo an experiment. In medicine, the question is intrinsically more sensitive as new medical information is often rocketed to high levels of public phantasy. In this field, the discussion on ethics and misconduct is a permanent concern (Hud00). For a recent review on the domain dependency of refereeing in e-journals, see Weller (Wel00).
In the above, we critically discussed the recommendations of an International Working Group. As we have seen, the most visible tension between the very reasonable recommendations and the electronic publication, is that an electronic publication is not a paper publication stored in a different medium. In the next section, I will dwell on the unique features of electronic publications, sharpening the argument for publishing standards, which will be summarized in the last section.
The abstract notions of the International Working Group are, of course,
fine; however, the problem is in the implementation. This implementation
demands a better grasp of what electronic documents are. For precisely
that reason, we try in the present section to make an advance on this issue
in order to specify recommendations in the final section.
In the electronic future, stills and moving pictures, sounds, simulations and soon also tactile information can be exchanged and experienced, hence analysed and interpreted by different people separated by time and place (Kir98b) . This means that a genuine electronic document will be a composition of text, images, sounds, animations, etc. All these components of the electronic document must adhere to quality and integrity standards. Thus, within the law of proper scientific discourse, all knowledge presentations are equal. To continue this political metaphor, we can say that we certainly need a diversity policy, to replace the period of positive discrimination of text only.
Here is not the place here to dwell at length on the differences between intuitive understanding by means of non-textual stimuli and scientific understanding through linguistic reasoning, but we must come to a realisation of the tenet that non-textual components will play a central rôle in the electronic document of the future.
In order to create an environment in which all this can be organised in a meaningful way, the first conclusion is that, in the first approximation, we have to consider all the various components as independent but interacting objects. This will lead to a modular approach of information.
However, in an electronic environment, introducing-already existing information into a new work is trivial. This is exactly the reason why the concept of modules is so crucial. In order to keep the integrity of the original work, introducing a module in a new work means introducing a complete module.
The difference between quoting and multiple use is that in multiple use, the new author can rely on the completeness and integrity of the original module. Hence, if, in a new work, a description of: a machine, the working of a medicine, or a mathematical proof is needed, reference to another work realises a new dimension. Now, we can seamlessly introduce the existing text into the new work. The old work doesn’t have to be located in a library elsewhere, but the electronic network allows us to input this information right there where it is needed.
This means that a module must be compatible with usages in different environments, indicating not that a link points to relevant information elsewhere, but rather that a link now transports elsewhere- located information into the present work.
Harmsze proposes a new structuring of scientific articles in modular form. A module is defined as a "uniquely characterised, self-contained representation of a conceptual information unit aimed at communicating that information". This means that a module is a textual, pictorial, or other representation, of an amount of information that in itself is sufficiently comprehensive to convey meaning for a reader. Note that neither length nor size enter the definition of a module. Although Harmsze deals mainly with modules that comprise coherent texts, the model is perfectly able to integrate non-textual modules as well. In the model, a distinction has been made between elementary modules and complex modules. Depending on the purpose, elementary modules can be merged to form complex modules just as atoms bind to molecules. Two types of such "bounded" complex modules can be distinguished.
We can compare such a compound module with a chemical molecule that is unique in itself, but can be analysed as a set of bound molecules and atoms.
We can compare this kind of complex module with the chemical example of a cluster, where we have many identical atoms weakly bound together.
Modularity allows for selected reading paths so that modules can be skipped or emphasised, depending on the reader’s wish, expertise or level of understanding.
Please note that we store information units only once! The bottom line is SGML-coded objects that will change their appearance according to the document style demanded by the presentation medium
Unfortunately Harmsze’s approach is not the end of the analysis. If we discuss multiple use, we also have to incorporate other granularities of information as well, even down to a single number.
At all events, full modules or single datum must be identifiable as unique entities in a database. This means that all coherent objects must carry inseparable metadata with them.
It is crucial in the following to realise that links are considered to be anchored on both sides, source and target, and can be traversed back and forth. This means that, e.g., the characterisation "section" in one direction indicates "belongs to" in the other direction. This is technically still a tedious problem, but within the XML environment good progress is being made (XML99).
In research, part of the work is to relate previously unrelated scientific findings within a new context. In a modular environment, this process can be enhanced. The way to do this is by naming hyperlinks in such a way that the reader knows why a link is being suggested by the author. At present, we have no clue as to why hyperlinks are added; we can only find out by clicking on them. In a structured environment, we know what the reason for this link is and we can decide to follow it or not. This brings us to the tedious discussion on hyper-link taxonomies or typographies.
Unfortunately very little has been published in the literature. Most of the initiatives are attempts towards a more-or-less complete list of possible notions (tags). In some works, a distinction is suggested between structural/ organisational relations and rhetorical or discourse relations. Our feeling is that in a distributed database environment, we have to start with a clear differentiation between at least two, and maybe three categories of relations. a) Organisational relations, describing the structural relationship of modules, e.g., hierarchical relations such: as part of, etc. b) Discourse relations describing the reasoning, such as argument for/against, an example, clarification. The discussion on this issue is ongoing and part of current research. (Har00, Kir00 and references therein); and c) context relations describing the context in which a certain relation is valid. Obviously the structure of this last category might be domain-dependent.
Instead of trying to curb history by conservative approaches, as some publishers try to enforce with their refusal to allow authors to post their own papers on their web site, we have to be forward-looking.
The conclusion so far is that we face a transition in which the traditional journal article will cease to exist. This means that we have to reformulate our notions about scientific documentation. In my view, which I defend in this contribution, we have to go for a distinctly different granularity of information units than that which the traditional paper one allows.
1. If we define modules as conceptual units, we can apply strict rules about quality. At present, a scientific article is peer-reviewed without any discrimination between the various kinds of information in it. In a world of well-defined modules, the refereeing standard for a module Method will be distinctly different from the module Data-acquisition. Thus, quality control will go up.
2. If all modules are endowed with a set of metadata that clearly identifies the author and time of creation, integration of a module in another work is automatically taken care with due credit being given. The DOI approach is promising in this respect. Of course, people can always retype, steal and add fraudulent data, but misconduct is a social problem and not a scientific one.
3. Another interesting new outcome of this analysis is that relations, which express themselves in hyperlinks become information objects on their own merit. As relations in an electronic environment can be typed, they become objects with metadata. Thus, we have to add the bibliographic information of the originator and a time stamp. This way, the minimum scientific publication becomes the brilliant insight of a researcher who connects two separate information units by a typed link, without any further business.
4. For documents that are built from available and new modules, we will have two levels of authentication, one on the level of each module and the other on the level of the complete new work.
5. Modular publication will have a list or map of contents with links to all components as well as a new kind of abstract that reflects the content of all modules and serves as an orientation tool in the hypertext environment. Not only is the completeness of the information part of the integrity but also the overview and a description of the mutual relationships between the components.
Therefore, the lesson of this contribution is that electronic media
enhance the integration of textual and non-textual knowledge representations,
enabling a proper conceptual segregation between various kinds of knowledge
and therefore allowing for more specific refereeing. The flip side of these
new capabilities is that we have to develop a stable system of domain-dependent
metadata for modules and relations that steer the logistics and storage
of these modules and relations. We can think back wistfully to the stable
situation of established peer-reviewed journals we built over the last
century; however, the unknown is the object of science and we are entering
a new and unknown phase in scientific communication. Therefore, we have
to make sure that our societal and scientific demands for quality and integrity
are not mixed with the latest fashion in technology. Technology is enabling
us to expand scientific communication into a serious mix of textual and
non-textual components. For most of the non-textual components we don’t
even have a good insight what quality standards are. Like all real advancement
in science, also the development of scientific communication will go through
experimental phases. From the analysis of these experiments we will be
able to develop new standards and rules. It is a matter of the highest
importance that the scientific community takes this experimenting serious
and does not bend for conservative forces that try to restrict the developments
to the known and established practises of the paper world.
CDA00 Centre de Données astronomiques de Strasbourg. http://cdsweb.u-strasbg.fr/CDS.html
Cro00 Crossref. The central source for reference linking. www.crossref.org
Dan93 H.-D. Daniel. Guardians of science. Fairness and reliability of peer review. Translated by Willed E. Russe. VCK, Weinheim 1993.
Doi00a Home page Digital Object Identifier Foundation. www.doi.org
Doi00b The DOI handbook Version 0.5.1. 11 August 2000. http://www.doi.org/handbook_2000/index.html
Gar79 W.D. Garvey. Communication: The Essence of Science. Pergamon Press, Oxford 1979.
Har00 Frédérique Harmsze. A modular structure for scientific articles in an electronic environment. PhD dissertation University of Amsterdam, 2000. The full text and appendices is available via: www.science.uva.nl/projects/commphys/papers
Hrd00 See for the publications on internet by Stevan Harnad. http://cogsci.soton.ac.uk/~harnad/intpub.html
Hud00 Anne Hudson Jones and Faith McLellan (eds.) Ethical Issues in Biomedical Publication. Johns Hopkins UP, Baltimore 2000.
IWG99 International Working Group. Defining and Certifying Electronic Publication in Science. A proposal to the International Association of STM Publishers. http://associnst.ox.ac.uk/~icsuinfo/aaas-stm.htm
Kir96 Joost G. Kircz and Hans E. Roosendaal. Understanding and shaping scientific information transfer. In: Dennis Shaw and Howard Moore (eds.) Electronic publishing in science: proceedings of the joint ICSU Press/UNESCO Expert Conference Paris February 1996. Unesco Press 1996 pp. 106-116.
Kir98a Joost G. Kircz. Modularity: the next form of scientific information presentation? Journal of Documentation, vol.54, no. 2, March 1998, pp. 210-235. The final draft can be found on: www.science.uva.nl/projects/commphys/papers
Kir98b Joost Kircz. Nouvelles présentations! Nouvelle science?. In: L’écrit de la science, Writing science. Forum Européen de la science et de la technologie (DGXII), Nice 1998. Alliage no. 37-38 , Hiver 98- Printemps 99. Pp. 14-24. For an English version.
Kir00 Joost G. Kircz and Frédérique Harmsze.Modular scenarios in the electronic age. Conferentie Informatiewetenschap 2000, Rotterdam 5 April 2000.
Mea98 A.J.Meadows. Communicating Research.Academic Press, San Diego 1998.
OAi00 Open Archive Initiative. www.openarchives.org
Pla99 Raymond L. Plante, Richard M. Crutcher, Robert E. McGrath. The NCSA astronomy digital image library: from data archiving to data publishing. Future Generation Computer Systems 16 (1999), pp. 49-61.
Rot99 Jeff Rothenberg. Avoiding Technological Quicksand: Finding a viable technical foundation for digital preservation. Council on Library and Information Resources. Washington. DC. 1999. http://www.clir.org/pubs/reports/rothenberg/contents.html
Sep98 September 1998 American Scientist Forum.september98-forum@listserver.sigmaxi.org
Som00 Herbert van de Sompel and Carl Lagoze. The Santa Fe Convention of the open archives initiative. D-Lib Magazine February 2000, Vol.6. Number 2. http://www.dlib.org/dlib/february00/vandesompel-oai/02vandesompel-oai.html
Wel00 Ann C. Weller. Editorial peer review for electronic journals: current issues and emerging models. Journal of the American Society for Information Science 51(14) 2000 pp. 1328-1333.