Contribution: ICSU Press - UNESCO Expert Conference on Electronic Publishing in Science. Unesco House, Paris, France, 19-23 February 1996
Joost G. Kircz
Sara Burgerhartstraat 25, 1055 KV Amsterdam
Communication in Physics Project, WINS Faculty, University of Amsterdam.
Valckenierstraat 65, 1018 XE Amsterdam
Hans E. Roosendaal
Sara Burgerhartstraat 25, 1055 KV Amsterdam
In this presentation, we report a two fold approach to the issues and opportunities modern electronic media pose for scientific information. The first part of this paper addresses a number of elements in the process of information: needs, transfer, and disclosure in academic environments and discusses results of in-depth interviews with a number of scientists from various fields. In the second part, we discuss the changes electronic publishing will induce in scientific information handling. We try to analyse the different cognitive components leading to a variety of ways in which information is presented, and we briefly discuss recent research towards a better understanding of the fundamental changes electronic publishing will introduce.
The main issue to be addressed in the context of electronic publishing is: "How can it support and enhance the science process"? Communication is the essence of science, and more particularly, it is the engine of the whole science process (1, 2). The scientific communication process is an object of investigation and it provides data for research programmes in a variety of science studies (1, 3,4,5,6,7).
It would go well beyond the scope of this contribution to describe the science process even in some detail. We will assume here that the science process consists of a system of related, mostly competing research programmes (4, 5). On this basis a number of different stages in the research process from conceptionalisation of problems, to theory, to hypotheses, to predictions and testing, and finally interpretation of research outcomes can be distinguished (5). While we realize that there is no consensus on the above, these different stages lead to a number of main communication needs as experienced by researchers in different fields (see below).
This structure of the science process has a number of social consequences, which are discipline dependent. Most important are common standards, resulting in specific rules and ethics. Furthermore, each scientist has to establish his own position, and this is mainly done through recognition of his contributions to science in the research process. These contributions can be informal and formal and are to a large extent manifested in publications (6, 7).
Generally, the communication needs result from research needs in the different stages of the science process. Analysis (8) indicates the following needs:
It is inherent to science, and to the science process, that both are in constant flux or growth. In this contribution, two aspects of this constant growth are worth mentioning:
In the previous section, we have formulated a number of theses on the communication process as an important engine for the science process. Communication needs are seen to be related to, and have different impact on, the different stages in every research process. The main question then is: how can we increase the effectiveness and efficiency of the communication process for the individual researcher? What are the main elements, what are the main expectations and desires a researcher has? For our research we identify as key issues:
The above-mentioned key issues are being addressed in field research comprising a number of in-depth interviews with individual researchers.
In our heuristic model there is a tendency to an open infrastructure and an integrated system. This model is being investigated on a stratified sample of individual researchers in the following scientific disciplines:
The objective of the research is to identify the expectations and desires researchers have with respect to the above themes. A number of pertinent themes is probed on a structured way using so-called provocative statements. Opinions of researchers are then further probed, using expert interviewers from the publishing departments of Elsevier Science. In that way, we allow hypotheses and other issues to be put to test, and to be criticized or falsified. Motives for certain opinions, expectations and desires can then be identified. A full description of the research method is given in (11). An example of a provocative statement, and some results for the mentioned disciplines, are given in Table 1.
|Discipline||Opinion||Expectations on infrastructure||Desires on infrastructure|
|Clinical medicine||2.3 (0)||1.7 (0)||2.1 (0)|
|Neuroscience||2.1 (0)||2.1 (0)||2.5 (0)|
|Organic chemistry||1.9 (--)||1.6 (-)||1.5 (-)|
|Mechanical engineering||2.9 (+)||1.9 (0)||1.8 (0)|
|Higher energy phys.||2.7 (+)||2.5 (+)||2.2 (0)|
|Discipline||Opinion||Expectat. on infrastructure||Desires on infrastructure||Expect. on information needs||Desires on information needs|
|Clinical medicine||4.7 (++)||4.6 (0)||4.7 (++)||4.6 (0)||4.7 (++)|
|Neuroscience||4.5 (++)||4.8 (++)||4.6 (0)||4.8 (++)||4.6 (0)|
|Organic chemistry||4.2 (0)||4.2 (0)||3.9 (0)||4.2 (0)||3.8 (-)|
|Mechanical engineering||3.8 (-)||4.6 (0)||4.4 (0)||4.6 (0)||4.5 (0)|
|Higher energy phys.||3.7 (--)||4.2 (0)||4.0 (0)||4.2 (0)||4.0 (0)|
Numbers indicate agreement with statement and scale from 1 to 5 (1 is strong disagreement, 5 is strong agreement). Brackets denote difference of discipline from average ranging from significant (++) to significant (--).
For this contribution, we restrict ourselves to summarizing the main overall results and conclusions from our research. First we discuss the results in terms of the four main functions in scientific communication (section 4.1). Then we discuss more specific needs behind these functions in detail (section 4.2).
It is useful to distinguish four main functions in scientific communication.
Technological dynamics will clearly influence all these functions, however, not conceptionally, but much more in the way these functions can be performed in the future. Recent technological developments allow novel ways of access to stored information, and this again impacts on the way information needs to be structured (see below). Technological dynamics can then lead to a new architecture of scientific communication, provided this architecture is accepted by the scientific community. This scientific community has in the past proven to be rather conservative in its acceptance of new technology, as is illustrated in the following quote (1): " resistance to new media stems from scientists' concern that the goals of the scientific system would not be fulfilled by these media".
The results of the survey show that researchers have rather well- defined expectations and desires with respect to acquisition needs. We can separate acquisition needs into two parts: demands with respect to the information proper and demands with respect to the process of acquiring information.
There are a number of different strategies to select, retrieve, and process information. The following main elements come to the fore:
Dissemination of information is seen to serve two main goals (9):
The research indicates that the following familiar issues are considered as remaining important or to becoming even more important:
In general, researchers have high expectations that more direct interaction using electronic facilities for informal and formal communication will increase feedback, and therefore effectiveness and efficiency of the research process.
The agents in the publishing chain may well focus on the following main aspects:
From the studies discussed in the first part of this paper, it is clear that scientific information is contextual in a double sense. Firstly the type of information is different in different fields. A geological chart is a totally different object from a histogram of radioactive decay rates, though both can be displayed as large colour posters. Secondly the usage of different types of information (including the cutting and clipping) is different. The emerging electronic tools already heavily influence the way scientists think and represent their thinking and research results. These two contextual levels of will be expressed differently in different media.
Present day digital information acquisition, storage, and handling techniques represent the apogee of the development which started with the possibility of using electrical devices for information handling. Given the flexibility of these techniques, we see that reporting of scientific research and its technical expressions will be further entangled. All this is not new; in the early sixties, Marshall McLuhan's famous book "Understanding Media" (12) already heralded discussions on the deep influences new technologies have in shaping culture. Most of these discussions, however, were developed in departments of Mass Communication and Media Studies. Within the sciences, we spent a lot of time and energy in developing these new tools but we hardly analysed the decisive role new technologies have in reporting our own results. In order to be able to understand, shape and use the new media proper, without loosing the essential objectives of scientific communications discussed in section 4. of this paper, we have to dissect the various interacting levels and their components.
Within the context of our research programme which aims at defining and developing the employment of the new electronic media, we would like to discuss here two different but intertwined components:
Within the following, we take the burgeoning development of sheer storage and transport (bandwidth) capacity as given. These exploding technologies provide the technological infrastructure for novel methods. As interesting as they are, as objects of scientific research per se, they are, however, not critical of the conceptual developments needed to address issues in scientific information handling as outlined in section 4.
Over the last years, we already saw a most promising development towards a better structuring of information. The Standardized Mark-Up Language (SGML), and Hypertext Mark-Up Language (HTML) are well known and accepted working standards today (13). A quite different approach than just loading classical documents on electronic storage media, leads to research to reveal and structure the inherent modularity of information. Text, pictures, films, animations, and sound are all separated and independent ways of presenting information. Until now, technology has confined the bulk of information presentation to text with illustrations. At the moment we see an explosion of technical possibilities which make available in addition to texts, all non-textual forms of information. The point is, however, that we do not need additions to texts, but that we need integrated information systems (as already discussed in section 4).
Every kind of presentation of information has its own character and is a different expression of the reported object, phenomenon, or theory. If we really want to value the possibilities of including sound, colour, movies, etc., into regular scientific reporting, we have to analyse their specific rôles in the communication process (see section 4.3). Historically, communication is confined to the printed journal, with the result that text is now the most important ingredient. Pictures started as illustrations of the text: as extensions. In the course of time, visual display of quantitative information became a craft in itself: the picture expresses more than a thousand words can do (14). In an electronic environment, the picture might become a similar prime source of information, whilst the text then becomes the explanation to the figure in complete symmetry with the figure as an illustration of the text. In the same way, films, sounds, animations, etc., will become full expressions of scientific results in their own right. We will deal with this point further on in the next section.
Within the Library Sciences, information retrieval (IR) research is already a well established field. In this contribution, we will not spend much time on these aspects. At the moment, it is sufficient to list the following fundamental problems IR research is facing (15):
Thus the research programme that we propose entails the development of domain-specific information representation structures which link scientific or related information concepts to the specific context in which they are used. One way to do this is to create a collection of flexible domain-specific thesauri. Even if terms in different thesauri within a collection are literally the same, they do not necessarily represent the same concept. Every term which will be put into context in a specific domain is therefore a much more powerful tool. If we now allow the domains to overlap slightly, we will be able to generate a collection of thesauri which, like an atlas of road maps of different scale and lay out, guide the searching researcher from one domain to another. A programme on overlapping thesauri in mathematics and physics starts soon. Here we try to develop a mathematical theory (16) to match overlapping terms (and there synonyms) extracted from a large and coherent set of articles within well-defined fields in mathematics and physics. The ultimate goal of this research programme is to develop techniques for the generation of an Atlas of contextual scientific index terms.
Following the requirements and expectations on storage, retrieval, etc., as resulted from our investigations, reported in the beginning of this paper, and in order to appreciate the new possibilities and fit them into the framework of conscientious scientific discourse, we have to clarify and define the various characteristics of the different kinds of information.
The essay form of scientific documents is a typical result of the use of print on paper sheets. The portability, browsability and comprehensiveness of the paper product is the end of a century long historical development process. In an electronic environment the characteristics might well change. All components of the paper product which are repetitive can be deleted as recurring objects, as they are always retrievable from the archive when needed for the integration of information by the reader. For example, it is customary (or even obligatory) to have an introduction which explains the authors' goals and serves to embed the reported work into a wider context. In an electronic environment, say a kind of hypertext structure, introductions might be reduced to pointers which link reported work to a review article in which the whole context is fully explained. Furthermore repetitive reviews of one's own and other researchers' work can be reduced if the structure of the reporting has a more modular build-up instead of the present linear story-telling structure. The aim then is to structure texts in different types of modules, in such a way that each kind of module has its own information value. It is important to note that scientific articles are already well structured according to well established rules and have familiar headings such as: Introduction, Methods, Data, Results, Discussion, Conclusion. However, this does not mean that all sentences dealing with, say, methods, can be found under that heading. Analysis shows that linear texts are generally much less structured then section headings suggest.
In our research programme we analyse a coherent collection of scientific papers in two different ways. Firstly, we analyse the different types of information contained in the documents (e.g., Goal, Embedding, Tools & Methods, Results, Data-handling, Apparatus, Discussions) as a first break-up of the linear structure. We take this set of types as basic modules and try to fit the original text therein. Of course such a simple linear set of modules is not sufficient. Within every module we make a further subdivision which relates this module to others. So, within the module "Apparatus" we can, e.g., distinguish the description of the apparatus used, the apparatus in context to other machines (the embedding of the experimental set-up), the apparatus in contrast to apparatus used by others (apparatus as part of the discussion). The main goal here is to reveal a possible modularity of information by analysing existing articles, in order to come to a heuristic model for a non-linear modular way of writing articles.
This part of the analysis is augmented by a linguistic study where the same set of articles is analysed as argumentative texts. According to well-established models of the Pragma-Dialectical approach in argumentational theory (17), we try to reveal the line of reasoning in a scientific article with the aim to use it as a tool for better structuring. The goal here is to develop a model for the relationship between the above mentioned modules. This way, we can assign to each module not only a scientific tag, but also a rhetorical one, e.g., a module "Goal" has a completely different character than a module "Data-Handling". While in the "Goal" module the author can express all kinds of speculations freely, the value of the module "Data-Handling" demands very strict adherence to well-established standards and procedures. Integrating both approaches will result in a model for a modular presentation of scientific texts, where each model has a well defined scientific as well as contextual character. The advantage of such a structuring is clear for the following modes of use:
Although text-based, mathematics represent a totally independent way of representing results. The research in this field is now aimed mainly at defining a (SGML) grammar for mathematics which will enable manipulation of formulae and their use in calculation of symbolic manipulation packages. Simulations contain again an independent way of communicating scientific ideas. Here the reader has to have the possibility to change the model and/or the parameters to develop one's own further research based on published research. The publication of computer programs, be it simulations or calculation packages, demands the development of one's own standards and rules. Some experience is actually gained in the management of program libraries, such as the Computer Program Library from the Queens University of Belfast, which is integrated in the paper journal Computer Physics Communications.
The analysis of potential applications of non-textual material still has to start. Pictures will be more than just "illuminations" of the text. Pictures have their own intrinsic value. At first sight, we can already appreciate the great difference between a graph (in any dimension) and a colour picture of an aberration of an optical device. Interestingly, in the peer review process, no standards or rules are established to review pictures as independent objects. In the analyses of pictures and their rôles, the results of textual studies will be helpful. Important items are:
Apart from the items mentioned for still pictures the following extra features have to be tackled. Film or video (a sequence of still pictures) differs from animation. In the case of film and video we still have the difference between immutable and re-creational pictures. In the case of animations, however, we can also think of including a tool for the reader's adaptations and modelling.
The case of sound is special because digital sound is a very well developed field with an almost total manipulation capacity. Nevertheless, the use of sound as an independent way of presenting scientific results is hardly considered at present, except in speech research or general sound recording. The cognitive value of sound objects is so different from visible objects that a completely new field can be opened up.
In this paper, we first try to define the rôle of information in the science process and describe investigations where we try to explicate the communication needs of researchers in different fields. This information provides us with a backbone and yardstick for the development of new ways of organising the scientific communication process. It clearly points to a greater integration of various types of information as well as the capacity of the reader to manipulate this freely. This way, social, cognitive and intellectual demands can be met by the emerging technologies in a cross-fertilizing way.
This "user" research is a starting point for our collaboration in various university projects under the umbrella programme "Communication in Physics". In this programme, we investigate the opportunities modularity of scientific information offers, to make optimum use of electronic media. We also research sophisticated combinatorial techniques to develop an Atlas of overlapping controlled index term systems.
Although the programme "Communication in Physics" is focused on physics as main corpus of investigation, the results are expected to be applicable to other research domains as well. However, in line with our conclusions, specific cultural differences should then be taken into account.
Our main message in all this is, that in order to go beyond the "electronification" of the classical publishing process, we need to have an in-depth knowledge of the use, needs and presentation requirements and possibilities of scientific information.
The work described in this paper is a collaboration of the Faculties of Arts, and Mathematics, Informatics, Physics, and Astronomy (WINS) of the University of Amsterdam, the National Research Institute for Mathematics and Computer Science (CWI),and Elsevier Science. The work is partly financially supported by: Stichting Physica, Royal Academy of Science and Arts (KNAW), Royal Library (KB), Shell Research Amsterdam (KSLA), Elsevier Science.