Contribution: ICSU Press - UNESCO Expert Conference on Electronic Publishing in Science. Unesco House, Paris, France, 19-23 February 1996

UNDERSTANDING AND SHAPING SCIENTIFIC INFORMATION TRANSFER

Joost G. Kircz
Elsevier Science
Sara Burgerhartstraat 25, 1055 KV Amsterdam
and
Communication in Physics Project, WINS Faculty, University of Amsterdam.
Valckenierstraat 65, 1018 XE Amsterdam

Hans E. Roosendaal
Elsevier Science
Sara Burgerhartstraat 25, 1055 KV Amsterdam

1. Introduction

In this presentation, we report a two fold approach to the issues and opportunities modern electronic media pose for scientific information. The first part of this paper addresses a number of elements in the process of information: needs, transfer, and disclosure in academic environments and discusses results of in-depth interviews with a number of scientists from various fields. In the second part, we discuss the changes electronic publishing will induce in scientific information handling. We try to analyse the different cognitive components leading to a variety of ways in which information is presented, and we briefly discuss recent research towards a better understanding of the fundamental changes electronic publishing will introduce.

2. Process and needs

2.1 The science process

The main issue to be addressed in the context of electronic publishing is: "How can it support and enhance the science process"? Communication is the essence of science, and more particularly, it is the engine of the whole science process (1, 2). The scientific communication process is an object of investigation and it provides data for research programmes in a variety of science studies (1, 3,4,5,6,7).

It would go well beyond the scope of this contribution to describe the science process even in some detail. We will assume here that the science process consists of a system of related, mostly competing research programmes (4, 5). On this basis a number of different stages in the research process from conceptionalisation of problems, to theory, to hypotheses, to predictions and testing, and finally interpretation of research outcomes can be distinguished (5). While we realize that there is no consensus on the above, these different stages lead to a number of main communication needs as experienced by researchers in different fields (see below).

This structure of the science process has a number of social consequences, which are discipline dependent. Most important are common standards, resulting in specific rules and ethics. Furthermore, each scientist has to establish his own position, and this is mainly done through recognition of his contributions to science in the research process. These contributions can be informal and formal and are to a large extent manifested in publications (6, 7).

2.2 Communication needs

Generally, the communication needs result from research needs in the different stages of the science process. Analysis (8) indicates the following needs:

and also:

2.3 Developments

It is inherent to science, and to the science process, that both are in constant flux or growth. In this contribution, two aspects of this constant growth are worth mentioning:

3. Research

In the previous section, we have formulated a number of theses on the communication process as an important engine for the science process. Communication needs are seen to be related to, and have different impact on, the different stages in every research process. The main question then is: how can we increase the effectiveness and efficiency of the communication process for the individual researcher? What are the main elements, what are the main expectations and desires a researcher has? For our research we identify as key issues:

3.1 Research design

The above-mentioned key issues are being addressed in field research comprising a number of in-depth interviews with individual researchers.

In our heuristic model there is a tendency to an open infrastructure and an integrated system. This model is being investigated on a stratified sample of individual researchers in the following scientific disciplines:

The objective of the research is to identify the expectations and desires researchers have with respect to the above themes. A number of pertinent themes is probed on a structured way using so-called provocative statements. Opinions of researchers are then further probed, using expert interviewers from the publishing departments of Elsevier Science. In that way, we allow hypotheses and other issues to be put to test, and to be criticized or falsified. Motives for certain opinions, expectations and desires can then be identified. A full description of the research method is given in (11). An example of a provocative statement, and some results for the mentioned disciplines, are given in Table 1.

Table 1

When looking for specific information, researchers are not interested in the quality of the refereeing process
Discipline Opinion Expectations on infrastructure Desires on infrastructure
All 2.41.92.0
Clinical medicine 2.3 (0)1.7 (0)2.1 (0)
Neuroscience 2.1 (0)2.1 (0)2.5 (0)
Organic chemistry 1.9 (--)1.6 (-)1.5 (-)
Mechanical engineering 2.9 (+)1.9 (0)1.8 (0)
Higher energy phys. 2.7 (+)2.5 (+)2.2 (0)



Researchers will more and more use on-line information services that select sources on the basis of their own personal profile in order to fulfil their own specific information needs
Discipline Opinion Expectat. on infrastructure Desires on infrastructure Expect. on information needs Desires on information needs
All 4.24.44.34.44.3
Clinical medicine 4.7 (++)4.6 (0)4.7 (++)4.6 (0)4.7 (++)
Neuroscience 4.5 (++)4.8 (++)4.6 (0)4.8 (++)4.6 (0)
Organic chemistry 4.2 (0)4.2 (0)3.9 (0)4.2 (0)3.8 (-)
Mechanical engineering 3.8 (-)4.6 (0)4.4 (0)4.6 (0)4.5 (0)
Higher energy phys. 3.7 (--)4.2 (0)4.0 (0)4.2 (0)4.0 (0)

Numbers indicate agreement with statement and scale from 1 to 5 (1 is strong disagreement, 5 is strong agreement). Brackets denote difference of discipline from average ranging from significant (++) to significant (--).

4. Results and Conclusions

For this contribution, we restrict ourselves to summarizing the main overall results and conclusions from our research. First we discuss the results in terms of the four main functions in scientific communication (section 4.1). Then we discuss more specific needs behind these functions in detail (section 4.2).

4.1 Main functions

It is useful to distinguish four main functions in scientific communication.

Technological dynamics will clearly influence all these functions, however, not conceptionally, but much more in the way these functions can be performed in the future. Recent technological developments allow novel ways of access to stored information, and this again impacts on the way information needs to be structured (see below). Technological dynamics can then lead to a new architecture of scientific communication, provided this architecture is accepted by the scientific community. This scientific community has in the past proven to be rather conservative in its acceptance of new technology, as is illustrated in the following quote (1): " resistance to new media stems from scientists' concern that the goals of the scientific system would not be fulfilled by these media".

4.2 Acquisition needs

The results of the survey show that researchers have rather well- defined expectations and desires with respect to acquisition needs. We can separate acquisition needs into two parts: demands with respect to the information proper and demands with respect to the process of acquiring information.

4.2.1 Information needs

4.2.2 Process of acquisition

There are a number of different strategies to select, retrieve, and process information. The following main elements come to the fore:

4.3 Dissemination needs

Dissemination of information is seen to serve two main goals (9):

The research indicates that the following familiar issues are considered as remaining important or to becoming even more important:

In general, researchers have high expectations that more direct interaction using electronic facilities for informal and formal communication will increase feedback, and therefore effectiveness and efficiency of the research process.

4.4 Summary of our first results

The agents in the publishing chain may well focus on the following main aspects:

5. Design for the future

5.1 Introductory remarks

From the studies discussed in the first part of this paper, it is clear that scientific information is contextual in a double sense. Firstly the type of information is different in different fields. A geological chart is a totally different object from a histogram of radioactive decay rates, though both can be displayed as large colour posters. Secondly the usage of different types of information (including the cutting and clipping) is different. The emerging electronic tools already heavily influence the way scientists think and represent their thinking and research results. These two contextual levels of will be expressed differently in different media.

Present day digital information acquisition, storage, and handling techniques represent the apogee of the development which started with the possibility of using electrical devices for information handling. Given the flexibility of these techniques, we see that reporting of scientific research and its technical expressions will be further entangled. All this is not new; in the early sixties, Marshall McLuhan's famous book "Understanding Media" (12) already heralded discussions on the deep influences new technologies have in shaping culture. Most of these discussions, however, were developed in departments of Mass Communication and Media Studies. Within the sciences, we spent a lot of time and energy in developing these new tools but we hardly analysed the decisive role new technologies have in reporting our own results. In order to be able to understand, shape and use the new media proper, without loosing the essential objectives of scientific communications discussed in section 4. of this paper, we have to dissect the various interacting levels and their components.

5.2 Preparing a research programme

Within the context of our research programme which aims at defining and developing the employment of the new electronic media, we would like to discuss here two different but intertwined components:

Within the following, we take the burgeoning development of sheer storage and transport (bandwidth) capacity as given. These exploding technologies provide the technological infrastructure for novel methods. As interesting as they are, as objects of scientific research per se, they are, however, not critical of the conceptual developments needed to address issues in scientific information handling as outlined in section 4.

5.2.1 Presenting and storing information.

Over the last years, we already saw a most promising development towards a better structuring of information. The Standardized Mark-Up Language (SGML), and Hypertext Mark-Up Language (HTML) are well known and accepted working standards today (13). A quite different approach than just loading classical documents on electronic storage media, leads to research to reveal and structure the inherent modularity of information. Text, pictures, films, animations, and sound are all separated and independent ways of presenting information. Until now, technology has confined the bulk of information presentation to text with illustrations. At the moment we see an explosion of technical possibilities which make available in addition to texts, all non-textual forms of information. The point is, however, that we do not need additions to texts, but that we need integrated information systems (as already discussed in section 4).

Every kind of presentation of information has its own character and is a different expression of the reported object, phenomenon, or theory. If we really want to value the possibilities of including sound, colour, movies, etc., into regular scientific reporting, we have to analyse their specific rôles in the communication process (see section 4.3). Historically, communication is confined to the printed journal, with the result that text is now the most important ingredient. Pictures started as illustrations of the text: as extensions. In the course of time, visual display of quantitative information became a craft in itself: the picture expresses more than a thousand words can do (14). In an electronic environment, the picture might become a similar prime source of information, whilst the text then becomes the explanation to the figure in complete symmetry with the figure as an illustration of the text. In the same way, films, sounds, animations, etc., will become full expressions of scientific results in their own right. We will deal with this point further on in the next section.

5.2.2 Disclosure

Within the Library Sciences, information retrieval (IR) research is already a well established field. In this contribution, we will not spend much time on these aspects. At the moment, it is sufficient to list the following fundamental problems IR research is facing (15):

  1. In systems where we use the full text of articles, so called free text searching systems, the search possibilities are confined to the words provided by the author. The manipulable information is restricted to the work as provided by the author. As already emphasised above, research and hence the authors language is very contextual, full of jargon and very much the expression of more or less closed social environments. For that reason free text searching systems are very difficult to handle for readers who are not conversant with the jargon of the particular field. This might be readers from other (adjacent) fields, but also readers within the field but reading from another perspective, be it geographically (American scientist reading Russian science), or temporally (today's scientists reading old work in their own field). From an other point of view, one can say that free text searching approaches the problem from the authors point of view.
  2. In systems with controlled keyword lists and thesauri (externally added keys), we are confronted with the almost impossibility of mapping content onto a fixed list of concepts. Whilst in the case of free text systems, we are able to maximally manipulate the texts as given, in the case of controlled keywords we reduce (or coalesce) language into fixed notions. However, to be useful, these notions need to be stable, at least for some time. Thus controlled keywords and thesauri always lag behind the research language used. It is important to note that, opposite to free text terms, controlled terms express in a way the readers point of view. Unfortunately, articles are now only indexed once, and retrospective indexing of collections of articles in order to identify old work to new concepts, and vice versa, never happens.
  3. In cases where we use references to disclose works that we need, we take the list of references as transmittal indicators. Not the works we have accessed, but the cited works are wanted. The problem is that the reason a reference is given by the citing author is not always clear. Is it just to show the author knows his field, is it to flatter a possible referee, is the reference to the competition deliberately left out, etc.? What is needed is a better link between the cited work and the context in which the citing author deems this reference useful. Fortunately, due to the speed-up of the publication process by electronic means, the time-lag inherent in the use of references as disclosure tools will be reduced. The use of references as disclosure tools emphasize their context, or embedding, of the wanted information.

Thus the research programme that we propose entails the development of domain-specific information representation structures which link scientific or related information concepts to the specific context in which they are used. One way to do this is to create a collection of flexible domain-specific thesauri. Even if terms in different thesauri within a collection are literally the same, they do not necessarily represent the same concept. Every term which will be put into context in a specific domain is therefore a much more powerful tool. If we now allow the domains to overlap slightly, we will be able to generate a collection of thesauri which, like an atlas of road maps of different scale and lay out, guide the searching researcher from one domain to another. A programme on overlapping thesauri in mathematics and physics starts soon. Here we try to develop a mathematical theory (16) to match overlapping terms (and there synonyms) extracted from a large and coherent set of articles within well-defined fields in mathematics and physics. The ultimate goal of this research programme is to develop techniques for the generation of an Atlas of contextual scientific index terms.

6. First steps to a new architecture

Following the requirements and expectations on storage, retrieval, etc., as resulted from our investigations, reported in the beginning of this paper, and in order to appreciate the new possibilities and fit them into the framework of conscientious scientific discourse, we have to clarify and define the various characteristics of the different kinds of information.

6.1. Texts

The essay form of scientific documents is a typical result of the use of print on paper sheets. The portability, browsability and comprehensiveness of the paper product is the end of a century long historical development process. In an electronic environment the characteristics might well change. All components of the paper product which are repetitive can be deleted as recurring objects, as they are always retrievable from the archive when needed for the integration of information by the reader. For example, it is customary (or even obligatory) to have an introduction which explains the authors' goals and serves to embed the reported work into a wider context. In an electronic environment, say a kind of hypertext structure, introductions might be reduced to pointers which link reported work to a review article in which the whole context is fully explained. Furthermore repetitive reviews of one's own and other researchers' work can be reduced if the structure of the reporting has a more modular build-up instead of the present linear story-telling structure. The aim then is to structure texts in different types of modules, in such a way that each kind of module has its own information value. It is important to note that scientific articles are already well structured according to well established rules and have familiar headings such as: Introduction, Methods, Data, Results, Discussion, Conclusion. However, this does not mean that all sentences dealing with, say, methods, can be found under that heading. Analysis shows that linear texts are generally much less structured then section headings suggest.

In our research programme we analyse a coherent collection of scientific papers in two different ways. Firstly, we analyse the different types of information contained in the documents (e.g., Goal, Embedding, Tools & Methods, Results, Data-handling, Apparatus, Discussions) as a first break-up of the linear structure. We take this set of types as basic modules and try to fit the original text therein. Of course such a simple linear set of modules is not sufficient. Within every module we make a further subdivision which relates this module to others. So, within the module "Apparatus" we can, e.g., distinguish the description of the apparatus used, the apparatus in context to other machines (the embedding of the experimental set-up), the apparatus in contrast to apparatus used by others (apparatus as part of the discussion). The main goal here is to reveal a possible modularity of information by analysing existing articles, in order to come to a heuristic model for a non-linear modular way of writing articles.

This part of the analysis is augmented by a linguistic study where the same set of articles is analysed as argumentative texts. According to well-established models of the Pragma-Dialectical approach in argumentational theory (17), we try to reveal the line of reasoning in a scientific article with the aim to use it as a tool for better structuring. The goal here is to develop a model for the relationship between the above mentioned modules. This way, we can assign to each module not only a scientific tag, but also a rhetorical one, e.g., a module "Goal" has a completely different character than a module "Data-Handling". While in the "Goal" module the author can express all kinds of speculations freely, the value of the module "Data-Handling" demands very strict adherence to well-established standards and procedures. Integrating both approaches will result in a model for a modular presentation of scientific texts, where each model has a well defined scientific as well as contextual character. The advantage of such a structuring is clear for the following modes of use:

6.2. Active mathematics and simulations

Although text-based, mathematics represent a totally independent way of representing results. The research in this field is now aimed mainly at defining a (SGML) grammar for mathematics which will enable manipulation of formulae and their use in calculation of symbolic manipulation packages. Simulations contain again an independent way of communicating scientific ideas. Here the reader has to have the possibility to change the model and/or the parameters to develop one's own further research based on published research. The publication of computer programs, be it simulations or calculation packages, demands the development of one's own standards and rules. Some experience is actually gained in the management of program libraries, such as the Computer Program Library from the Queens University of Belfast, which is integrated in the paper journal Computer Physics Communications.

6.3. Still Pictures.

The analysis of potential applications of non-textual material still has to start. Pictures will be more than just "illuminations" of the text. Pictures have their own intrinsic value. At first sight, we can already appreciate the great difference between a graph (in any dimension) and a colour picture of an aberration of an optical device. Interestingly, in the peer review process, no standards or rules are established to review pictures as independent objects. In the analyses of pictures and their rôles, the results of textual studies will be helpful. Important items are:

6.4. Motion Pictures

Apart from the items mentioned for still pictures the following extra features have to be tackled. Film or video (a sequence of still pictures) differs from animation. In the case of film and video we still have the difference between immutable and re-creational pictures. In the case of animations, however, we can also think of including a tool for the reader's adaptations and modelling.

6.5. Sound

The case of sound is special because digital sound is a very well developed field with an almost total manipulation capacity. Nevertheless, the use of sound as an independent way of presenting scientific results is hardly considered at present, except in speech research or general sound recording. The cognitive value of sound objects is so different from visible objects that a completely new field can be opened up.

7. General conclusions

In this paper, we first try to define the rôle of information in the science process and describe investigations where we try to explicate the communication needs of researchers in different fields. This information provides us with a backbone and yardstick for the development of new ways of organising the scientific communication process. It clearly points to a greater integration of various types of information as well as the capacity of the reader to manipulate this freely. This way, social, cognitive and intellectual demands can be met by the emerging technologies in a cross-fertilizing way.

This "user" research is a starting point for our collaboration in various university projects under the umbrella programme "Communication in Physics". In this programme, we investigate the opportunities modularity of scientific information offers, to make optimum use of electronic media. We also research sophisticated combinatorial techniques to develop an Atlas of overlapping controlled index term systems.

Although the programme "Communication in Physics" is focused on physics as main corpus of investigation, the results are expected to be applicable to other research domains as well. However, in line with our conclusions, specific cultural differences should then be taken into account.

Our main message in all this is, that in order to go beyond the "electronification" of the classical publishing process, we need to have an in-depth knowledge of the use, needs and presentation requirements and possibilities of scientific information.

Acknowledgements

The work described in this paper is a collaboration of the Faculties of Arts, and Mathematics, Informatics, Physics, and Astronomy (WINS) of the University of Amsterdam, the National Research Institute for Mathematics and Computer Science (CWI),and Elsevier Science. The work is partly financially supported by: Stichting Physica, Royal Academy of Science and Arts (KNAW), Royal Library (KB), Shell Research Amsterdam (KSLA), Elsevier Science.

References

  1. W.D. Garvey. Communication: The essence of science. Pergamon Press, Oxford 1979.
  2. H.E. Roosendaal and A.P. de Ruiter. The Journal at the crossroads of developments in scientific information and information technology. Paper presented at Conference in Helsinki 1990.
  3. T.S. Kuhn. The structure of scientific revolutions, 2nd enlarged edition. Chicago Univ. Press. 1970.
    I. Lakatos. Falsification and the methodology of scientific research programmes. In: I.Lakatos and A. Musgrave. Criticism and the growth of knowledge. Cambridge Univ. Press. 1970. p.135.
    B. Gholson, W.R.Shadish Jr., R.A. Niemeyer, and A.C. Houts (eds.). The psychology of science. Cambridge Univ. Press. 1989.
    S. Jasanoff, G.E. Markle, J.C. Petersen, and T. Pinch (eds.). Handbook of science and technology studies. Sage Publ. London 1995.
  4. I. Lakatos. The methodology of scientific research programmes. In: J. Worrall and G. Currie (eds) Philosophical papers, vol 1, Cambridge Univ. Press. 1978.
  5. G. Panhuijsen and R. van Hezewijk. To be published Univ. of Utrecht.
  6. A.G. Gross. The rhetoric of science. Harvard Univ. Press, Cambridge 1990.
  7. R. Merton. The sociology of science: theoretical and empirical investigations. Univ. of Chicago Press. 1973.
  8. F. van Rooy. The rôle of electronic media in scientific communication. Thesis University of Utrecht 1995.
  9. D. Schauder. Electronic publishing of professional articles: Attitudes of academics and the implications for the scholarly communication industry. JASIS vol.45(2), 73-100.
  10. See for example:
    J. Maddox. Nature. Vol.376. p.11, p.113, and p.385
    C. Bell. Nature. Vol. 376. p.375
  11. P.A.Th.M. Geurts and H.E. Roosendaal. Mixed market research for strategic management. To be published.
  12. Marshall McLuhan. Understanding Media: The Extensions of Man. Routledge & Kegan Paul Ltd. London, 1964.
  13. For a good overview of the developments towards the actual situation see: J. André, R. Furuta, V. Quint (eds). Structured Documents. Cambridge University Press, 1989.
  14. Edward R. Tufte. The Visual Display of Quantitative Information, Graphic Press, Cheshire, Conn. 1983
    Edward R. Tufte. Envisioning Information, Graphic Press, Cheshire, Conn. 1990.
  15. For a critique see:
    Joost G. Kircz. Rhetorical structure of scientific articles: The case for argumentational analysis in information retrieval. Jnl. of Documentation, 47(4), 1991, pp.354-372.
    D.C. Blair . Langauge and representation in information retrieval. Elsevier, 1990.
    For a recent collection of overviews see:
    "Special Topic Issue: Evaluation of Information Retrieval Systems". Edited by Jean M. Tague-Sutcliffe. JASIS, vol.47(1), January 1996 .
  16. M. Hazewinkel. Tree-tree matrices and other combinatorial problems from taxonomy. CWI report AM-R9507, April 1995.
  17. Frans H. van Eemeren, Rob Grootendorst, and Tjark Kruiger. Handbook of Argumentation Theory: A critical survey of classical backgrounds and modern studies. Floris Publications, Dordrecht, 1987.
    Frans H. van Eemeren and Rob Grootendorst (eds). Studies in Pragma-Dialectics. Sic Sat, Amsterdam, 1994.