2007-2Lexique et écrits scientifiques
(The lexicon in scientific texts)
This issue has been put on line in its integrality on the Cairn portal:
  • Agnès TUTIN (Grenoble 3)
    Autour du lexique et de la phraséologie des écrits scientifiques
    (Lexicon and phraseology of scientific texts)
    pp. 5-14
  • Peter BLUMENTHAL (Cologne, Allemagne)
    Sciences de l'Homme vs sciences exactes : combinatoire des mots dans la vulgarisation scientifique
    (Humanities vs Science: how to use words in scientific vulgarization)
    pp. 15-28

    Our contribution compares the lexical characteristics and combinatorial properties of nouns in two kinds of entries from the Encyclopædia Universalis (2005). We split the Encyclopædia into two big subcorpora, one for texts dealing with the humanities, the other for texts representing the natural sciences. We compared these subcorpora in terms of lexical overlaps, lexically distinctive features and combinatorial properties of certain nouns. On the basis our tools to determine the combinatory profile of words and to compute degrees of similarity between them, profound lexical and semantic differences between both “academic cultures” emerged.

    Représentation et caractérisation lexicale des sciences dans Wikipédia
    (Lexical representation and categorization of science in Wikipedia)
    pp. 29-44

    The free and online encyclopaedia project Wikipedia has become in less than six years one of the most prominent commons-based peer-production example. The way the project works and evolves is now at stake for academics eager to explore auto-organized structures. Although many studies have been led on the connections between contributors, the linguistic properties of Wikipedia productions remain almost unexplored. In this article, we focus on the way sciences are represented within the project and examine the general and epistemic lexical characteristics of the articles thanks to the comparison of a set of corpora extracted from Wikipedia’s category system.

  • Patrick DROUIN (Montréal, Canada)
    Identification automatique du lexique scientifique transdisciplinaire
    (Automated identification of a transdisciplinary scientific lexicon)
    pp. 45-64

    In this paper, we propose a first step leading to the description of the lexicon of scientific language by identifying a transdisciplinary scientific lexicon (TSL). The TSL is domain independent and forms a central lexical core of all domains; it is at the center of argumentation we find in scientific discourse as well as its structuring. In order to gather the transdisciplinary lexicon, we use natural language processing (NLP) tools and statistical techniques; central to our method is the calcul des spécificités (specificity measure) put forward by Lafon (1980). By using NLP tools, we want to verify if it is possible to quickly and simply complement existing lexical resources without much manual intervention. We conclude our study by exploring an observation made by Phal (1971) about collocations and scientific discourse. We focus here on V-N collocations revolving around nouns taken from our TSL.

  • Averil COXHEAD & David HIRSH (Massey, Nouvelle-Zélande / Sydney, Australie)
    A pilot science-specific word list
    pp. 65-78

    The coverage of the General Service List (GSL) (West, 1953) and Academic Word List (AWL) (Coxhead, 2000) over a science-based written academic English corpus of approximately 875,000 words is 80%, compared with three corpora of the same size from arts (86.7%), commerce (88.8%), and law (88.5%) (Coxhead, 1998). The AWL coverage of 9.1% over science is similar to arts and law, the coverage of the GSL over science is 65%, 10% lower than the coverage over law, 8% less than arts, and 6% less than commerce. One way to address this gap in coverage is conduct a corpus-based study of the vocabulary in academic science texts to establish whether there is a science-specific vocabulary consisting of words outside the GSL and AWL. Hirsh (2004) found that academic subject areas with the highest proportion of technical vocabulary make use of the lowest proportion of general service vocabulary. This pilot study found 318 such word families with coverage of approximately 4% over a science-specific corpus of 1.5 million running words, in contrast to its coverage of well under 1% of the arts, commerce, and law corpora mentioned above, and a 3,500,000 word corpus of fiction.

  • Mojca PECMAN (Paris-VII)
    Approche onomasiologique de la langue scientifique générale
    (Onomasiological approach of general scientific language)
    pp. 79-96

    This study aims at developing a method for investigating the invariants across different scientific discourses. The paper illustrates a model analysis for processing general scientific vocabulary by applying three complementary approaches: cross-disciplinary, onomasiological and phraseological. The combination of these approaches allows us to bypass traditional methods generally based on a lexical analysis of a single domain. We first define the borderline between the specialised scientific language features and those features that are common to various scientific discourses, then we present a corpus-based study of a combinatorial profile of lexical resources used for formulating general scientific ideas. This method thus provides a framework for the modelling of general scientific language resources.

  • Agnès SÁNDOR (Xerox Research Centre Europe, Meylan)
    Modeling metadiscourse conveying the author's rhetorical strategy in biomedical research abstracts
    pp. 97-108

    The importance of the role of metadiscourse is increasingly recognized for natural language processing applications like text-mining and information extraction. Thus the detection of metadiscourse has recently been identified as a task in several domains, including the processing of scientific literature. We have developed a natural language processing system that detects and highlights in biomedical research abstracts a particular kind of metadiscourse that conveys the author’s rhetorical strategy. In this paper we describe the model of rhetorical metadiscourse underlying the system. Our model, combining and extending previous discourse analysis methods and models, is based on both conceptual and syntactic analyses of metadiscourse. We argue that this model is effective for automatic processing.

  • Françoise BOCH, Francis GROSSMANN & Fanny RINCK (Grenoble 3 / Université Grenoble-Alpes)
    Conformément à nos attentes... : les marqueurs de convergence/divergence dans l'article de linguistique
    (As expected...: markers of convergence/divergence in linguistic articles)
    pp. 109-122

    This article presents the qualitative analysis of a corpus of published linguistics articles. Two types of lexical markers were studied: those signaling convergence (e.g. conformément à nos attentes) or signaling divergence (e.g. contre toute attente), in relation to the expectations of the writer-researcher or of the larger scientific community. The objective of this study is to examine to what extent these markers reveal the way the writer-researcher constructs and validates knowledge in their article. The corpus analysis found several functions for these markers, thus highlighting a wide variety of possible expectations. As a result, we argue that such an analysis must not only see these markers as linked to other terms but must also to take into account their relation to the entire text. Therefore, markers of convergence and divergence may not necessarily be the locus of an implied scientific reasoning. However, they potentially express various epistemological styles.

  • Dirk SIEPMANN (Osnabrück, Allemagne)
    Les marqueurs de discours polylexicaux en français scientifique
    (Markers of polylexical discourse in scientific French)
    pp. 123-136

    Unlike two-word collocations, multi-word discourse markers have until recently suffered comparative neglect in lexicology and lexicography. The present article aims to remedy this deficiency for French. After giving an operational definition of the lexical items in question, the author proceeds to classify them by functional criteria. He concludes his article with a detailed survey of suggestors. The great frequency of this type of marker would seem to belie the assumption that academic language is free of subjectivity.

