2007-1Corpus : état des lieux et perspectives
(Corpora: inventory and perspectives)
  • Henri BÉJOINT (Lyon 2)
    Informatique et lexicographie de corpus : les nouveaux dictionnaires
    (Computer science and corpus lexicography: the new dictionaries)
    The dictionary evolved from medieval glosses that explained fragments of discourse in their contexts. Those fragments were later collected, then classified and reduced to their simplest forms, ie words. The most important aspect of that evolution from the gloss to the dictionary is that the fragment to be explained was decontextualized, extracted from discourse. The main objective of the dictionary is to give an image of the system. It is now possible to improve the dictionary in its role as a tool for explaining discourse. It cannot provide explanations that would be adapted to every single context, but it can give to the user a huge quantity of discourse, and provide explanations that would be more closely adapted to every occurrence or type of occurrence. Lexicographers would be well advised to investigate those new possibilities.

  • Christian BOITET (Grenoble 1)
    Corpus pour la TA : types, tailles et problèmes associés, selon leur usage et le type de système
    (Corpus for the Machine Translation: types, sizes and connected problems, in relation to use and system type)
    It is important to realise that human translation is difficult and diverse, and that automation is needed not only by end users, but also by translators and Interpreters. Also, automation itself comes in many forms. After briefly describing computer tools for translators, we will concentrate on the linguistic and computer approaches to the automation of translation proper. This survey will yield an array of criteria for categorizing existing CAT systems, with brief examples of the state of the art. Finally, we present perspectives of future research, development, and dissemination.

  • Anne CONDAMINES (Toulouse 2-Le Mirail / CNRS)
    L'interprétation en sémantique de corpus : le cas de la construction de terminologies
    (The role of interpretation in corpus semantics: building a terminology)
    The aim of this paper is to focus on the necessity of a double marking-out when doing the semantic analysis of the data of a corpus. The first mark lies in the situation in which texts are produced and the second one lies in the interpretation of the texts. In both cases, the author suggests to use the notion of 'genre' (textual 'genre' and interpretative 'genre') in order to classify and categorize situations. The issue is exemplified by the problem of the building of terminologies according to a particular interpretative 'genre'. The paper shows how textual 'genre' influences the functioning of conceptual relations patterns (e.g. the preposition avec is used to spot a meronymic relation). It demonstrates that this kind of analysis may help to refine the descriptions initially made by introspection.

  • Marie-Laure ELALOUF & Catherine BORÉ (Cergy-Pontoise / IUFM de Versailles)
    Construction et exploitation de corpus d'écrits scolaires
    (The building-up and exploitation of corpora of texts written in schools)
    The first part of this article explains which methodological issues need to be examined in order to establish and transcribe a large corpus of texts written by pupils, along with their school context. The second part of the article states the various lines of epistemological questioning which led to a second research project, i.e. questions about how to define types of school writing as well as a corpus and context, and about the necessary links between those three elements. A variety of software programs was used to analyse corpora which were not in conformity with orthographical and stylistical standards. Such a use seems possible, joined with qualitative analysis.

  • Martine ADDA-DECKER (Paris)
    Corpus pour la transcription automatique de l'oral
    (Corpus for automatic transcription of spoken texts)
    This contribution aims at giving an overview of automatic speech recognition research, highlighting the needs for corpora development. As recognition systems largely rely on statistical approaches, large amounts of both spoken and written corpora are required. In order to fill the gap between written and spoken language, speech transcripts need to be produced manually using appropriate tools. Methods and resources accumulated over the years now allow, not only to tackle genuine oral genres, but also to envision large-scale corpus studies to increase our knowledge of spoken language, as well as to improve automatic processing.

  • Olivier BAUDE (Orléans)
    Aspects juridiques et éthiques de la conservation et de la diffusion des corpus oraux
    (Legal and ethical aspects of conserving and diffusing corpora of spoken texts)
    The digitalization of spoken language corpora opens large perspectives for linguistics. However, the archiving and the exploitation of these spoken corpora raise new ethical and legal problems that the scientific community must take into account. This article presents the results of an interdisciplinary working group which wrote a Guide of good practices for the constitution, the exploitation, the archiving and the diffusion of spoken language corpora.

  • Paul CAPPEAU & Françoise GADET (Poitiers / Paris Ouest)
    L'exploitation sociolinguistique des grands corpus. Maître-mot et pierre philosophale
    (The sociolinguistic exploitation of large corpora. Key-word and stone of wisdom)
    The desire to make use of large collections of oral data is nowadays largely shared by linguists. At a time when such tools are becoming increasingly available for French, it is important to make sure that there is sensitivity to all of those factors which guarantee reliability in the different stages of obtaining data: clarification of the term ‘corpus’; reflection on approaches to the field and to orality, and on representativeness (both in terms of genres and numbers of speakers); data elicitation practices and transcription.

  • C. Pusch (Fribourg, Allemagne)
    Les corpus de linguistique romane en pays germanophones. Bilan et perspectives
  • C. Guillot & A. Lavrentiev & C. Marchello-Nizia (ENS-LSH Lyon / ENS LSH Lyon / ENS )
    Les corpus de français médiéval : état des lieux et perspectives
  • P. Cappeau & F. Gadet (Poitiers / Paris Ouest )
    Où en sont les corpus sur les français parlés?
Book reviews
  • Fransk grammatik. Till santale og forstäelse, de H. Andersen & D. Fristrup
    par C. Bozier
  • Négociations commerciales et objectifs spécifiques. De la description à l'enseignement des interactions orales professionnelles, de G. Mercelot
    par J. Binon
  • Pour enseigner et apprendre l'orthographe. Nouveaux enjeux, pratiques nouvelles, de D. Cogis
    par B. Habert
  • Instruments et ressources électroniques pour le français, de B. Habert
    par A. Tutin
