• Jean-Pierre DESCLÉS (Paris 4)
    Extraction d'informations de corpus composés de textes techniques
    (Information retrieval from corpora of technical texts)
    1997, Vol. II-2, pp. 19-33

    Technical texts present interesting and so far poorly researched linguistic characteristics. In this article, a research project is described, carried out by a multidisciplinary group of linguists and computer scientists, which aims at devising and realising prototypes of computer programmes for extracting information from technical texts. This research, as is illustrated by concrete examples, has led to computer programmes that have the form either of networks between concepts or of phrases taken from the analysed texts, and that are, if necessary, accompanied by automatically assigned semantic information.

  • Nathalie GASIGLIA (Lille 3)
    Faire coopérer deux concordanciers-analyseurs pour optimiser les extractions en corpus
    (The co-operation of Cordial Analyser and Unitex for optimising corpus extractions)
    2004, Vol. IX-1, pp. 45-62

    A well-delimited linguistic study - a semantico-syntactic analysis of the use of the verbs donner 'to give' and passer 'to pass' in the language of soccer - presents a useful framework for the development of quality reflection on documentary resources that can form an instructive concentrated electronic corpus and for the introduction of the notion of 'thematic corpus of high efficiency'. In order to explore a thus constructed corpus, two tools that generate concordances and provide syntactic analyses, Cordial Analyseur and Unitex, are put to the test. The description of their shared functionality and specificity, as well as of their weak points induced me to formulate an original proposal: to make these two tools function together so that their strategically used complementarity allows to formulate searches of certain complexity using confirmed analysis reliability and a capacity to mark every identified element in the generated concordances tagged in the XML language.

  • Gaston GROSS (Paris 13)
    Traitement automatique des domaines
    (Automatic processing of linguistic 'domains')
    1998, Vol. III-2, pp. 47-56

    The aim of this article is to present a practical implementation of the notion of a "domain" for the automatic processing of domain information and its application to information retrieval on the web. After a discussion of the problems that arise when assigning a text to a given domain, we will define a domain as a set of hyperclasses (such as human, concrete, locative, actions etc.) and of object classes, which correspond to the structure of a simple sentence into predicates and arguments. This semantic-syntactic information is already encoded in general and technical language dictionaries. In these dictionaries we distinguish between simple and compound words. On the basis of the dictionaries we will tag web pages. A first application has enabled the search engine Alta Vista to identify 29 languages. We have also found that the identification of compound word allows queries that lead to more precise and more rapid results, compared to search algorithms that work exclusively with the simple words of the compound expression. Information retrieval can thus be considerably improved by taking into account compound words. An application of this research will be the retrieval of texts in medical language that we are carrying out in the framework of the European Commission research project Webling.

  • Sarah LEROY (Paris X-Nanterre)
    Extraire sur patrons : allers et retours entre analyse linguistique et repérage automatique
    (Extraction on patterns: two-way traffic between linguistic analysis and automated identification)
    2004, Vol. IX-1, pp. 25-43

    We present here an automated identification of proper name antonomasia in tagged texts. First, we compare manual and computed identification, describing the system's cogs as well as the methods and tools we used ; we point out that the automated process is better concerning reliability.After having exposed how capabilities and limits of automated location can influence linguistic work, we compare this rather old (2000) work with new tools now usable by linguists, e.g. the query ability on a subset of tagged texts in the Frantext database.

  • Patrick LEROYER (Aarhus, Danemark)
    En termes de vin : lexicographisation du guide œnotouristique en ligne
    (In terms of wine: lexicographisation of an on-line tourist guide for wine-lovers)
    2009, Vol. XIV-2, pp. 99-116

    Online tourist guides are information tools communicating destination image and specialised knowledge at the same time. They feature a large variety of lexicographic structures including word lists, articles, conceptual schemes, indexes and registers, search options on keywords, internal and external cross references etc. This is by no means surprising in so far as what is needed is effective data access in order to extract information – precisely in the same way as in lexicography. The functional thesis we defend in this article is that lexicographisation in a user perspective can improve the access process. Taking œnotouristic online guides as a case in point, we will examine different user situations leading to consultation, in particular the need for experiential information, in which users simply wish to improve the conditions of their œnotouristic experience. We will then formulate theoretical proposals aimed at ensuring better interaction of lexicographic functions, data presentation and access possibilities.