|2008-1||Extraction d'information : l'apport de la linguistique|
(Information extraction: the contribution of linguistics)
|Click the book to abstract!||This issue has been put on line in its integrality on the Cairn portal: Cairn.info|
Anne CONDAMINES & Thierry POIBEAU (Toulouse 2-Le Mirail / CNRS / Paris 13)Linguistique et accès automatisé à l’information : un bilan(Linguistics and automated access to information: drawing up the balance)pp. 5-8
Christina LIOMA & C.J. Keith VAN RIJSBERGEN (Louvain, Belgique / Glasgow, Grande-Bretagne)Part of speech n-grams and Information Retrievalpp. 9-22
Efforts to use linguistics in information retrieval (IR) were initiated in the 1980s, and intensified in the 1990s, reporting performance benefits (see the overviews by Smeaton 1986 & 1999, Karlgren 1993, and Tait 2005). Later, these efforts decreased: baseline system performance improved, and the cost associated with linguistic processing was not worth the small benefits over the already improved baselines (Tait, 2005). At present, most research on linguistics for IR tends to be geared towards domain-specific IR applications that seem to benefit more from linguistics, like question-answering (Tait & Oakes 2006). Although such applications are important, they should not limit the scope of research into linguistics for IR. In this work, we present an alternative use of linguistics, part of speech information in particular, to compute a term weight of informative content. This term weight is a novel application of linguistics to IR, and can benefit retrieval performance of general IR systems.
Piek VOSSEN (Amsterdam, Pays-Bas)Linguistic knowledge for more precision, richer answers and flexible systemspp. 23-39
Irion Technologies is a language technology company at Delft (The Netherlands) that incorporates linguistic knowledge to build new generations of information systems: conceptual retrieval, automatic extraction of terms and ontologies and open-domain dialogue systems. These systems are multi-lingual and cross-lingual and combine statistical, machine learning techniques with linguistic techniques. We have carried out evaluations of some of these systems. For information retrieval, we found advantages with respect to standard statistical approaches within special experimental settings that focus on ambiguity. Term extraction is clearly benefiting from rich linguistic knowledge and resources. Dialogue systems depend on the communicative models and systems that also require deep linguistic processing. From our perspective, language technology is definitely helping to make applications better and necessary to develop new applications.
Pierre ZWEIGENBAUM, Brigitte GRAU, Anne-Laure LIGOZAT, Isabelle ROBBA, Sophie ROSSET, Xavier TANNIER, Anne VILNAT & Patrice BELLOT (CNRS-LIMSI)Apports de la linguistique dans les systèmes de recherche d'informations précises(Contributions of linguistics in the search for precise information)pp. 41-62
Searching for precise answers to questions, also called “question-answering”, is an evolution of information retrieval systems: can it, as its predecessors, rely mostly on numeric methods, using exceedingly little linguistic knowledge? After a presentation of the question-answering task and the issues it raises, we examine to which extent it can be performed with very little linguistic knowledge. We then review the different kinds of linguistic knowledge that researchers have been using in their systems: syntactic and semantic knowledge for sentence analysis, role of “named entity” recognition, taking into account of the textual dimension of documents. A discussion on the respective contributions of linguistic and non-linguistic methods concludes the paper.
Horacio SAGGION (Sheffield, Grande-Bretagne)Automatic Summarization: An Overviewpp. 63-81
A summary is a condensed version of a document. It contains the most relevant information in context found in the source document. Automatic summarization is the process of producing text summaries by computer. Although research on automatic summarization started in the late 50s, the increasing volume of electronic text and recent international evaluation efforts have fuelled research in this field. The paper gives an overview of basic concepts in automatic text summarization together with examples of available tools, systems, evaluation, summarization experiments, summarization in practical settings, and discusses the role of linguistic information in the summarization task.
Aurélie PICTON, Cécile FABRE & Didier BOURIGAULT (Toulouse / Toulouse)Méthodes linguistiques pour l’expansion de requêtes. Une expérience basée sur l’utilisation du voisinage distributionnel(Linguistic methods for expanding queries. An experiment based on the use of distributional closeness)pp. 83-95
This paper reports the results of a query expansion experiment making use of semantic data produced by a program that automatically performs the distributional analysis of a large corpus. This method allows us to take into account relations that go beyond classic lexical relations. We look at these results from both a global and a local perspective, showing that such broader and corpus-based semantic relations may be useful, provided that the expansion is controlled by a filtering technique based on the texts of the database (distributional feedback), and by an analysis of the query linguistic characteristics.
Marie-Claude L'HOMME (Montréal, Canada)Ressources lexicales, terminologiques et ontologiques : une analyse comparative dans le domaine de l’informatique(Lexical, terminological and ontological resources: a comparative analysis in the field of computer science)pp. 97-118
The automatic or semi-automatic processing of texts often requires an access to external resources (e.g., lexical and terminological databases or ontologies), which can differ greatly in terms of both form and content. External resources can be used to assist other forms of processing and are expected to supply linguistic information - especially semantic information - that is not explicitly expressed in running text. In this article, the potential interest of lexical and terminological sources as well as ontologies for analyzing terms in specialized texts will be examined and evaluated. The evaluation takes into account the contents and the descriptive perspective of sources. We will focus on the field of computing assuming that this domain is representative of many others. The presence and description of 75 specific terms has been analyzed in six different sources. Results show that sources do not take into account the entire set of linguistic properties of terms.
Mathieu VALETTE & Monique SLODZIAN (ATILF / INaLCO)Sémantique des textes et Recherche d’Information(Text semantics and Information research)pp. 119-133
The aim of this paper is to set out some of the proposals of text semantics for information retrieval - more specifically for content-based text classification. To start with, we will assess the contribution of linguistics to information retrieval by means of natural language processing techniques. This will give us an opportunity to look at the achievements that have been secured and to examine standard linguistic approaches to information retrieval. In particular, we will focus on the slow emergence of text considerations as the web expands. We intend to show that the ever-greater attention raised by text linguistics comes at a critical juncture in the evolution of information retrieval on the web. We will show how text categorisation is a departure from traditional approaches. The second and third parts will go into greater detail and examine the way text linguistics can apply to information retrieval. We will first lay out the methods used within the framework of a project aiming to filter racist web texts; we will then introduce some of the research currently conducted in the field of textual data analysis, which, in the near future, is liable to improve the methodology of information retrieval.
Colons, Créoles et Coolies. L'immigration réunionnaise en Nouvelle-Calédonie (XIXe siècle) et le tayo de Saint-Louis, de K. Speedy
par R. Chaudensonpp. 134-135
La phraséologie dans tous ses états, de C. Bolly, J. Klein, B. Lamiroy (éds)
par M. Pecmanpp. 135-137
Bibliographie thématique et chronologique de Métalexicographie (1950-2006), de C. Boccuzzi, M. Centrella, M. Lo Nostro & V. Zotti
par T. Fontenellepp. 137-138
Grammaire rénovée du français, de M. Wilmet
par C. Corblinpp. 138-140
Practical Lexicography: a reader, de T. Fontenelle
par G. Williamspp. 140-143