Grammatical labeling of corpora of spoken language: problems and perspectives
1999, Vol. IV-2, pp. 113-133
The use of transcription conventions that attempt to code the specific properties of speech, such as false starts, hesitations, and repetitions, and do not rely on the usual written punctuation, suggests that the grammatical tagging of transcribed oral corpora might be a very difficult undertaking. Developing speech-specific taggers, although desirable, would be a long-term project. In the experiment reported in this article, a spoken corpus was tagged using a system designed for written text, along with some appropriate pre-editing and post-editing programs. Quite unexpectedly, the results for speech were excellent, almost as good as those previously obtained for writing. This discovery allows us to foresee the rapid compilation of large tagged spoken corpora for French.