• Martine ADDA-DECKER (Paris)
    Corpus for automatic transcription of spoken texts
    2007, Vol. XII-1, pp. 71-84

    This contribution aims at giving an overview of automatic speech recognition research, highlighting the needs for corpora development. As recognition systems largely rely on statistical approaches, large amounts of both spoken and written corpora are required. In order to fill the gap between written and spoken language, speech transcripts need to be produced manually using appropriate tools. Methods and resources accumulated over the years now allow, not only to tackle genuine oral genres, but also to envision large-scale corpus studies to increase our knowledge of spoken language, as well as to improve automatic processing.

  • Martine ADDA-DECKER, Cécile FOUGERON, Cédric GENDROT, Lori LAMEL & Elisabeth DELAIS-ROUSSARIE (Paris)
    French ‘liaison’ in casually spoken French, as investigated in a large corpus of casual French speech
    2012, Vol. XVII-1, pp. 113-128

    In this paper, the realisation of the French Liaison is investigated in a large corpus of casual speech. Considering that casual speech gives rise to a large range of pronunciation variants and that overall temporal reduction increases, one may hypothesize that French liaison tends to be less productive in this speaking style. We made use of automatic processing such as automatic speech alignments to evaluate when liaison is realized in the NCCFr corpus. Realized liaisons were examined and measured for the most frequent liaison consonants (/z/, /n/ and /t/) as a function of a liaison sites classified as mandatory, optional or forbidden. The relation between speech rate and liaison realization is also examined.