Normalization of Non-Standard Words with Finite State Transducers
for Russian Speech Synthesis

Artem Lukanin

ANALYSIS OF IMAGES, SOCIAL NETWORKS, AND TEXTS

April, 9-11th, 2015, Yekaterinburg

Normalization of Non-Standard Words
with Finite State Transducers
for Russian Speech Synthesis

Artem Lukanin

Text Preprocessing for Speech Synthesis

Normalization of Non-Standard Words

Existing Russian Normalization Systems

Normatex

Test Parallel Corpus

Finite State Transducers

Cardinal Numbers

5-9ncard

2x-9xncard

NUM-5-9-ncard

units

Ordinal Numbers

Ordinal Numbers

Acronyms

ФГБОУ ВПО «ЮУрГУ» (НИУ)

ФГБОУ ВПО «ЮУрГУ» (НИУ) → Федеральное государственное бюджетное образовательное учреждение высшего профессионального образования «Южно-Уральский государственный университет» (Научно-исследовательский университет)

Acronyms

Graphic Abbreviations

Results

Token type Tokens Correct Errors Recall Precision
Numbers 977 920 53 94.17% 94.55%
Acronyms and initials 431 355 40 82.37% 89.87%
Graphic abbreviations 379 232 4 61.21% 98.05%
Total 1787 1507 97 84.33% 93.95%

The work is still in progress

References

  1. Reichel, U.D., Pfitzinger, H.R.: Text preprocessing for speech synthesis (2006)
  2. The Festival Speech Synthesis System, http://www.cstr.ed.ac.uk/projects/festival/
  3. Dutoit, T.: An introduction to text-to-speech synthesis (Vol. 3). Springer Science & Busi-ness Media (1997)
  4. Russian Grammar [Русская грамматика]. Vol. 1. Nauka, Moscow (1980)

References

  1. Rosental, D.E., Golub, I.B., Telenkova, M.A.: The Modern Russian Language [Современный русский язык]. Airis-Press, Moscow (1997)
  2. Rosental, D.E., Djandjakova, E.V., Kabanova, N.P.: Reference Book on Orthography, Pronunciation, Literary Editing [Справочник по правописанию, произношению, литературному редактированию]. CheRo, Moscow (1998)

Normatex — Russian text normalization

github.com/avlukanin/normatex

Artem Lukanin

Slides: artyom.ice-lc.com/slides/normatex

Powered by Shower