Agile MaryTTS Architecture for the Blizzard Challenge 2018

Jan 1, 2018·

Sébastien Le Maguer

Ingmar Steiner

Francesco Tombini

Pradipta Deb

Moitree Basu

Insa Kröger

· 0 min read

PDF Cite URL

Abstract

In this paper, we present the MaryTTS entry for the Blizzard Challenge 2018. Our participation is motivated by the use of a new system architecture whose development began three years ago. To this end, we designed a fully modular pipeline which incorporates native modules and distributed processes, including a new grapheme to phoneme conversion (G2P) component. The back-end also supports this modularity, as the fundamental frequency (F0) is predicted separately, based on a model of its dynamics. A segmental synthesizer using phonetic information and the predicted prosody is then used to produce the final signal. Even though our results are disappointing, the participation has shown that our architecture is functional and that we can now further develop interfaces to several open-source backends. This will hopefully strengthen the role of MaryTTS as a framework for research in speech synthesis.

Type

Journal article

Publication

Blizzard Challenge

Last updated on Jan 1, 2018

MaryTTS Text-to-Speech Blizzard Challenge Speech Synthesis Natural Language Processing Deep Learning Artificial Intelligence Machine Learning

← Interpretable privacy with optimizable utility Feb 1, 2021

Hybrid of particle swarm optimization and simulated annealing for multidimensional function optimization Jan 1, 2014 →