This work is done within the FestVox voice building framework [1], which offers general tools for building unit selection synthesizers in new languages. The unit selection paradigm is a cluster based technique where units of the same type (phones, diphones, syllables or whatever) are clustered based on their acoustic differences [2]. The clusters are then indexed based on high level features such as phonetic and prosodic context. Voices generated by this system may be run in the Festival Speech Synthesis System [3].
FestVox offers a language independent method for building synthetic voices, offering mechanisms to abstractly describe phonetic and syllabic structure in the language. It is that flexibility in the language building process that we will exploit in this paper.