next up previous
Next: A finite or infinite Up: Unit Selection Synthesis Previous: Synthesizing in style

Unit size

Another way to help address the coverage question is to vary the size of the units we are selected. The smaller the units the easier it might be to have coverage over the whole acoustic phonetic space as each units may provide better sharing of contexts. Smaller units such as half-phones are used in [8] or even smaller units based on HMM states as typified by [9] will allow better coverage with a smaller amount of total speech.

Most systems use a fixed size unit, though longer contiguous sections may be selected from the database as a consequence of the selection algorithm. Some system however explicitly allow for mixed sized units. Bonn's HADIFIX system was more explicit in its varying unit length including consonant clusters sized units as well as single phone units [10].

Phonological Structure Matching [11] is explicit in its selection of non-uniform lengthed units. The database is labelled with tree structures. An utterance to be synthesized is also labelled with a tree structure. The database is then searched top down for the largest sub-trees that are contained within the desired utterance. Thus longer units of the database can be selected. There are two advantages here, first selecting longer units will mean less joins which should mean less chance for bad joins. Second, because there are less units being selection this should be computationally more efficient.

In almost all of the current unit selection synthesis systems very little prosodic or spectral modification is done to the selected units. The major consequence of this is that the resulting synthesized utterances can mostly sound very good, and when they do sound good they sound as if the person who recorded the database said the new utterance.


next up previous
Next: A finite or infinite Up: Unit Selection Synthesis Previous: Synthesizing in style
Alan W Black 2002-09-30