Speech synthesis systems require ways of storing the various types of linguistic information produced in the process of converting the input format (e.g. text) into speech. In this paper we present a new formalism for representing arbitrary linguistic data and show how this helps in building a speech synthesis system.
The are a number of design considerations which builders of synthesis architectures must take into account. We have listed these as follows:
Most importantly, the real purpose of the architecture is to allow speech synthesis algorithms to be written as easily as possible. It is therefore important that the architecture should be unobtrusive and provide the sorts of structures and information that synthesis algorithms need. All programs must deal with infrastructure issues such as data storage, file i/o, memory allocation etc. If left unchecked, algorithms can easily become bogged down with this sort of code, which becomes intertwined with the actual algorithm itself. A good architecture will abstract the infrastructure to such an extent that these aspects are hidden, so that synthesis algorithms can be easily written and read without other issues getting in the way. The interface should be easy to use, and make the writing of synthesis algorithms easier and quicker than by purely ad-hoc methods.