Current location - Education and Training Encyclopedia - Graduation thesis - System implementation of speech recognition
System implementation of speech recognition
The requirements for selecting recognition primitives in speech recognition system are accurate definition, enough data for training and universality. English usually uses context-sensitive phonemes to model, while Chinese homophones are not as serious as English, so syllable modeling can be used. The size of training data required by the system is related to the complexity of the model. The design of the model is too complex, which exceeds the ability of the training data provided, and will make the performance drop sharply.

Dictation machine: A large vocabulary, non-specific and continuous speech recognition system is usually called dictation machine. Its architecture is HMM topology based on the above acoustic model and language model. In training, the model parameters are obtained by the forward-backward algorithm of each primitive. In recognition, the primitives are concatenated into words, and a silent model is added between words, and a language model is introduced as the transition probability between words to form a circular structure, which is decoded by Viterbi algorithm. In view of the easy segmentation of Chinese, it is a simplified method to improve efficiency to segment first and then decode segment by segment.

Dialogue system: The system used to realize man-machine oral dialogue is called dialogue system. Limited by the current technology, the dialogue system is often a system oriented to a narrow field with limited vocabulary, and its topics include travel inquiry, reservation, database retrieval and so on. Its front end is a speech recognizer, which recognizes the generated N-best candidate or word candidate grid, and the semantic information is analyzed by the parser, and then the response information is determined by the dialogue manager and output by the speech synthesizer. Because the current system often has a limited vocabulary, we can also obtain semantic information by extracting keywords.