Current location - Education and Training Encyclopedia - Graduation thesis - A new breakthrough in speech recognition: Microsoft AI beats human experts.
A new breakthrough in speech recognition: Microsoft AI beats human experts.
Transcribing human conversations into words has always been a nightmare for machines. Even if the voice file is of high quality and there is no background noise, the algorithm still tries to distinguish different voices, interruptions, hesitations, corrections, and subtle differences in lengthy conversations.

A new paper from Microsoft Research Institute claims that their voice transcription technology is superior to that of human dialogue transcription experts, even though their texts have been reviewed by another person. The research team did not attribute this achievement to the breakthrough of algorithms or data, but adjusted the existing AI architecture.

In order to test whether their algorithm can compete with humans, researchers must first determine a baseline. Microsoft found a third party, and they have confirmed that 100% correctly transcribed audio. The test is divided into two stages: one person dictated the audio, and the second person listened to the audio and corrected the transcription errors. After comparing the correct texts, the error rates of professionals are 5.9% and 1 1.3% respectively.

After learning human speech for 2000 hours, Microsoft dictated the same audio, with error rates of 5.9% and 1 1. 1% respectively. The difference of 0.2% is that 12 errors are missing.

Microsoft's next challenge is to make this level of speech recognition work in more noisy environments, such as cars or parties. The line of sight is extremely important for Microsoft, far beyond the transcription itself.

This research is an important step for Microsoft to make human-computer conversation smoother and easier. If a computer can't understand a person's drawing, it will be more difficult for it to complete instructions or answer questions. This is the basis for Microsoft to make other breakthroughs. Earlier this year, Microsoft CEO Satya? Nadella said that artificial intelligence is the future of the company and the ability to talk is its cornerstone.

Despite its success, there is still a huge difference between artificial intelligence system and human transcriber: it can't understand subtle changes in dialogue, such as "hmm". When "uh-huh" appears, it is often a person who thinks in a conversation or asks the other person to continue talking, such as "uh-huh". Professional human transcribers can notice whether this is hesitation or certainty, but machines will ignore these tiny clues. They don't understand the meaning, and they don't know why they make such a sound.

Text: Xu Shu/Fried Egg Network

About fried eggs: senior novelty pushing chicken. Website jandancom, WeChat official account: fried eggs (WeChat official account ID: jandancom, without me)