Microsoft's speech recognition engine is now as good at recognising speech as a human, the company has claimed.
The tech giant has been working on its speech transcription systems, which are used in products like its digital assistant Cortana, with a goal of achieving 'human parity' – the point at which an AI can interpret speech with the same error rate as a real person.
"Today, I'm excited to announce that our research team reached that 5.1 percent error rate with our speech recognition system, a new industry milestone, substantially surpassing the accuracy we achieved last year," Microsoft technical fellow Xuedong Huang wrote in a blog post.
"Reaching human parity with an accuracy on par with humans has been a research goal for the last 25 years. Microsoft's willingness to invest in long-term research is now paying dividends for our customers in products and services such as Cortana, Presentation Translator, and Microsoft Cognitive Services. It's deeply gratifying to our research teams to see our work used by millions of people each day."
Both systems are benchmarked against the Switchboard corpus, a dataset of recorded telephone conversations that speech research technologists have been using for more than two decades to measure the capability of transcription systems.
Huang said Microsoft managed to reach this milestone, which represented a 12 percent increase over the system's performance last year, by modifying the neural net-based language and acoustic models it uses, as well as by using the entire history of a conversation to allow the system to predict what the next word is likely to be by using context.
The next focus for Microsoft's speech recognition research will be to improve the system's ability to recognise accented speech, dialects and conversations in noisy environments. The company will also work on improving its ability to understand the meaning and intent behind speech, saying: "Moving from recognizing to understanding speech is the next major frontier for speech technology."