Microsoft Creates Speech Recognition Tech With Human Accuracy

As News Editor of Silicon UK, Roland keeps a keen eye on the daily tech news coverage for the site, while also focusing on stories around cyber security, public sector IT, innovation, AI, and gadgets.

Follow on:

The system can transcribe conversational language as accurate as professional transcriptionists

Microsoft researchers have announced details on the company’s latest speech recognition technology, which is claimed can transcribe conversational speech as accurately as a human.

The team of researchers and engineers at Microsoft’s Artificial Intelligence (AI) and Research division noted that the speech recognition system they developed makes the same or fewer errors than professional transcriptionists.

They reported a word error rate of 5.9 percent, about equal to that of people asked to transcribe the same conversation the system was tested against.

AI milestone

cortana“We’ve reached human parity,” said Xuedong Huang, the company’s chief speech scientist. “This is an historic achievement.”

It’s a bold claim, but when speech recognition with the likes of virtual assistants such as Cortana and Apple’s Siri can be hit and miss, such improvements can take speech recognition tools and smart software from being gimmicks and nice-to-have features into genuinely useful day-to-day tools.

It is also indicative of the rapid evolution of AI and smart systems, which makes concerns that the impact of intelligent machines and software needs to be considered sooner than later.

“Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible,” said Harry Shum, the executive vice president who heads the Microsoft Artificial Intelligence and Research group.

To get transcription parity with humans, Microsoft made use of deep learning neural networks, which replicates in part how the human brain learns, to train the system to recognise patterns in sounds rather than be trained manually to make sense of each sound.

Using Microsoft’s Computational Network Toolkit the researchers were able to process deep learning algorithms across multiple computers running graphics processing chips for parallel processing, an important technique needed for crunching the vast amount of information a neural network needs to ingest. This allowed the researchers to carry out their testing and training at quite a lick.

While the researchers have some way to go before they can make sure the speech recognition technology works well in real-world settings with background noise, the current discoveries are likely to find their way into existing speech recognition features found in Windows and Xbox platforms.

“This will make Cortana more powerful, making a truly intelligent assistant possible,” Shum said.

Microsoft’s AI efforts are timely given how Google is looking to make waves with its AI-powered Assistant found in its new Pixel smartphones, and the search company has figured out how to make its speech-based technology replicate human speech.

What do you know about Windows 10? Try our quiz?