Commercialising Watson has clear and exciting implications for the enterprise, says IBM’s Craig Rhinehart
In February 2011, IBM’s supercomputer project Watson won a special series of the American gameshow Jeopardy, trouncing the two best ever human players.
Even though it signed off one show by guessing that Toronto was an American city, the ambitious supercomputer project demonstrated just how far, with a great deal of time and money, natural language processing (NLP) could go.
Watson was mocked on Twitter for the Toronto gaffe but, to place it in context – which is only right since we’re talking about NLP – the fact is that Watson was way ahead of its opponents and could not lose. And it knew it.
IBM’s Craig Rhinehart, director of ECM product strategy, compliance and discovery says that the technology in Watson, ‘Deep Q&A’ (question and answer), has immediate and obvious implications for the enterprise in a world, where data is growing at an enormous rate and reaching unwieldy, daunting proportions.
That data, for example, could be generated from innumerable contact points with customers – perhaps hours and hours of call centre conversations, written communications, email, website submissions and even Facebook.
This unstructured data – in human language, not computer language – represents about 80 percent of the data hoarded by the enterprise, according to Rhinehart.
And there is valuable information contained within that NLP can unlock, revealing trends and insights to be acted upon.
“There is a tremendous amount of innovation around this notion of unstructured information and we think of Watson as a moment in time, as a grand challenge breakthrough,” he said.
Context is everything in natural language – Jeopardy is full of puns and wordplay and this fluid interaction between words is challenging for a computer that relies on precise instructions.
“Watson processes raw information, text and natural language so we can understand what’s in there. With Watson we then put that info into a knowledge base – for want of better word, not a knowledge base per se, but something that manages and stores unstructured info so it’s retrievable in the right context.
“Once you have this built up set of knowledge you can ask questions and get back answers with varying degrees of confidence.”
Thirst for knowledge
Watson’s knowledge base for Jeopardy contained around 200 million pages of information, including the complete works of Shakespeare and Wikipedia. Hundreds of algorithms scour its knowledge base for possible answers.
Then, hundreds more algorithms search for supporting evidence and yet more algorithms score each answer based on the supporting evidence resulting in a confidence rating.
Continued on page 2