There's Gold in Them Thar Documents!
An investor publication recently ran this story:
Shares of Microsoft Corp. (MSFT) slid 2.83% to $254.56 Wednesday, on what proved to be an all-around dismal trading session for the stock market, with the S&P 500 Index (^GSPC) falling 0.08% to 4,183.18 and Dow Jones Industrial Average (^DJI) falling 0.48% to 33,820.38. The stock's fall snapped a three-day winning streak. Microsoft Corp. closed $8.63 short of its 52-week high ($263.19), which the company achieved on April 27th.
Do you notice anything odd about this text passage? Have a look again. What is odd about this paragraph is that it was written automatically by a machine; no “person” wrote this story. This machine of course has access to stock market data, but still, it’s an impressive and easily understandable way to tie these numbers together. Similarly, a tremendous number of videos on YouTube are narrated by a machine. You would not know it but the machine is reading text and speaking to the video with almost flawless inflection and emphasis on words and sentences. Of course, the machine can speak in any language.
Machines can go the other way around and read text and then act on it too. This brings us to all the information we have in text: emails, websites, and documents and so on. Imagine how many customer support emails and messages are received every day? Or how many different documents are involved in court cases. What can a machine do with all this text?
Thinking more about those customer support questions, what if a machine could automatically tag each incoming question and route it to the correct agent for response? Is it a payment issue and needs to be routed to finance? Is it in a different language that needs to be routed to a native-speaking support agent? Is the question urgent and needs to be prioritized above others? What if we could automatically detect the frustration of a valuable customer? What about the actual responses to customer questions from the support agents themselves, how timely and effective are they? What if we could match the question to a valuable response automatically, and seamlessly?
Another example might be company, and product, reviews (perhaps on social media) and their impact on a company's brand image and reputation. Would it be valuable to understand the things that customers value or criticize? What if we could automatically group reviews into different topics like design, price, features, performance, etc. and then act on them in various divisions of the company. What if we could determine the emotional tone of feedback and act accordingly, perhaps we can take action before an issue turns into a crisis. What if we could identify influencers and brand advocates and learn how to promote them?
The machine technology we are talking about here is called Text Mining and it is a large part of what Bluewire does. In this world, if data is the new currency then text is the new gold. By some estimates 80% of all information is locked away in text documents, the problem is that it is difficult to extract any value from it. A popular example of text mining is IBM’s Watson program. Watson beat human players in the game of Jeopardy. How did Watson use text mining to do that? Watson has access to millions of text documents; it even has access to all the content on Wikipedia. When a Jeopardy question is presented to Watson it searches many documents using text mining techniques and arrives at an answer that best fits all the information available. It turns out that this is a winning strategy that allows it to answer quiz show questions with winning accuracy.
The good news here is that the technology to extract valuable information from text documents is now becoming available. While there isn’t a spreadsheet program for analyzing text yet, you get the idea. Bluewire is currently engaged in text mining with a focus on championing the reputation of motor carriers. To achieve this, we look to how a motor carriers’ reputation might be impacted and repaired, and how reputation “gaps” can be addressed.