The recent digital revolution has moved almost every form of information from the paper world to the digital world. While this change makes communication, information creation and data access quick, easy and cheap there are still areas that need improvement.
While today we may store almost everything digitally we may not have a detailed method behind it. It is estimated that around 80% of all data stored is in a plain text format.
Plain text is one of the most difficult types of data to work in terms of sorting, searching and classifying data. The need to sort and classify plain text documents has spawned an entire software market which is attempting to find the best way to deal with text data.
Text Mining attempts to take unorganized clumps of text data and allow users to sort/search through it in an effort to return some sort meaningful information from what potentially be meaningless word. While this sounds easy in theory, in practice sorting/searching through text data is one of the most difficult aspects of not only data mining but also computer intelligence as a whole.