Text Mining in a Nutshell

Time: Thu, Apr 17 at 3:00 pm – 4:00 pm

Location: 129 Hesburgh Library

At its core, text mining is about discovering patterns and anomalies in sets of written documents. Invariably, the process begins with the creation of a digitized corpus of materials. The content of the corpus may then be cleaned up, marked up, and organized, thus making it easier for computers to read and parse. From there, "tokens" usually words - are identified, counted, and tabulated. These techniques - usually employed under the rubric of natural language processing - form the basis for more sophisticated applications. This one-hour workshop familiarizes participants with the fundamentals of text mining - what it can do and what it can't.