Currently, Professor Leticia Arco García of the Central University of Las Villas (Cuba) is visiting our research group. As a member of the Computer Science Department of the Artificial Intelligence Lab, she is an expert within the domain of text mining. Given her expertise and our curiosity, we asked her if she could give an introduction seminar. And so she did, by organizing not one but two seminars, one on text mining and one on opinion mining.
Full disclosure: Before the sessions I was a complete layman on text mining. I am familiar with traditional data mining and the statistical analysis of structured data. Now I was looking forward to extend my expertise to less structured data. But what I discovered was not what I expected at all!
What struck me the most was the complexity of this domain. If you thought data pre-processing was a large and time-consuming step, think again. The number of steps and design decisions to take in text mining analysis are tremendous. In my opinion, this makes text mining exploratory and hard to validate empirically. The fact that classified data often lacks seems to complicate direct verification.
Despite the complexity, text mining sparked my interest. I look forward to experiment a bit with some of the techniques Professor Arco introduced to us. Luckily, she provided a nice slide deck with an extensive overview of different steps and tools. For those interested, you can find her slides here. See you down the rabbit hole.