A not so gentle introduction to text and opinion mining

Currently, Professor Leticia Arco García of the Central University of Las Villas (Cuba) is visiting our research group. As a member of the Computer Science Department of the Artificial Intelligence Lab, she is an expert within the domain of text mining. Given her expertise and our curiosity, we asked her if she could give an introduction seminar. And so she did, by organizing not one but two seminars, one on text mining and one on opinion mining.

Full disclosure: Before the sessions I was a complete layman on text mining. I am familiar with traditional data mining and the statistical analysis of structured data. Now I was looking forward to extend my expertise to less structured data. But what I discovered was not what I expected at all!

What struck me the most was the complexity of this domain. If you thought data pre-processing was a large and time-consuming step, think again. The number of steps and design decisions to take in text mining analysis are tremendous. In my opinion, this makes text mining exploratory and hard to validate empirically. The fact that classified data often lacks seems to complicate direct verification.

Despite the complexity, text mining sparked my interest. I look forward to experiment a bit with some of the techniques Professor Arco introduced to us. Luckily, she provided a nice slide deck with an extensive overview of different steps and tools. For those interested, you can find her slides here. See you down the rabbit hole.

One thought on “A not so gentle introduction to text and opinion mining

  1. Dear Leticia, thank you for sharing your knowledge with us. The overview has been very helpfull to get a good insight in to thisvery interesting domain.

    Looking forward to discuss this further.


