Text Analytics: From Quotation Search to Cancer Treatment


170704_ScanDoks Unsplash TextAnalytics no text.jpg


Have you ever wanted to find something you read, but were not able to? Maybe you seem to recall the book or magazine where you read the piece, maybe you even remember if it was on the left or on the right, at the top of the page or at the bottom. But still, you can’t find it again. The struggle is real. Especially if you wanted to quote that specific piece in something you’re presenting, such as an essay, an oral presentation, or tell your friends as a fun fact.

Let me tell you about my experience. Ten years ago, I was studying for my exit examination. My friend and I were going through the art history program. Expressionism. My turn. 

I began the discussion and confidently maintained that the term for this art movement were coined during an art exhibition when someone asked a painter if his pieces of art belonged to Impressionism. The artist replied that, no, he was an Expressionist. The year was... the art exhibition was... and the artist was... Blank slate. I took the book to read again about this event. Where was it? Maybe it was in the other manual? Despite all my efforts, the quote was nowhere to be found. Why didn’t I look for it on Google? Well, because ten years ago search engines were not as good as today’s, the Internet did not contain all human knowledge (and more), and eBooks were not popular yet. I passed the exit exam, but this Expressionism thing left me with a nasty taste in my mouth.

The Subtle Art of Deeply Understanding a Text Without Reading It

Surely, nowadays web search has made huge progresses, and many more texts are easy to obtain in electronic form. But text mining and natural language processing are the real game-changers.

The text analytics market is expected to grow from $3.97 billion in 2017 to $8.79 billion by 2022

These two fields help turning “text data into high-quality information, or actionable knowledge” that is more useful than the initial raw text, as Prof. ChengXiang Zhai from the University of Illinois at Urbana-Champaign reports in his MOOC. Therefore, these tools do not only minimize the human effort, similarly to other automations, but they also provide additional knowledge present in the text that would have otherwise been inaccessible or difficult to retrieve. In particular, the high-quality information we can extract can be of many different kinds, depending on the initial sources and on the way we want to apply them. Here you can read about a few noteworthy examples:

  1. (Bio)medical text mining to extract novel knowledge from scientific texts and the scientific literature. This is often used in cancer research and treatment, as the number of published articles in cancer research has grown immensely, reaching over 183,000 publications in 2014.
  2. Identifying patterns of literary style in the humanities. You may remember how an algorithm developed by Patrick Juola revealed that The Cuckoo’s Calling was actually written by J. K. Rowling under the pseudonym Robert Galbraith.
  3. Text mining legal research. Going through the corpora of case law and legal search material is a pain in the neck, but new tools are being developed to make the search easier and user-friendlier.

In addition, another (for me personally) great application is that now, ten years later, I can finally tell you that in 1911, during a meeting of the Berlin Secession, someone from the jury asked: “Is this still Impressionism?” and an artist replied: “No, this is Expressionism!”. What’s more, I probably read this anecdote in “L’arte svelata”, by G. Nifosi, section 17.2.1. What about you? In which ways can text mining and analytics help you with your business or in your daily life? If you’re interested in finding out, see what Balzano can do and contact us!

About The Author