The Hidden Information that can be Extracted from Texts

The above title might sound boring, but it’s probably the area of “Artificial Intelligence” that I’m most excited about. If you squint just right, you can probably understand what I mean when I say that Texts are generated by Topics filtered through the human mind. What if we could uncover some of those Topics by processing the Texts in some way? What if we could also uncover information about the human mind that is evaluating those Topics?

Here’s a great link that gives some insights into the questions above in bold. It comes from Radiolab (NPR) and presents a couple of really interesting examples:

Agatha Christie.png — https://commons.wikimedia.org/wiki/Category:Agatha_Christie#/media/File:Agatha_Christie.png

The story of Agatha Christie’s novels. A researcher conducted word frequency analysis (probably something like the tf-idf technique) and found that Agatha’s first 72 novels had very similar statistics around word choice and vocabulary. But the 73rd novel showed a huge shift that revealed something that was likely happening in Agatha Christie’s brain. This has a lot of interesting implications! Click the link above to listen to or read the story on Radiolab.
A similar story about an amazing study done by the University of Minnesota on something over 600 nuns over the years that involved memory capacity and status assessments. At some point the dataset of “entrance essays” from each of the sisters was discovered and the researchers learned that information and grammar in the essays — written in the sisters’ youth — had correlation to memory issues in their older ages. This is correlation, not causality, of course, but still fascinating.

These are the kinds of analyses that I like to do on all sorts of text. There are techniques that allow me to uncover hidden (or “latent”) topics from large quantities of text and sometimes what these reveal is spooky! There are all sorts of other kinds of analyses like the word frequency ones from the Agatha Christie study as well as from the grammar and idea density metrics used in the University of Minnesota study. This may well be one of the most useful near-term applications of AI that we have today, one that is even able to reveal hidden truths about our own selves.

Leave a Reply Cancel reply