This week’s practicum exercises made the left side of my brain hurt (OK, admittedly, my whole head hurt, I began to feel the room spin and as you are reading this, I am probably still holding my head in my hands!) but since I identify more with “right-sided brain” thinking, I thought exercising the left made it hurt more.
Nevertheless, I persevered through the exercises! I know that I am not getting the full capability out of the text mining resources. So I will be very excited to see what we do in class on Monday. I began with the introductory video to Voyant, and read through some of the instruction guides to get a better sense of how the website worked. Then I uploaded a dissertation with which I have been working on for our first project (“Evaluation of Noisy Transcripts for Spoken Document Retrieval”). I thought I would have some fun and see what I could gain from mining the document for key phrases or words.
Unfortunately, I do not think I accomplished much. The Wordle picture was interesting, finding mostly research-oriented words like “query”, “segmentation” and “analysis”. Voyant also helpfully let me know where those words appeared in the document, so I could get to them quickly.
I played a bit with frequency of words appearing in the text. Again, I found it interesting to see how frequently, and where, in the text such words appeared, but I did not gain anything from this additional information. Of course, I did not have anything specific in mind when I went search. I was just having fun. Perhaps this colored my perspective on the tool. Like anything in research (or life, for that matter), it appears that having a direction and goal in mind helps shape the final product significantly.
Next, I decided to play with Google Ngram for a bit. The articles made that sound more fun, and easier to use. So I played. Full disclosure – I am not particularly creative, so the most interesting things I did were enter in single words. I realize that Ted Underwood said the most interesting things would come from 4- or 5-Ngrams. However, I am just not that good at identifying those phrases (tried a few times and failed miserably). So I reverted back to single words, like “dog”, “cat”, “war” (peaking around pre-1920 and post-1940 – go figure) and “peace” (also peaking around pre-1920 and post-1940). This was far more interesting than I thought it would be!
In case you were wondering, “happy” peaked at the beginning of the period covered (1800) and has been on the decline ever since. “Sad” has gone up and done, peaking around 1860 (curious) and then dropping again to about where it was in 1800. “Dog” has risen and fallen over the same time period; whereas “cat” has simply been on the rise. “Equanimity” peaked around 1860 and then fell to about level with its 1800 frequency. Here, I was checking our vocabulary, to see if it really HAS been on the decline. “Obfuscate” appears hardly at all until 1960, then rises sharply. So interesting to see how words become popular! Also of note, our professionalism has experienced a significant decline. “Sir” has nearly dropped from use altogether.
So herein lie my forays into the world of text mining.
Again, I am looking forward to class. I tried to get a better understanding of the technology involved by going into Google’s explanations a bit more, reading additional items available on Voyant, but all of that just made my head hurt more. I felt like I was in high school physics class again. So I can see why Professor Leon said this is the point in the semester where people begin to get overwhelmed. I’m not quite overwhelmed, because I know this is just one more tool in our ever-expanding toolbox. However, I could use a bit of professional expertise in applying the tool. See you all in class!