Moretti enthralled me with his various interpretations of the rise and fall of the novel. Never a literary historian (who even knew such a wonderful thing existed?), but an enthusiastic reader of British classical novels, I felt the familiar tug of historical curiosity as I read the book. Questions – “Did the novel really “rise” and then “fall”? and “How did historic events (the American and French Revolutions, Japanese dynastic tendencies), literacy rates, or just ability to publish influence these trends?” – wafted through my mind as I read.
Fascinated, I read on to his concept that there was a bigger pattern emerging, and that novels actually went through cycles, irrespective of other things going on in the world. Readings from History 610 and the Annales school went through my mind here (a connection, possibly?) And this all seemed a clever prelude to our theme for the week – the million (or billion) book libraries, and a whole new way to conduct research.
Ted Underwood helped generate my central questions. Are we radically changing the way we approach research through text/data mining? Is this concept of “distant” versus “close” reading just another way of saying we (professional humanists) have introduced a new methodology (or historiography)? Was anything like this possible before digitizing all this material?
Which method is more important – the more traditional close reading of texts and primary source documents? Or a trends analysis, which some might argue, is by its nature more objective? Does this broaden the field of quantitative analysis, and perhaps usher in a new era of objectivity? How do we determine which method accomplishes what?
Here I will pause for a moment, and beg forgiveness if the rest of the class was on this train a long time ago, and I am just jumping on while it barrels down its standard gauge track at 350 miles per hour. I took a statistics course in my previous academic life, but the thought of using tools such as that to track themes through books, or the rise and fall of the novel, never occurred to me as possible.
Theibault mentions that some of this is actually NOT new for historians, or for other scholars of the social sciences. We have been putting together statistical analysis together for close to two centuries, and some of this has become so common that the reader expects to see pie charts, graphs and other illustrations in the reading as part of the explanation. In some cases, “visualization made interpretation possible”. (Theibault)
For brevity’s sake, and since I have already mentioned Theibualt, I will move to the visualization articles. Theibault discusses two primary reasons for visualizations – to identify overarching patterns and to present information. Text/data mining seems like it would fall into the first category. Rapidly searching multiple texts for the occurrence of a word or phrase, or seeing what words/phrases move together, would be a way to rapidly identify trends.
This is interesting, and serves as a good starting point for more detailed research. One aspect of text mining that bothers me the lack of knowledge about the corpus of materials from which a researcher is working. Underwood discusses this a bit when talking about Google’s Ngrams. Because of the way Google is set up, and unlike systems such as Voyant, you may not know exactly where you data is coming from. So this may be useful for identifying overarching trends, but where can you take data such as this? We spend so much time in research papers and dissertations explaining our dataset, that it seems only mildly useful to use a dataset that is undefinable.
On the other hand, presenting information seems an extraordinarily good method of using visualization. Minard’s map of Napoleon’s trip across Russia and back again offers a great example. I actually got to see this in much greater detail at a conference about visualization and presenting information. So I got to see how dramatically the troop numbers change as he marches eastward, and then returning home. It is striking to see, though I also admit that it helps to have a person who understands the map to explain it.
But how else would you have done it? Putting together such a comprehensive picture of a complex and long endeavor is not easy. The research that went into the one visualization certainly could have been contained in a book of several hundred pages. Personally, I would rather look at the map.
Of course, the argument can be made that other important details are lost. The researcher has chosen to emphasize certain things (troop losses) over others (an actual map, Napoleon’s decision-making, etc.)
All in all, I think that data mining is a good place to start. I remain unconvinced that is a methodological approach in and of itself. (wait to see how the practicum goes for final results!) Visualization remains one of my favorite tools, as a way of presenting information to an audience (academic or otherwise) more effectively than through mere verbiage.