How I learned to stop worrying and love my data

With my one-day workshop, Numbers are your Friends, fast approaching, I decided to finally take* the plunge into my long overdue number-crunching. Over the past year, I have been slowly gathering my qualitative data and placing quantitative statistics into an unobtrusive Excel matrix, deep in the recesses of my hard drive. It was not that I feared numbers, but rather that my data would be so woefully incomplete that it would defy quantification. As I plucked up the courage to  look through my data, I found that, after a year, there were still far more unknown or semi-known sources (such as Unknown London Newspaper) than concrete links.

Yet, I do not despair. I was well aware that this would be the case for the vast majority of my project. My methodology was designed to explore concentric circles of the news network, starting with those Scottish newspapers that have been digitized by Google News and the British Library. I would then move methodically onto those that had been cited by these original subjects and work thus until all British newspapers between 1783 and 1837 were effectively catalogued. Should any newspapers not be included through qualitative dissemination links, they would need to be researched at the conclusion of the project to ensure they were properly connected to the wider network (despite their monodirectional and, dare I say, parasitic nature).

In this way, specific paths of dissemination for specific pieces of reportage would be more comprehensively established. It would also, hopefully, reduce the amount of guesswork involved in establishing the most likely candidates for ‘unknown sources’ by looking first to those journals that were known sources of international news content. Having only recently completed the first ring of digitised periodicals, it is little surprise that my unknown column is still quite long.

So I must admit to myself that methodology is simply not designed for a mid-project presentation. In order to facilitate the qualitative aspect of my project, which traces changes in content along dissemination lines, my quantitative data is greatly skewed. As my data so far is primarily derived from six newspapers, these will undoubtedly be described as the central hubs of the newspaper network. While they are certainly the central hubs within my limited data set, their centrality is hugely misleading to anyone interested in the structure of the wider network. As I add layers to my dataset, this undue centrality will be corrected and the true harbingers of colonial intelligence will emerge. Until then, is there any reason to plot a network visualisation?

Perhaps there are two.

First, even my limited data shows the interconnectedness of these six papers, as well as a handful of other prime contributors to Scottish reportage (namely, the Glasgow Chronicle and the Sydney Gazette.) Despite my rows and rows of data, I had not noticed that prominence of the Chronicle until someone actually drew me a picture.

Second, childishly, it is a hugely satisfying to see that little Excel spreadsheet laid out in beautiful shades of blue. And, really, isn’t that reason enough to stop worrying and learn to love your data?

*split infinitives are your friends, too. Unlike Latin, in which the infinitive cannot be split no matter how hard your try, English allows adverbs to be placed in a variety of locations to facilitate emphasis and style. To wit, ‘To Boldly Go’ will always sound superior to ‘To Go Boldly’ no matter how much grammarians wish it were otherwise. The correct use of the subjunctive, on the other hand, is law.

Leave a Reply