Research Taxonomies; or, Things that from a long way off look like flies.


[Borges] quotes a ‘certain Chinese encyclopaedia’ in which it is written that ‘animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) sucking pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) inumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies.’

Preface to The Order of Things by Michel Foucault

Several weeks ago, in the midst of AcWriMo, I decided that my filing system was simply not up to par. I could write quickly enough, but spent a great deal of my time searching for stray references, half-remembered quotations, and scraps of prose that I had composed in moments of train-bound serendipity.The problem? I was simply not interfacing properly with my technology.

About eighteen months ago, I began to use Evernote as a 21st-century commonplace book; a single repository for my research and musings. In some ways, this second brain has worked remarkably well. All of my research notes, archival scribblings, and historiographical musings (as well as lecture and seminar notes) are readily accessible on any internet-enabled computer, as well as my smart phone. Indeed, the problem was not accessibility, but retrievability.

Full-text searching is both a blessing and a curse. Although it provides unprecedented access and precision to an otherwise unwieldy corpus, many of the problems associated with digital scholarship lay squarely at its feet. In his 2008 article for The Journal of Academic Librarianship, Jeffrey Beall highlights a number of its weaknesses, and, although not all of these are relevant to my own database, several certainly are.

In sum (for those who have yet to peruse the aforementioned article), full-text searching, for all its power, is flawed because it relies upon remembering, or guessing, the correct word or phrase to bring up a particular entry.

For example, last month I needed to find a specific quotation about bush-rangers. I could not remember who it was by, what article it was in, or any of the exact wording. All I could remember was that it did not actually have the word bush-ranger in it.

I never did find that quotation.

The obvious solution is to tag, code, or otherwise create metadata for my notes. This, in conjunction with full-text searching, should allow me to quickly organise or filter my entries, and therefore find the most relevant material available. And yet, problems remain:

    1. You want to tag, now?
      The main issue is not that I do not see the value in creating metadata, it is that I have 3000 un-tagged notes–all (or at least many) of which are still relevant to my ongoing research.  If I exclude them from my coding, I may miss out on important connections and qualifications when writing up my work at a later date. If I stop my current research to code my previous research, I am losing days (months) that could have been spent collecting or analysing new data.
    2. My previous attempts at tagging have failed.
       This is not the first time I have considered adding metadata to my notes. Indeed, many of my notes are already tagged to a greater or lesser degree according to a variety of different systems. However, with an evolving taxonomy, and an inconsistent application, they provide little overall clarity.
    3. Tagging is a time-consuming processes.
      Yet, perhaps at the core of my resistance is the simple fact that coding research is an extremely time-consuming process. Not only are there considerable start-up costs in designing a taxonomy, extra time is also expended  with each and every entry. It will certainly save time in the future, but sometimes an imperfect system that works now feels more effective than a perfect system that will work in the future.

In the end, full-text searching may not be perfect, but it is surely more effective than binders and compositions books; so, perhaps I should simply be content with my modest gains and leave metadata to those with more time and greater inclination.


When I began collecting data for Demography and the Imperial Public Sphere, I knew that I would be transcribing a huge number of newspaper articles. Thousands, at least. After years of work, it would be a shame (a crime!) to simply file those transcriptions away, to be satisfied with a book and a handful of articles. But what could I do?

In the midst of my musings, a friend of mine was explaining her own data worries. As a microbiologist working with genetic data, her funders required her to deposit her raw data into a publicly accessible database, so that it can be referenced, verified and built upon in the future. Although she was worried that her data would be plagiarised (or rather, published as part of another person’s project before she could publish her own), it offered me a ray of hope.  Was there something similar for history? Was there somewhere I could effectively deposit my own research?

There are already several promising possibilities, and more appear every year. My current favourite is Omeka, a WordPress-like applications for creating digital archives.

The only problem? Online databases–or, at least  effective online databases–need metadata.

So, if I want my years of painstaking research to go beyond Demography, I will need to develop a relevant taxonomy and, most importantly, to apply it consistently throughout my collection. And, if I am going to go to the trouble of organising one part of my workflow, I might as well apply it to the rest of my digital libraries.

So, over the next few months I will begin to tackle my 3000+ Evernote entries, 650+ Zotero citations and 80+ Delicious links.

Wish me luck.

**Image courtesy of BioDivLibrary

Leave a Reply