Elusive Boolean; or, How to Strike Gold in Electronic Mining

Last week, I pondered (or, in truth, ranted about) the difficulties that I often face because of database time-outs. Having dedicated some thought and reflection to the matter, I am still undecided as to a new methodological framework for the collection and cataloguing of my data. I was, however, able to codify some advice regarding  the searching of electronic newspaper databases.

First, some basic vocabulary. For the web savvy, the term Boolean (BOO-le-an) is a familiar one–and my titular pun was very, very funny. For those less familiar with ins and outs of electronic database searching, it may be less so.

If you are a complete novice, the University of Auckland had produced a straightforward video, which is available on YouTube at http://www.youtube.com/watch?v=7tm-sDKCnO4.  The folks at CommonCraft have also produced a video on Web Search Strategies at  http://www.commoncraft.com/video/web-search-strategies, which you may find useful.

For those who cannot watch (or listen) to these as the moment, a Boolean search can be most simply described as one that uses the terms AND, OR, and NOT in order to narrow down or limit a search. AND and NOT are also sometimes indicated by the use of a + or symbol immediately before a word. Quotation marks are also used to search for a specific phrase. For example:

“M. H. Beals” AND newspapers NOT migration

or

“M. H. Beals” +newspapers -migration

Boolean searching has a variety of consequences for electronic research, but the most important, perhaps, is the flexibility it gives you in adjusting and saving your search parameters. Take, for example, the Scotsman’s electronic archive.



The ProQuest-based service offers users a powerful, but accessible, interface for limiting searches. However, the complexity of my search (which also included ‘article types’ limitations not shown here) meant that session time-outs (and the re-entering of these details) were extremely time consuming. Yet, the open nature of the ProQuest search function offered a solution.

Once entered, the URL in the browser address bar read the following:

http://search.proquest.com/hnpscotsman/results/139A19D38223DCAD31D/1/$5bqueryType$3dadvanced:hnpscotsman
$3b+sortType$3drelevance$3b+searchTerms$3d$5b$3cAND$7ccitationBodyTags:australia$3e,+$3cOR$7ccitation
BodyTags:$22new+south+wales$22$7cOR$7c$22van+diemen$27s+land$22$3e,+$3cOR$7ccitationBodyTags:
$22botany+bay$22$7cOR$7cnew+holland$3e,+$3cOR$7ccitationBodyTags:$22swan+river$22$7cOR$7c
$22new+zealand$22$3e,+$3cOR$7ccitationBodyTags:$22van+dieman$27s+land$22$3e,+$3cNOT$7ccitation
BodyTags:$22naval+intelligence$22$3e$5d$3b+searchParameters$3d$7bNAVIGATORS$3dpubtitlenav,decadenav
$28filter$3d110$2f0$2f*,sort$3dname$2fascending$29,yearnav$28filter$3d1100$2f0$2f*,sort$3dname$2fascending
$29,yearmonthnav$28filter$3d120$2f0$2f*,sort$3dname$2fascending$29,monthnav$28sort$3dname$2fascending
$29,daynav$28sort$3dname$2fascending$29,+RS$3dOP,+chunkSize$3d20,+instance$3dprod.academic,+date$3
dRANGE:1817-0-1,1844-11-31,+ftblock$3d55199+670835+670834+7+660829+199+55007+55000+670831+670828
+660845+670829+660843+660840,+removeDuplicates$3dtrue$7d$3b+metaData$3d$7bUsageSearchMode$3d
Advanced,+dbselections$3dhistory$7cgenealogy$7chomework_help$7chistoricalnews,+fdbok$3dN,+siteLimiters
$3dRecordType$7d$5d?accountid=15842

I have highlighted my search terms and date limitations above to demonstrate the syntax used by the Scotsman’s search function.  Although obviously cluttered by a great deal of additional information, the key terms are easily identifiable, and, more importantly, easily editable. By discovering where your terms are within the URL, you can quickly edit your search (correcting the dates, for example) without having to return to the search screen.

Understanding the URL also allows you to pick up where you left off. ProQuest does offer the option to save searches, but saving the URL itself (in Notepad or Evernote) means you will have quick access to your search and the ability to quickly update the date parameters  to the relevant period.

Other databases are less open with their search protocols.  British Library Newspapers, for example, takes the following advanced search



and returns the URL

http://find.galegroup.com.lcproxy.shu.ac.uk/bncn/advancedSearch.do

Unfortunately, with no Boolean syntax, the URL cannot be saved or refined.  Nonetheless, there are some short-cuts to be found.  Upon searching for the above, the results screen does offer a truncated (shortened) version of the Boolean syntax it utilises:

 ((ke (australia)[Fuzzy Level=Med]) And (ke (“new south wales”)[Fuzzy Level=Med]) And (ke (“van dieme…

Placing the above into the advanced search box on a subsequent visit will not return any results, because it will be searching for all of those terms (including the  phrase ‘Fuzzy Level’) rather than actually using the Boolean syntax. Yet, your knowledge of Boolean can still be utilised.  By crafting an ANDOR, and NOT statement such as

australia OR “new south wales” OR “van diemen’s land” OR “botany bay” OR “new holland” OR “swan river” OR “new Zealand” NOT “naval intelligence”

and placing it within the search box, you will be able to cut and paste a single statement with all your keyword possibilities and limits. Unfortunately, I have not yet found a way of including date or place limits; if you have, please do post the secret in the comments section below.

Thus by creating a text file (or Evernote note) with my search parameters written in Boolean syntax, I can ensure that I am being consistent in my searches, both within a single database and as I move from one archive to another.

*Image courtesy of  OpenPlaques

3 thoughts on “Elusive Boolean; or, How to Strike Gold in Electronic Mining

  1. Very useful information. Thanks.

    • You are very welcome! Thanks for visiting so early on a Monday morning!

Leave a Reply