It is with great pleasure that today I officially announce the launch of the Scissors-and-Paste-O-Meter, a free, online tool for tracking reprints and textual reappearances in 19th-century British newspaper material. The service allows visitors to browse or search through two of the most commonly used British newspaper repositories, the Times Digital Archive and the British Library’s 19th Century Newspapers, and discover the extent to which a particular text traveled across the nation between 1800 and 1900. The Scissors-and-Paste-o-Meter can be accessed at http://scissorsandpaste.net/scissors-and-paste-o-meter.
The tools aims to support researchers and students at all levels and in all fields to improve and contextualise their work with 19th century newspapers. Limited attribution within the newspapers themselves can often obscure the true origins of a piece of text. The Scissors-and-Paste-o-Meter therefore acts as a quick and unobtrusive aid to uncovering the ubiquity or uniqueness of a given snippet within these wider corpora.
With the support of the Digging into Data Challenge and Loughborough University, it will be expanded over the course of 2018 to include the London Gazette as well as titles held by Welsh Newspapers Online, Trove (The National Library of Australia) and Papers Past (National Library of New Zealand). If you would like to contribute to the growth of the Scissors-and-Paste-o-Meter, please feel free to contact me via email or on twitter.
What is Scissors and Paste?
Across the nineteenth century, and indeed since the very beginning of newspaper printing in Britain, editors relied upon textual reuse and re-purposing in order to fill their pages with the latest intelligence, foreign and domestic. Colloquially known as scissors-and-paste journalism, the practice usually entailed one newspaper copying, in part or in whole, textual material from another, creating a highly decentralized, global news network of virtual correspondents. In some cases, this occurred as part of an explicit agreement between publications or news-gathers; in others, it was part of an implicit professional norm—a (seemingly) uncontroversial practice that was deemed in the public interest. In still others, it was seen as flagrant theft of intellectual content, to be publicly shamed and (theoretically) barred by Victorian legislation.
Scissors-and-paste practices can be seen in all types of periodical material, including news, correspondence, literature, poetry, jokes and advertisements. The degree to which attribution was professionally expected is a matter of ongoing research, but even when attribution did take place, it was given unsystematically; sometimes newspapers listed the date and title of the original publication, rather than the one from which they had directly copied, while other times they offered only basic clues, such as ‘a London paper’.
This has led to a sense of frustration, and several honest mistakes, by those using newspapers as indicators of local or regional public opinion; this lack of clear attribution, alongside anonymous or pseudonymous authorship, leaves the modern reader unsure as to the true origin of a given text. Matching texts within 19th-century corpora computationally allows us to work with reprinted and reworked materials with a greater confidence as to their provenance. News content, broadly defined as the time-sensitive recordings of events, was likely to be reprinted quickly and maintain a high fidelity regardless of the number of generations, making it particularly well suited for electronic discovery.
How to Use the Scissors-and-Paste-o-Meter
The Scissors-and-Paste-O-Meter can be used in one of two ways. If you have located a particular text within a supported collection, and wish to see if it is a reprint, or was reprinted elsewhere, simply enter the relevant bibliographical details (date, title and page number) into the form. You will then be presented with a list of other newspaper pages that contain at least 100 words of the same textual material. You can also browse through a listing of all pages that have either received or supplied reprinted material. A quick tutorial on using the site can be viewed below:
The Scissors-and-Paste website also acts a repository for newspaper transcriptions that have appeared in multiple publications. If you discover an interesting series of reprints, and would like to contribute these to the collection, please contact me via email or on twitter.
Provisos and Warnings
Unfortunately, no system of identifying textual reuse is perfect. Therefore, please carefully consider the following provisos:
- The listings provided by the Scissors-and-Paste-o-Meter are taken from an on-going research project, and may expanded or be otherwise revised at any time; links may therefore not reflect the same result on separate occasions. However, a complete, versioned archive of all listings is available via the site’s Github repository, which may be used to create static links.
- The results are limited to reappearances within 15 days and are best used for researching news or other time-sensitive material. Literary material may have appeared after a significantly longer delay and may not be systematically represented.
- The results are based on word-for-word reappearances and although there is a slight tolerance for variation, it will not routinely identify heavily rephrased or summarised texts.
- You can attempt to “follow” a text across several instances by clicking on the hand icon beside each result. However, it is strongly advised that you verify each match manually (see the above video). That page A has similar text to page B, and page B has similar text to page C, does not guarantee that page A has similar text to page C.
- In general, the results are based upon an automatic analysis of a limited collection of three million pages of optically-recognized transcriptions, processed by a set of historiographically-informed algorithms. That a particular text is not listed as a reprint on Scissors and Paste does not guarantee that it is unique. A full discussion of the methodology used can be viewed here.
Although primarily a labour of love (and madness) this project was greatly facilitated by the financial, material and moral support of colleagues at Sheffield Hallam University, Loughborough University, British Library Labs, the Software Sustainability Institute and the Programming Historian. To all of you, my continuing and unreserved thanks.
All data is available CC-BY at http://osf.io/nm2rq