Newspaper Dissector

A Windows-Python programme for categorising and visualising newspaper pages based on topic, word count and source. Based upon code built at the Software Sustainability Institute’s 2017 Collaboration Workshop with the support of Geraint Palmer and Vince Knight (Cardiff University).

Software: MIT License (2018) M. H. Beals
Instructions: CC-BY 4.0 (2018) M. H. Beals


The Newspaper Dissector can be downloaded directly here.


This programme requires


To use the Newspaper Dissector, place the executable file (NewspaperDissector.exe) in a directory/folder with one or more tab-separated (.tsv) data files. The files should have no headers and include only the raw data.

Each row should include the following fields:

  • Year
  • Month
  • Day
  • Title
  • Page number (front to back)
  • Column number (left to right)
  • Snippet number (top to bottom)
  • Type (advert, commentary, miscellany, news, numerical)
  • Topic (open-ended)
  • Action Location (open-ended)
  • Source Location (open-ended)
  • Source Type (l, a, t, c, u)
    • l = location
    • a = ambiguous publication
    • t = named publication
    • c = correspondence
    • u = undetermined
  • Source Location Type (l, r, n, i, u)
    • l = local
    • r = regional
    • n = national
    • i = international / colonial
    • u = undetermined
  • Text

An example line of the file is as follows:

1820 6 15 Caledonian Mercury 1 1 1 Advert Meeting Dumfries Dumfries l r "This is an example article text"

An example of the results can be seen at:

Beals, M. H. (2018): 16 Re-Visualisations of The Caledonian Mercury, 14 June 1830, by Source Type and Source Location with legends and associated data. figshare. Fileset.

Leave a Reply