Adrian’s PhD research had used a number of Natural Language Processing (NLP) systems to search for toponymic (place name) mentions in Online Social Network (OSN) interactions sourced from Twitter and Facebook:
- GATE Desktop and GATE Cloud, two long-established open source projects from the University of Sheffield
- CLAVIN-rest, a RESTful API built on top of Berico Technologies’ (now Novetta’s) CLAVIN geoparsing and georesolution library.
- IBM’s Alchemy API, subsequently rebranded Watson Natural Language Understanding.
As part of a separate research project these technologies were applied to Professor Richard Healey’s research into the Illinois Central Rail Road (ICRR) and Chicago and North Western Rail Road (CNWRR).
Digitised County History documents from the Internet Archive were downloaded, e.g.:
Scanned documents were processed with Optical Character Recognition (OCR) software and searched for the words:
- railroad
- railroads
- brakeman
- engineer
- conductor
- fireman
- railroad clerk
- baggage master
- depot master
- railroad agent
- railroad superintendent
- railroad machinist
- railroad shops
These terms, relevant to Richard’s research, could be found in the ORC’d texts and used, with some further developments to jump to the relevant pages, to speed up the research process.