Distant Reading Early Modernity
DREaM is a sub-project of the McGill-based SSHRC Partnership project, Early Modern Conversions. We're working through EEBO TCP phases 1 and 2; normalizing the texts, and finding new things to do with the metadata. I'm part of the Digital Humanities Team. For more information see http://earlymodernconversions.com/introducing-dream/
Over the past year or so I've worked through the EEBO TCP corpus of some 44418 texts to normalize the spelling (to a certain degree) and enrich the existing EEBO TCP metadata with Linked Open Data, and identifiers. In terms of DREaM this will allow us to create a corpus analytic interface that is searchable in new ways (like gender and dates of birth, for instance). My longer term objective, however, is to clean up the EEBO TCP metadata so I can use it as seed data for Nano-history.org.
I'm nearing completion of the first revision of the metadata. It's been a lengthy process; VIAF data was easy to work with until I found discrepancies between the API and VIAF RDF XML - co-author names were appearing in the API results as canonical names. Not very good! But I think I've managed an initial work around, and I'm reporting the errors to OCLCDevNet as I go.
Work continues on cleaning up the EEBO TCP publication data - this time I'm mining for addresses. Some of these are contained in the Stationer's records, but the online versions of the text are somewhat messy. Moreover, the process has highlighted a very critical issue: the lack of a useful approach to modelling historical addresses in an age without street numbers.