For the past few months or so, when I've had the chance in between teaching, research, and my own work, I've been assisting the Early Modern Conversions project here at McGill in building a corpus tool for Early English Books Online using the data from the Text Creation Project (EEBO TCP). It's been interesting: the objective is to create texts that have some measure of orthographic consistency so that large scale text-analysis tools can be used on them - things like topic modelling for instance. Because of the variants in spelling, scholars normally can't do much with these texts. We've been using a tool call VARD2, which uses statistical analysis to alter variant spellings in early modern
read more