17
Nov
2015
00:00
EEBO TCP Metadata Mashup
Over the past year I've spent some time recreating the metadata of Phase 1 and 2 texts from the Text Creation Partnership's hand coded SGML files of Early English Texts Online for the Early Modern Conversions digital humanities project 'Distant Reading Early Modernity' (DREaM). It's been an interesting process working through the 44,418 texts of the TCP corpus (McGill has access to phase 2 as a TCP partner). Though it first involved marrying the TEI metadata headers with the text bodies, to create a master file for each text, subsequent work, has focused on each part, in turn. Last year I extracted elements from each file where the "lang" attribute contained "eng" as a value, in order to cr
read more
read more