Event box

Acquiring and Preparing a Corpus of Texts (online RCR; GS717.07)

Before you can undertake automated text analysis, it's necessary to obtain a corpus of digitized texts and, in many instances, take steps to prepare them for further processing. This digital humanities workshop focuses on the technical, logistical, and legal dimensions of corpus development. We will explore the risks and benefits of optical character recognition (OCR); file formatting and naming issues; organization strategies for large corpora; problems of data cleaning and preparation; common sources for textual research data; and legal and ethical concerns around the use of textual corpora.

Thursday, September 21, 2023
9:00am - 11:00am
Digital Humanities   Digital Scholarship   ScholarWorks  
Registration has closed.

Event Organizer

Profile photo of Will Shaw
Will Shaw

Digital Humanities Consultant, Duke University Libraries