Event box

(Online; RCR) Digital Humanities Research: Acquiring and Preparing a Corpus of Texts

Before you can undertake automated text analysis, it's necessary to obtain a corpus of digitized texts and, in many instances, take steps to prepare them for further processing. This hands-on digital humanities workshop focuses on the technical dimensions of corpus development. We will explore the risks and benefits of optical character recognition (OCR); file formatting and naming issues; organization strategies for large corpora; problems of data cleaning and preparation; common sources for textual research data; and common legal concerns around the use of textual corpora.

Registered participants will receive a Zoom link the day before the workshop; this event is offered for RCR credit as GS717.07.

Tuesday, March 21, 2023
9:30am - 11:30am
Digital Humanities   Digital Scholarship   ScholarWorks  
Registration has closed.

Event Organizer

Profile photo of Will Shaw
Will Shaw

Digital Humanities Consultant, Duke University Libraries