Event box

Acquiring and Preparing a Corpus of Texts (Digital Humanities Workshop Series: Text/Data)

This session focuses on the technical dimensions of corpus development. Using an array of printed matter -- from digital facsimiles of incunabula to modern letterpress/offset books -- we will explore the risks and benefits of optical character recognition (OCR); file formatting and naming issues; organization strategies for large corpora; and problems of data cleaning and preparation. We will also look at some common sources for textual research data, such as Project Gutenberg, the Internet Archive, and Google Books. We will also discuss some common legal concerns around the use of textual corpora.

** This workshop is offered for RCR credit as GS712.15.  Participants who plan to receive RCR credit (as indicated on the registration form) will receive priority registration.

Date:
Wednesday, February 7, 2018 Show more dates
Time:
9:00am - 11:00am
Location:
Bostock 121 (Murthy Digital Studio)
Campus:
West Campus
Categories:
Digital Scholarship  
Registration has closed.

Event Organizer

Profile photo of Will Shaw
Will Shaw

Digital Humanities Consultant, Duke University Libraries

Digital Scholarship Services