Event box

[RCR; Online] Digital Humanities Text Analysis: Building a Corpus

Before you can undertake computational text analysis, it's necessary to obtain a corpus of digitized texts and, in many instances, take steps to prepare them for further processing. This digital humanities workshop focuses on the technical and ethical dimensions of corpus development. We will explore:

  • the risks, benefits, and implications of depending on optical character recognition (OCR) to transcribe text;
  • best practices for preserving the integrity and usability of a corpus via file formatting, naming, and organization choices;
  • the ethics of data cleaning and preparation;
  • common sources for textual research data; and
  • ways in which AI can (and can't) assist with these challenges, and whether it should.

Note: No previous experience with any of these topics is assumed, but this workshop includes hands-on exploration in small groups and requires active participation.

Date:
Monday, April 7, 2025
Time:
10:00am - 12:00pm
Categories:
Digital Humanities   Digital Scholarship   ScholarWorks  

Registration is required. There are 49 seats available.

Event Organizer

Profile photo of Will Shaw
Will Shaw

Digital Humanities Consultant, Duke University Libraries