Event box
Acquiring and Preparing a Corpus of Texts (online RCR; GS717.07)
Before you can undertake automated text analysis, it's necessary to obtain a corpus of digitized texts and, in many instances, take steps to prepare them for further processing. This digital humanities workshop focuses on the technical, logistical, and legal dimensions of corpus development. We will explore the risks and benefits of optical character recognition (OCR); file formatting and naming issues; organization strategies for large corpora; problems of data cleaning and preparation; common sources for textual research data; and legal and ethical concerns around the use of textual corpora.
- Date:
- Thursday, September 21, 2023
- Time:
- 9:00am - 11:00am
- Categories:
- Digital Humanities Digital Scholarship ScholarWorks
Registration has closed.