Getting to Text Analysis: Cleaning and Structuring Data for the TOC Project [Munch & Mull Digital Scholarship Discussion Group]
Text mining projects and other digital humanities endeavors have long held the interest of librarians and researchers alike. The allure of using digital tools to help analyze a large, difficult corpus is powerful, but often the important task of data cleaning, data processing, and data structuring is overlooked. What do these processes entail? How long do they take? What is "structured" data anyway, and how can researchers get there with their own data?
Arianne Hartsell-Gundy (Librarian for Literature & Theater Studies and Head Humanities Librarian), Heidi Madden (Librarian for Western European and Medieval/Renaissance Studies), and UNC SILS student Ellen Cline will discuss their experiences with data structuring in their work on the TOC project. They used front matter from materials in the German National Library in a text mining project to better understand literary trends. We will talk about specifics from the TOC project, data structuring in general, and continue our discussion of text mining by building on the following articles from our previous Digging Deeper workshop:
Munch & Mull is a Libraries-based discussion group that holds weekly, informal, brown-bag lunch conversations around issues, projects, methods, and trends in digital scholarship. All are welcome!
The current M&M schedule of talks is on the Digital Scholarship Services website, https://library.duke.edu/digital/events. For more information about upcoming discussions, join our listserv: https://lists.duke.edu/sympa/subscribe/munch-mull-digihum-reading-group.