Event box

Getting to Text Analysis: Cleaning and Structuring Data for the TOC Project [Munch & Mull Digital Scholarship Discussion Group]

Text mining projects and other digital humanities endeavors have long held the interest of librarians and researchers alike. The allure of using digital tools to help analyze a large, difficult corpus is powerful, but often the important task of data cleaning, data processing, and data structuring is overlooked. What do these processes entail? How long do they take? What is "structured" data anyway, and how can researchers get there with their own data? 


Arianne Hartsell-Gundy (Librarian for Literature & Theater Studies and Head Humanities Librarian), Heidi Madden (Librarian for Western European and Medieval/Renaissance Studies), and UNC SILS student Ellen Cline will discuss their experiences with data structuring in their work on the TOC project. They used front matter from materials in the German National Library in a text mining project to better understand literary trends. We will talk about specifics from the TOC project, data structuring in general, and continue our discussion of text mining by building on the following articles from our previous Digging Deeper workshop:


Denny, M. J. and Spirling, A. (2017). Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads, and What to Do about It. https://ssrn.com/abstract=2849145


Rawson, K., & Muñoz, T. (2016). Against Cleaning. Retrieved August 16, 2017, from http://curatingmenus.org/articles/against-cleaning/


Rockwell, G. (2003). What is Text Analysis, Really? Literary and Linguistic Computing18(2), 209–219. https://doi.org/10.1093/llc/18.2.209


Monday, February 5, 2018 Show more dates
12:00pm - 1:00pm
Bostock 121 (Murthy Digital Studio)
West Campus
Digital Scholarship   Events @ the Edge  

Munch & Mull is a Libraries-based discussion group that holds weekly, informal, brown-bag lunch conversations around issues, projects, methods, and trends in digital scholarship. All are welcome!

The current M&M schedule of talks is on the Digital Scholarship Services website, https://library.duke.edu/digital/events. For more information about upcoming discussions, join our listserv: https://lists.duke.edu/sympa/subscribe/munch-mull-digihum-reading-group.

Event Organizer

Arianne Hartsell-Gundy
Digital Scholarship Services