Event box

Getting to Text Analysis: Cleaning and Structuring Data for the TOC Project [Munch & Mull Digital Scholarship Discussion Group]

Text mining projects and other digital humanities endeavors have long held the interest of librarians and researchers alike. The allure of using digital tools to help analyze a large, difficult corpus is powerful, but often the important task of data cleaning, data processing, and data structuring is overlooked. What do these processes entail? How long do they take? What is "structured" data anyway, and how can researchers get there with their own data? 

 

Arianne Hartsell-Gundy (Librarian for Literature & Theater Studies and Head Humanities Librarian), Heidi Madden (Librarian for Western European and Medieval/Renaissance Studies), and UNC SILS student Ellen Cline will discuss their experiences with data structuring in their work on the TOC project. They used front matter from materials in the German National Library in a text mining project to better understand literary trends. We will talk about specifics from the TOC project, data structuring in general, and continue our discussion of text mining by building on the following articles from our previous Digging Deeper workshop:

 

Denny, M. J. and Spirling, A. (2017). Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads, and What to Do about It. https://ssrn.com/abstract=2849145

 

Rawson, K., & Muñoz, T. (2016). Against Cleaning. Retrieved August 16, 2017, from http://curatingmenus.org/articles/against-cleaning/

 

Rockwell, G. (2003). What is Text Analysis, Really? Literary and Linguistic Computing18(2), 209–219. https://doi.org/10.1093/llc/18.2.209

 

Date:
Monday, February 5, 2018 Show more dates
Time:
12:00pm - 1:00pm
Location:
Bostock 121 (Murthy Digital Studio)
Campus:
West Campus
Categories:
Digital Scholarship   Events @ the Edge  

Munch & Mull is a Libraries-based discussion group that holds weekly, informal, brown-bag lunch conversations around issues, projects, methods, and trends in digital scholarship. All are welcome!

The current M&M schedule of talks is on the Digital Scholarship Services website, https://library.duke.edu/digital/events. For more information about upcoming discussions, join our listserv: https://lists.duke.edu/sympa/subscribe/munch-mull-digihum-reading-group.

Event Organizer

Profile photo of Arianne Hartsell-Gundy
Arianne Hartsell-Gundy
Digital Scholarship Services