Event box
Getting to Text Analysis: Cleaning and Structuring Data for the TOC Project [Munch & Mull Digital Scholarship Discussion Group]
Text mining projects and other digital humanities endeavors have long held the interest of librarians and researchers alike. The allure of using digital tools to help analyze a large, difficult corpus is powerful, but often the important task of data cleaning, data processing, and data structuring is overlooked. What do these processes entail? How long do they take? What is "structured" data anyway, and how can researchers get there with their own data?
Arianne Hartsell-Gundy (Librarian for Literature & Theater Studies and Head Humanities Librarian), Heidi Madden (Librarian for Western European and Medieval/Renaissance Studies), and UNC SILS student Ellen Cline will discuss their experiences with data structuring in their work on the TOC project. They used front matter from materials in the German National Library in a text mining project to better understand literary trends. We will talk about specifics from the TOC project, data structuring in general, and continue our discussion of text mining by building on the following articles from our previous Digging Deeper workshop:
Denny, M. J. and Spirling, A. (2017). Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads, and What to Do about It. https://ssrn.com/abstract=2849145
Rawson, K., & Muñoz, T. (2016). Against Cleaning. Retrieved August 16, 2017, from http://curatingmenus.org/articles/against-cleaning/
Rockwell, G. (2003). What is Text Analysis, Really? Literary and Linguistic Computing, 18(2), 209–219. https://doi.org/10.1093/llc/18.2.209
- Date:
- Monday, February 5, 2018
Show more dates
Monday, February 12, 2018
Monday, February 19, 2018
Monday, February 26, 2018
Monday, March 5, 2018
Monday, March 12, 2018
Monday, March 19, 2018
Monday, March 26, 2018
Monday, April 9, 2018
Monday, April 23, 2018
Monday, May 14, 2018
Monday, May 21, 2018
- Time:
- 12:00pm - 1:00pm
- Location:
- Bostock 121 (Murthy Digital Studio)
- Campus:
- West Campus
- Categories:
- Digital Scholarship Events @ the Edge
Munch & Mull is a Libraries-based discussion group that holds weekly, informal, brown-bag lunch conversations around issues, projects, methods, and trends in digital scholarship. All are welcome!
The current M&M schedule of talks is on the Digital Scholarship Services website, https://library.duke.edu/digital/events. For more information about upcoming discussions, join our listserv: https://lists.duke.edu/sympa/subscribe/munch-mull-digihum-reading-group.