Event box

Mining Text and Data, Legally (M&M Digital Scholarship Group)

Researchers and libraries staff alike can find it difficult to navigate copyright and license restrictions around text and data mining. In some cases, researchers may be unaware of these limits -- until they attempt to publish or share the data on which their scholarship is based. Libraries often confront these challenges earlier, when negotiating contracts for licensing databases; for instance, Peter McCracken and Emma Raub detail these issues from the libraries' perspective in their 2023 article, "Licensing Challenges Associated with Text and Data Mining." Working on behalf of authors, the Authors Alliance lobbied successfully for an exemption to the Digital Millennium Copyright Act (DMCA) to permit text data mining research on electronic books and films - with some limitations, however.

What are ways in which the Duke community seeks to analyze corpora (texts, datasets, images, and more)? How we can help them understand and navigate their rights? Where do we see need for advocacy and reform in order to pursue and support research?

This conversation, led by Will Shaw (Digital Humanities Consultant) and Kate Dickson (Copyright Librarian), will focus primarily on the legal, ethical, and practical issues of using software to extract data from websites at a large scale (i.e., web scraping). The presentation will detail specific laws implicated by web scraping, where there are gaps in what U.S. law addresses (e.g., the still evolving understanding of copyright and AI), and best practices that can help guide researchers’ approach to web scraping. Feel free to share questions ahead of time or bring them to the discussion.   

Date:
Monday, April 8, 2024
Time:
12:00pm - 1:00pm
Categories:
Digital Scholarship & Publishing Services  
Registration has closed.

Munch & Mull is a Libraries-centered, informal, and thought-provoking conversation about digital scholarship, digital humanities, and publishing. For updates on meetings and topics, subscribe to our list at https://lists.duke.edu/sympa/subscribe/digital-librarians.

Event Organizer

Profile photo of Will Shaw
Will Shaw

Digital Humanities Consultant, Duke University Libraries

Profile photo of Liz Milewicz
Liz Milewicz

Director, The ScholarWorks Center for Open Scholarship