The Public Interest Corpus is focused on developing large-scale, high-quality AI training data from the world’s memory organizations to serve the public interest.
Memory organizations like libraries and archives are in a prime position to address AI training data challenges given a multi-century focus on developing high quality, locally and globally comprehensive collections of books, newspapers, scholarly journals, photographs, manuscript materials, and more.
We seek to prioritize uses of The Public Interest Corpus that promote learning, access to knowledge, and broad benefits to the public.
About the Project
The Public Interest Corpus was launched in December 2024 by Authors Alliance and Northeastern University with support from the Mellon Foundation.
The project team consists of Dave Hansen, Executive Director of Authors Alliance, Dan Cohen, Vice Provost for Information Collaboration, Dean of the Library, and Professor of History at Northeastern University, Thomas Padilla, Public Interest AI Strategist, and Giulia Taurino, Project Coordinator.
To help guide the project we have assembled an Advisory Board:
- David Bamman, Associate Professor, UC Berkeley School of Information
- Sandra Aya Enimil, Director of Scholarly Communications and Collection Strategy, Yale University Library
- Mike Furlough, Executive Director, HathiTrust
- David Smith, Associate Professor, Khoury College of Computer Sciences, Northeastern University
- Claire Stewart, Juanita J. and Robert E. Simpson Dean of Libraries and University Librarian, University of Illinois, Urbana-Champaign
- Mehtab Khan, Assistant Professor of Law at Cleveland State University College of Law
- Rachael Samberg, Director, Scholarly Communications and Information Policy, UC Berkeley Library
- Robin Sloan, NY Times best selling science fiction author
- Günter Waibel, Associate Vice Provost & Executive Director, California Digital Library
- Martha Whitehead, Vice President for the Harvard Library and University Librarian and Roy E. Larsen Librarian for the Faculty of Arts and Sciences, Harvard University
- John Wilkin, CEO, Lyrasis
- Suzanne Wones, University Librarian, UC Berkeley Library
- Ted Underwood, Professor of Information Science and English, University of Illinois at Urbana Champaign