“Stolen books,” bad faith, and fair use

By Amber Lautigar Reichert | February 27, 2024

It’s Fair Use Week! UVA Library’s Director of Information Policy, Brandon Butler, penned a piece for Harvard’s Fair Use Week series titled, “‘Stolen Books,’ Bad Faith, and Fair Use.” The piece examines the origins of AI training data and its intersections with court cases such as those around HathiTrust and Google Books. He writes: 

Artificial intelligence is sure to be the hottest topic of this year’s Fair Use Week, and that hotness is well-deserved. It’s startling when a machine can instantly create written or visual works that would ordinarily require a skilled human writer or artist.

Fair use analysis is (famously) case-by-case, and the outcome of a fair use analysis for any particular AI technology will depend on how that technology works and (especially) the nature of its outputs and the purposes it serves. But we know from the Google Books and HathiTrust cases that some unlicensed computer processing of large datasets of in-copyright works is clearly fair use. Some AI technologies are sure to pass the fair use test from those cases, all else equal. But there is one interesting difference between HathiTrust and Google Books on one hand, and some of the AI tools being sued on the other: the books used in the former cases were lawfully owned by libraries and scanned with the libraries’ consent. It’s not clear that the AI companies have obtained all of their data with as clear a pedigree.

Indeed, one of the author class action lawsuits over AI argues that the datasets used to train some artificial intelligence tools are comprised partly or entirely of material of apparently dubious origin. As The Verge reports, the plaintiffs claim that some of the AI training data “were acquired from ‘shadow library’ websites like Bibliotik, Library Genesis, Z-Library, and others, noting the books are ‘available in bulk via torrent systems.’” Does this matter for the fair use calculus? Should it?

Read the full article from Harvard’s Fair Use Week blog.

For more Fair Use Week content, like “Fair Use Week 2024: The Taper’s Greatest Fair Use Hits, and a Taper Swan Song, visit The Taper.

fair use week | fair dealing week