The Verge·May 5, 2026

🚨Publishers Sue Meta Over Llama AI Training Data

Big Publishers Take on Meta Over Pirated Books

TL;DR

Five major publishers and author Scott Turow have filed a lawsuit against Meta, alleging massive copyright infringement in training its Llama AI models. The suit claims Meta used unauthorized copies of copyrighted works from pirate sites like LibGen and Sci-Hub.

Major book publishers including Macmillan, McGraw-Hill, Elsevier, Hachette, Cengage, and author Scott Turow have filed a class action lawsuit against Meta over alleged copyright infringement in training its Llama AI models. The lawsuit alleges that Meta repeatedly copied copyrighted materials without permission when training Llama with information from the Common Crawl dataset, which includes unauthorized copies of copyrighted works. This results in Llama reproducing verbatim or near-verbatim copyrighted material upon request. For example, a prompt based on two sentences from a best-selling textbook triggers an exact reproduction of subsequent content. The lawsuit seeks damages and orders Meta to provide a list of books and articles used for training. If you're working with AI models trained on potentially pirated data, this case could set important legal precedents.

Publishers Sue Meta Over Llama AI Training Data

Key Points

1

Macmillan, McGraw-Hill, Elsevier, Hachette, and Cengage filed a class action lawsuit against Meta on October 19th, 2023.

2

The lawsuit claims Meta used unauthorized copies of copyrighted works from pirate sites like LibGen and Sci-Hub to train Llama AI models.

3

Meta trained its Llama model with information from the Common Crawl dataset, which includes unauthorized copies of copyrighted materials.

4

Llama outputs verbatim or near-verbatim reproductions of copyrighted material when prompted with specific content.

5

The lawsuit seeks damages and orders Meta to provide a list of books and articles used for training Llama AI models.

Why It Matters

If you're working on an AI project that relies heavily on large datasets, this case could set important legal precedents. Publishers are challenging the use of pirated materials in AI training, which may impact how companies handle copyrighted content.

copyrightmetallama ailawsuit

Frequently Asked Questions

Why does this matter?

If you're working on an AI project that relies heavily on large datasets, this case could set important legal precedents. Publishers are challenging the use of pirated materials in AI training, which may impact how companies handle copyrighted content.

What happened?

Five major publishers and author Scott Turow have filed a lawsuit against Meta, alleging massive copyright infringement in training its Llama AI models. The suit claims Meta used unauthorized copies of copyrighted works from pirate sites like LibGen and Sci-Hub.

Comments

Subscribe to join the conversation...

Be the first to comment

Enjoyed this article?

Get it daily. 7am. Free. Reads in 5 minutes.