Summary
A new study suggests that OpenAI trained at least some of its AI models on copyrighted content, sparking fair use concerns. The study's authors used a new method to identify training data 'memorized' by the models and found signs of memorization in GPT-4 and GPT-3.5 models.
Key Points
The study proposes a new method for identifying training data 'memorized' by AI models
GPT-4 and GPT-3.5 models showed signs of having memorized portions of popular fiction books and New York Times articles
OpenAI has long advocated for looser restrictions on developing models using copyrighted data
Why It Matters
The study's findings highlight the need for greater transparency in AI training data and fair use concerns in the development of AI models.
Author
Kyle Wiggers