Reported EU legislation to disclose AI training data could trigger copyright lawsuits

A late provision reportedly added to the EU’s forthcoming AI Act would force companies like OpenAI to disclose their use of copyrighted training data. Already, a number of high-profile AI firms have been hit by copyright lawsuits.

By James Vincent, a senior reporter who has covered AI, robotics, and more for eight years at The Verge.

Apr 28, 2023, 3:53 PM UTC

A circle of 12 gold stars representing the European Union.

The current AI boom, from Bing and Midjourney, relies on free access to training data, much of it scraped from the web and often protected by copyright. The use of this data has led to both criticism and lawsuits, particularly in the art world, with rights owners arguing that their work is being exploited without their permission.

Some of the AI world’s biggest players, like OpenAI, have avoided scrutiny by simply refusing to detail the data used to create their software. But legislation proposed in the EU to regulate AI (the long-building and far-reaching AI Act) could force companies to disclose this information, according to reports from Reuters and Euractiv.

The amendment was reportedly a late addition to the draft AI Act

Reuters says late amendments to the AI Act, which was approved in draft form by legislators earlier this week, will require “companies deploying generative Al tools, such as ChatGPT ... to disclose any copyrighted material used to develop their systems.” Earlier this month, Euractiv reported on the same provision, saying companies would have to “make publicly available a summary disclose the use of training data protected under copyright law.” Reuters, citing “sources familiar with the discussions,” says the amendment was “a late addition drawn up within the past two weeks.”

The details of this requirement are unknown, and the law may change during the coming closed-door negotiations, known as trilogues, needed to finalize the act. But if AI companies are forced to disclose the sources of their training data, it could open the door to numerous lawsuits that would affect some of the biggest names in tech.

Already, companies like Getty Images are suing image-generating AI for scraping their data without permission, while there are a small number of class action lawsuits targeting image- and code-generating AI. However, the biggest name in AI today — OpenAI, maker of ChatGPT, GPT-4, and DALL-E and the power behind Microsoft’s AI push — is extremely secretive about its data sources. The reported legislation could change this, giving evidence for lawsuits and leverage to discussions between organizations like media companies, whose data is being used and referenced by numerous chatbots.

Although the potential impact of the law will depend on its details, the rest of the EU’s AI Act is certain to have similarly broad effects on the fast-changing AI landscape.

The act will classify AI systems based on their perceived risk and require the companies responsible for building the most impactful tools to disclose important data about safety, interpretability, performance, and so on. As with previous tech regulation pushed by the EU, the AI Act will undoubtedly have a global effect on how tech companies do business. Lawmakers in the EU will continue to discuss details of the act throughout the year, though compliance for companies will likely not come into force until 2025 or later.

Reported EU legislation to disclose AI training data could trigger copyright lawsuits

Reported EU legislation to disclose AI training data could trigger copyright lawsuits

A late provision reportedly added to the EU’s forthcoming AI Act would force companies like OpenAI to disclose their use of copyrighted training data. Already, a number of high-profile AI firms have been hit by copyright lawsuits.

Razer made a million dollars selling a mask with RGB, and the FTC is not pleased

The walls of Apple’s garden are tumbling down

In the first Autonomous Racing League race, the struggle was real

The Apple Vision Pro’s eBay prices are making me sad

The OLED iPad Pro could launch with an M4 chip

More from Artificial Intelligence

Interview: Figma’s CEO on life after the company’s failed sale to Adobe

The shine comes off the Vision Pro

Spike Jonze’s Her holds up a decade later

You sound like a bot

Reported EU legislation to disclose AI training data could trigger copyright lawsuits

Reported EU legislation to disclose AI training data could trigger copyright lawsuits

A late provision reportedly added to the EU’s forthcoming AI Act would force companies like OpenAI to disclose their use of copyrighted training data. Already, a number of high-profile AI firms have been hit by copyright lawsuits.

Share this story

Related

Razer made a million dollars selling a mask with RGB, and the FTC is not pleased

The walls of Apple’s garden are tumbling down

In the first Autonomous Racing League race, the struggle was real

The Apple Vision Pro’s eBay prices are making me sad

The OLED iPad Pro could launch with an M4 chip

More from Artificial Intelligence

Interview: Figma’s CEO on life after the company’s failed sale to Adobe

The shine comes off the Vision Pro

Spike Jonze’s Her holds up a decade later

You sound like a bot