As AI models increasingly rely on copyrighted content for training, the EU AI Act and upcoming legal rulings—like Getty Images v. Stability AI—could trigger a wave of litigation. This article explores the legal, financial, and operational risks now facing AI developers.
AI developers should prepare for increased legal scrutiny by auditing their training data sources, especially ahead of the EU AI Act’s August 2025 disclosure deadline. Copyright holders, meanwhile, should monitor disclosures and be ready to challenge unauthorised use.
The use of copyright works to train models for AI is a vexed subject. In the last six months in the UK alone we’ve seen national newspapers campaign against the technology, more than 1,000 musicians release a silent album in protest, and a government consultation which received 11,500 responses.
Over the course of the next two months there are likely to be two significant further developments which may finally tip matters into litigation between copyright owners and AI developers, at least in the UK: the implementation of aspects of the EU AI Act and judgment in Getty Images v. Stability AI.
This could be a source of revenue for copyright owners, has the potential to reshape the practices of AI developers and could have a chilling effect on AI development in certain jurisdictions. Here we explore the factors that are aligning and the practical implications.
To date, many copyright and database right holders do not know whether their intellectual property has been used to train models used by developers of AI. Put simply, there is no public record of which copyright and database right works have been used.
This is a problem for rights holders in the EU. While the EU has a data mining exemption for commercial use (Art. 4 Directive 2019/790)1, this is subject to an opt-out. Rights holders do not know if the opt-out is being respected.
The EU AI Act seeks to remedy this. It does so by imposing an obligation on developers of models for General Purpose AI2 to publicly share a summary of the data used for training the model (Art. 53(1)(d) Regulation (EU) 2024/1689) and to put in place a policy to comply with copyright including the opt-out (Art. 53(1)(c)). This comes into force on 2 August 2025.
On its face, this is sensible. AI developers in the EU would only have an issue if they had been mining data which is the subject of an opt-out, in breach of the data mining exemption. However, this may inadvertently expose those developers outside of the EU (in jurisdictions which do not have a data mining exemption) to litigation.
By way of background, the jurisdiction of the EU AI Act is broad. In particular, it applies to: “(a) [developers] placing on the market or putting into service AI systems or placing on the market general-purpose AI models in the Union, irrespective of whether those [developers] are established or located within the Union or in a third country” (Art. 2(1)). Therefore, even if a model is trained in Australia, Canada, Japan, Singapore, South Africa, UK or USA, if it is placed on the market in the EU, it is likely to fall within the scope of the EU AI Act.
Where a commercial data mining exemption (or equivalent) exists (such as Japan, Singapore or the USA) this may not be an issue. Subject to any contractual rights, publication of a summary of the data used for training the model may not result in liability in those jurisdictions. Developers who have mined data and/or trained models in those jurisdictions may be able to rely on the national data mining exemption (or equivalent).
However, where a commercial data mining exemption (or equivalent) does not exist (such as Australia, Canada, South Africa or the United Kingdom), publication in the EU of the summary of the data used for training the model may result in the developer self-publishing: a) those rights it has been infringing; and b) the scale of that infringement, in those jurisdictions. This could result in a wave of litigation.
For example, let us take a national newspaper in the UK and assume that it has published its articles (which attract copyright) on its website for free (namely, no paywall). With the average newspaper producing over 100,000 articles a year, this is a rich repository of high-quality data.
Let us further assume that an AI developer has mined this data in the UK for the last six years to train a model for generative AI, which falls within the scope of a general-purpose AI, and which has subsequently been put on the market in the UK and EU. The copying of these articles is likely to be an act of copyright infringement. To date, the developer has avoided enforcement proceedings because the national newspaper has no evidence of the developer’s use. However, because the developer’s generative AI has been placed on the market in the EU, come 2 August 2025, it has to disclose its copying.
If infringement is established, the consequences could be severe. In addition to delivery-up/destruction, costs and adverse publicity, an injunction may be granted which covers the generative AI. Further, the developer may also have to pay damages or an account of profits. Assuming a reasonable royalty rate of, say, £1 per article (it may be significantly more) and assuming 1 million articles copied, our national newspaper may be owned £1 million.
The above example of course adopts a simplified situation and assumes that the data mining, training and use are all in the UK. In practice, this is unlikely, with most data being mined and models being trained outside of the UK.
Here, the judgment in Getty Images v. Stability AI (due shortly) will be instructive. This is a claim by Getty Images against Stability AI, in which it is alleged that Stability AI used Getty Images’ images to train its model for Stable Diffusion (an image generator).
Initially, Getty Images alleged that the primary infringement (namely copying) took place in the UK, asserting that the data mining and model training took place in the jurisdiction. At trial, Getty Images struggled to establish this and in closing submissions, it withdrew the allegation. It explained it had taken "the pragmatic decision to pursue only the claims for … secondary infringement of copyright" explaining that while "evidence confirms that the acts complained of within this claim did occur", there was "no Stability witness ... able to provide clear evidence as to the development and training process from start to finish – only evidence that these acts occurred outside the jurisdiction".
As to secondary infringement, Getty Images relies on the following:
Liability for secondary infringement is unclear, will vary depending on the jurisdiction and is fact specific. For example, some of the issues the court will need to grapple in Getty Images are:
However, if Getty Images is right, it follows that any AI developer that implements its AI in the UK may be liable for the mining of and training with unauthorised works in other jurisdictions.
Whether this comes to pass, of course, is likely to turn on the detail to be provided by the AI developer when summarising the data used for training the model.
Recital 107 of the EU AI Act says that the information should be “generally comprehensive in its scope instead of technically detailed to facilitate parties with legitimate interests, including copyright holders, to exercise and enforce their rights under Union law”. It elaborates that it should list “the main data collections or sets that went into training the model, such as large private or public databases or data archives, and [provide] a narrative explanation about other data sources used.”
The EU AI Office is supposed to provide a standard template to be used. However, with less than one month to go it has not done so.
Absent a template and with the obligation ambiguous, it may be that AI developers opt for very vague declarations in the knowledge that the EU AI Office and national supervisory authorities are unlikely to enforce the issue and, if they do, may not impose (significant) fines (although, in principle, these can be up to the higher of €10 million or 2% of the AI developer’s worldwide annual turnover).
Without the information, copyright and database right owners’ attempts to evidence infringement may remain frustrated, at least in the short term.
What practical steps should AI developers be taking? AI developers should carefully consider whether their AI falls within the scope of the definition of General-Purpose AI under the EU AI Act. If not, the obligations and risks to it are likely to be less. Where an AI developer’s AI does fall within the definition it should consider the following:
For copyright and database right owners, they should remain vigilant, monitoring the data summaries made by the developers of general-purpose AI following 2 August 2025. If the copyright and database right owners consider them to be inadequate, they should be raising this with the EU AI Office or national regulators. Equally, if the data summaries disclose the unauthorised use of their copyright works, they should consider action.
1Although a recent European Parliamentary study recommends replacing the data mining exemption with a compulsory licensing scheme or similar.
2Namely, an AI model that displays significant generality and is capable to competently performing a wide range of distinct tasks (art. 3).
Source: https://www.shoosmiths.com/insights/articles/ais-copyright-tango-dancing-on-the-edge-of-litigation