AI's copyright tango: Dancing on the edge of litigation

As AI development accelerates, copyright tensions rise. Explore how the EU AI Act and landmark cases like Getty v. Stability AI could reshape the legal landscape for AI training data and spark global litigation.

What matters

As AI models increasingly rely on copyrighted content for training, the EU AI Act and upcoming legal rulings—like Getty Images v. Stability AI—could trigger a wave of litigation. This article explores the legal, financial, and operational risks now facing AI developers.

What matters next

AI developers should prepare for increased legal scrutiny by auditing their training data sources, especially ahead of the EU AI Act’s August 2025 disclosure deadline. Copyright holders, meanwhile, should monitor disclosures and be ready to challenge unauthorised use.

Summary

The use of copyright works to train models for AI is a vexed subject. In the last six months in the UK alone we’ve seen national newspapers campaign against the technology, more than 1,000 musicians release a silent album in protest, and a government consultation which received 11,500 responses.

Over the course of the next two months there are likely to be two significant further developments which may finally tip matters into litigation between copyright owners and AI developers, at least in the UK: the implementation of aspects of the EU AI Act and judgment in Getty Images v. Stability AI.

This could be a source of revenue for copyright owners, has the potential to reshape the practices of AI developers and could have a chilling effect on AI development in certain jurisdictions. Here we explore the factors that are aligning and the practical implications.

Background

To date, many copyright and database right holders do not know whether their intellectual property has been used to train models used by developers of AI. Put simply, there is no public record of which copyright and database right works have been used.

This is a problem for rights holders in the EU. While the EU has a data mining exemption for commercial use (Art. 4 Directive 2019/790)¹, this is subject to an opt-out. Rights holders do not know if the opt-out is being respected.

The EU AI Act seeks to remedy this. It does so by imposing an obligation on developers of models for General Purpose AI² to publicly share a summary of the data used for training the model (Art. 53(1)(d) Regulation (EU) 2024/1689) and to put in place a policy to comply with copyright including the opt-out (Art. 53(1)(c)). This comes into force on 2 August 2025.

On its face, this is sensible. AI developers in the EU would only have an issue if they had been mining data which is the subject of an opt-out, in breach of the data mining exemption. However, this may inadvertently expose those developers outside of the EU (in jurisdictions which do not have a data mining exemption) to litigation.

Issue

By way of background, the jurisdiction of the EU AI Act is broad. In particular, it applies to: “(a) [developers] placing on the market or putting into service AI systems or placing on the market general-purpose AI models in the Union, irrespective of whether those [developers] are established or located within the Union or in a third country” (Art. 2(1)). Therefore, even if a model is trained in Australia, Canada, Japan, Singapore, South Africa, UK or USA, if it is placed on the market in the EU, it is likely to fall within the scope of the EU AI Act.

Where a commercial data mining exemption (or equivalent) exists (such as Japan, Singapore or the USA) this may not be an issue. Subject to any contractual rights, publication of a summary of the data used for training the model may not result in liability in those jurisdictions. Developers who have mined data and/or trained models in those jurisdictions may be able to rely on the national data mining exemption (or equivalent).

However, where a commercial data mining exemption (or equivalent) does not exist (such as Australia, Canada, South Africa or the United Kingdom), publication in the EU of the summary of the data used for training the model may result in the developer self-publishing: a) those rights it has been infringing; and b) the scale of that infringement, in those jurisdictions. This could result in a wave of litigation.

Example

For example, let us take a national newspaper in the UK and assume that it has published its articles (which attract copyright) on its website for free (namely, no paywall). With the average newspaper producing over 100,000 articles a year, this is a rich repository of high-quality data.

Let us further assume that an AI developer has mined this data in the UK for the last six years to train a model for generative AI, which falls within the scope of a general-purpose AI, and which has subsequently been put on the market in the UK and EU. The copying of these articles is likely to be an act of copyright infringement. To date, the developer has avoided enforcement proceedings because the national newspaper has no evidence of the developer’s use. However, because the developer’s generative AI has been placed on the market in the EU, come 2 August 2025, it has to disclose its copying.

If infringement is established, the consequences could be severe. In addition to delivery-up/destruction, costs and adverse publicity, an injunction may be granted which covers the generative AI. Further, the developer may also have to pay damages or an account of profits. Assuming a reasonable royalty rate of, say, £1 per article (it may be significantly more) and assuming 1 million articles copied, our national newspaper may be owned £1 million.

Jurisdiction

The above example of course adopts a simplified situation and assumes that the data mining, training and use are all in the UK. In practice, this is unlikely, with most data being mined and models being trained outside of the UK.

Here, the judgment in Getty Images v. Stability AI (due shortly) will be instructive. This is a claim by Getty Images against Stability AI, in which it is alleged that Stability AI used Getty Images’ images to train its model for Stable Diffusion (an image generator).

Initially, Getty Images alleged that the primary infringement (namely copying) took place in the UK, asserting that the data mining and model training took place in the jurisdiction. At trial, Getty Images struggled to establish this and in closing submissions, it withdrew the allegation. It explained it had taken "the pragmatic decision to pursue only the claims for … secondary infringement of copyright" explaining that while "evidence confirms that the acts complained of within this claim did occur", there was "no Stability witness ... able to provide clear evidence as to the development and training process from start to finish – only evidence that these acts occurred outside the jurisdiction".

As to secondary infringement, Getty Images relies on the following:

s. 23 Copyright, Designs & Patents Act 1988, namely that Stable Diffusion is an article that is an infringing copy of Getty Images’ copyright works which have been imported into the UK and Stability AI knows or has reason to believe that it is an infringing copy
s. 33 Copyright, Designs & Patents Act 1988, namely that Stability AI has possessed and distributed in the course of business, sold, offered, or exposed for sale, Stable Diffusion which is an article that is an infringing copy and that it knows or has reason to believe that it is an infringing copy

Liability for secondary infringement is unclear, will vary depending on the jurisdiction and is fact specific. For example, some of the issues the court will need to grapple in Getty Images are:

does the making of a model accessible via the internet constitute an “article” being imported or dealt with in the relevant jurisdiction?
is the AI model an infringing copy of one or more works that were used to train the model (or rather is it simple a collection of pre-trained model weights)?
can it be said that the developer of the AI model knew or had reason to believe that the AI model was an infringing copy?

However, if Getty Images is right, it follows that any AI developer that implements its AI in the UK may be liable for the mining of and training with unauthorised works in other jurisdictions.

Summary

Whether this comes to pass, of course, is likely to turn on the detail to be provided by the AI developer when summarising the data used for training the model.

Recital 107 of the EU AI Act says that the information should be “generally comprehensive in its scope instead of technically detailed to facilitate parties with legitimate interests, including copyright holders, to exercise and enforce their rights under Union law”. It elaborates that it should list “the main data collections or sets that went into training the model, such as large private or public databases or data archives, and [provide] a narrative explanation about other data sources used.”

The EU AI Office is supposed to provide a standard template to be used. However, with less than one month to go it has not done so.

Absent a template and with the obligation ambiguous, it may be that AI developers opt for very vague declarations in the knowledge that the EU AI Office and national supervisory authorities are unlikely to enforce the issue and, if they do, may not impose (significant) fines (although, in principle, these can be up to the higher of €10 million or 2% of the AI developer’s worldwide annual turnover).

Without the information, copyright and database right owners’ attempts to evidence infringement may remain frustrated, at least in the short term.

What practical steps should AI developers be taking? AI developers should carefully consider whether their AI falls within the scope of the definition of General-Purpose AI under the EU AI Act. If not, the obligations and risks to it are likely to be less. Where an AI developer’s AI does fall within the definition it should consider the following:

Practical steps

if possible, licensing data for the training of content (do not mine it)
only mining data where it is publicly available (namely, not behind a pay-wall)
only mining data where there is no express provision against said scraping in the terms and conditions of the website
only mining data where there are no express opt-out signals (for example, a robot.txt)
only mining data from within jurisdictions where there is an express exemption
only training data within jurisdictions where there is an express exemption
compartmentalising the mining of the data, for example instructing a third party or establishing a specific company to mine the data
minimising the length of time the mined data is retained
storing all pre-trained model weights in a jurisdiction in which there is an exemption
ensuring that the output of the model does not reproduce in whole or substantial part the original data on which it was trained.

For copyright and database right owners, they should remain vigilant, monitoring the data summaries made by the developers of general-purpose AI following 2 August 2025. If the copyright and database right owners consider them to be inadequate, they should be raising this with the EU AI Office or national regulators. Equally, if the data summaries disclose the unauthorised use of their copyright works, they should consider action.

¹Although a recent European Parliamentary study recommends replacing the data mining exemption with a compulsory licensing scheme or similar.
²Namely, an AI model that displays significant generality and is capable to competently performing a wide range of distinct tasks (art. 3).

Source: https://www.shoosmiths.com/insights/articles/ais-copyright-tango-dancing-on-the-edge-of-litigation