25
Sun, Jan
55 New Articles

Using Media Content in AI Model Training: A Croatian Perspective on United Kingdom and German Cases

Issue 12.11
Tools
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Whether an artificial intelligence model can be trained on media content without infringing copyright is rapidly becoming one of the most pressing questions of the AI-copyright interaction. The recent UK High Court ruling in Getty Images v Stability AI has intensified this debate, drawing global attention to the legal boundaries surrounding the use of copyright-protected media content in the training of generative AI systems. Although the case initially included claims concerning the training and development of Stability AI’s image-generation model, the claimants ultimately abandoned those claims. Thus, despite the ongoing heated debate, the UK High Court did not have the opportunity to address the merits of the AI-copyright interaction issue – whether the collection and internal processing of copyrighted works during training constitute copyright infringement.

Still, the proceedings yielded significant technical findings. It was undisputed that training gen-AI models like Stable Diffusion requires downloading and storing images from online sources. Such a process is referred to as “materialization.” Stability AI acknowledged that this involved storing copies of scraped images on cloud servers and creating temporary copies in the memory of graphics processing units during training. While Getty Images did not pursue this issue further and the court made no ruling on the lawfulness of this process, the record confirms that the training phase entails large-scale reproduction of protected works, albeit outside UK jurisdiction.

The recent decision of a Munich Regional Court in GEMA v OpenAI further strengthens the view that copyright infringement can occur through the ingestion and internal reproduction of protected works during training. Although the case focused on the unauthorized reproduction of nine song lyrics, the court was asked to determine whether training datasets may embed works which are then “memorized” and reproduced by AI models – a function that may itself qualify as reproduction and making available to the public under the copyright law. This prototype case, albeit not yet final, affirms that AI systems do not merely “learn” from works in the training dataset, but may copy and store them in a way that triggers rights.

These technical and legal understandings, emerging from the UK and German cases, are entirely consistent with the 2024 Copyright & Training of Generative AI - Technological and Legal Foundations study. The authors found that training LLMs frequently involves the mass ingestion and internal storage of copyright‑protected works, and that the traditional text‑and‑data‑mining exceptions or fair‑use regimes are unlikely to provide a reliable legal defense. Their empirical and conceptual analysis reveals that gen-AI training diverges from conventional data mining by aiming to replicate expressive features of creative works, leading to “memorization” of content within the model’s architecture. In light of this, the two recent cases, by acknowledging reproduction during both input (training) and output phases, highlight how courts are responding to the same phenomenon: the model does more than analyze, it reproduces.

While the UK and German decisions have no direct legal effect in Croatia, they offer valuable technical and conceptual insights that can inform how similar issues might be approached under Croatian law. These cases illustrate with increasing clarity that training gen-AI systems often involves acts – such as the downloading, storing, and internal reproduction of protected works – that fall within the scope of exclusive copyright rights. Although Croatian courts have not yet addressed this issue, the technical realities revealed in these foreign proceedings are universally relevant. They help contextualize the legal question: should the use of copyrighted media content in AI training, particularly when it involves using at scale and without authorization, be treated as infringing under Croatian copyright law?

Croatian copyright law, rooted in the continental civil law tradition, views copyright as a natural and absolute right with the author at its core. Copyright work is seen as an expression of the author’s personality, protected as a fundamental human right under both statutory and constitutional frameworks. This strong normative foundation makes the Croatian legal system well-equipped to respond to technological disruptions such as generative AI. However, enforcement mechanisms remain a challenge. While foreign rulings may serve as persuasive guidance when comparable cases reach Croatian courts, they are not a substitute for proactive systemic measures. Given the scale and opacity of AI training practices, judicial proceedings – slow-to-react and inter partes by nature – are unlikely to offer timely or comprehensive solutions. A more effective approach may lie in involving collective management organizations and deploying technologies like blockchain to monitor usage and ensure fair remuneration for rightsholders. What the Croatian legal response will ultimately look like remains to be seen.

By Dino Gliha, Partner, MGG Law

This article was originally published in Issue 12.11 of the CEE Legal Matters Magazine. If you would like to receive a hard copy of the magazine, you can subscribe here.