# The New Information Economy: Fair Training and Copyright in the EU 
**Date de l'événement :** 02/09/2025
* Publié le 02/09/2025

### Date
07/04/2025

### Auteur
**[Marcus Woodcock](https://conference.sciencespo.fr/structure/marcus-woodcock_HIPOwolPXQ3Sz3fMWjzi)** 


## Chapô
AI training in the EU operates within a legal framework that theoretically allows rightsholders to control their content and monitor its usage for AI training, notably through the Digital Single Market Directive and the AI Act. Though these set guidelines for data usage and transparency, enforcement challenges persist due to regulatory loopholes and evolving technological standards. This article explores key regulatory gaps and proposes policy solutions for fair and effective AI governance.

## Corps du texte
The European Copyright Model
----------------------------

It is important to emphasise that there is no statutory right to remuneration for rightsholders provided for under EU law in the context of AI training. Instead, the EU regulatory mechanism establishes copyright exceptions focusing on practices of Text Data Mining (or TDM) in addition to transparency obligations for AI model providers.

Firstly, the framework provides control over rightsholders’ content through the Digital Single Market Directive (CDSM – Directive 2019/790). The Digital Single Market Directive allows content creators to maintain control over their work through two key mechanisms outlined in Article 3 and Article 4. Article 3 allows specific institutions, such as research organizations and cultural heritage institutions, to mine data for research purposes without requiring explicit permission from content owners. Article 4 extends this possibility for commercial usage by allowing text and data mining, provided that the rightsholder has not opted out in a clear and machine-readable format, or via ‘appropriate means’. If no opt-out is expressed, content can be used for mining.

Secondly, the AI Act complements the CDSM Directive by introducing additional transparency requirements. Specifically, Article 53(1) of the AI Act mandates that providers of general-purpose AI models must document the data used to train their models and make this information available to AI system providers who intend to integrate these models. By requiring these disclosures, the AI Act ensures transparency in how AI systems are trained and what data is used in the process. This regulatory framework helps stakeholders, such as content creators, businesses and consumers, understand how AI models are developed and the data on which they are based.

The issue with the EU Model
---------------------------

Multiple issues hinder the effectiveness of these dual mechanisms, which are supposed to complement one another for the information economy to work efficiently.

### a) Lack of common standards

In the absence of clear guidelines, defining the ‘state of the art’ regarding the implementation of opt-out provisions gives rise to certain ambiguities. One liberal interpretation of the opt-out option involves general opt-outs expressed by rightsholder associations, but these have limited practical effect since they aren't directly accessible to scraping services. Another approach uses natural language in websites’ Terms and Conditions (T&Cs), but the complexity and variety of T&Cs across websites make this method challenging for AI companies to implement effectively. The robots.txt file, or robot exclusion protocol (REP), is another tool that allows website owners to control access to their content by specifying which bots can access certain parts of their site. However, the REP is limited in that it cannot express complex policies such as object oriented or user-oriented authorisations - beyond ‘allow/disallow’. Other protocols allowing for such options, however, fail to be adopted in a widespread manner.

### b) Lack of enforceability

Additionally, the transparency obligations laid out by the AI Act, which are set to be established by the EU AI Office, are of limited efficacy. The template which has been drawn up by the AI Office is set to require the disclosure of a list of the top 10% of all internet domain names per type of data modality (modality being understood as the type of data, i.e. text, image etc…). Therefore, the AI Office template summary, at best, allows rightsholders to assess the likelihood that their works were used to train a specific AI model.

Moreover, there is presently an increasing decoupling of what is referred to as reasoning and information gathering for AI models. Increasingly, content is extracted through real-time web search by AI agents (such as Perplexity AI) which can perform searches to provide the most up- to-date answers. Agentic AI systems are characterized by the ability to autonomously take actions which contribute towards achieving goals over an extended period. They are predicted to play an increasingly significant role in information extraction, processing and usage. There is currently no framework for monitoring their actions, as they do not technically enter the field of training and are therefore not included within the remit of the transparency obligations under the AI act. As a result of the inefficacy of opt-outs, technical protection measures (TPMs) - such as IP blocking and encryption - are used to protect content from scraping. This is complemented by the rise in use of data poisoning - methods which consist in formatting data so as to corrupt the AI system upon training. Both these methods are indiscriminate and prevent actors benefiting from Article 3 exceptions to conduct data-scraping. Data poisoning in particular has an uncertain status under law and exacerbates uncertainty in the context of AI training.

### c) Complex interactions with other legal provisions

Additionally, though European law generally considers that data cannot be protected under copyright under the form of mere facts, the sui generis database right, provided in the Directive on the legal protection of databases, protects certain online databases. The criteria upon which the copyrightability of a database is posited is the level of ‘investment made into the database’, both financially and in terms of effort. However, it is not immediately evident to the trainer which databases are protected by sui generis rights, and which are not.  This threshold is far from transparent to the outside observer, and could hinder training practices, by exposing trainers who comply with opt-outs to additional legal uncertainty.

### d) Unequal bargaining power

There is currently no clear bargaining mechanism or framework for rightsholders to negotiate remuneration for their content. This is due to the multiplicity of rights holders which renders them uncoordinated and limits their bargaining power. Moreover, there is no transparent pricing mechanism for training content, which allows AI actors to unilaterally set the terms of agreements. 

### Policy recommendations

### 1\. Ensure more granular conditions for opting out:

Study other opt-out protocols, which allow more granular policy choices than allow/disallow, including automated remuneration and object and actor-oriented access. This is particularly important in the growing context of agentic search.

### 2\. Clarify additional regulation:

Clarify the application of sui generis databases regarding AI crawling: sui generis copyright for databases, especially in the context of AI agents, may prove to be unmanageable in the context of AI agents and automated content extraction. Opt-out clauses could prove to be sufficient and provide legal certainty for the AI provider. Clarify the legal status of data poisoning and Technical Protection Measures: clarify the sanctions incurred for protecting noncopyrighted content and determine whether data poisoning is legal conduct.

### 3\. Ensure traceability of AI agents:

Ensure that AI agents are coded to be legally compliant, in particular regarding optouts, and that their actions are intelligible, traceable and auditable for right-holders, relevant authorities and users.

### 4\. Establish a negotiation framework based on competition authorities:

Establish a framework akin to the neighbouring rights negotiation mechanism established in the CDSM directive, which redresses inherent power imbalances between major AI actors and right holders, ensuring that AI actors negotiate in good faith, based on transparent, objective and non-discriminatory criteria, and ensure pricing transparency so that right holders may assess their remuneration for related rights.

> _This article was initially published in [a special issue on Artificial Intelligence](https://drive.google.com/file/d/1FCBUbbMOXZatYq8BU3MH5P8YgV3SwQyB/view) from the Sciences Po Student Works and Papers Collection. It draws from multiple master’s programmes and builds on the Sciences Po Student Conference, “Can AI benefit democracy?”, held on 21 February 2025 with students from the Sciences Po School of Public Affairs, Law School, School of International Affairs, and School of Management and Impact. Despite different disciplinary backgrounds, student perspectives converged around three core themes — regulation, inequality, and citizenship & trust — echoing topics from the Intergovernmental AI Action Summit held earlier that month in Paris._

### Thématique
`#Europe` `#Numérique` 

**Langue :** `#Anglais` 


---
### Navigation pour IA
- [Index de tous les contenus](https://conference.sciencespo.fr/llms.txt)
- [Plan du site (Sitemap)](https://conference.sciencespo.fr/sitemap.xml)
- [Retour à l'accueil](https://conference.sciencespo.fr/)