AI training data sets as intellectual property?

 Protectability and protection gaps of training data and data sets

Abstrakte Darstellung von Cybersicherheit mit einem roten Vorhängeschloss-Symbol auf einem digitalen Hintergrund aus leuchtenden blauen und roten Datenleitungen und Schaltkreisen.

In the age of big data and artificial intelligence (AI), data has become a decisive competitive factor. Training data in particular, which is used to develop AI models, is now considered one of the most valuable assets in technology-driven companies. But are data sets, like patents, designs or copyrights, even eligible for protection as intellectual property?

The answer is complex: neither German nor European law recognises expressly granted "data ownership"-rights. Instead, companies must rely on a combination of different legal instruments - from database law and trade secret protection to contractual agreements. This article sheds light on the protectability of data and the gaps in protection that exist in practice - and shows which strategies companies should pursue when dealing with training data.

1. No explicit right of ownership of databases

Under current German and EU statutory law, data is not protected by a right similar to ownership. However, copyright law does recognise a special database right: if a company has invested considerable effort in a data collection, an exclusive right arises for the producer (sui generis, §§ 87a ff. UrhG), which prohibits third parties from extracting essential parts. This right creates an exclusionary position similar to traditional IP rights – however, only for structured collections and for a limited period (15 years from publication).

Not every data collection fulfils these requirements: If there is a lack of sufficient investment, for example, no protection is granted. The new EU Data Act (Regulation (EU) 2023/2854) also restricts database protection, for example for IoT-generated data (see Art. 43 Data Act). In addition, copyright restrictions for text and data mining (e.g. Sections 44b, 60d UrhG) allow the use of protected works for AI training purposes – without the consent of the rights holder. Although database producers can opt out of some uses, they face the challenge of increasing legal freedom for data analyses.

2. Trade secrets: Protection through confidentiality

Certain sensitive data that companies work with may be protected as trade secrets. The German Trade Secrets Act (GeschGehG) defines protected information as information that is not generally known or readily accessible, has economic value and has been protected by appropriate technical and organisational measures.

AI-Training data can fulfil these criteria – such as proprietary data sets for AI development that remain internal. It is crucial that the company takes active protective measures: access only for authorised persons, contractual non-disclosure agreements (NDAs) and technical safeguards. Without such measures, there are no claims based on the Trade Secrets Act.

In addition, if a third party obtains the data independently or by analysing a product, the protection of secrets does not apply either. Companies must therefore strategically consider which data they keep secret and when alternative protection methods (such as patents or disclosure to partners via a licence) make sense (see also our blog post on the topic of “Protecting trade secrets in the age of AI: legal challenges and strategic approaches“).

3. Contracts and licences: control over the use of data

Where neither intellectual property rights nor statutory trade secrets protection apply, contractual provisions come into play. Companies often secure their data sets through usage agreements or licence conditions that specify exactly what a recipient of the data may – and may not – do with it.

For example, when exchanging data with a business partner, it is possible to contractually prohibit the information received from being passed on to third parties or used for other purposes. Open licences (e.g. Creative Commons) also make it possible to share data subject to certain conditions, while proprietary contracts leave exclusive use to the licensee or severely restrict further use.

Important to know: Contracts only bind the parties who conclude them – contracts to the detriment of third parties are invalid. Hence, if data falls into the unauthorised hands of third parties, contractual clauses alone are of little or no help (e.g. in the form of recourse claims against the contractual partner) – in this case claims based on statutory law (and, if necessary, swift legal action) must be resorted to. Nevertheless, clear contractual rules are a key instrument for maintaining control over company data – especially in co-operations and data partnerships.

Conclusion: Strategically protect data records and close gaps

No single law offers comprehensive all-round protection for data. Companies are therefore well advised to take a multi-pronged approach: Important data records should be identified and, depending on their type, protected by database rights, technical and organisational confidentiality measures and cleverly designed contracts.

These protection concepts must be regularly reviewed and adapted to new technologies – especially in the field of AI, where potential uses are developing rapidly.

Meanwhile, the legal framework is changing: Instead of introducing fully-fledged “data ownership”, legislators are focussing on access rights and data pools (keyword Data Act) as well as new barrier regulations for AI applications.

For companies, this means that their own data remains valuable – but only if it is proactively protected and contractually secured. Those who take appropriate precautions at an early stage and keep an eye on developments in data law will minimise the risk of data loss and at the same time create the basis for secure cooperation with partners.