France’s data protection authority, Commission Nationale Informatique & Libertés (the "CNIL"), recently issued very important news, namely that artificial intelligence (AI) is compatible with the European Union’s General Data Protection Regulation (GDPR).
This is incredibly welcomed news for AI-centric companies (as well as those that are just dabbling in AI, which is most companies these days), which have been treading lightly and trying to create an AI-donut hole around the EU (and the U.K.) to avoid the GDPR (and potentially the EU AI Act, which may become law in the next few months).
The CNIL is basing its assessment on the premise that the AI models can be designed (from the outset) to align with the GDPR principles, such as purpose limitation, transparency, data minimization and storage limitation, and ensuring there is a lawful basis to process the data (depending on the circumstances of the collection). This boils down to the CNIL promoting a Privacy by Design approach for AI development.
The CNIL also shed light on some benefits for compliant models, data set ‘re-use’ and retention. Data set re-use refers to the use of publicly accessible data which is extracted by a company from third-party sources (e.g., web scraping). Such a practice can be GDPR compliant, according to the CNIL, if it: limits collection to freely accessible data; defines precise collection criteria before collection; and deletes irrelevant data right after collection. The CNIL goes on to say that data sets used for training purposes, because of their importance and the cost often associated with them, can be retained for longer periods of time (even for several months) than other data fed into the model, if the other aspects of the storage limitation principle are observed (e.g., analyzing and justifying the need for pro-longed use).
So what does this all mean for the average AI developer? In France, at least, it means that: developers should build their model(s) to meet the GDPR principles; web scraping, in certain circumstances, is a valid means of building data sets; and training data can be kept longer to help develop the model(s).
There could be additional benefits as well in that the developers that follow this guidance in the form of an advantage over competitors that cannot say they are compliant (and perhaps enhanced usability), and by spending less time worrying about, and dealing with fines and enforcement actions (in addition to the GDPR, the EU AI Act offers fines ranging from the higher of €10 million or 2% of annual revenue to the higher of €30 million or 6% of annual revenue – and while the CNIL isn’t the most active regulator, its fines typically start in the six figure range, but it has also issued several ranging from €20 to €90 million).
Developers should not rely blindly on this recent guidance from the CNIL as other EU data privacy regulators may disagree, leading to a split in GDPR enforcement. Earlier this year, for instance, we saw the Italian data protection authority (the "Garante") ban the use of ChatGPT (and other generative AI chatbots) because it did not conform to the GDPR principles. However, this Italy-wide ban was recently lifted after OpenAI made several adjustments to ChatGPT, including switching to consent- and legitimate interest-based processing, enabling data subjects to correct their data, and creating a plan to implement an awareness campaign for, and age verification system within, ChatGPT in the country.
With this in mind, and the prevalence of AI being incorporated into virtually all technology these days, it may benefit developers to take the carrot (GDPR/Privacy by Design approach) here, instead of the stick (country-wide bans and/or fines).