Managing IP in the AI Value Chain

Facing a new frontier in the data space, in 2017 Canada established the Pan-Canadian AI Strategy which was the world’s first national AI strategy. Please see: https://cifar.ca/wp-content/uploads/2023/11/aican-impact-2023-eng.pdf. Along the same lines, the UK’s National AI strategy includes ten common principles for generative AI use. Please see: https://www.gov.uk/government/publications/generative-ai-framework-for-hmg.

Last year’s emergence of generative-AI systems has renewed the interest in how IP applies to developing AI systems in Canada. In particular, there are a range of IP issues associated with generative AI surrounding three pillars of the typical AI value chain, namely: (a) data (text, images, structured data, etc.) and ownership, (b) language model training such as OpenAI, and (c) AI system output.

Looking ahead, increasing the adoption and use of AI holds great promise. However, the task of managing AI’s capabilities in the fast-paced generative AI field is complex and the current legal landscape supporting AI development in Canada remains uncertain with legal precedent underdeveloped. For example, there may be ethical concerns surrounding biases from data inputs, “hallucinations” from AI system outputs, verification of AI benefits, AI-related liabilities (e.g., harm events in healthcare) and actual operational adoption of AI to workstreams – all to establish trustworthy AI.

On the other hand, there is also the risk of being a laggard; that is, the opportunity cost of not moving forward with prioritizing high-value use cases for AI. For example, a health sector organization could put its own health data to use for public good (when properly anonymized for privacy protection under Ontario privacy laws) such as care planning to improve healthcare for patients or new research.

In terms of IP, data is typically an asset; however, keep in mind that there is no specific legal regime in Canada that is data-specific. Instead, there is a series of rights which apply to all data such as copyright, patents, or trade secrets in the form of possessory rights. In turn, the proper use of data in the AI value chain means that there is data copying and that each individual step has potential copyright issues to wrangle with.

While it is key to draw a distinction between AI-assisted works and AI-generated works, in the Canadian context, attracting copyright protection expressed as an idea depends primarily upon an exercise of skill and judgment as per the Supreme Court ruling in CCH Canadian Ltd. v Law Society of Upper Canada (2004). For example, when it comes to AI, data labelling and tagging for AI has an element of skill and judgment which may attract copyright protection for dataset authors. Along the same lines, the United States Copyright Office considers AI-generated material standing on its own does not, in all likelihood, satisfy the copyrightability standard if it had not been created by a human author. Please see: 2023 U.S. Copyright Office, Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence. Please see: https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence. Moreover, a recent U.S. decision in Thaler v. Perlmutter (Case No. 1:22-cv-01564, 2023), confirmed that AI content (a piece of artwork entitled A Recent Entrance to Paradise) generated without human involvement cannot be a work protected; that is, copyright protection was limited to creations of human authors but not where the author was identified as the AI-system itself. Similarly, when it comes to patents, an AI-system – standing on its own – cannot be the inventor in a patent application as found in in a 2023 UK Supreme Court decision at https://www.bailii.org/uk/cases/UKSC/2023/49.html where the Court held that AI itself cannot be an inventor as inventors must be natural persons.

When it comes to using data for AI model training, Large language models (LLMs) like ChatGPT require big data. LLM owners often seek to use copyright materials in training data but may not hold copyright in the data that they seek to use or are using. For example, the New York Times has recently sued against OpenAI alleging that it has “used the Time’s content without payment to create products that substitute for the Times and steal audiences away from it”. Please see: https://www.cbc.ca/news/business/new-york-times-openai-lawsuit-copyright-1.7069701. In the present state, “fair dealing” defenses under the Copyright Act, RSC do not provide a proper framework for commercial purposes; that is, there is no specific allowable purpose to enable “training” LLMs for AI systems. The federal government is currently contemplating modernizing the Copyright Act, RSC. and for policy reasons may consider how training LLMs could diminish others’ incentive to create data or may consider other copyright reforms such as text and data mining exceptions to copyright. The 2021 federal public consultation on AI aim to modernize Canada’s copyright framework by responding to AI developments. Please see: https://ised-isde.canada.ca/site/strategic-policy-sector/en/marketplace-framework-policy/copyright-policy/consultation-modern-copyright-framework-artificial-intelligence-and-internet-things. So, absent explicit permissions or licensing from copyright owners, there would be a direct conflict for use of data. In fact, the current best practice is to establish agreements for training data using a balanced licensing scheme or apportioning IP ownership and liabilities for the lifetime of the project. In the AI context, commercial use cases would require a balance between protecting training data and innovation. For example, is the entity training the AI model on the datasets creating a new expression of code? Are there ongoing downstream obligations from data licensor (e.g., access to data for secondary purposes such as research, or AI-system output commercialization)? Moreover, there may also be pitfalls for the unwary when works in the form of an AI-generated output which substantially resembles copyrighted content such as when using a coding assistant to generate a complex algorithm. Along the same lines, training AI models may result in unintentionally using third party IP such as proprietary code.

For more information, please contact us.

Managing IP in the AI Value Chain

SHARE THIS POST

Read Our Other Posts