It has recently come to light that tech powerhouse Apple struck a deal with stock media company Shutterstock over a year ago to use the latter’s millions of images and data to train AI generative models.
The millionaire agreement, a part of a growing trend, demonstrates that big tech firms are increasingly prioritizing the provenance of their AI training datasets. This shift is driven by the need to comply with industry pressure and the evolving legal regulations for AI tools, a crucial development for the tech industry.
According to Reuters, Apple secured a deal that gives it access to Shutterstock’s massive image and video libraries to build AI training datasets at the end of 2022, not long after the release of OpenAI’s ChatGPT. But such an agreement had not been disclosed until now.
The licensing contract is said to have had an initial cost of between $25M and $50M but has possibly expanded since.
While Apple is one of the firms that has been conservative about AI and not jumping the hype wave right away, it is expected to include major AI-powered features in the upcoming iOS 18, which is set to be announced at WWDC in June this year.
This new information shows this is yet another high-profile firm that is both jumping on the AI bandwagon and taking the legal status of its training datasets very seriously.
The startups and companies that pioneered generative AI models and applications, such as OpenAI, Stability.ai, or Midjourney, built their training datasets with free content scraped from the web. Several firms and experts have defended this practice, affirming that it’s financially unviable to develop generative AI without access to free data repositories for training.
However, as we constantly pointed out, the visual industry at large has raised against this practice, citing copyright infringement and prejudice of artists’ livelihoods when unauthorizedly scraping and using their work from the web. The still-shaping legal framework for AI technology is also moving in this direction, with the most recent example being the EU’s AI Act, which requires transparency on training datasets.
Like this, many important companies are striking agreements with large content providers—such as Shutterstock—to secure ethical and legally unproblematic access to the volumes of data they need to train their AI models. Apple is the only one of them; Meta, Google, and Amazon have similar deals with Shutterstock. Other content providers also claim to have agreements or offers of a similar nature with small and large firms.
We think it’s safe to say that content licensing for AI training is a new business opportunity for large image depositories and that more companies understand the importance of working with legally sourced training datasets every day.
THE AUTHOR
Ivanna Attie
I am an experienced author with expertise in digital communication, stock media, design, and creative tools. I have closely followed and reported on AI developments in this field since its early days. I have gained valuable industry insight through my work with leading digital media professionals since 2014.
AI Secrets is a platform for tech decision-makers to learn about AI technology. Our team includes experts such as Amos Struck (20+ yrs ICT, Stock Photo, AI), Ivanna Attie (expert in digital comms, design, stock media), and more who share their views on AI.
Get AI news in your inbox & join thousands of engineers and managers using AI to boost sales and grow market share