Protecting Content Usage: The New York Times Implements Measures to Safeguard Against AI Model Training

By Aya Mohammed On Aug 14, 2023

A new report, published by Adweek today, Monday, indicates that The New York Times has taken precautionary measures to prevent the use of its content in training artificial intelligence models.

According to the report, the newspaper updated its terms of service on August 3rd to prohibit the use of its content, including texts, photographs, graphics, audio and visual clips, “shape and appearance,” descriptive data, and compilations, in the development of “any software, including, but not limited to, machine learning training or artificial intelligence systems.”

The updated terms also now specify that automated tools, such as web crawling programs designed to access or collect this content, cannot be used without written permission from the publisher.

The New York Times states that failure to comply with these new restrictions may result in unspecified fines or penalties.

Despite introducing these new rules, the newspaper does not seem to have made any changes to its robots.txt file, which informs search engine crawlers about accessible URL addresses, this move might be a response to Google’s recent privacy policy update, revealing that the search giant could gather public web data for training various artificial intelligence services, such as Bard.

Many large language models that popular AI services rely on, like OpenAI’s ChatGPT, are trained on extensive datasets that may contain copyrighted or otherwise protected materials taken from the web without the original creators’ permission.

However, The New York Times also entered a $100 million deal with Google in February, allowing the search giant to display the newspaper’s content across some of its platforms for the next three years.

The newspaper mentioned that both companies will collaborate on content distribution tools, subscriptions, marketing, advertising, and “experimentation,” thus, changes to The New York Times’ terms of service could potentially affect other companies like OpenAI or Microsoft.

Recently, Microsoft also introduced new restrictions in its terms and conditions, prohibiting individuals from using their AI products “to create, train, or improve (directly or indirectly) other AI services,” along with banning users from extracting data from their AI tools.

Earlier this month, several news organizations, including The Associated Press and the European Publishers Council, signed an open letter urging global legislators to introduce rules that require transparency in training dataset collections and obtain rights holders’ approval before using data for training purposes.