Small newspapers take on Big Tech in AI battle. A loss could devastate local news

As local newspapers shutter left and right, hundreds of small-town, family-owned publishers are going to battle against Big Tech to prevent their business models from being eaten alive by chatbots.

In a federal lawsuit filed Wednesday, some 400 local newspapers from across the country accused OpenAI and Microsoft of systematically copying articles from their websites to train leading artificial intelligence chatbots ChatGPT and Copilot. The chatbots then serve the information from those stories to their users without permission or compensation to the original publishers.

The lawsuit seeks to ensure that struggling local newspapers “have meaningful protections in the AI era,” attorney Matthew Platkin, who represents the plaintiffs, said in a statement. Platkin, a former New Jersey attorney general, said his firm, Platkin LLP, convened “the largest coalition of local newspaper publishers ever assembled” to take on the fight. They include the Arkansas Democrat-Gazette, the New York Amsterdam News and the Santa Fe New Mexican, among dozens of other smaller dailies and weeklies.

They’re far from the first news outlets to sue AI chatbot creators over the use of their proprietary content. The New York Times, for example, was the first major publisher to sue Microsoft and OpenAI on allegations the tech firms pilfer its intellectual property. But the latest lawsuit provides insight into the impact the growth of AI chatbots have had on small, local news outlets. Some of the newspapers have been in print since the 1800s.

Microsoft and OpenAI didn’t respond to requests for comment.

Chatbots are effective at providing news to people because their underlying models are built on the backs of journalists and news companies, said Diane Kennedy, the president of the New York News Publishers Association, which isn’t a plaintiff in the lawsuit. In fact, the chatbots give extra weight to credible sources when providing responses to their users, Kennedy told Straight Arrow.

Such a reality, she said, represents “an extinction-level event for newspapers.”

“We’re the credible sources,” Kennedy said. “So if they kill us, they don’t have any credible sources anymore.”

Why are local newspapers suing Big Tech?

The federal copyright lawsuit, filed in U.S. District Court for the Southern District of New York, alleges that ChatGPT and Copilot have generated “hundreds of billions of dollars (and counting) in market value” for OpenAI and Microsoft — but “not a cent of it has gone to the Publishers whose work made it possible.”

The complaint alleges the tech companies “systematically and secretly crawled” the news websites, copied content behind paywalls and stripped it of bylines and other attributions.

Chatbots have “memorized” news outlets’ content, the suit alleges, allowing the tools to reproduce the stories “verbatim or near-verbatim” in response to users’ prompts. The publishers claim the tech companies deliberately stripped articles of their “copyright management information,” including bylines, publication names and terms of use. An analysis by a third-party technologist, the suit says, found one ChatGPT model was trained on the contents of hundreds of thousands of the news outlets’ articles.

Tech companies have maintained they’re within their legal right to collect facts from across the web — and have no plans of sharing their revenue with the publishers whose work powers their tools. In 2024, OpenAI acknowledged in testimony to British lawmakers that “it would be impossible to train today’s leading AI models without using copyrighted materials,” which includes “virtually every sort of human expression — including blog posts, photographs, forum posts, scraps of software code and government documents.”

People are increasingly consuming news through chatbots which, research indicates, can spread misinformation and perpetuate political biases. About 10% of people use chatbots weekly for news, according to a global survey published last week by the Reuters Institute for the Study of Journalism. The results represent a 3-percentage-point increase from last year. Just 1% of respondents said AI chatbots are their main source of news.

Google’s AI Summaries, which are powered by Gemini, serve inaccurate information in about 10% of searches, while others perform even worse. Recent research by media literacy company NewsGuard found the 10 leading chatbots “spread false claims when prompted with questions about controversial news topics 35 percent of the time.”

NewsGuard launched its own AI chatbot this week. And although the company promises direct payments to news publishers, outlets that don’t want their content cited by the chatbot must opt out.

Meanwhile, the local newspapers whose content is used to power these chatbots are increasingly becoming a thing of the past. Since 2005, nearly 3,500 U.S. newspapers have closed shop, a trend that’s especially pronounced in the suburbs of large cities, according to a recent report by the Medill Local News Initiative at Northwestern University.

Fewer newspapers equals fewer journalists covering events in their communities, according to research by public relations software company Muck Rack. Since 2002, the number of journalists employed by local news outlets in the U.S. has tanked by 81%.

Chatbots’ effects on local newspapers could be harmful to people who rely on them for community news, according to the lawsuit.

“The Publishers occupy a unique and essential role in American civic life,” according to the complaint. “Unlike national outlets, they cover school board meetings, municipal elections, community events and other local issues that directly shape people’s daily lives.”

Can lawmakers help?

One step in defending publishers from tech companies is a first-in-the-nation bill in New York bill that takes aim at “stealth crawlers” deployed anonymously to scrape content from news websites, said Kennedy of the New York News Publishers Association.

If signed into law, the rules would require tech companies to disclose their identities while accessing the news sites through web crawlers. State lawmakers approved the legislation this month and it awaits a signature from Gov. Kathy Hochul, a Democrat.

News websites are “overwhelmed by millions of bot hits every day” that scrape their content and overload their servers, according to the News/Media Alliance.

In fact, traffic from web crawlers now exceeds that from humans, according to a recent report by the cybersecurity firm Imperva. Researchers concluded that “bad bots,” including those engaged in unscrupulous web scraping and fraud, account for more than a third of web traffic.

While leading tech companies are transparent about their use of webcrawlers, “some unscrupulous companies mask their identity” and countries of origin to “steal data or conduct cyberattacks without accountability,” according to the News/Media Alliance, a trade group.

Kennedy told Straight Arrow the legislation would allow transparency into a “cottage industry of anonymous bots” that circumvent paywalls to scrape news content “despite news organizations having code on their websites prohibiting scraping.”

But more needs to be done, Kennedy said, to combat mainstream chatbots like ChatGPT and Gemini that scrape and repackage news articles as their own, often without payment or attribution.

“You don’t have the right to break into someone’s news website and make copies of their content,” she said. “Just like you couldn’t break into a bookstore, take the books, make copies of them, bring them back and then open up your own bookstore next door.”

Round out your reading

Not red or blue: America’s politically homeless middle.
Peter Thiel’s ‘Dialog’ network was super-secret. A data leak changed that.
The novel legal strategy that Taylor Swift and Matthew McConaughey are using to fight AI.
Illinois balances budget with new $200 million social media tax that tracks in-state users.
When Trump serves up ‘Just the News,’ it comes with a side of bias.

Ella Rae Greene, Editor In Chief

Ella Greene

Ella and the staff at Clear Media Project (CMP) curate these articles.
Unless otherwise noted CMP does not write these articles.
The views, thoughts, and opinions expressed in the articles published on this blog belong solely to the original authors and do not necessarily reflect the views of the blog owner. The blog owner does not claim ownership of the content shared by contributors and is not responsible for any inaccuracies, errors, or omissions.

All rights and credits goes to its rightful owners. No Copyright Infringement is intended. If you believe any content infringes on your rights, please contact us for review and potential removal.

See author's posts