The race to find a better way to label AI
An internet protocol called C2PA uses cryptography to label images, video, and audio
This article is from The Technocrat, MIT Technology Review's weekly tech policy newsletter about power, politics, and Silicon Valley. To receive it in your inbox every Friday, sign up here.
I recently wrote a short story about a project backed by some major tech and media companies trying to help identify content made or altered by AI.
With the boom of AI-generated text, images, and videos, both lawmakers and average internet users have been calling for more transparency. Though it might seem like a very reasonable ask to simply add a label (which it is), it is not actually an easy one, and the existing solutions, like AI-powered detection and watermarking, have some serious pitfalls.
As my colleague Melissa Heikkilä has written, most of the current technical solutions “don’t stand a chance against the latest generation of AI language models.” Nevertheless, the race to label and detect AI-generated content is on.
That’s where this protocol comes in. Started in 2021, C2PA (named for the group that created it, the Coalition for Content Provenance and Authenticity) is a set of new technical standards and freely available code that securely labels content with information clarifying where it came from.
This means that an image, for example, is marked with information by the device it originated from (like a phone camera), by any editing tools (such as Photoshop), and ultimately by the social media platform that it gets uploaded to. Over time, this information creates a sort of history, all of which is logged.
The tech itself—and the ways in which C2PA is more secure than other AI-labeling alternatives—is pretty cool, though a bit complicated. I get more into it in my piece, but it’s perhaps easiest to think about it like a nutrition label (which is the preferred analogy of most people I spoke with). You can see an example of a deepfake video here with the label created by Truepic, a founding C2PA member, with Revel AI.
“The idea of provenance is marking the content in an interoperable and tamper-evident way so it can travel through the internet with that transparency, with that nutrition label,” says Mounir Ibrahim, the vice president of public affairs at Truepic.
When it first launched, C2PA was backed by a handful of prominent companies, including Adobe and Microsoft, but over the past six months, its membership has increased 56%. Just this week, the major media platform Shutterstock announced that it would use C2PA to label all of its AI-generated media.
It’s based on an opt-in approach, so groups that want to verify and disclose where content came from, like a newspaper or an advertiser, will choose to add the credentials to a piece of media.
One of the project’s leads, Andy Parsons, who works for Adobe, attributes the new interest in and urgency around C2PA to the proliferation of generative AI and the expectation of legislation, both in the US and the EU, that will mandate new levels of transparency.
The vision is grand—people involved admitted to me that real success here depends on widespread, if not universal, adoption. They said they hope all major content companies adopt the standard.
For that, Ibrahim says, usability is key: “You wanna make sure no matter where it goes on the internet, it’ll be read and ingested in the same way, much like SSL encryption. That’s how you scale a more transparent ecosystem online.”
This could be a critical development as we enter the US election season, when all eyes will be watching for AI-generated misinformation. Researchers on the project say they are racing to release new functionality and court more social media platforms before the expected onslaught.
Currently, C2PA works primarily on images and video, though members say that they are working on ways to handle text-based content. I get into some of the other shortcomings of the protocol in the piece, but what’s really important to understand is that even when the use of AI is disclosed, it might not stem the harm of machine-generated misinformation. Social media platforms will still need to decide whether to keep that information on their sites, and users will have to decide for themselves whether to trust and share the content.
It’s a bit reminiscent of initiatives by tech platforms over the past several years to label misinformation. Facebook labeled over 180 million posts as misinformation ahead of the 2020 election, and clearly there were still considerable issues. And though C2PA does not intend to assign indicators of accuracy to the posts, it’s clear that just providing more information about content can’t necessarily save us from ourselves.
Researchers are still trying to sort out just how social media platforms, and their algorithms, affect our political beliefs and civic discourse. This week, four new studies about the impact of Facebook and Instagram on users’ politics during the 2020 election showed that the effects are quite complicated. The studies, published by University of Texas, New York University, Princeton, and other institutions, found that while the news people read on the platforms showed a high degree of segregation by political views, removing reshared content from feeds on Facebook did not change political beliefs.
The size of the studies is making them sort of a big deal in the academic world this week, but the research is getting some scrutiny for its close collaboration with Meta.
More than 140 brands are advertising on low-quality content farm sites—and the problem is growing fast.
Can a massive infusion of money for making computer chips transform the economy of Syracuse and show us how to rebuild the nation’s industrial base?
The AI Act vote passed with an overwhelming majority, but the final version is likely to look a bit different
A new era of AI-powered domestic politics may be coming. Watch for these milestones to know when it’s arrived.
Discover special offers, top stories, upcoming events, and more.
Thank you for submitting your email!
It looks like something went wrong.
We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at [email protected] with a list of newsletters you’d like to receive.
What I am reading this weekWhat I learned this week