
This week, it was announced that the New York Times is suing Microsoft, now owner of OpenAI (the company responsible for ChatGPT), to the tune of Billions of dollars.
Although quite the Christmas stocking filler, this comes as not much of a shock. After all, it’s not like OpenAI trained ChatGPT on exclusively out-of-copyright Dickens novels and Creative Commons works.
I’ve personally felt the sting of AI plagiarism. It’s disheartening to see your hard graft copied, transformed, and paraded as new by AI tools and the people who publish the output as their own. I’ve made my feeling clear on this – It’s laundered plagiarism, a sophisticated way of copying and pasting that bypasses traditional checks and balances, not to mention it fundamentally disincentivises the open publication of unique written works.
Since the launch of Content Sheild, I’ve been approached several times by writers who have observed their works plagiarised by ChatGPT and it’s users. “Let Disney’s lawyers deal with it” has seemingly become my catchphrase. I never thought I’d find myself advocating for Disney when it came to an issue of Copyright; 2023 has truly been a wild ride.
The recently announced lawsuit by the New York Times against OpenAI and Microsoft has put a spotlight on the issue. The result of this case has the potential to be one of the most important legal precedents set in recent history, with untold and far-reaching consequences.
I am willing to be wrong and remain open to changing my mind at any time if presented with a compelling counter-argument. I am certainly no expert in law, but the following are my thoughts on the case, as currently understood on December 29th 2023.
I Feel Like I’ve Heard This Song Before
I am not a lawyer, but I am a (former) musician – Not a good one, but that’s beside the point.
Just as ChatGPT has been ‘trained’ on copyrighted content without consent, so too has the music industry grappled with the complexities of using existing works to create something new. But there’s a critical difference: the music industry has established rules for sampling and interpolation, requiring licenses and royalties to protect original creators. It’s high time we applied similar principles to text-based AI output.
While it’s tempting to treat Generative AI utilising crawlers to scoop up swathes of the internet like piracy, it’s not quite. And although the output functions as a replacement for the original work, it isn’t a direct copy.
Comparing AI to search engine indexing doesn’t quite fit either. There’s a fine line between helping navigate content and replacing it. (I have more to say about zero-click search results, but that’s for another time.)
The legal showdown may hinge on precedents from the music industry.
Whistle Stop Tour of How Recorded Music Copyright Works
In the UK, recorded music copyright is a symphony of intricacies. When a track is recorded, there are two key rights: the rights to the music (composition) and the rights to the recording itself (performance).
If you cover a song, you’re reinterpreting the composition, needing a mechanical license – you are responsible for the performance; thus, the royalties go to you, and someone else is responsible for the composition; thus, royalties go to them, too. While you need a mechanical license, you can record and publish covers of Taylor Swift songs to your hearts-content without asking Tay-Tay.
I have been paying Swift royalties since 2015 due to a published “We Are Never Getting Back Together” cover recording – I know she’d hate it.
Sampling, however, slices into both rights. It’s using a piece of the actual recording, requiring permission from the record label (performance rights) and the songwriter or publisher (composition rights).
Interpolating a song also treads on these dual paths but in a nuanced manner, demanding a balance of respecting original composition while introducing new performance elements. Each note played in this legal orchestra requires careful consideration to avoid discord.
Additionally, the public performance of a live or recorded song requires separate licensing by the venue or performer. And even more confusingly, there are different royalties for music and lyrics, but lets not get bogged down in detail.
Of course, it goes without saying piracy in the form of creating unauthorised copies of a recorded work is illegal. We all remember the greatest ad of a generation.
Are Generative AIs Sampling or Performing, and Is This a Record or a Live Performance?
If you’ve read this far, you can likely tell I believe Generative AIs should pay royalties to the material from which it is ‘trained.’ when used for commercial purposes. I am, however, conflicted by which type of royalties it should pay, who should pay (the users who republish its work or the Generative AI company itself) and the mechanism by which it should be paid.
Until recently, I thought of Generative AI as a cover version. However, two words caught my eye in the case: “verbatim excerpts” (although there are seemingly no publicly published examples that will be used in the case thus far.) If this is the case, I see this as sampling unless correctly cited. Even in works published by a human being, there are limits to how much you can pull quotes from existing works under fair use.
My Current Thoughts Are:
- Generative AI should pay performance royalties, similar to mechanical licenses, for an unauthorised cover-version.
- Publishers who use AI should pay royalties similar to sampling, requiring consent from the original publisher.
- Original publishers should have the right to withdraw content for the training of generative AI.
- Original publishers should have the right to consent to or opt out of generative crawl in a permission-over-forgiveness model.
Generative AI Paying Performance Royalties
Think mechanical licenses for a cover song, but in the AI world. It’s about fairness, really. The AI takes existing work, adds its own twist – sort of like a cover band doing their version of a classic. The original creators should get their due, shouldn’t they?
We could see a system where AI developers chip into a fund, a bit like the PRS licence. This fund compensates the brains behind the original content. It’s all about striking that delicate balance – encouraging new creations while tipping the hat to the old masters.
Royalties for Publishers Using AI
Now, let’s talk about publishers using AI. They should face the music like producers sampling a beat. If you’re using AI to jazz up your content, you’re sampling in a way. And that means paying up and getting permission. It’s only fair, right?
Tracing AI’s breadcrumbs back to the original creators is tricky but crucial. Imagine a deal between AI developers and publishers, ensuring a slice of the pie goes back to the original content creators. It’s a step towards ethical AI use in publishing. Innovation, yes, but not at the expense of the creators’ rights.
Withdrawal Rights for Original Publishers
Giving power back to the original publishers – now that’s a game-changer. They should have the right to pull their work from AI training sets. It’s about respecting their choices. Sometimes, they might not want their work to be part of this AI-driven future. And that’s okay.
For this to work, AI developers need to be crystal clear about where they’re getting their data. And they need to make it easy for creators to say “no thanks.” This isn’t just about protecting rights; it’s about encouraging a diverse and ethical approach to AI training.
Consent in Generative Crawling
Last but not least, let’s talk about consent in generative crawling. It’s high time we moved to a permission-over-forgiveness model. Instead of AI developers hoovering up content willy-nilly, they should ask first. Think of it as knocking on the door before entering.
In this world, publishers can choose to be part of the AI story or not. It’s their call. This isn’t just respectful; it’s collaborative. It fosters a world where technology and creative rights aren’t at odds but work together in harmony.
Where Does This Leave Us?
As we stand at the crossroads of technological advancement and artistic integrity, it’s clear that the path forward isn’t just black and white. The lawsuit involving the New York Times, Microsoft, and OpenAI isn’t just a legal skirmish; it’s a clarion call for redefining the rules of the game in the age of AI.
Generative AI, like ChatGPT, isn’t a pirate in the traditional sense. It’s more of a sampler, remixing and reimagining our collective digital tapestry. But, as with any art form that borrows from existing works, there’s a moral and legal obligation to acknowledge and compensate those original creators. To me, the question isn’t whether AI should pay its dues – it’s how.
As I’ve suggested, a nuanced approach, borrowing from the music industry’s playbook, might be our best bet. Generative AI could pay performance royalties, akin to cover bands paying their respects to the original artists. Publishers using AI, much like music producers who sample a beat, should also cough up royalties and secure necessary permissions.
But it’s not just about money. It’s about choice. Original publishers must have the power to opt in or out of this brave new world. They should be able to withdraw their content from AI training sets, and consent should be the golden rule in generative crawling.
The outcome of the NYT lawsuit could be a watershed moment, setting a precedent that guides us through these uncharted waters. But regardless of the verdict, one thing’s for sure – the conversation around AI, copyright, and royalties isn’t just necessary; it’s overdue.
As we move forward, let’s keep sight of the core principle that has driven artistic creation for centuries: respect for the creator’s rights. It’s time to harmonise the tunes of technology and creativity, ensuring that as AI writes the next chapter of human innovation, it does so by playing a fair, respectful melody that honours those who laid the foundation for its symphony.
Oh! and If you have questions about the practicality of such a mechanism to facilitate the royalties system – remember that they built a machine that can write like Shakespeare; they can figure it out.