4/20/2023: Credit Attribution and Revenue Sharing for AI Models
original content creators should get paid
According to a detailed Wired report, Stack Overflow will start charging for training data for LLM models, a day after Reddit’s announcement on paid API access, also targeting LLMs. In the previous AI era of search engines, websites like Stack Overflow, Reddit and Twitter are glad to be crawled regularly by Google, Bing, etc because the search engines will send traffic back to their websites. But the current AI models like ChatGPT and Bard just guzzle all the data and spit out answers directly without attribution. The constructive relationship between AI and publicly accessible internet content has now broken. Something has to be done as ChatGPT or Bard can not stand on their own without the rich training data. Charging LLM model publishers is a good start but I believe there needs to be a broader revenue share mechanism to make sure everyone who contributes content to the AI models can get a fair share. Stack Overflow CEO made the following statement in the wired article:
Community platforms that fuel LLMs absolutely should be compensated for their contributions so that companies like us can reinvest back into our communities to continue to make them thrive. We're very supportive of Reddit’s approach.
Previously, Reddit CEO said he didn’t want to give a freebie to the world’s largest companies:
Crawling Reddit, generating value and not returning any of that value to our users is something we have a problem with.
As far as I know, all the decent LLM models are produced based on a large corpus of public content. I personally think all the models should be partially publicly owned. AI tax could be a very legitimate idea in this regard. Namely, X% of the revenue generated from the LLM models should be paid to a universal income pool and the funds will be distributed in a way that benefits original content creators and the general public.
In addition to LLMs, there are also controversies on AI generated images and very soon AI generated audio and videos. Multiple lawsuits are currently going on that graphic designers and photographers complain that their original content is plagiarized by AI. AI definitely generates images based on many people’s hard work and it is very unfair if the original content creators don’t get a fair share.
Credit attributions and revenue shares are important topics for the new age of AI. We need to start thinking about these issues and work on viable solutions right now. Otherwise, people will not be incentivized to create original work and the AI model publishers will unfairly extract too much value from content that is generated by all of us.