February 9, 2023




Tags: ChatGPT, Copyright


Categories: Technology

ChatGPT: The Web Will Change!

First there was the web. Researchers came up with the ideas, and tools to create, and publish content, cross-linked with references, etc. This was very natural to their way of thinking. When the amount of information grew, and it was not possible for people to find their way around the web on their own, web directories, and search engines came up to help people search through the web. These search engines, Google in particular, were able to understand the link structure, and help people find the information that they need with a simple query.

Search Engines & Content Creators

Google and other search engines in effect had a compact with content creators for many years: in exchange for providing valuable information on the web, search engines would send interested users to those sites. This has been a mutually beneficial relationship, as it allowed search engines to improve the user experience by providing relevant results and allowed content creators to monetize their work by receiving traffic from search engines. In this way, search engines served as a “switchboard” rather than a destination.

However, in order to improve the user experience, search engines began to do a few things that changed the dynamics of the web. First, they began to show snippets of information in search results, so that users could decide if the destination was relevant before clicking through. This allowed users to quickly scan through multiple results, and saved time and effort. Second, search engines began to use web content as a byproduct to improve language dictionaries and search completions. They also used trusted sources to augment search results with facts, which started to answer some questions directly. Third, Google created Google Trends to summarize what users were interested in, giving a zeitgeist of the internet. Search engines became the entry point to the web, easily transporting people to their destination on the web!

This value that search engines brought to the ecosystem was multifaceted: they created a way to discover new websites, allowed for monetization through relevant ads, and became the entry point and navigator through the web. This also gave search engines a unique vantage point from which they could observe and understand user behavior, which in turn allowed for the improvement of search results over time. With just a few words in the query, users could get great results in the top 10 results. Search engines could also personalize search results based on who is searching, from where they are searching, and when they are searching. This is what makes search engines so powerful – they were able to influence the nature of the web well beyond what the original creators had imagined it to be.

However, most people do not naturally create structured information with links etc. They communicate in plain language and share links. Even formally produced structured content like News rarely has links to reference materials. This led to the creation of multiple social media sites – with the relationships between people, and their interests as the key structuring element. 

The rise of social media and entertainment sites have challenged the dominance of search engines, by becoming the starting point for many people. Social media platforms like Facebook, Instagram and Twitter have made it easy for users to consume content from their friends and family, and this has led to a shift in how users discover new content. Social media platforms have also made it easy for users to share content, which has led to the creation of viral content that can spread quickly across the internet. 

This has resulted in increased competition for attention and has led to a decline in the importance of search engines as a primary source for discovering new content. While this threatened search engines as being the entry point to the web, the 2 of them could live together, sometimes uneasily, and sometimes well. However, this did not fundamentally change the worldview for content creators. They had to build some expertise to navigate social networks, but they could continue to be rewarded for creating good content.

ChatGPT’s New Paradigm 

Now, with the advent of large language models like ChatGPT, there is a new challenge to this compact. ChatGPT is effectively a destination site that answers users’ questions without providing a link to the source material. This breaks the compact that content creators had with the web and can be seen as exploitative and extractive of value for content creators. Language models like ChatGPT seek to take the same vantage point that search engines had, but they do not provide any value to content creators. This can be seen as an extreme case of the tragedy of the commons, as LLMs extract value from the web without contributing anything in return.

There will be legal challenges as a result of this change. Copyright law could change, technology could change – for instance, websites could refuse to be indexed by chatbots, or AI learning systems. Copyright is an evolving legal scenario which continues to find a way to benefit society, reward creators, in the face of technological changes – starting with the printing press! However, it’s important to remember that copyright laws are not static and are constantly evolving to adapt to new technologies and to benefit society, reward creators.

There are technological challenges as well – LLMs have an encoding of the knowledge from the web, but this is very hard to personalize. Search engines, and Social media giants excel at this personalization, and will use their lead in a way to fight off the challengers – maybe by assimilating the AI models. How they will do this will have long range implications for the structure of the web, and content creators will be affected.

However, there are other complexities to consider as well. For instance, LLMs depend on the content out there, and have an implicit assumption that this was created by people. However, the first serious use of LLMs is to generate content, and there is a tsunami of tools coming up. This will make the web content more uniform (in tone, etc.). But, it will also lead to a self-reinforcing issue – AI will learn from AI generated content – which will reinforce errors as well. Sam Altman has said that one of the weaknesses of LLMs is that they will provide wrong answers confidently. Humans will have to judge! But the content will appear very confident and could fool humans easily. This is similar to the Dunning-Kruger effect – incorrect / misleading content will appear more confident than good content. If there is a ton of generated content, newer LLMs will not be able to trust that content anymore.

This is not without precedence. When Google translation tools, which were trained on web content became good enough, a lot of the translated websites used it automatically without humans in the middle. Once this happened, the web was effectively polluted, and Google stopped using this content to improve translation!

Will this happen with LLMs? It has already started!

Interestingly, the developer contributed content website – stack exchange has been flooded with ChatGPT based answers, and the moderators have asked developers to stop putting up responses from ChatGPT on the site! Google has declared a code Red on how to deal with generated content. Universities are opting for in-class assignments, handwritten papers, group work and oral exams, and asking students to write about their own lives and current events, so that they cannot use chatbots.

Is this the end of the web? Will the removal of incentives change the nature of web content creation? For content creators, they will have to put up better content than AI generated content so that they can differentiate themselves, and yet prevent that AI from learning from you while trying to deal with a diminished revenue stream!

No matter what the final outcome is, the web will change!