The Impact of AI on Information Retrieval
The Double-Edged Sword of AI in Search: Efficiency and Transparency Challenges
As the main source of traffic for journalistic websites, Google Search is set to undergo fundamental changes soon, as the company announced at an event in May 2024.
By the end of the year, search results generated by generative artificial intelligence will be available to 1 billion users worldwide. These individuals will see the answers directly on the Google page, without needing to click on the result links.
This development reflects a trend beyond Google, whose search function accounts for 57% of the $80 billion generated in the first quarter.
In February, Gartner estimated that the volume of internet searches could decrease by 25% by 2026 due to the rise of chatbots and other AI-powered tools. Whether that number is accurate is hard to say, but it is already noticeable that traditional search tools are losing ground in our daily lives. Furthermore, there is a growing perception that Google Search and other search tools are less efficient than they used to be.
The logical conclusion is that journalistic websites' audiences are likely to decline even further]than they have in recent years.
Today’s platforms are all about the pursuit of quality information.
Google's New Search
In an official document, Google explains how its search function works in simple terms:
"It's like the index at the back of a book — with an entry for every word found on every page of the web we index. When we add a webpage, we also add entries for all the words contained in it."
The analogy to a book refers to the organization of information, whose roots date back to the monasteries and universities of 13th-century Europe. In his book Index, A History of the, British professor Dennis Duncan explains:
"A history of the index is, in fact, a history of time and knowledge, and the relationship between the two (...) It is the story of our growing urgency to access information quickly, and a parallel urgency to have the contents of books as divisible, distinct units of knowledge, individually consumable. This is information science, and the index is the fundamental element of this discipline’s architecture."
Replace "contents of books" with "web pages," and that is Google Search — a tool created to save users' time in the vastness of the internet.
The innovation in using generative AI follows the same path of saving time.
However, indexing most of the web is becoming increasingly complex. A person searching for information about the floods in Rio Grande do Sul, Brazil (May 2024), on Google, for instance, might end up on online casino websites, as the fact-checking agency Aos Fatos revealed two weeks ago.
In their report, data scientist João Barbosa and journalist Ethel Rudnitzki identified that criminals were using journalistic content from Brazilian sources such as CNN Brasil, Exame, UOL, and even Aos Fatos to manipulate Google's indexing system and boost the reach of malicious websites.
Additionally, they were hosting these pages on compromised municipal and government websites, leveraging the higher ranking of .gov domains.
Beyond copyright infringement, this case illustrates how digital marketing professionals try to keep up with and manipulate the algorithm, even for illegal purposes.
In this video presenting the new AI-powered search, the company states that "Google will Google for you."
AI-generated content may feature an index of links used as references, alongside maps and integrations with other company services.
However, this content will not be immune to external influences. Just as the SEO industry attempts to manipulate the algorithm, professionals will seek new ways to influence what AI generates. This brings us to the question of information quality.
Journalism, which is likely to lose audience due to this innovation, is a human craft, limited in scale but committed to accuracy above all (although not always accurate, whether due to incompetence, bias, or intent).
Is this the most appropriate content to feed large language models (LLMs) whose goal is to inform people? Publishers are already fighting in court to receive compensation for the use of this intellectual property by companies like OpenAI and Google.
Even with potential improvements, AI-generated content will always be a copy of a copy, a summary of an index.
It may save users time, but without transparency about how it works and the sources used, it will be difficult to assess the quality of the information. This will be the key challenge going forward.
Otherwise, we will be left to trust the algorithm.
Sérgio Vieira
@sergiovds at X, Bluesky, Threads, Mastodon, Medium, Instagram
mailto: Projeto Impressões Digitais
Excellent, reminds me of Tay (Microsoft’s AI twitter experiment)
Thanks for subscribing and for the comment... and Tay is a perfect example of a bad social system focused on $$$.