Google Algorithms and the Effect on AI Agents

Remember when you had to type a precise query into Google to get a list of blue links? That world is fading fast, and the Internet is going through a seismic shift. For over two decades, Google’s search engine has been the unchallenged king of search on the internet. However, an increasing number of people are turning to AI agents like ChatGPT, as it’s much simpler to ask a question and receive a direct answer.

In fact, a recent study from OpenAI and Harvard found that 24% of all ChatGPT conversations are now for the purpose of seeking information. This is a massive jump from just 14% a year ago and is expected to only increase with time.

This major change is leading new AI search engines to face an important question: how can they find, index, and rank the world’s information with the same quality and trust that Google has built over many decades?

To understand this current dilemma, we need to get back to the beginning.

The History of Crawling: From Wanderers to Web Giants

The internet is an ocean of information, and to make sense of it, you need a map. That’s where web crawlers come in. The very first crawler, the World Wide Web Wanderer, was created in 1993 by Matthew Gray. Initially, it wasn’t intended to power a search engine, but to measure the size of the booming World Wide Web itself. These early bots, sometimes called “spiders,” would start at a known page and follow every link on it. This way, they would discover new pages in the process, a method that still remains the core of web crawling today.

This simple concept gave birth to the first generation of search engines like Archie, Lycos, and AltaVista. However, they struggled with the web’s rapid growth and the problem of relevance. Their algorithms were often simplistic, ranking pages based on keyword frequency, which was easily gamed by “Keyword stuffing.”

How Google took the centre stage in 1998

This is where Google’s story begins. Google’s true breakthrough in 1998 wasn’t just its ability to crawl the web more efficiently, but its revolutionary PageRank algorithm. PageRank introduced a new way of thinking: a page’s importance should be determined by the quality and quantity of other pages that link to it. This was a massive leap forward in creating a Search Engine Algorithm that could understand authority and trust, not just keywords. This framework of analyzing the relationships between pages, rather than just their content in isolation, laid the foundation for the modern search experience.

What’s in the AI Training Data? The Common Crawl

So, where do today’s AI agents get their foundational knowledge? The answer is a non-profit organization you’ve probably never heard of: Common Crawl. Founded in 2007, Common Crawl is a non-profit that systematically crawls the web and provides its massive, open repository of data to the public for free. Think of it as a public library of the internet, available for researchers, students, and developers to use.

This open data has become the bedrock for training many of the large language models(LLMs) that power AI agents. From models developed by OpenAI and Google to those from Meta and Hugging Face, Common Crawl is cited as one of the most important sources for pre-training data for generative AI. Its scale is unmatched in the open source world, providing the raw materials these models need to learn the patterns and structures of human language.

However, this data is a snapshot of the past. It’s static and doesn’t include the real time, dynamic information that users often seek. This is a critical limitation. An AI trained only on Common Crawl might know a lot about the history of a topic, but be completely unaware of a major event that happened yesterday or the current price of a product. This creates a significant knowledge gap that AI agents must bridge to be truly useful.

The Real-Time Race: AI Agents Turn to Live Search and Google

To overcome this limitation and provide fresh, accurate answers, AI agents have had to develop their own live-search capabilities. This is where things get interesting – and where the shadow of Google looms large. While these AI companies are building their own crawlers, the reality is that Google has spent over 25 years perfecting its index of the live web. It’s the most comprehensive and up-to-date map of the internet that exists.

This has led to a fascinating development. A new report strongly suggests that ChatGPT is, in many cases, pulling its live search results directly from Google Search data. This isn’t about copying Google’s ranking algorithm; it’s a practical business decision to leverage its unparalleled, real-time index of the web. Why spend billions of dollars and years of engineering effort to build a new map of the world from scratch when you can license access to the best one available?

A more visible example of this reliance is in local search. The ChatGPT iOS app now allows users to choose Google Maps over Apple Maps for its location-based responses. This is a clear signal that, for specific, high-quality data verticals like maps, where Google has a dominant, highly refined product, AI agents are integrating directly with the market leader rather than trying to replicate the entire ecosystem. This creates a cooperative, yet competitive, relationship between the old guard and the new.

Independence or Imitation from Google? The Algorithm Question

This raises a critical question: are these new AI search engines truly independent, or are they just a new interface on top of Google’s infrastructure?

The answer is a bit of both. The core AI models that generate the answers, their reasoning, their ability to synthesise information from multiple sources, and their conversation style, are entirely their own. This is where their unique values and differentiations lie. However, their ability to find that information in the first place is often deeply intertwined with the existing search landscape, which is overwhelmingly dominated by Google.

Google, for its part, is not standing still. For years, its search engine algorithm has been powered by AI, with systems like RankBrain and BERT, and its new AI overviews are constantly evolving to understand user intent at a deeper level. Google’s algorithm updates are a continuous, independent process focused on its own vision for search.

In 2025 alone, Google has rolled out multiple core updates and spam updates, each one refining its ability to reward high-quality, authentic content and penalize low value pages. The company is effectively engaged in a race to become the best AI agent itself, moving from a “search engine” to a “search and answer engine.”

The New SEO: From Keywords to GEO

This entire shift is rewriting the rules of online visibility. For decades, the game was Search Engine Optimization(SEO)–a practice focused on understanding and appeasing Google’s algorithm through keywords, backlinks, and technical site structure.

Now, a new discipline is emerging: Generative Engine Optimization(GEO). While SEO is about ranking in a list of links, GEO is about being the source that an AI agent chooses to cite in its synthesized answer. The goal is no longer just a click, but a citation. This requires a different approach, focusing on creating clear, authoritative, and well-structured content that an AI can easily understand and trust as a primary source.

Best practices for GEO are still evolving, but early strategies point to a few key principles:

Prioritise user intent above all else.
Implement strong EEAT(Experience, Expertise, Authoritativeness, and Trustworthiness; even though it is not a direct ranking factor) signals throughout your content.
Keep your content incredibly fresh and current, as AI agents are increasingly looking for the most up-to-date information.
Use structured data on your pages to help AI bots understand the context and entities with your content more easily.

The rise of AI agents is also creating new challenges for website owners. Some face noteworthy traffic drops as users get their answers directly from AI summaries, without ever clicking through the original source. This has even led to defensive measures, like Cloudflare (a service that powers about 20% of all web pages) announcing it will not block AI crawlers by default, forcing AI companies to negotiate for access to that data.

The Future of Search: A Hybrid Ecosystem

The idea that AI agents will completely replace Google is a myth. Instead, we are moving towards a hybrid search ecosystem. Users will fluidly move between traditional search for complex, multi-faceted research and AI agents for quick, conversational answers to simple questions.

For businesses and content creators, this means a dual strategy is essential. You must continue to master traditional SEO to maintain visibility in Google’s results, while at the same time, adapting your content for GEO to ensure you are cited by the AI agents that are becoming an increasingly important gateway to your audience. The foundation of both remains the same: create high quality, helpful, and trustworthy content. The algorithms and agents may change, but the core principles of serving the users will always stay the same.

Google Algorithms and the Effect on AI Agents: A New Era of Search