How Search Engines Use Machine Learning: 9 Things We Know For Sure via @sejournal, @_kevinrowe
Tech giants are investing heavily in machine learning.
In 2019, Microsoft invested in 11 artificial intelligence (AI) startups, with $1 billion for OpenAI alone. And they aren’t even the biggest source of corporate venture capital flooding into AI startups.
In that same year, Intel Capital made 19 investments, and Google Ventures made 16 investments.
That huge influx of capital means that AI computing power is making rapid advancements in a range of sectors from healthcare to construction to marketing and search engine optimization.
However, before we get into the implications of machine learning for SEO professionals, let’s define what we mean by AI.
There are 3 types of AI:
- Narrow or Weak AI: This type of AI is designed to perform specialized tasks that must be “taught” to the algorithm (think Google’s search algorithms). While extremely specialized in scope, narrow AI (ANI) is able to quickly recognize patterns and perform tasks in a way that outpaces human ability.
- General or Strong AI: Capable of autonomously learning and solving problems, general AI (AGI) takes machine learning to the next level. This AI is powered by deep learning processes designed to mirror the human brain’s neural networks, allowing the algorithm to make decisions without instruction.
- Artificial Superintelligence: At the moment, artificial superintelligence (ASI) still lands fully in the category of science fiction. This type of AI would, theoretically, be capable of outperforming human capabilities to solve the “unsolvable” problems of our time.
While companies like OpenAI and Conversion.ai are moving toward developing general AI for natural language processing, there are currently no clear-cut examples of AGI.
Advertisement
Continue Reading Below
To progress from ANI to AGI, deep learning will be the key to creating stronger AI capable of using deductive reasoning to analyze complex, unstructured data and make independent decisions.
Back in 2016, Google declared its intention to become a “machine learning first” company. Since then, they’ve made steady strides toward that goal, launching Google AI in 2017 and rolling out BERT in 2019.
What’s their goal in going all-in on machine learning?
Well, according to Google, they want to not only make our lives easier but also use AI to find “new ways of looking at existing problems, from rethinking healthcare to advancing scientific discovery.”
Besides those lofty goals for the future, humanity is already seeing these machine learning advancements on a smaller scale in something we interact with every day – search engine algorithms.
Google has been making steady progress in the way it connects users to the content they’re searching for, including these nine ways we know search engines are using machine learning right now.
Advertisement
Continue Reading Below
1. Pattern Detection
Search engines are using machine learning for pattern detections that help identify spam or duplicate content.
Low-quality content typically has distinct similarities, such as:
- The presence of several outbound links to unrelated pages.
- Lots of uses of stop words or synonyms.
- The occurrence rate of identified “spammy” keywords.
Machine learning recognizes these patterns and flags them. It also utilizes data from user interactions to detect when new spam structures and techniques are being used, recognize the new patterns, and successfully flag those, as well.
Even though Google still uses human quality raters, utilizing machine learning to detect these patterns drastically cuts down on the amount of manpower necessary to review the content.
This way, Google is able to automatically sift through pages to weed out low-quality content before an actual human has to get involved.
Machine learning is an ever-evolving technology, so the more pages that are analyzed, the more accurate it is (at least in theory).
2. Identification of New Signals
RankBrain is the machine learning algorithm developed by Google that not only helps identify patterns in queries, but also helps the search engine identify possible new ranking signals.
Before RankBrain, Google’s algorithm was coded entirely by hand. It depended on a team of engineers to analyze search query results, run tests to improve the quality of those results, and implement the changes.
Now, while there are still human engineers working on the algorithm, RankBrain quietly works in the background running tests and gauging how the changes affect user interactions.
RankBrain solves some of the tricky problems that Google used to face with traditional algorithms – including how to handle search terms that have never before been entered into Google.
According to Google’s Gary Illyes in a 2019 Reddit AMA:
“RankBrain is a PR-sexy machine learning ranking component that uses historical search data to predict what would a user [sic] most likely click on for a previously unseen query.”
Advertisement
Continue Reading Below
As search engines are able to teach technology how to run predictions and data on their own, there can be less manual labor and employees can move toward other things machines can’t do, like innovation or human-centered projects.
3. It’s Weighted as a Small Portion
However, even though machine learning is slowly transforming the way search engines find and rank websites, it doesn’t mean it has a major, significant impact (currently) on our SERPs.
In a 2019 Webmaster Central Office Hours discussion, Google’s John Mueller references how machine learning helps Google’s engineers better understand various issues, but he’s careful to note that:
“…machine learning isn’t just this one black box that does everything for you where you feed the internet in on one side and the other side comes out search results.”
More recently, in a May 2021 Office Hours discussion, he explained that machine learning may adjust the weight of various ranking signals. But again, there are still real people manually checking and adjusting those values.
Advertisement
Continue Reading Below
Google’s end goal is to use technology to provide users with a better experience. They don’t want to automate the entire process if that means the user won’t have the experience they are looking for.
So don’t assume machine learning will soon take over all search ranking; it is simply a small piece of the puzzle search engines have implemented to hopefully make our lives easier.
4. Custom Signals Based on Specific Query
Google’s current privacy policies discuss how the search engine currently creates personalized search results based on a user’s behavior.
Google’s personalized search patent, US20050102282A1, states that:
“…personalized search generates different search results to different users of the search engine based on their interests and past behavior.”
We can clearly see this in action. Often used in conference presentations, proving this process is as simple as typing a string of queries into Google in one sitting and seeing how the results change depending on what you last searched.
Advertisement
Continue Reading Below
For instance, if I search [New York Football stadium] in an incognito browser, I get the answer [MetLife Stadium].
Next, if I search in the same browser for just [jets], Google assumes that because my last query was about a football stadium, then this query is also about football.
As I continue my search, Google learns when my interest starts to change.
Searching for [Jaguars] in the same browser will bring up information about the NFL team the Jacksonville Jaguars (which is related to my last two searches).
But the moment I start to search [zoo near San Diego] and type [zoo] in the query box, Google suggests [zoos with jaguars] even though I haven’t searched jaguars a second time.
Search history is just one component of the search experience that machine learning uses to provide better results.
Advertisement
Continue Reading Below
5. Natural Language Processing
It’s important for a search engine to be able to recognize how similar one piece of text is to another. This applies not just to the words being used but also their deeper meaning.
Bidirectional Encoder Representations from Transformers – BERT, for short – is a natural learning processing framework that Google uses to better understand the context of a user’s search query.
People don’t always speak like a machine would expect them to. We play with language to come up with new turns of phrase.
We use the same word to describe different things. Sometimes, we’re even purposefully ambiguous.
However, as more people are using and searching new phrases online, machine learning is able to display more accurate information for those queries.
Google Trends is a great front-facing example of this. A new phrase or word that gains traction (e.g., “glow up” or “spill the tea”) may have nonsensical search results at first.
Advertisement
Continue Reading Below
BERT is designed to replicate human recognition as closely as possible to decode those contextual nuances by learning how users interact with the content and matching search queries with more relevant results.
As language develops and transforms, machines are better able to predict our meanings behind the words we say and provide us with better information.
6. Image Search to Understand Photos
Every second, approximately 1087 photos are uploaded to Instagram, and 4000 are uploaded to Facebook. That’s hundreds of millions of photos being uploaded to those two social networks alone every day.
Analyzing and cataloging that many submissions would be an arduous (if not impossible) task for a human, but it’s perfect for machine learning.
Machine learning analyzes color and shape patterns and pairs them with any existing schema data about the photograph to help the search engine understand what an image actually is.
This is how Google is able to not only catalog images for Google Image search results but also powers its reverse image search, which allows users to search using an image instead of a text query.
Advertisement
Continue Reading Below
Users can then find other instances of the photo online, as well as similar photographs that have the same subjects or color palette and information about the subjects in the photo.
In turn, the way the user interacts with these results can shape their SERPs in the future.
7. Ad Quality & Targeting Improvements
Just like its organic search results, Google wants to provide the most relevant ads for its individual users. According to Google U.S. patents US20070156887 and US9773256 on ad quality, machine learning can be used to improve an “otherwise weak statistical model.”
This means that Ad Rank can be influenced by a machine learning system.
“Bid amount, your auction-time ad quality (including expected clickthrough rate, ad relevance, and landing page experience), the Ad Rank thresholds, the context of the person’s search” gets fed into the system on a keyword-by-keyword basis, to determine what thresholds are considered by Google for each keyword.
8. Synonyms Identification
When you see search results that don’t include the keyword in the snippet, it’s likely due to Google using RankBrain to identify synonyms.
Advertisement
Continue Reading Below
When searching for [forest preservation], you’ll see various results with the word “protection” as it can be used interchangeably with “preservation” in this case.
Google even highlights the synonyms in some cases, further indicating that it’s recognizing the synonyms.
9. Query Clarification
One of my favorite subjects is search query user intent.
There are many reasons to fire up a search engine. Users may be searching to buy (transactional), research (informational), or find resources (navigational) for any given search.
Advertisement
Continue Reading Below
Furthermore, a single keyword could be useful to one or any of these intents.
By analyzing click patterns and the content type that users engage with (e.g., CTRs by content type) a search engine can leverage machine learning to determine the intent behind the user’s search.
An example can be seen with the query “best colleges” in a Google search.
The results are reviews and a list of colleges all in one SERP, with the universities listed at the top. This demonstrates Google’s understanding of the possible intents behind the search.
Advertisement
Continue Reading Below
This is changing how SEOs look at link structure and placement as Google’s algorithm uses tools like BERT to get better and better at evaluating the context of where those links are placed.
Summary
While machine learning isn’t (and probably never will be) perfect, the more humans interact with it, the more accurate and “smarter” it will get.
This could be alarming to some, creating visions of Skynet from the “Terminator” movies.
However, the actual result may be a better experience with technology that solves complex problems and allows humans to focus on driving creativity and innovation.
In 2018, Pew Research conducted a poll in which 63% of respondents said that they are hopeful for the future of humanity as it relates to AI – agreeing that by 2030, humans will be better off with the help of artificial intelligence.
One way we’re already seeing that enhancement to quality of life is with search. As Google and other search engines revolutionize machine learning, we’re able to more easily find the information and services we need, when we need it.
Advertisement
Continue Reading Below
More Machine Learning Resources:
Image Credits
All screenshots taken by author, June 2021