How to Scrape SERPs to Optimize for Search Intent


Google won’t ever explicitly inform us the specifics of the “more than 200 signals” their algorithm makes use of to rank a web page.

Other than implementing what is often referred to as “SEO best practices,” we’re left depending on a few issues:

web optimization & Context

When we see a rating place change (e.g., a competitor transferring above us within the SERPs, our website outranking a competitor, or a web page changing into seen for a brand new set of key phrases), we want to try to tie that again to a particular change or modifications.

We want to contextualize it.

This could possibly be because of:

Or it could possibly be due to a competitor launching a set of latest pages.

Whatever the rationale may be, the nearer we are able to get to pinpointing rating actions to a particular set of modifications, the extra centered we might be with our web optimization technique.

web optimization & Clues

If we’re speaking about clues that assist us perceive rating, what higher place is there to begin than the search outcomes pages?

They are, in spite of everything, the clearest window we’ve got into the varieties of pages that Google likes to rank for the queries we wish to goal.

Let’s discover how we are able to scale up the method of investigating these clues, particularly how Google interprets intent for a set of keywords.

Analyzing SERP intent, at scale, can assist you diagnose why you’re having bother gaining visibility for an vital set of key phrases and offer you perception on what varieties of pages and content material you want to create so as to rank.

While there are various methods to analyze SERP intent, notably with the toolsets out there from web optimization software program suites, I would like to concentrate on customized extractions as a place to begin.

What Are Custom Extractions?

There are loads of nice assets already on the market round customized extractions, starting from the more simple to the highly detailed, so I don’t need to waste an excessive amount of time recovering previous floor.

To summarize, customized extractions on this context are instructions we give to a crawling instrument to establish and extract info from a particular aspect on a webpage.

In this case, the webpage we wish to crawl simply occurs to be a SERP.

The concept for this course of got here from a tweet I shared round utilizing Screaming Frog to extract the associated searches that Google shows for key phrases.

This idea was then developed in an ideal article from BuiltVisibile, which walked by way of how you should utilize the identical course of to scrape results from the ‘People Also Ask’ suggestions that Google shows for sure key phrases.

Scrape SERP tweet screenshot

While these strategies are each nice strategies for content material ideation and on-page optimization, they’re barely missing when it comes to figuring out intent.

Even if in case you have entry to a instrument that may inform you what SERP features (native pack, featured snippets and many others.), are current for a key phrase, I’ve discovered this isn’t at all times dependable in figuring out what varieties of pages Google likes for the “true” natural outcomes.

For instance, we might assume the presence of a Local Pack would counsel a “Visit” intent, however the remainder of the search outcomes can usually favor informational outcomes that could possibly be extra relevant as a “Know” intent classification.

So, what provides us the most effective perception into how Google is decoding key phrase intent?

In my opinion, it’s contained inside the web page titles and meta descriptions that Google shows.

Scraping Page Titles & Meta Descriptions from Google

Let’s run by way of the method of scraping some knowledge from search engine outcomes pages.

The very first thing you want to do is pull collectively an inventory of SERP URLs that you really want to crawl. These are the URLs that Google would show for the question you enter.

Compiling these is simple. All you want to do is an easy Excel method that follows this format (A3 being the cell containing your key phrase):

="https://www.google.co.uk/search?q="&SUBSTITUTE(A3," ","+")

Or alternatively, you can also make a duplicate of this Google Sheet with the method already arrange for you:

https://docs.google.com/spreadsheets/d/1_E_Xb8eR7ke1jFbedA4iKyNfKuzGdDn10qAZQxd55ZU/edit?usp=sharing

You may customise these SERP URLs as a lot or as little as you need by appending easy search parameters to your URL.

For this train, you typically need to dabble with the unique outcomes as little as potential. But listed below are a few of the extra vital modifications you can also make.

If you need to scrape greater than 10 outcomes, append this to your SERP URL:

&num=20

Change the “20” to nonetheless many outcomes you need to crawl.

This doesn’t want to be a quantity divisible by 10.

You might change it to three for those who solely needed to have a look at the highest three outcomes for a question, for instance.

Or, let’s say you’re engaged on a world website with a presence in a number of markets. In this case, you may want to change the nation of origin for your search.

This is finished by way of this parameter:

&cr=countryXX

Change the “XX” to the nation code that you really want to search for.

You can discover a full checklist of nation codes here.

If you need to enhance the specificity of your localized search, you possibly can even specify a language for your search.

To do that, use this parameter:

&lr=lang_XX

Again, change the “XX” to the language code that’s related to your analysis.

You can discover a checklist of Google supported language codes here.

So you might be as particular (inside cause), or as broad as you need to be.

Let’s say considered one of your key phrases was “office space to rent” and also you need to get the highest three search outcomes based mostly in France with a most well-liked language of French. Your crawlable SERP URL would seem like this:

https://www.google.co.uk/search?q=office+space+to+rent&num=3&cr=countryFR&lr=lang_fr

Or for those who solely needed to look for the key phrase itself, this could the URL:

https://www.google.co.uk/search?q=office+space+to+rent

With this established, we are able to transfer into the enjoyable half: scraping the search outcomes.

Let’s run by way of how to arrange our customized extraction in Screaming Frog.

It’s truly very simple. Just comply with these paths and alter the related settings:

  • Open Screaming Frog
  • Change mode from Spider to List
  • Configuration > Spider > Basic > Uncheck all bins
  • Configuration > Spider > Rendering > JavaScript (from the dropdown – that is typically required to scrape components of a web page that Google makes use of JS to inject into the SERPs)
  • Configuration > Speed > Max Threads = 1 (since you don’t need Google to block your IP)
  • Configuration > Speed > Limit URI/s = 1.2

Custom extraction for web page titles:

  • Configuration > Custom > Extraction > XPath = //*[@id=”ires”]/ol/div[*]/h3 – change closing dropdown to “Extract Text” and label extraction as “Page Title”

Custom extraction for meta descriptions:

  • Configuration > Custom > Extraction > XPath = //*[@id=”ires”]/ol/div[*]/div/span – change closing dropdown to “Extract Text” and label extraction as “Meta Description”

Select custom extraction SFSERP scrape setup in ScreamingFrog

These extraction components don’t appear to change, so you ought to be ready to proceed utilizing them with out having to replace the XPath you utilize.

When you’ve run your report, working as many SERP URLs as you want, go to the Custom tab in Screaming Frog then set the Filter dropdown to Extraction.

With a little bit of luck, you must see a set of web page titles and meta descriptions returned for your goal key phrases, matching the standards you specified together with your URL parameters.

You can now export this into Excel.

When I ran the search for 4 queries relating to workplace area and coworking area (“office space London”, “office space Manchester”, “coworking space London”, “coworking space Manchester”), that is the report I generated:

SERP scrape results

If you run the report on the identical queries, you’ll discover one thing very fascinating. Despite “office space” and “coworking space” being semantically associated, Google is decoding the intent behind variations of these queries very otherwise.

When I search for “office space [CITY LOCATION]”, I see web page titles like this:

Office Space & Business Space To Let & Rent In London

Office Space & Desks to Rent in London

Find a London Office: Office Space London, Office Space to Rent

When I search for “coworking space [CITY LOCATION]”, I see web page titles like this:

The 10 finest London coworking areas

Overview of the 10 finest co-working areas in Manchester

Best coworking areas: London’s eight coolest workspaces

I can see clearly that Google is seeing “office space” queries as extra immediately transactional, whereas it treats “coworking space” phrases as extra informational.

What Does This Mean for Your web optimization Strategy?

It means you’re unlikely to get the identical web page rating for each “office space” and “coworking space” phrases, regardless of how carefully associated the 2 may be.

Without analyzing the SERPs on this manner, I may not have realized that Google is decoding these key phrases otherwise.

As AJ Kohn says: “Target the keyword, optimize the intent.”

This has been simple for me to spot as I’m solely 4 SERPs.

But what for those who’re doing it for 100 key phrases? Or 500? Or, dare I say it, 1,000 key phrases?

There is a course of that you should utilize to try to scale this up with a method you could apply to Google Sheets or Excel (no matter your weapon of alternative is).

The method appears to be like like this:

=IF(OR(NOT(ISERR(SEARCH("how",C2))),NOT(ISERR(SEARCH("what",C2))),NOT(ISERR(SEARCH("who",C2))),NOT(ISERR(SEARCH("when",C2))),NOT(ISERR(SEARCH("why",C2))),NOT(ISERR(SEARCH("whose",C2))),NOT(ISERR(SEARCH("whether",C2))),NOT(ISERR(SEARCH("best",C2))),NOT(ISERR(SEARCH("tips",C2))))=TRUE,"Informational",IF(OR(NOT(ISERR(SEARCH("buy",C2))),NOT(ISERR(SEARCH("sale",C2))),NOT(ISERR(SEARCH("to let",C2))),NOT(ISERR(SEARCH("rent",C2))),NOT(ISERR(SEARCH("space in",C2))),NOT(ISERR(SEARCH("get",C2))))=TRUE,"Transactional","Intent not found"))

All this does is look for the presence of sure phrases or phrases in a web page title (within the above instance, contained in cell C2), and assigns an intent classification.

For the aim of this text, I’ve solely created two intent sorts – informational and transactional – however you should utilize the construction of the method to create as many as you want.

Then you add the phrases you need that act as a signifier for a sure intent classification. In this instance, I’ve added the next phrases that counsel an informational intent:

  • How
  • What
  • Who
  • When
  • Whose
  • Whether
  • Best

And the next phrases that counsel a extra transactional intent:

  • Buy
  • Sale
  • To let
  • Rent
  • Space in
  • Get

You can tweak this checklist as you would possibly want to make it extra related to the area of interest you’re researching, for instance, I’ve included issues like “rent”, “to let”, and “space in” which are extra current in transactional pages.

I additionally discover that for those who can tweak your method to classify intent precisely for 10 SERPs, you possibly can roll this out to an even bigger checklist and it is going to be correct.

In the top, what you must find yourself with is a sheet that appears one thing like this:

https://docs.google.com/spreadsheets/d/1Scpg2v7GEP8-rrXRIfR7kQgaf7bX6a9xk2y10YEqm0s/edit?usp=sharing

SERP Intent Report Example

The first tab reveals how the intent classification works.

The second tab reveals the commonest intent kind Google favors for the key phrases you’re focusing on.

You can apply the identical methodology to the meta descriptions Google reveals to reinforce your findings, although you may want to tweak the method above barely.

For instance, the presence of a date in a meta description usually suggests the web page listed is an article – this hints at informational intent.

The huge takeaway: You can use this analysis methodology to higher match the varieties of content material you’re creating to the intent that Google appears to be mostly favoring for your goal queries.

Intent Matters Now More Than Ever

It appears that nowadays, it doesn’t matter what number of hyperlinks or how a lot authority a web page or area has; except you’re creating content material that’s optimized for intent, you’re going to face an uphill battle to obtain the rankings you need.

Beyond that, in case your web page isn’t matched to intent, it’s unlikely that site visitors will carry out effectively as soon as it arrives on the location from an engagement or conversion perspective (in spite of everything, we’ve got to assume there’s a cause Google prefers sure intent sorts for particular key phrases).

There’s much more you are able to do with customized extractions, particularly customized SERP extractions, that may assist together with your web optimization technique (assume knowledge mining, outreach checklist constructing, competitor intelligence), so I hope this text is helpful and might act as a framework for your personal SERP extraction experiments.

Remember, it’s all about digging out these vital clues that can provide context to your natural technique. Good luck and completely satisfied searching, my fellow web optimization detectives!

More web optimization Resources:


Image Credits

All screenshots taken by creator, August 2018

Subscribe to SEJ

Get our weekly publication from SEJ’s Founder Loren Baker in regards to the newest information within the business!

Ebook



Tags: , , , ,