SEO Orphan Pages Tutorial

What are SEO Orphan Pages? What you should do about them? [2021]

Tech SEO includes various tasks that look after the entire website’s performance, link juice flow and structure in general. One of the tasks that is often omitted is the cleanup of unused pages. By cleanup we mean either to set 410 status code to the unused pages or redirect any pages that might still have value in form of traffic or back-links.

During the lifetime of any website, web structure tends to change, new pages created and internal linking methods altered. What happens as a result you may end up with pages that have zero internal links pointing to them, these pages are known as “orphan pages” since they do not have any related pages. We are going to go through a step by step guide to see how to spot orphan pages, what to do with them and how to avoid having such pages.

Table of Contents

How to spot Orphan pages?

There are two ways to extract the orphan pages:

  • Screaming Frog Crawl against Backend URLs
  • Google Search Console against Backend URLs

Both methods are effective and should give you all the pages whether it’s a 20 page site or 100,000 pages. We will first extract the URLs from the backend side and then from the crawl/index (Google Search Console or Screaming Frog). Once ready, we compare both using MS Excel Conditional Formatting.

Exporting URLs from Backend (WordPress)

For both methods we need to first get a list from our backend, in this tutorial I’ll be showing you how to do it with a WordPress plugin called Export All URLs.

1) Once installed and activated head to Tools>Export All URLs make sure that you select “All Types“, tick “URLs” and “Post Status” should be published and also select “CSV File” as “Export Type“.

Export All URLs Step 1

2) Click on “Click here” to download the extract. It is recommended to delete the file for security reasons, I also recommend deactivating the plugin and delete the plugin if you’re not going to go through this process frequently

Export All URLs Step 2

Method 1: Crawled pages extraction with Google Search Console

1) Go Google Search Console dashboard and click on “Coverage“, click on “Valid” and then click on “Submitted and indexed” under “Details

Google Search Console URL extract Step 1

2) This will give you a list of pages crawled by GoogleBot

Google Search Console URL extract Step 2

3) Hit “Export” button and select “Download Excel”

4) Click on “Table” tab at the bottom of your Excel sheet, select the URLs and copy them into a new Excel sheet, see “Comparing Extracts in MS Excel” for next steps

Google Search Console URL extract Step 4

Method 2: Crawled pages extraction with Screaming Frog

1) Open Screaming Frog and run a crawl of your site. Click on “Internal” tab, select “HTML” from the dropdown filter

Orphan pages extraction Step 1

2) Click on the “Export” button

Orphan pages extraction Step 2

3) Select the correct type either “Excel Workbook” or “CSV” format and hit “Save”

Orphan pages extraction Step 3

4) Open up the Screaming Frog extract, select “Indexable” pages (this excludes pages which are set as “no-index”)

Orphan pages extraction Step 4

5) Copy the URLs into a new Excel sheet, see “Comparing Extracts in MS Excel” for next steps

Orphan pages extraction Step 5

Comparing extracts in MS Excel

I will be using Excel to compare both extracts, feel free to use Google Sheets as an alternative.

1) In a new Excel sheet paste the URLs from the backend extracted using the “Extract ALL URLs” plugin and paste in the copied URLs from Screaming Frog or Google Search Console. Make sure to add a column called “Type” so you’re able to tell the difference between backend and crawled pages.

2) Highlight the second column click on “Conditional Formatting” and click on “Duplicate Values”

Orphan pages extraction Step 7

3) Excel will now highlight the URLs that have more than one entry in “Column B”

In this example I have one URL that is an Orphan page “/author/seolad“. In this case, I will set the page as “no-index” since I will not be making use of the author page anytime soon. Alternatively I can update the page and link from within one of my posts.

Orphan pages extraction Step 8

What should you do with Orphan pages?

Once identified we need to take action, and there are several options:

  • Link to the pages – If the page is valuable and has up to date content then you can simply find another page and link to it, alternatively you can make use of menu links or user sitemaps.
  • Take them offline – If the page has no value, that is, no traffic and no backlinks then you can delete it and set it as 410 which gives the crawlers a signal that this page no longer exists.
  • Update the pages and Link to them – some pages may be lost through site structural changes or completely missed. In this case update the content and link from other related pages.
  • Redirect them into a related page – another option is to set a redirect, maybe there’s a new version of the page or the page still has some backlink value. In this case we need to redirect it to a relevant page.

How to prevent Orphan pages from re-appearing?

Here’s a list of tips that help prevent having more orphan pages in the future:

  • User sitemap – easiest way is to have an up to date user sitemap that has an entire directory of all the pages in the site.
  • SEO audit on structure changes – if you know that a structure change is coming up make sure that in the new structure you have listed down all the URLs.
  • SEO audits pre and post web updates/deployments – best practice is to always do a crawl before a deployment and after so that you’re able to spot any issues including missing pages, ideally this is done on a staging environment

Summary

The process of checking for Orphan pages should be done periodically, especially if a 3rd party is doing the web development part. Even though you do all the prevention methods mentioned above sometimes either with updates from Content or Development (or even yourself) some pages end up losing crucial links or worse they end up being de-indexed by the search engine. This quick check helps keeping your site structure up to date without any loose ends which containing the value of all pages on the site.