Phone: +356 7928 9069
Tech SEO includes various tasks that look after the entire website’s performance, link juice flow and structure in general. One of the tasks that is often omitted is the cleanup of unused pages. By cleanup we mean either to set 410 status code to the unused pages or redirect any pages that might still have value in form of traffic or back-links.
During the lifetime of any website, web structure tends to change, new pages created and internal linking methods altered. What happens as a result you may end up with pages that have zero internal links pointing to them, these pages are known as “orphan pages” since they do not have any related pages. We are going to go through a step by step guide to see how to spot orphan pages, what to do with them and how to avoid having such pages.
There are two ways to extract the orphan pages:
Both methods are effective and should give you all the pages whether it’s a 20 page site or 100,000 pages. We will first extract the URLs from the backend side and then from the crawl/index (Google Search Console or Screaming Frog). Once ready, we compare both using MS Excel Conditional Formatting.
For both methods we need to first get a list from our backend, in this tutorial I’ll be showing you how to do it with a WordPress plugin called Export All URLs.
1) Once installed and activated head to Tools>Export All URLs make sure that you select “All Types“, tick “URLs” and “Post Status” should be published and also select “CSV File” as “Export Type“.
2) Click on “Click here” to download the extract. It is recommended to delete the file for security reasons, I also recommend deactivating the plugin and delete the plugin if you’re not going to go through this process frequently
1) Go Google Search Console dashboard and click on “Coverage“, click on “Valid” and then click on “Submitted and indexed” under “Details“
2) This will give you a list of pages crawled by GoogleBot
3) Hit “Export” button and select “Download Excel”
4) Click on “Table” tab at the bottom of your Excel sheet, select the URLs and copy them into a new Excel sheet, see “Comparing Extracts in MS Excel” for next steps
1) Open Screaming Frog and run a crawl of your site. Click on “Internal” tab, select “HTML” from the dropdown filter
2) Click on the “Export” button
3) Select the correct type either “Excel Workbook” or “CSV” format and hit “Save”
4) Open up the Screaming Frog extract, select “Indexable” pages (this excludes pages which are set as “no-index”)
5) Copy the URLs into a new Excel sheet, see “Comparing Extracts in MS Excel” for next steps
I will be using Excel to compare both extracts, feel free to use Google Sheets as an alternative.
1) In a new Excel sheet paste the URLs from the backend extracted using the “Extract ALL URLs” plugin and paste in the copied URLs from Screaming Frog or Google Search Console. Make sure to add a column called “Type” so you’re able to tell the difference between backend and crawled pages.
2) Highlight the second column click on “Conditional Formatting” and click on “Duplicate Values”
3) Excel will now highlight the URLs that have more than one entry in “Column B”
In this example I have one URL that is an Orphan page “/author/seolad“. In this case, I will set the page as “no-index” since I will not be making use of the author page anytime soon. Alternatively I can update the page and link from within one of my posts.
Once identified we need to take action, and there are several options:
Here’s a list of tips that help prevent having more orphan pages in the future:
The process of checking for Orphan pages should be done periodically, especially if a 3rd party is doing the web development part. Even though you do all the prevention methods mentioned above sometimes either with updates from Content or Development (or even yourself) some pages end up losing crucial links or worse they end up being de-indexed by the search engine. This quick check helps keeping your site structure up to date without any loose ends which containing the value of all pages on the site.