In this guide I’ll be covering the workflow I use when it comes to technical reports, structuring and so on with the help of two of the most tools I use for tech analysis: Screaming Frog and Microsoft Excel. This tutorial will help you save time when creating structuring a client’s website and edits such as Page titles and Meta description.
Open up ScreamingFrog or DeepCrawl (you can also your CMS if it gives you in depth details of each page)
Set the following configs:
The one thing I set up is memory and storage and leave everything else as default. I would rather have all details and trim it down later.
Configuration –> System –> Memory and Storage
Memory – this will depend on your system. Adjust it depending on the amount of ram you have, you need around 4GB for your operating system and the rest you can allocate to ScreamingFrog. If you’re not tech savvy leave it as default.
Storage – depending on the size of the website, usually if it’s larger than 50,000 pages it’s safer to switch to “Database Storage” mode as this allows you continue the crawl even after it crashes and will allow Screaming Frog to handle large amount of URLs without hogging all the memory.
Next step is to enter the URL and click on “Start” to get the crawl started.
Once crawl is finished go to “Internal” tab and select “HTML” from the dropdown
Click on export (I suggest exporting it as .csv if it’s a large site then switch file extension later). At this stage you can filter out columns from Screaming Frog but I prefer to crawl the domain get all the info and then trim down the columns which I will not be using.
Open the export in MS Excel, save the file as an excel file (.xlsx) which gives you the ability to save filtering, formulas and edits which is limited with .csv extension.
Clone the first tab and leave first one as is – raw, this is a big time saver can’t tell you how many times I had to re-export everything because of something I overlooked or Excel crash, in this way you always have a raw extract that you can duplicate.
You do this by right clicking the tab and clicking on “Move or Copy…”
OK now that we have a sheet that we can mess with, we can start filtering out. First remove the columns that you will not be using. Since we’re addressing structure and METAs, in this case I’m keep the following: Address, Status Code, Title 1, Meta Description 1 and H1-1. Depending on the report you’re doing will affect which columns to keep. I’d rather keep things separate as it can be a pain to display in meetings with clients or management.
Now separate the pages according to “Status code” 2xxs, 3xxs, 4xxs, 5xxs and place them in their own tab/sheet respectively.
Do this by selecting the top row, go to the “Data” menu and then select “Filter”. Alternatively, use the shortcut “Ctrl + Shift + L” (Windows) or “Cmd + Shift + L” (Mac). Click on the dropdown and select the status codes according to number, then hit “Ctrl + A” or “Cmd + A” to select all and finally copy and paste in a new tab. Once copied, remove the rows from the original sheet, keep doing this until you only have 200s left.
Now we can start the structuring process, add a column to the left call it “category” and make sure the filtering is active on it.
Start by filtering out pages that are generated by queries or page variations, these are usually URLs with “?” inside the domain (these pages should point their canonicals to the original page to avoid page duplication issues – although this, I haven’t yet seen a Google penalty because of query pages but I like to play it safe).
Here’s an example of the mentioned pages:
Filter these by click on the “Address” arrow and enter “ ~? “ in the textbox
This will bring the mentioned pages up, for the purpose of this report we are going to remove them. If I was doing a full audit, I’d push these to a separate sheet and deal with them later.
Now we need to start identifying similar URLs. The screenshot below clearly shows that the page structure does not use any subfolders like “/cats/”. Although this does not cause any issues, ideally we have them organised in separate subfolders so that they’re easier to maintain and keep track of in SEO/content plans and analytics purposes (sometimes this may not always be possible).
I start categorising by finding similar words, in this case “cats” you can filter with “Address or “Title 1“
You keep doing this till you only have pages that don’t fit under any category, these can be labelled as “Generic” or “Other”
Example “Cats” category
Example “Generic” pages
Do this exercise for all the pages listed, it shouldn’t take more than few hours. This will give you a clear idea of the site structure and how you can possibly simplify it. The key to speed this up is finding similar patterns or words in either URLs or METAs as you can categorise multiple pages at one go.
Select a category, create the META for the first row and use it as a template for the others. In this case the example might not be ideal as I’m using a Pet store eCommerce site and there should be a unique META Title and Description for each.
Let’s say we want remove to “Pethouse.” at the start and add “ | PetHouse.com.mt “ at the end of the meta.
First I use “Find and Replace” using the shortcut “Ctrl + H” or “Cmd + H” enter “Pethouse. “ (include the space) then replace with nothing and click “Replace All”.
This removes it, now I’m going use a simple formula to add to the meta description.
Add new column to the right of the “Title 1” column.
Type in “=” in the cell adjacent in the new column, then select the first cell under Title 1, and enter the following: &” | PetHouse.com.mt”
You should have something like this
Hit enter and you get this
Now simply drag the corner of the cell and automatically applies it to the rest of the cells
And you get this applied automatically
Remember to copy paste the contents of the new cells so that you can keep the text and keep editing
There are several other formulas you can use, some of my favourites:
Let’s say I forgot to add an important column and I’ve already been editing for an hour or so. You can easily import from separate “tabs” or “spreadsheet” by using VLOOKUP
In our example we will get the “Word Count” column from the raw extract of ScreamingFrog
Add a new column “Word Count” next to the URL
Enter =VlOOKUP in the empty cell and hit tab, select the first URL
This selects what to look for, now enter “ , “ (comma without inverted commas), go the sheet where the raw extract is.
Drag your cursor from column A all the to where the “Word Count” column is, in this case it’s column “AC”
Enter comma “,” again then enter the “column index number”, meaning that column A is number 1, B is number 2 etc, column AC (where the word count is) is 29 so we enter that.
Hit comma “,” again and a drop-down menu comes up, select “FALSE” to get the exact match and hit Enter.
Now simply drag the corner of the cell like we did in the last exercise and you get all the “Word Count” column imported.
The good thing about this formula is that even if you switch sequence, Excel will still automatically get the exact match.
Just by using “Find and Replace” and some of the mentioned formulas you can easily do things in bulk. While it’s true that it does not automate the whole process, it is a way to speed things up while still fully analyse the site structure. I have used this on sports and horse racing domains where you have multiple results every day in different racecourses for the last 8 years. For example, “runners”, “jockeys” and “trainers”. Although number of pages is around 500,000 I was still able to do the structuring and METAs in one working day by using these simple tools.
I’m by no means a Microsoft Excel guru and there’s probably easier ways to do several tasks I did by using “macros” or other formulas. I hope this workflow helps you with some of the tedious structuring tasks.
I cannot guarantee that you’ll gain rankings by adding folders and organising your site-structure but our team increased traffic (and conversions) by 40% in just 4 weeks by moving to a new CMS with the right structure and redirects implemented.
I’m not by any means suggesting to auto-fill in your METAs but in some cases it makes sense, since you’re not going to have 300,000 variations of page title and description for a sports results page but instead you can have a template where you only change few words which make the METAs more relevant. For example, changing date and venue of race or a football match.