I crawled over 5,000 UK retail sites for on page SEO errors. Here are the results.

I’ve had a plan for well over a year to use a version of the crawler I built for SEO reports for a bigger project. I wanted to crawl 5,000+ UK retail sites, to see what errors they had. More specifically, I wanted to compare the three main platforms that I work with: Shopify, Magento and WooCommerce on WordPress. How would the UK online retail market fare as a whole, and which of the platforms had the ‘best’ sites from a technical SEO point of view? I wanted to find out…

…so I did.

What are we measuring?

I started with a list of around 2,500 domains for each of the 3 platforms (attained from BuiltWith), knowing that in my tests I’d seen around 75% success rates which would allow me to hit my 5,000 domains target. For each of those domains, the crawler does the following:

  • Scrape the homepage, and grab up to 50 internal links from that page [EDIT: 50 pages was the happy medium found during testing, a larger sample did not yield a large difference in results]
  • Scrape each of those pages for the following info:
    • Status of page (i.e. whether it exists or throws an error)
    • URL: Static or dynamic?
    • URL Length
    • URL Uses Standard Characters
    • Accessible to Google
    • One title tag
    • Optimal title tag length
    • One canonical tag
    • Canonical tag implemented correctly
    • Avoid meta keywords tag
    • Use meta description
    • One meta description
    • Optimal meta description length
    • Rich snippets (Open Graph, Twitter Cards, Schema Tags)
    • Content length
    • Number of internal links
    • All images contain alt tag
    • Overall page grade, based on those metrics
  • Store the data in a CSV for me to download and use

As you can see, that’s a pretty useful list of basic on page SEO metrics which can then be used to judge those sites. There is a very clear lack of keyword data included in those metrics though, and this is due to the technical limitations of scraping this many pages – it’s very difficult for the crawler to ascertain the target keyword or phrase. This study, then, continues under the assumption that all untested factors are equal.

Headline Stats

Each platform had a list of 2,500 domains to start from. With a proportion of those sites unable to be crawled for one reason or another, I ended up with data from the following number of sites:

  • Shopify: 1577
  • Magento: 1919
  • Woocommerce: 1879
  • Total: 5375

Here’s the data as I ended up with it (‘worst’ and ‘best’ marked where appropriate):

Metric Overall Shopify Magento Woocommerce
Number of Sites 5,375 1,577 1,919 1,879
Avg Pages Crawled 23 14 39 16
Avg Page Grade 60 64 55 62
Non-200 Response* 24% 24% 24% 23%
Avg Body Text Length 8,975 8,672 11,677 6,577
Contains Meta KW Tag* 37% 9% 83% 14%
Alt Tags Missing from Images* 86% 83% 92% 83%
Too Many Internal Links* 56% 37% 88% 39%
Avg Internal Links 155 72 222 170
Missing Rich Snippets* 56% 39% 85% 41%
Missing Meta Description* 63% 63% 36% 91%
Non-optimal Meta Description Length* 92% 85% 95% 94%
Multiple Meta Descriptions* 0%** 0%** 0%** 1%**
Multiple Title Tags* 6% 10% 3% 7%
Non-Optimal Title Length* 57% 43% 80% 45%

*these metrics count as true for a site if at least one page from the domain in question had this issue. I.E. if I crawled 30 pages for a domain, and one of those pages contained a meta keywords tag, they’re included in the total for ‘Contains Meta KW Tag’

** I didn’t include decimal points on this table, so the rounding makes for a strange overall result

What to make of the data

Page Grade

The average page grade for the domains as a whole was not great. The score my page grader gives is a percentage itself, do the dataset as a whole gets a score of 60%. When I do individual crawl reports for a domain a score of 60% would cause me some concern.

Magento sites performed the worst here, but the spread to Shopify with the largest average score was only 9%. Plenty of work on the score as a whole can be done across all the platforms.

Non-200 Status Codes

A web page gives a 200 status code when it is returned without error to the visitor. The main code we worry about is the 404 error, which means the requested page is not found. Due to the fact that my crawler was only trying to scrape pages that were linked to from the homepage, this means that 1,278 (or 24%) of these domains have broken links on their home page. Broken links on the homepage are a MASSIVE red light to a search engine like Google, so check all your internal links work – especially from the home page.

All of the platforms had similar results here, meaning that the likelihood is that these numbers are down to similar human error across platforms. If you run a retail site it’s worth checking all of your links work internally.

Meta Keywords Tag

I’ve been harping on about the meta keywords tag for years now. In short, you should never be including that tag on your websites. That link a couple of sentences back goes into exactly why. It is a big disappointment, then, to note that 37% of the domains I crawled contained at least one page with the meta keywords tag. Over a third of sites are still making this mistake. It’s not a mistake with massively detrimental effects, but it is one which is simple to fix and shows that you are keeping your site up-to-date with current best practises.

The platform stats here are very useful – they show which of the platforms are most likely to include this tag by default in their themes. Magento is the biggest loser here, with a whopping 83% of those sites containing the tag. Shopify is the winner with less than 10% of domains including the tag.

Missing Alt Tags on Images

Another simple fix, this one. The alt attribute on image tags is used as a way to describe your images to a search engine. More importantly, it’s also a way to describe your image any time which it cannot be reproduced. For instance, on a screen reader for the blind. It’s a useful tag, and it’s simple to make sure that each image on your site contains it. 86% of sites crawled contained an error on at least one image on one page during this experiment.

All of the platforms were relatively equal again here. I believe that most of these errors come down to two factors:

  • Human Error when adding images
  • Pasted code containing images without alt tags
    • The Facebook tracking code, for instance, is a major offender. It contains a pixel image without an Alt Tag. Take care when pasting code.

Too Many Internal Links

All websites with more than one page need internal links, that’s for sure. Without those internal links users would be unable to find anything, search engines would be less able to discover your pages, and this crawl wouldn’t have been possible! That being the case, though, there is a limit. Consider that each of your pages have a certain amount of ‘link juice’ to give. Each page it links to is passed some of that juice, and that juice is partly what search engines use to decide which pages are important on your site – the ones with the most juice are the most important. Have too many internal links on your site, and you’re spreading that link juice very thinly.

For this crawl, any pages with more than 100 links on them were deemed to have too many. A lot of retail sites go over this number very quickly, by trying to link to every subcategory they have from the navigation. In this case, 56% of sites had at least one page with too many internal links on it. The overall average number of internal links on a page was 155.

Individually, only one platform had an average number of internal links lower than the threshold: Shopify.

Missing Rich Snippets / Structured Data

I tested each page crawled to see whether it contained any of the following:

  • Schema Data
  • Open Graph Data
  • Twitter Cards Data

Each of these types of rich snippets, or structured data, are a way of providing search engines and social networks with more information about your page. I find that correct implementation of Open Graph data, for instance, hugely affects the impact of a social share of a page.

56% of sites crawled were missing this data on at least one page. Magento was once again the biggest culprit, with Shopify and WooCommerce doing relatively well in comparison. The numbers are still high enough across the board to need addressing, though.

Meta Descriptions

Meta Description tags, unlike the Keywords tag I mentioned earlier, are absolutely fine to include on your site. In fact, they are a necessity in my opinion. Correctly implemented Meta Description tags allow you to suggest to a search engine what ‘blurb’ to include with your result in searches. That blurb, when used well, can become your best chance of convincing a potential visitor that your page offers the product, service, or answer that they require.

For that reason, the fact that 63% of sites crawled contained at least one page without this tag is concerning! Each and every page on your site should contain this tag, so I encourage all retail site owners to check their implementation. Especially those running WooCommerce, as a gigantic 91% of those sites contained errors here.

It’s worth noting that hardly any sites crawled made the opposite mistake, with less than 1% overall having any pages with more than one meta description tag.

Title Tags

The title tag on your web pages is the first way that you can tell a search engine or visitor exactly what your page is about. It should be very simple to get right. 6% of sites crawled were so eager to show search engines and visitors their titles that they managed to implement the tag more than once! That’s not ideal, so it’s worth checking on your own sites just to be sure.

More importantly is the title length. Having a title tag too long or too short means that search engines are likely to replace it with their own best guess when showing your result to a potential visitor, which takes away some of your control. Currently the range to aim for is 10 to 70 characters. It’s a bit more complicated than that, because it actually comes down to the pixel width of your title, but keeping in that range will keep you in the ball park.

57% of sites crawled got this wrong on at least one page, with Magento the biggest culprit again with 80% of sites crawled giving an error on at least one page. This is a metric you have great control over on your sites, so it’s worth having a look at.

Conclusion

First off, I’m glad to finally be able to write this post. Creating the crawler and making sure that it gave reliable data has been the result of many hours of work and testing, scattered across many months in fits and starts.

The data as a whole is concerning to me as an online marketer. It only looks at some basic technical on-page SEO aspects of the pages crawled, and yet those pages were found lacking in many areas. There is much room for improvement in these areas, and if the basics are being missed, how much opportunity is there for improvement in the more difficult areas?

In terms of the platform results, I’m not massively surprised by the results. Shopify is doing a good job of giving it’s users a good base to build upon, with many of the basics implemented well by the most popular themes and skins for the shops running on their platform. WooCommerce is ultimately a shop platform bolted onto a website platform, which in turn is built on a blogging platform. It’s amazing that it works at all, and to end up only being ‘worst offender’ in 2 of the metrics discussed is an achievement. Magento is a different beast, a monolith of a platform for those shops which need the sort of features which just aren’t possible with the other options. It’s a complicated beast, though, which means that the basics can and do get messed up.

This data is not meant to decide which platform is ‘best’. It’s impossible to make that decision based only on some basic on-page SEO metrics, when there is so much else which goes into the decision of which platform to run. This data is there to show just how many sites have work to do on the basics, and hopefully to encourage you the reader to check out and improve these metrics on your own site.