Earlier this week I was exchanging emails with the Editor of a fairly well known entrepreneur blog, with regard to a potential content partnership.
I’ve withheld from using real names to protect anonymity, but I can say it’s a site with a Domain Authority of 40+, which has been active for over five years, and has a wealth of what I would consider to be high quality content.
So all seemed good. That was until I ran a couple of site:<domain> searches to check on some old articles, and was presented with this:
Nothing. Nothing at all.
A few more searches confirmed that none of the site’s pages were indexed – uh-oh. Curiosity got the better of me; was this an innocent blunder or something more sinister?
Before I reveal the answer to that question, I’ll outline some of the common issues that can result in an entire website not being indexed, as well as some solutions for resolving them. Please bear in mind I’m not talking about a loss in rankings here, but going missing from Google’s index entirely.
I should also mention that I have no affiliation with said site, and therefore no access to Google Search Console, which should be the first port of call in scenarios such as this – the Google Index reports in particular can shed a lot of light on indexing issues, and help you pinpoint exactly when the issues occurred. But I don’t have that privilege, so I’m relying instead on process of elimination, some SEO tools, and a bit of guesswork.
Three common reasons for a website not being indexed
1. Robots.txt file
A Robots.txt file can be used by website owners to instruct robots (i.e. search engine spiders) as to which areas of a website should be crawled and indexed. If a Robots.txt file contains the following rule – * Disallow: / – nothing will be indexed at all. This is a common mistake but thankfully easily resolved.
To allow robots access, change the Robots.txt rule to User-agent: *Disallow:
For more Robots.txt wisdom check out this page
2. Noindex meta tags blocking spiders
A Noindex tag can be used to instruct spiders to ignore certain pages, but it is also entirely possible that a tag could be inadvertently implemented across an entire site. The simplest way to check if such a tag exists is to view the page’s source code and search for content=”noindex”.
3. Crawlers blocked by .htaccess file
An .htaccess file is a file that controls the server running your website. It’s useful for managing many different website actions, such as redirects, rewrites, and most crucially in this instance, website access. If the two scenarios above come back clear, it’s worth checking your .htaccess file as this may be preventing spiders from crawling your site at server level.
That’s just three of the more common scenarios, but other possible explanations include:
- Broken or outdated sitemaps
- Incorrectly configured URL parameters
- Connectivity issues, meaning crawlers cannot reach the server
- Incorrect use of rel=canonical tags
- Very slow load speed
- Selecting the incorrect privacy settings in your CMS (easily done in WordPress)
- The site may be indexed under a different domain
- If the site is brand new, Google just might not have got around to it yet
These are all plausible explanations, but in this instance do not appear to be the root cause of the issue. This leads me to believe there’s something more dodgy going on.
Something more sinister?
With no (visible) evidence of any of the above causing a problem, and considering the fact the entire site has disappeared from the face of the earth, all signs point to something more troublesome. A manual penalty perhaps?
A quick Ahrefs report reveals some worrying results:
This first graph shows the number of referring pages (links) to the website over the past 12 months:
The next graph shows the volume of organic traffic (albeit an approximation based on an algorithm) to the site over the past 12 months:
A huge spike in links followed by a sudden drop in organic traffic; this screams of a penalty.
So what of the links themselves?
A (very) top level link audit further confirms my concerns. The site has close to 100,000 links pointing to it, many of which are from:
- non-indexed domains
- weak domains with no external links pointing to them
- domains which themselves may have had a penalty
- domains which appear to be part of a link network
- generally dodgy looking domains that bear no relevance to the site in question
If it’s not a manual penalty, I’ll eat my hat.
So what now?
As mentioned earlier in this post, the first port of call when things go south should be Google Search Console. If the site in question has had a manual penalty then Google Search Console is the place to identify and rectify the problem.
The bad news is that rectifying the problem is easier said than done. Removing a manual penalty is no walk in the park and can take months to reverse, and even then there is no guarantee of recovering fully, if at all.
While links are the most likely cause of the problem in this instance, if I had more time I’d also be inclined to check for duplicate content issues, and do some more research around who the site is sharing server space with, as both of these things could cause problems.
More often than not, indexing issues are caused by innocent mistakes, which are easily rectified. But if, like the case outlined above, there are deeper issues caused by external factors, it’s time to stop what you’re doing and seek help, because manual penalties do not reverse themselves.
Finally, if you don’t currently have a Google Search Console account set up, I’d suggest you do this as a priority – https://support.google.com/webmasters/answer/6001104?hl=en