Link Analysis 101 – Part 1.

Link Analysis 101 – Part 1 ...

In the new world of the Google Penguin updates any good SEO needs to be able to audit backlinks.

Google’s algorithm updates are forcing companies to reassess their SEO strategies and focus on earning quality links through creating good content. Many websites are suffering because of dubious quality links created by other means in the past. Before a business can be rewarded for creating great content, toxic links that contravene Google’s guidelines need to be discovered and removed.

However, this is not a straightforward process if you don’t have any records of the links that have been built for you and especially as WMT cannot yet give you any examples of the poor links discovered by Google (although that functionality is promised soon), so this is the first article in a series of three that will guide you through link analysis and link removal.

This first article, Part 1, focuses on link discovery and manual link checks, and should help you get started. Part 2 will cover the automation of this process that is possible, and Part 3 will cover penalties and link removal.

I am assuming that you’re new to the whole link analysis process. I’m going to start with some basic principles and caveats and then move on to the actual checks.

What can link analysis do?

Link analysis can help you spot patterns and trends that may explain why a particular site is performing well, or not so well. However one thing we need to be clear on is that any link analysis technique will struggle to replicate the power of Google’s algorithm and so this can be a time consuming process involving the use of several different tools, on multiple occasions to ensure you find all bad links.

It’s not a numbers game

This leads nicely into my next point. Link building is no longer a numbers game and not all links are equal. Some will naturally pass more value than others. A natural link pro/wp-content/uploads/file is going to have a diverse range of links from a selection of sources.

Outcomes can inform the future

It’s really important that you relate the findings from link analysis to inform any future SEO strategy. It’s easy to fall into the trap of using it to create lots of interesting charts. The real skill is using the insight gained to drive future site strategy.

Using any insight you gain to chase the algorithm and try to cheat Google is not advised. In the long run you are highly likely to find yourself susceptible to future algorithm updates. Our advice is to start investing in a future proof content development and broadcast strategy now rather than wait for the inevitable Penguin 3.0 update.

Correlation does not imply causation

Another key point is that correlation does not equal causation. Sometimes you will spot situations where two variables are closely related but the presence of one does not imply the cause of the other.

e.g. A site that has lots of toxic and poor links is ranking in position one – for now.

The opposite of this correlation is in fact the situation now and any site with a huge number of bad links and an un-natural link pro/wp-content/uploads/file is likely to be penalized soon.

Understand what SEO activity you have done in the past and when it was done

This is the bit where you have to be really honest, or do some detective work. Try to find out as much information as possible. If you are new at your organization this can be difficult but perseverance can pay off and give you a good idea of what you might be dealing with.

For example have you or your agencies previously done any of the following?

  • Directory submissions
  • Article submissions
  • Free links

Clues about previous SEO activity will give you an idea of the type of footprint you are looking for and help /wp-content/uploads/filter your results.

Intent and extent

If you previously employed a particular link building technique to artificially inflate link numbers then the chances are Google are going to look at that link as inorganic. The extent to which a particular technique was used is also highly important. All sites have a degree of low quality links. Unfortunately it’s a feature of the internet and unavoidable to some extent. Try to understand how aggressive you’ve previously been with a particular technique. 10 links from “quality” directories are going to be viewed very differently to the way 500 submissions to free directories with no editorial control are seen by Google.


It’s important to consider that the site owner or webmaster of any site you have a link from may just be really bad at SEO and a link you consider inorganic may actually be organic and earned. This distinction is really important and it’s important to manually check links you are uncertain about (more about this in a minute).

Extracting back link data

The first thing you are going to do is extract your backlink data. Despite Google having the deepest and biggest index of the web the data they give webmasters can at times be frustrating sparse. In our opinion this makes using other data sources a necessity.

There are a number of informative data sources, the first being the data offered by the two main search engines: Google and Bing.

To extract data from WMT simply log into your account you manage and select the “Links to Your Site” report in the traffic section.

Webmaster tools links to your site

After you’ve navigated to the “Links to your Site report” select “More >>” From under the report called “who links to you the most”. You will now have the option to download a list of all the domains that link to you. This report just gives you a list of the domains that links to you and not the actual links. Thus I recommend you click the “Download Sample Links” table because this /wp-content/uploads/file gives you the actual link locations which are much easier to interpret.

Webmaster tools - download latest links

Finally there is an option to download your sites latest links. This report is very similar to the sample links reports other than it has a second column with the date the link was first discovered.

You will then need to supplement this data with link information from third party link discovery tools. There are a number of excellent tools that provide crawls of the web. Open site explorer (OSE) run by Moz in my opinion is the easiest for a novice to use and British company Majestic SEO have an excellent crawl that offers a lot more data.

There are also other data suppliers like ahrefs, Searchmetrics and Sistrix that you may also want to consider, some of these do have free plans which allow you to download some of their data for free.

Normally for small sites a combination of Google, Bing, OSE and Majestic is going to be more than enough however there will be situations where you will need to pull every single link you can find and this will mean using all the tools as some are better at discovering certain types of link than the others.

Classification of the data

Once you have all the downloaded data from all the tools you can classify the links. Some will obviously be bad and need to be removed, but the bulk will be suspicious and need to be manually checked.

We will come back to how to classify your pro/wp-content/uploads/file and what to do about it in part 2, but for now here is some guidance on how to interrogate your links and get a feel for how bad the situation might be.

How to spot SPAM – manual techniques you need to become skilled in

Before we dive into the data side of things it’s important to be able to manually identify a toxic or inorganic link. Quite often this is easier said than done, but as you get more experienced you’ll soon start to know what to look for. Before you start, pay a visit to the link schemes page over at Webmaster Tools. Whilst this doesn’t document every type of inorganic link it’s an excellent start point. Over time you will encounter and ultimately become familiar with a lot of different link building tactics, what’s good and what’s bad.

Another key point is to try and understand how a particular webmaster tries to earn money from their site. I think this added a lot of context to the work we do and it’s important to be able to differentiate a webmaster who is genuinely trying to create a community or legitimate business from someone who is out to make a quick buck when assessing the quality of the link they have given you.

Basic manual checks

Once you’ve downloaded your data you’ll need to perform some basic manual checks on suspicious links. This section assumes you’ve never done this kind of work before. Hopefully there’s something for everyone.

Domain Checks

  1. Is the site live? – The first thing to do is check if the site is currently live
  2. Is the site indexed? – You can check this using a site command. If the site is not indexed it may be due to bad technical SEO. However if the site is live and not indexed the chances are Google does not rate it
  3. Does the domain name look spammy? - Exact match domains (EMDs) can be a signal of a low quality domain. Once you’ve working in SEO for a while you’ll get a sixth sense for this! e.g.
  4. Domain TLDDomain TLD can be another good clue. Some are a lot more credible than others and spammers have traditionally used the more readily available TLDs to register a lot of sites quickly. Shady TLDs to look out for are .info .biz
  5. WHOIS information – I like to have a look at the registrar too (if it’s available). If I spot a site that has a webmaster in an exotic location overseas location, I tend to be suspicious. DomainTools is my main WHOIS lookup provider but there are lots of very good tools/browser extensions out there
  6. Domain age – Domain age used to be an indication of a well-established site. The premise being that spammers would register lots of low quality domains quickly. Thus a link from a newer site would have less trustworthy than a link from a more mature, well established site. This can also be checked by looking at the ‘created on’ line on the WHOIS record
  7. Does the site have a dedicated server? – When doing the WHOIS check have a quick look at the reverse IP information. Shared hosting isn’t necessarily a bad thing and there are quite legitimate reasons for this (mainly cost) but if the site does share a IP with over 100 other low quality sites you may have a link network on your hands. Once again you should be able to use your WHOIS tool for reverse IP checking
  8. Server location – is the server based overseas outside your target market? You should expect to see links from a good spread of countries with the UK, US and EU countries being most prominent
  9. Type of site – links from directories, articles sites, social bookmarks and forums can signal unnatural links
  10. Is the site relevant to your niche? - This is a really key point. Relevancy is such a big part of Google’s algorithm, and always will be. Relevancy is something you should be able to deduce quite quickly
  11. Signs of legitimacy – Go to the contact page. Is there a physical address? Don’t get duped by mailboxes either! If you are unsure search for the address and look at it in street view
  12. Signs of human activity – Check the sites about us page. Is there information about why the site was set up and who runs it?
  13. Signs of automation - SPLOGS are quite often automatically publish spun or thin content on a diverse range of “topics”. Look for patterns in posts eg. First link in first paragraph links to Wikipedia page, second link in second paragraph links to Wikipedia, third link in final paragraph links to an article on your site

Page Checks

  1. Is the page title optimized and does it contain junk? - a quick visual check can give you a good idea of what’s going on
  2. Does a Google search of the page title show the page you are looking for? - This is a good basic check, in most situations the page linking to your site should show
  3. Does the page and site in general have a lot of external links? - Again there can be legitimate reasons for this. However, check the sidebar and footer for external links. Also have a look at in page link lists
  4. Are there lots adverts on the page and are they above the page fold?- Look for AdSense blocks. Text links, Image ads, etc
  5. Does it link out to bad neighborhoods? - If the site does link out does it link to relevant sites or does it link out to bad neighborhoods. Look for evidence of PPC (pharmaceuticals, porn and casinos) and other low quality sites. Check multiple pages
  6. Junk in the footer and sidebars – Check the footer and sidebars for links to other sites. Normally this is an indication of a low quality site. Traditionally webmasters have hidden a lot of junk in their footers

Link Checks

  1. Does the site send you traffic and does it convert?- Check the referring sites report in GA. If the site is sending you traffic it’s a strong indication that humans are looking at it. Even better if the traffic is converting.
  2. Is the link no followed?- A key point is to check that if the link has the rel=”nofollow” attribute. The rel=”nofollow” attribute can be used to signal paid sites and untrusted content.
  3. Check for redirects – I also like to see if the link is going through any redirect routine or through a third party tracking pixel. This can be quite useful in understanding if the link passes PageRank. There are a number of tools that do this, redirect path by Ayima is my favorite
  4. Anchor text – Traditionally SEOs have looked to make the anchor text of their links closely match the keywords they want links for. Google are acutely aware of this, so when checking links I’m going to look for heavily optimized text. In their guidelines Google uses the phrase, “Links that are inserted into articles with little coherence”, again this is a good visual cue that something isn’t right
  5. Reciprocal links – I don’t see this so much anymore but back in the day reciprocal links or link exchanges were a very common way of inflating link numbers and inflating PageRank. The key thing to look for here is relevancy. A useful list that has been curated is going to be a lot better than a long link list
  6. Type of link – Is the image through a text link or image? Here I try to look for signs of paid images without the’ nofollow’ attribute applied, passing off as ads. Quite often you will notice something like “Partner Site” or “Advertisement” in the surrounding text

Link and Social Metrics

  1. Number of links pointing to page/domain – Checking both the page and domain for links can give insight into the quality of the links. Don’t forget that links can be manipulated and SPAM sites will have other dross funneled into them from other sites
  2. Social citations– Social citations can indicate that a page has gathered a certain degree of social popularity. Once again be aware of fake votes. This does happen. So if you see a SME site with 10,000 Twitter followers and 15,000 Facebook likes the chances are that they have been ‘gamed’.
  3. PageRank – Using PageRank isn’t an exact science but I like to use it a barometer to see if a link or site warrants further investigation. Toolbar PageRank isn’t practically fresh either and updates about once a quarter, so don’t expect new sites to display any values. To check PageRank you’ll need to install the Google toolbar and enable PageRank on Internet Explorer
  4. Domain Authority and Page Authority – I use Domain Authority and Page Authority to supplement my initial page Rank reading. Especially if I notice a lot of PageRank 0 and N/A links. The easiest way to get Domain Authority and Page Authority metrics straight into your browser is to install the Mozbar on your browser. The Mozbar has lots of free features and I recommended installing it

When checking link and social metrics it’s important to run these check for both the link and the whole domain. This way you can judge the credibility of the whole site.

Final Check

    1. Mum test - If you show the link to your Mum would she be able to understand the page linking to you?

This list isn’t exhaustive but I’ve tried to give you as many checks as possible to help you judge the value of a link. As I mentioned earlier, after a while you’ll start to get a sixth sense for SPAM and be able to critique a site very quickly, without having to perform all the checks.

Although this process of link analysis is laborious, it is worth it, as if you find you have a bad link pro/wp-content/uploads/file and can then explain falling natural search traffic and visibility once you have worked to remove all of your bad links (which we will cover in part 3 of this series) it is a great feeling when you receive one of these to tell you a manual penalty has been lifted thanks to your hard work, or your natural search visibility starts to recover.

Google manual spam action revoked

In the next part of our link analysis guide, I’m going show you how to automate the process to save time and check links in bulk but without sacrificing any of the quality.

Contact Us

Do you have a challenge for us to solve?

get in touch