One of the most common forms of duplicate content I come across is the development, test or stage site being indexed by the search engines.
This can be an issue not only due to having two versions of your site being live, but also that traffic can then be driven to your development site via the Serps, where the general public can then see your rough draft of the site as well as the live version.
This could also result in links being attracted back to the test site rather than the live site.
Having your test version of the site available to search engines and users poses the risk of content being live, which has not yet passed compliance or been approved by your legal team.
So how can you prevent this happening? Block the search engines spiders from the start, there are three methods you can utilise for achieving this,
- Use password protection on the stage site, which requires user authentication. The spiders cannot access pages which require a login user name and password.
- Use a robots.txt file which will disallow all spiders from indexing the test content.
- Use a noindex,nofollow in the test platform meta code.
If you use methods 2 and 3 remember not to carry this over to the live site, or you will prevent your new site from being indexed.
What if my test site is already indexed?
Make the above changes to the test site first to prevent any future indexing
Using webmaster tools account (if you have one setup for all the major engines) will allow you remove the content from the search results using the URL removal tool.
If the test site is no longer live Google will empty the cached test site pages and replace them with the live site in time, as it indexes the live content.