Search Engine Crawlers
Search Engine CrawlersYou may not want certain pages of your website crawled and indexed because they might not be useful to users if found in a search engine's results, or from an SEO stand-point, are not relevant to the topic(s) that make up the rest of your site (a contact or privacy policy page for example is unlikely to be relevant to your chosen topic/keywords). Search engine crawlers can be instructed to ignore pages, and not follow certain links. This can be achieved in two ways:a) Robots Meta TagYou can add a robots meta tag to any page on your website and it may contain any combination of the following values, in a comma separated list; NOINDEX, NOFOLLOW and NOARCHIVE
NOINDEX will prevent the page from being indexed (listed) by search engines.
NOFOLLOW will prevent the links on the page being followed by the crawler.
NOARCHIVE prevents the search engine from caching a copy of the current page.
Example;
<meta name="robots" content="NOINDEX,NOFOLLOW"/>
In the above example the page containing this tag would not be indexed, and the links contained within it would not be followed.
b) Robots.txt FileA robots.txt file is a plain text file that tells search engine crawlers whether they can access parts of your site. This file must be named "robots.txt" and be placed within the root directory of your website.
This file uses the Robots Exclusion Protocol to specify the parts of the site that may be accessed by the crawler. In the example below all resources except for those stored in the images folder are accessible.
Example;
User-agent: *Disallow: /images/
Warning: Care should be taken when employing any of the above methods to restrict access to your web pages by search engine crawlers. Spending hours optimizing your titles and tweaking your text will not help if you have inadvertently instructed search engines to ignore an optimized page!
Published on January 14, 2014 18:51
No comments have been added yet.


