What are Robots.txt and sitemap


Robots.txt

 

For web pages to be indexed, we need programs, robots, scour the net in search of unknown or changed pages to beWhat are Robots.txt and sitemap added to the engine. When it crawls your site, the first thing he wants is a text file robots.txt.

This file can give polite requests to search engine bots. You can tell them they have the right to access your site and if they can index all or a specific . For this, there are two commands: User-agent and Disallow.

 

 

 

 

User-agent can specify the robots which pages are allowed. It can take several forms:

User-agent: * – All robots can index.
User-agent: robot – Only the specified robot can index.

Disallow used to declare the pages that you do not want the engine indexes. It can be used like this:
Disallow: / dir / – A directory will not be indexed.
Disallow: / page.html – Only page.html will not be indexed.

 

You can use several commands to Disallow later. They must each be placed on one line. The robots.txt file must then be inserted at the root of your site.

 

provides a for automatically generating a robots.txt file on its site in the Webmaster Tools.

 


Define a Sitemap

 

You can also use the Sitemap to indicate the absolute address of a file XML sitemap on your site. This can give:

Sitemap:  http://www . yoursite . com/sitemap.xml

The sitemap is a simple text file that corresponds to the XML standard, which contains all links to your site to allow Google to access it more easily.

 

In Webmaster Tools, you can also use the feature that checks your robots.txt is valid and that the sitemap is detected. You get the number of URLs added to the engine.

 

It is better to add an human readable Sitemap of the site too, preferencially on the footer. The reason is simple, you as a human can not read this xml sitemap of this website easily; but you can easily read and search anything from this version of sitemap of our website.

 

To summarize

 

  • The robots.txt file is the first thing read by a robot when trying to browse your Web pages.
  • User-agent can specify the robot and allowed to choose Disallow directories that should not be indexed.
  • Sitemap command allows to specify where that file must contain all the links in your site.

Signature


0saves
If you enjoyed this post, please consider leaving a comment or subscribing to the RSS feed to have future articles delivered to your feed reader.
About Abhishek

Abhishek Ghosh is an Orthopedic Surgeon, Inventor with 216 Patents, Current editor of The Customize Windows Media Group. You can follow and know more about Dr. +Abhishek Ghosh on Google Plus and follow on Twitter as @AbhishekCTRL.

Trackbacks

  1. [...] the robots.txt in WordPress properly for easy crawling April 11, 2011 Leave a CommentWe previously discussed about robot.txt in a separate article. However, this time, we will discuss about the optimum settings of this [...]

  2. [...] is very important to perform several things which we wrote ago. Other than that, we wrote a basic guide on what is this robots.text file [...]

Speak Your Mind

*