Actually this robots.txt is nothing but a simple text file containing few instructions to the search robots (as we said ago) that is located in the /root/ of your server. So, it can be accessed by adding
/robots.txt after any WordPress based website’s url. For example, for WordPress’s own official website, it is :
For few reason, we need to modify this file.
Note that, by default, there may be no actual robots.txt in the root of your server; Google get instruction set from WordPress itself (allow search engine’s to crawl option: do you remember?) and generates a file that generally allows to crawl the whole server by Google or other search engine bots and location of the sitemap file.
You can create own from Google Webmaster’s tool (Crawler access). However, we will not use it. Just use any simple text editor (do not use wordpad or MS word, they will insert unwanted extra chunks of characters).
So, our work flow will be:
- Creating a custom robots.txt file
- Uploading it to the root of your server using this tutorial or this tutorial or any FTP software or Online FTP.
Why we need to modify the robots.txt file?
- Tell me, do you want to index all the files in your wp-content or wp-admin folder? Why you should allow the search bot to crawl those folders?
- Technically, the crawling will be smooth, desired and well directed. That is, Google or other search engine bots reads the instruction which folders are marked as NOT TO CRAWL and spends the time for useful posts (which you desire, right?)
- You are adding rel=”nofollow” to all the folders you are adding to disallow to crawl : this is obvious and more robust than simple fragile rel=”nofollow”. This logically can eliminate the comments to get indexed and so
Disallow: /comments/be used to remove duplicate content problem that shows up in Google Webmaster tool (HTML suggestion) : typical problem of WordPress. Though this type of duplicate content, does have very little issue with SERP or SEO, but still, we like clean things.
- As a corollary of the above logic, your spammer friends’ hard try to get indexed with your site is ruled out.
- Second probable corollary is, you are providing a relatively static text, which Google loves very much: you are getting comments and Google Bots are also happy.
So, you need to understand the basic to modify and apply according to your website’s need. We are using this robots file instruction set:
Sitemap: http://thecustomizewindows.com/sitemap.xml.gz <--This is for Yahoo!
Sitemap: http://thecustomizewindows.com/sitemap.xml <---This is for other search engines including Google.User-agent: * <--This means allow all and except is added later
Disallow: /wp-admin/ <--Bots will not crawl this folder
Disallow: /wp-includes/ <--Bots will not crawl this folder
Disallow: /feed/ <--Bots will not crawl this feeds
Disallow: /trackback/ <--Bots will not crawl trackbacks
Disallow: /cgi-bin/ <--Bots will not crawl this folder
Disallow: /*.php$ <--Bots will not crawl all php files
Disallow: /*.cgi$ <--Bots will not crawl this cgi bin
Disallow: /*.xhtml$ <--Bots will not crawl any xhtml document
Disallow: /*.php* <-- We have ensured not crawling of "php?p=123" format too.
Disallow: */trackback* <--As above
Disallow: /*?* <--As above
Save the file as robots.txt will work fine and upload it to your server, thats it. For newbies, you can use our robots file (link on previous sentence), just change the domain name to yours, otherwise, Google will think you are cheating by providing the url of our sitemap instead your own.
Additional parameters like
Disallow: /comments/ ,
Disallow: /tags/ etc. You can read on robotstxt.org for disallowing harmful search bots too.