• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here: Home » Configure the robots.txt in WordPress properly for easy crawling

By Abhishek Ghosh April 11, 2011 2:03 pm Updated on October 17, 2014

Configure the robots.txt in WordPress properly for easy crawling

Advertisement

We previously discussed about robot.txt in a separate article. However, this time, we will discuss about the optimum settings of this robots.txt file for your WordPress website.

Actually this robots.txt is nothing but a simple text file containing few instructions to the search robots (as we said ago) that is located in the /root/ of your server. So, it can be accessed by adding <span style="background-color: #e9eef3; font-color: #000000 font-size; font-family: Arial,Tahoma,Verdana; font-weight: bold; text-shadow: #fff 1px 1px;">/robots.txt</span> after any WordPress based website’s url. For example, for WordPress’s own official website, it is : <span style="background-color: #e9eef3; font-color: #000000 font-size; font-family: Arial,Tahoma,Verdana; font-weight: bold; text-shadow: #fff 1px 1px;">http://wordpress.org/robots.txt</span>.
For few reason, we need to modify this file.

robots.txt in WordPress

Note that, by default, there may be no actual robots.txt in the root of your server; Google get instruction set from WordPress itself (allow search engine’s to crawl option: do you remember?)  and generates a file that generally allows to crawl the whole server by Google or other search engine bots and location of the sitemap file.

Advertisement

---

You can create own from Google Webmaster’s tool (Crawler access). However, we will not use it. Just use any simple text editor (do not use wordpad or MS word, they will insert unwanted extra chunks of  characters).

 

So, our work flow will be:

 

  • Creating a custom robots.txt file
  • Uploading it to the root of your server using this tutorial or this tutorial or any FTP software or Online FTP.

 

Why we need to modify the robots.txt file?

 

  • Tell me, do you want to index all the files in your wp-content or wp-admin folder? Why you should allow the search bot to crawl those folders?
  • Technically, the crawling will be smooth, desired and well directed. That is, Google or other search engine bots reads the instruction which folders are marked as NOT TO CRAWL and spends the time for useful posts (which you desire, right?)
  • You are adding rel=”nofollow” to all the folders you are adding to disallow to crawl : this is obvious and more robust than simple fragile rel=”nofollow”. This logically can eliminate the comments to get indexed and so   <span style="background-color: #e9eef3; font-color: #000000 font-size; font-family: Arial,Tahoma,Verdana; font-weight: bold; text-shadow: #fff 1px 1px;">Disallow: /comments/</span> be used to remove duplicate content problem that shows up in Google Webmaster tool (HTML suggestion) : typical problem of WordPress. Though this type of duplicate content, does have very little issue with SERP or SEO, but still, we like clean things.
  • As a corollary of the above logic, your spammer friends’ hard try to get indexed with your site is ruled out.
  • Second probable corollary is, you are providing a relatively static text, which Google loves very much: you are getting comments and Google Bots are also happy.

 

So, you need to understand the basic to modify and apply according to your website’s need. We are using this robots file instruction set:

 

Sitemap: https://thecustomizewindows.com/sitemap.xml.gz <span style="color: #ff0000;">&lt;--This is for Yahoo!</span>

Sitemap: https://thecustomizewindows.com/sitemap.xml <span style="color: #ff0000;">&lt;---This is for other search engines</span> including Google.User-agent: *  <span style="color: #ff0000;">&lt;--This means allow all and except is added later</span>Disallow: /wp-admin/  <span style="color: #ff0000;">&lt;--Bots will not crawl this folder</span>Disallow: /wp-includes/ <span style="color: #ff0000;">&lt;--Bots will not crawl this folder</span>Disallow: /feed/ <span style="color: #ff0000;">&lt;--Bots will not crawl this feeds</span>Disallow: /trackback/ <span style="color: #ff0000;">&lt;--Bots will not crawl trackbacks</span>Disallow: /cgi-bin/ <span style="color: #ff0000;">&lt;--Bots will not crawl this folder</span>Disallow: /*.php$ <span style="color: #ff0000;">&lt;--Bots will not crawl all php files</span>Disallow: /*.js$ <span style="color: #ff0000;">&lt;--Bots will not crawl javascript</span>Disallow: /*.cgi$ <span style="color: #ff0000;">&lt;--Bots will not crawl this cgi bin</span>Disallow: /*.xhtml$ <span style="color: #ff0000;">&lt;--Bots will not crawl any xhtml document</span>Disallow: /*.php* <span style="color: #ff0000;">&lt;-- We have ensured not crawling of "php?p=123" format too.</span>Disallow: */trackback* <span style="color: #ff0000;">&lt;--As above</span>Disallow: /*?* <span style="color: #ff0000;">&lt;--As above</span>Disallow: /z/Disallow: /*.inc$Disallow: /*.css$Disallow: /*.txt$

 

Save the file as robots.txt will work fine and upload it to your server, thats it. For newbies, you can use our robots file (link on previous sentence), just change the domain name to yours, otherwise, Google will think you are cheating by providing the url of our sitemap instead your own.

Additional parameters like <span style="background-color: #e9eef3; font-color: #000000 font-size; font-family: Arial,Tahoma,Verdana; font-weight: bold; text-shadow: #fff 1px 1px;">Disallow: /comments/</span> , <span style="background-color: #e9eef3; font-color: #000000 font-size; font-family: Arial,Tahoma,Verdana; font-weight: bold; text-shadow: #fff 1px 1px;">Disallow: /tags/</span> etc. You can read on robotstxt.org for disallowing harmful search bots too.

Signature Tagged With paperuri:(3d985f5122690f8719bbe8e4f968d418)

This Article Has Been Shared 230 Times!

Facebook Twitter Pinterest
Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Orthopaedic Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to Configure the robots.txt in WordPress properly for easy crawling

  • Basic guide on choosing web hosting for successful WordPress blog

    Users gets confused when they think to get a server for Wordpress setup. Here is a basic guide to choose your webhost.

  • How easy and how much one can make money from a blog or forum?

    This is the power to earn money online. You have to work really hard on the blog at first, after a while, you can make money without doing virtually anything.

  • Making of WordPress theme : softwares and browsers addons

    List of the most useful tools needed for designing and coding a Wordpress theme.

  • 66 possible topics for your blog

    Here are some topics to write blogs, which are almost guaranteed to give you good ranking in search engine result pages for long run, if you do care about SEO.

  • Earn More Money per day online with Google Adsense using our top 7 tips

    Earn More Money from your blog using our top 7 tips from your Good AdSense Ad revenue.

Additionally, performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Comments

  1. AvatarBlogging Park says

    July 4, 2011 at 5:59 am

    Thanks Abhishek da,for this great info.Actually I am facing lot of duplicate content issue and some unwanted 302 redirection.I have done all the necessary SEO but yet its coming.I think now it will help me a lot.Thanks for share.
    have a great day.
    Manas Kabiraj.

  2. AbhishekAbhishek says

    July 4, 2011 at 10:55 am

    Thanks and welcome Manas. I have no duplicate issue now. Seems that “Nofollow” has little value. I tried using Platinum SEO to nofollow : simply does not work.

    You have to monitor it everyday, You’ll feel like playing a game : Google will throw every next day a new content duplicate issue. Disallow it (I use online FTP for easy editing), it’ll disappear next day. After a month or two, there will be no duplicate content.

    (Edited to make the comment shorter)

  3. AvatarBlogging Park says

    July 4, 2011 at 5:14 pm

    But what about 302 redirection?I have edited my robots.txt file and have disallow comments and comments/feed,I will see what Google are doing this.But please suggest some about removing 302 redirection.I have checked .htaccess file too,its clear still now.What are causing this redirection?

  4. AbhishekAbhishek says

    July 4, 2011 at 5:32 pm

    302 error in WordPress generally happen for the Host. Its a problem that arises from the Apache server configuration. Check your cPanel settings if they are OK. If you are using GoDaddy its probable that, other than changing the Host, there is no way to solve it. Generally the .htaccess file is not faulty. Ask in your Hosting company’s forum, if there is anyway to solve it.

    You can ask Mr. Sajal Kayan in his blog (sajalkayan.com), he might point it out more precisely.

    Sorry that I missed your 302 error in earlier comment. Please report me back what happened next.

  5. AvatarKhawer Khan says

    December 3, 2012 at 3:29 am

    What should i add in robots.txt to allow only homepage , tag and post to appear in search result and disallow everything else including category .Because each tag with meta description is more important for me then category due to keyword. And also it will be good for creating this robots.txt file or not.

Subscribe To Our Free Newsletter

You can subscribe to our Free Once a Day, Regular Newsletter by clicking the subscribe button below.

Click To Subscribe

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (20K Followers)
  • Twitter (4.9k Followers)
  • Facebook (5.8k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.2k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • Arduino : Independently Blink Multiple LED January 18, 2021
  • What is a Loosely Coupled System? January 17, 2021
  • How To Repack Installed Software on Debian/Ubuntu January 16, 2021
  • Components of Agile Software Development January 15, 2021
  • What is Conway’s Law? January 14, 2021

 

About This Article

Cite this article as: Abhishek Ghosh, "Configure the robots.txt in WordPress properly for easy crawling," in The Customize Windows, April 11, 2011, January 19, 2021, https://thecustomizewindows.com/2011/04/configure-the-robots-txt-in-wordpress-properly-for-easy-crawling/.

Source:The Customize Windows, JiMA.in

 

This website uses cookies. If you do not want to allow us to use cookies and/or non-personalized Ads, kindly clear browser cookies after closing this webpage.

Read Cookie Policy.

PC users can consult Corrine Chorney for Security.

Want to know more about us? Read Notability and Mentions & Our Setup.

Copyright © 2021 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy