• Home
  • Archive
  • Tools
  • Contact Us

The Customize Windows

Technology Journal

  • Cloud Computing
  • Computer
  • Digital Photography
  • Windows 7
  • Archive
  • Cloud Computing
  • Virtualization
  • Computer and Internet
  • Digital Photography
  • Android
  • Sysadmin
  • Electronics
  • Big Data
  • Virtualization
  • Downloads
  • Web Development
  • Apple
  • Android
Advertisement
You are here: Home » Configure the robots.txt in WordPress properly for easy crawling

By Abhishek Ghosh April 11, 2011 2:03 pm Updated on October 17, 2014

Configure the robots.txt in WordPress properly for easy crawling

Advertisement

We previously discussed about robot.txt in a separate article. However, this time, we will discuss about the optimum settings of this robots.txt file for your WordPress website.

Actually this robots.txt is nothing but a simple text file containing few instructions to the search robots (as we said ago) that is located in the /root/ of your server. So, it can be accessed by adding <span style="background-color: #e9eef3; font-color: #000000 font-size; font-family: Arial,Tahoma,Verdana; font-weight: bold; text-shadow: #fff 1px 1px;">/robots.txt</span> after any WordPress based website’s url. For example, for WordPress’s own official website, it is : <span style="background-color: #e9eef3; font-color: #000000 font-size; font-family: Arial,Tahoma,Verdana; font-weight: bold; text-shadow: #fff 1px 1px;">http://wordpress.org/robots.txt</span>.
For few reason, we need to modify this file.

robots.txt in WordPress

Note that, by default, there may be no actual robots.txt in the root of your server; Google get instruction set from WordPress itself (allow search engine’s to crawl option: do you remember?)  and generates a file that generally allows to crawl the whole server by Google or other search engine bots and location of the sitemap file.

Advertisement

---

You can create own from Google Webmaster’s tool (Crawler access). However, we will not use it. Just use any simple text editor (do not use wordpad or MS word, they will insert unwanted extra chunks of  characters).

 

So, our work flow will be:

 

  • Creating a custom robots.txt file
  • Uploading it to the root of your server using this tutorial or this tutorial or any FTP software or Online FTP.

 

Why we need to modify the robots.txt file?

 

  • Tell me, do you want to index all the files in your wp-content or wp-admin folder? Why you should allow the search bot to crawl those folders?
  • Technically, the crawling will be smooth, desired and well directed. That is, Google or other search engine bots reads the instruction which folders are marked as NOT TO CRAWL and spends the time for useful posts (which you desire, right?)
  • You are adding rel=”nofollow” to all the folders you are adding to disallow to crawl : this is obvious and more robust than simple fragile rel=”nofollow”. This logically can eliminate the comments to get indexed and so   <span style="background-color: #e9eef3; font-color: #000000 font-size; font-family: Arial,Tahoma,Verdana; font-weight: bold; text-shadow: #fff 1px 1px;">Disallow: /comments/</span> be used to remove duplicate content problem that shows up in Google Webmaster tool (HTML suggestion) : typical problem of WordPress. Though this type of duplicate content, does have very little issue with SERP or SEO, but still, we like clean things.
  • As a corollary of the above logic, your spammer friends’ hard try to get indexed with your site is ruled out.
  • Second probable corollary is, you are providing a relatively static text, which Google loves very much: you are getting comments and Google Bots are also happy.

 

So, you need to understand the basic to modify and apply according to your website’s need. We are using this robots file instruction set:

 

Sitemap: https://thecustomizewindows.com/sitemap.xml.gz <span style="color: #ff0000;">&lt;--This is for Yahoo!</span>

Sitemap: https://thecustomizewindows.com/sitemap.xml <span style="color: #ff0000;">&lt;---This is for other search engines</span> including Google.User-agent: *  <span style="color: #ff0000;">&lt;--This means allow all and except is added later</span>Disallow: /wp-admin/  <span style="color: #ff0000;">&lt;--Bots will not crawl this folder</span>Disallow: /wp-includes/ <span style="color: #ff0000;">&lt;--Bots will not crawl this folder</span>Disallow: /feed/ <span style="color: #ff0000;">&lt;--Bots will not crawl this feeds</span>Disallow: /trackback/ <span style="color: #ff0000;">&lt;--Bots will not crawl trackbacks</span>Disallow: /cgi-bin/ <span style="color: #ff0000;">&lt;--Bots will not crawl this folder</span>Disallow: /*.php$ <span style="color: #ff0000;">&lt;--Bots will not crawl all php files</span>Disallow: /*.js$ <span style="color: #ff0000;">&lt;--Bots will not crawl javascript</span>Disallow: /*.cgi$ <span style="color: #ff0000;">&lt;--Bots will not crawl this cgi bin</span>Disallow: /*.xhtml$ <span style="color: #ff0000;">&lt;--Bots will not crawl any xhtml document</span>Disallow: /*.php* <span style="color: #ff0000;">&lt;-- We have ensured not crawling of "php?p=123" format too.</span>Disallow: */trackback* <span style="color: #ff0000;">&lt;--As above</span>Disallow: /*?* <span style="color: #ff0000;">&lt;--As above</span>Disallow: /z/Disallow: /*.inc$Disallow: /*.css$Disallow: /*.txt$

 

Save the file as robots.txt will work fine and upload it to your server, thats it. For newbies, you can use our robots file (link on previous sentence), just change the domain name to yours, otherwise, Google will think you are cheating by providing the url of our sitemap instead your own.

Additional parameters like <span style="background-color: #e9eef3; font-color: #000000 font-size; font-family: Arial,Tahoma,Verdana; font-weight: bold; text-shadow: #fff 1px 1px;">Disallow: /comments/</span> , <span style="background-color: #e9eef3; font-color: #000000 font-size; font-family: Arial,Tahoma,Verdana; font-weight: bold; text-shadow: #fff 1px 1px;">Disallow: /tags/</span> etc. You can read on robotstxt.org for disallowing harmful search bots too.

Signature Tagged With paperuri:(3d985f5122690f8719bbe8e4f968d418)

This Article Has Been Shared 964 Times!

Facebook Twitter Pinterest

Abhishek Ghosh

About Abhishek Ghosh

Abhishek Ghosh is a Businessman, Surgeon, Author and Blogger. You can keep touch with him on Twitter - @AbhishekCTRL.

Here’s what we’ve got for you which might like :

Articles Related to Configure the robots.txt in WordPress properly for easy crawling

  • Basic guide on choosing web hosting for successful WordPress blog

    Users gets confused when they think to get a server for Wordpress setup. Here is a basic guide to choose your webhost.

  • How easy and how much one can make money from a blog or forum?

    This is the power to earn money online. You have to work really hard on the blog at first, after a while, you can make money without doing virtually anything.

  • Making of WordPress theme : softwares and browsers addons

    List of the most useful tools needed for designing and coding a Wordpress theme.

  • 66 possible topics for your blog

    Here are some topics to write blogs, which are almost guaranteed to give you good ranking in search engine result pages for long run, if you do care about SEO.

  • Earn More Money per day online with Google Adsense using our top 7 tips

    Earn More Money from your blog using our top 7 tips from your Good AdSense Ad revenue.

Additionally, performing a search on this website can help you. Also, we have YouTube Videos.

Take The Conversation Further ...

We'd love to know your thoughts on this article.
Meet the Author over on Twitter to join the conversation right now!

If you want to Advertise on our Article or want a Sponsored Article, you are invited to Contact us.

Contact Us

Comments

  1. AvatarBlogging Park says

    July 4, 2011 at 5:59 am

    Thanks Abhishek da,for this great info.Actually I am facing lot of duplicate content issue and some unwanted 302 redirection.I have done all the necessary SEO but yet its coming.I think now it will help me a lot.Thanks for share.
    have a great day.
    Manas Kabiraj.

  2. AbhishekAbhishek says

    July 4, 2011 at 10:55 am

    Thanks and welcome Manas. I have no duplicate issue now. Seems that “Nofollow” has little value. I tried using Platinum SEO to nofollow : simply does not work.

    You have to monitor it everyday, You’ll feel like playing a game : Google will throw every next day a new content duplicate issue. Disallow it (I use online FTP for easy editing), it’ll disappear next day. After a month or two, there will be no duplicate content.

    (Edited to make the comment shorter)

  3. AvatarBlogging Park says

    July 4, 2011 at 5:14 pm

    But what about 302 redirection?I have edited my robots.txt file and have disallow comments and comments/feed,I will see what Google are doing this.But please suggest some about removing 302 redirection.I have checked .htaccess file too,its clear still now.What are causing this redirection?

  4. AbhishekAbhishek says

    July 4, 2011 at 5:32 pm

    302 error in WordPress generally happen for the Host. Its a problem that arises from the Apache server configuration. Check your cPanel settings if they are OK. If you are using GoDaddy its probable that, other than changing the Host, there is no way to solve it. Generally the .htaccess file is not faulty. Ask in your Hosting company’s forum, if there is anyway to solve it.

    You can ask Mr. Sajal Kayan in his blog (sajalkayan.com), he might point it out more precisely.

    Sorry that I missed your 302 error in earlier comment. Please report me back what happened next.

  5. AvatarKhawer Khan says

    December 3, 2012 at 3:29 am

    What should i add in robots.txt to allow only homepage , tag and post to appear in search result and disallow everything else including category .Because each tag with meta description is more important for me then category due to keyword. And also it will be good for creating this robots.txt file or not.

Subscribe To Our Free Newsletter

Get new posts by email:

Please Confirm the Subscription When Approval Email Will Arrive in Your Email Inbox as Second Step.

Search this website…

 

Popular Articles

Our Homepage is best place to find popular articles!

Here Are Some Good to Read Articles :

  • Cloud Computing Service Models
  • What is Cloud Computing?
  • Cloud Computing and Social Networks in Mobile Space
  • ARM Processor Architecture
  • What Camera Mode to Choose
  • Indispensable MySQL queries for custom fields in WordPress
  • Windows 7 Speech Recognition Scripting Related Tutorials

Social Networks

  • Pinterest (24.3K Followers)
  • Twitter (5.8k Followers)
  • Facebook (5.7k Followers)
  • LinkedIn (3.7k Followers)
  • YouTube (1.3k Followers)
  • GitHub (Repository)
  • GitHub (Gists)
Looking to publish sponsored article on our website?

Contact us

Recent Posts

  • What is Configuration Management February 5, 2023
  • What is ChatGPT? February 3, 2023
  • Zebronics Pixaplay 16 : Entry Level Movie Projector Review February 2, 2023
  • What is Voice User Interface (VUI) January 31, 2023
  • Proxy Server: Design Pattern in Programming January 30, 2023

About This Article

Cite this article as: Abhishek Ghosh, "Configure the robots.txt in WordPress properly for easy crawling," in The Customize Windows, April 11, 2011, February 6, 2023, https://thecustomizewindows.com/2011/04/configure-the-robots-txt-in-wordpress-properly-for-easy-crawling/.

Source:The Customize Windows, JiMA.in

PC users can consult Corrine Chorney for Security.

Want to know more about us? Read Notability and Mentions & Our Setup.

Copyright © 2023 - The Customize Windows | dESIGNed by The Customize Windows

Copyright  · Privacy Policy  · Advertising Policy  · Terms of Service  · Refund Policy

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie SettingsAccept
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT