It is usual to get huge 404 errors like that of post images in Google WebMasters Tools. Here is permanent fix for these WordPress 404 errors. Usually we face three or four types of 404 errors. These can get exacerbated when Google Bots re-crawls the whole site, usually an old site again for some reason – change in server (crawl increases on A Grade web hosts due to many factors), change in domain ownership data, implementing site-wide HTTPS, changing the domain name etceteras. WordPress is not very web software – it is a PHP driven, resource hungry web software. WordPress is famous for Frontend HTML output; but basically WordPress is not a web software which can be labelled as new generation web software. MySQL is one of the best database software, WordPress can make it appearing as a living problem. PHP is not very good programming language itself plus some core files create logical conflict in the database. Add the penalty for the badly coded Plugins.
WordPress can make a very good website with lot content to appear as spammy to search engine bots, it can render the webmaster appearing a lazy person.
Understanding Types of 404 Errors Before Fixing WordPress 404 Errors For Non Existing URLs
Many years ago we wrote about some 404 WordPress Plugins. Redirection is the best plugin to internally monitor the 404 errors, use PHP based 301 redirections. While 301 Redirection is usually good for a moved content, it is kind of black hat to do 301 against the URLs which never existed. Also, if you get 1000 404 errors against 5000 URLs on your website, neither it is practical to manually do 301 redirection nor it will do anything good. Suppose you are 301 redirecting :
Google or any search engine will never take it as bad attempt. Even you can change the permalink structure :
# it does not exist, please do not try the URL
Google bots understand
/2011/09/ has nothing to do with title or text on the webpage. On SERP, they usually get stripped off on green colored URLs. But, if you have something like this as 404 :
and you again do a 301 redirection to :
this is not good. Basically Google was trying to fetch the image file like this :
Fixing WordPress 404 Errors For Non Existing URLs – Conflict of MIME Types
Google understood the MIME Type. If you really served an image with trailing slash stripping the file extension, it would do no harm. But a soft 404 on first attempt and 301 later can simply harm your SERP and Google Bots crawling. Unless whitelisted manually ( there are only few domains ), this is a big mistake to do 301 redirection to an internally appearing link from a static file within a page. SEO Experts are born to cheat, they will offer you some WordPress Plugin to automate the 301 process and show all error gone away. Here is a simple cURL command to show you how a computer will logically think :
curl -I https://thecustomizewindows.com/Windows-7-Speech-Recognition-Related-Tutorials-Compiled-Index/Windows-7-Speech-Recognition-Related-Tutorials-Compiled-Index/
Here is the return if we do a 301 redirection :
HTTP/1.1 301 Moved Permanently
Cache-Control: no-cache, must-revalidate, max-age=0
Content-Type: text/html; charset=UTF-8
Date: Fri, 22 Aug 2014 17:59:22 GMT
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Set-Cookie: X-Mapping-fjhppofk=F778D0884D86358FEB6DFC2C6852207C; path=/
Set-Cookie: PHPSESSID=3inlme1mrsoob8plb8vslre4o3; path=/
Look at the
Content-Type, it has been changed to
text/html. It can mean to Google Bot’s logic that you are auto generating content. This is first issue, second is – your real image file is not getting indexed. You are dually penalized. Forget about the extra load on servers for doing 301 using PHP via Plugin. But if I run cURL with the image URL :
curl -I https://thecustomizewindows.com/wp-content/uploads/2012/04/Internet-Media-Type-or-Content-Types.png
I will get this return :
HTTP/1.1 200 OK
Cache-Control: max-age=86400, max-age=86400, public, must-revalidate, proxy-revalidate
Date: Fri, 22 Aug 2014 18:08:19 GMT
Expires: Sat, 23 Aug 2014 18:08:19 GMT
Set-Cookie: X-Mapping-fjhppofk=F778D0884D86358FEB6DFC2C6852207C; path=/
Last-Modified: Mon, 26 Sep 2011 06:26:54 GMT
Look at the
Content-Type, it is rightly
image/png. It will mean to Google Bot logic that this is not for indexing on Google Web Search, this will be referred to Google Image Search Bot. Whether you have a XML site map for images or not, invariably Google Image Search will list it. We use XML site map for images for faster indexing.
Yes, it is possible to get all the headers of all the contents of a web page with cURL. It will require a full tutorial how to do it – that is off topic but practically all search engine bots will run equivalent script for initial evaluation of a webpage.
Understand Why It Arises Before Fixing WordPress 404 Errors For Non Existing URLs
We Must Not Have Any /post-name/post-image-name/ Type of 404 Errors.
I am telling you how to detect it and resolve it. It has nothing to do with Apache2’s .htaccess, WordPress Permalink or anything related to server. If you have fixed it by virtual host file manipulation, you are far away from detecting the issue. This is kind of prescribing pain killer for treating a patient suffering from cancer. Pain will get reduced – but the disease will not get proper treatment.
First disable any kind of caching – plugin or any server side cache. Clear all the cache directories content by recursively removing from SSH. Disable all the Plugins except Hello Dolly like 100% good plugins. Disable all cache settings, delete all kind of cache – database, object cache etc. Disable CDN. Activate WordPress default theme. Now you have no SEO Plugin. For very bad database you might need to throughly wash your database. But this is among the last options. Go to any post any hover or right click on the attachment image. It should be like my this page (current one has done nothing wrong, if you want to see another example) :
If I hover on the image, it should show me the full URL :
If I click the image, that image should open or hardly it can point to another webpage like category. If I overdo on page SEO any link towards the parent page, with pretty permalinks ON, it will create an endless loop – quite normal. Most of the Plugin developers (and WordPress itself too) do this wrong. This is not a physically existing directory or page :
The original is quite pathetic looking :
So, we committed a kind of crime 301 at least once! If you used that ugly URLs, linking to the originating post would not create an infinite loop. Because that attachment’s (image) url would also be a different looking and ugly. For that post, WordPress has assigned a reference number –
wp-image-36399. For managing that part, WordPress has Media Library.
You wanted to redirect the attachment page to the original post – otherwise with pretty permalinks ON, you’ll get duplicate content issue. Someone ( plugin or theme ) has created this URL for attachment page :
It is likely (why likely, this is definitely the reason) that you are using one or two plugins or ways for attachment page redirection (with or without knowing you are doing) and using a PHP based template for auto generating attachment page link. Double logic making PHP confused and it is printing a bizarre URL.
Show me a single fresh WordPress Installation with 3-4 posts, with default themes or plugins; where this problem is present – it will never happen. From technical point of view, we should use this format :
But, Google has faulty algorithm ( it is possibly not correctable ) – Google gives priority to the KEYWORDS on URLs – that is why people pay a huge to acquire EMD and BMDs. Exact matched domain with exact matched Post URL with good content will always rank higher up. For Google’s fault we actually use the pretty permalinks.
?p=36395 is technically used by WordPress all processing on the backend.
Fixing WordPress 404 Errors For Non Existing URLs – For The Above Type of 404
So we have explained the mechanism of action behind the problem. You have to make it like default WordPress settings – image should be image. You can only redirect the attachment page to the original post by setting the logic once only. If you have done some error by manual working – sadly, you need to edit all the pages – there is no SQL query which can solve the issue – there is query, but it can destroy / un-associate the GUID. After you are 100% sure that there is no such odd url are getting auto generated, go to Google Webmasters Tools and select all the errors and mark as Fixed. In Redirection Plugin, reset the log.
Obviously you will get some GENUINE 404 errors – they can be only be detected from only Redirection Plugin not Google Webmasters Tools. You can 301 redirect such pages to proper post or category. It happens that we have wrongly linked any example like
https://thecustomizewindows.com/anyname.php, it is better to create a real file with minimum information.
If you ever did any
manual on page optimization for the images by manually inserting urls to the images, we will suggest to check all the posts. It sounds horrible, but actually you’ll find that you have not done the crime more than 25% of all the posts. Edit 10-20 posts everyday. Also it will help to correct any typographical errors. There is no such plugin or SQL query exits which can fully reset to default – as you have manually overridden, you have to manually correct. Also Google Webmasters Tools will start reporting of such small number of posts as 404. As hopefully they will not cross more than 100, you can do it manually. Obviously it is better to avoid getting caught by Google Bots for the second time. One hint – Redirection Plugin catches the error before Google Webmasters Tools report! So, you have to keep an eye almost like an army!