Common issues with crawling – and how to fix them

ikona-data-publikacji

Published

03/04/2023

ikona-data-publikacji

Published

03/04/2023

Contents

The robots.txt file allows you to precisely control what GoogleBot crawls (or scans) on our website. So it’s no wonder that optimizing technical SEO usually starts with improving this file. We can block an access in .htaccess – it doesn’t matter as long as the robots.txt file doesn’t return a 429 code or 5** error.

How does crawling work?

Every time Google crawls a website, it visits the robots.txt file – and then retrieves the URL addresses from the queue and starts scanning them.

Simple? Yes, however –

Google may not take your changes into account.

Why?

  • Because it has already used the previously downloaded file.
  • Because your change is “too small” for it and it has decided not to update the file.

What to do about it?

1/ Go to the hidden tab in Google Search Console: Robots.txt tester (or click here).

2/ Check if the displayed text from the robots.txt file is exactly the same as the uploaded robots.txt file.

2a/ If it is the same – great. Check again to make sure!
2b/ If it is not the same – use the submit button in the lower right corner of the screen. This way, you ask Google to check and update your robots.txt file quicker. Thanks to this, it will apply your new directives from that file.

3/ Enjoy the fact that GoogleBot takes the appropriate robots.txt file into account.

Robots.txt tester is a useful tool. You can also check the date of the latest registered version of the robots.txt file and test if your entries do not block the crawling (scanning) of important URL addresses.

Image licence by Freepik

Share on facebook
Share on Twitter

Subscribe to our newsletter

Our newsletter is just one email per month. It will keep you up to date with the latest substantive articles and Linkhouse features that we are continually working on to improve your link building process.

Related articles

We share knowledge, case studies and interview experts to help your business leverage the full marketing potential of the web.