Skip to main content

Article | Insane SEO: Blocking Robots.txt from Being Indexed

Popularity Report

Total Popularity Score: 0

Loading...
Loading...
Loading...
Loading...
Loading...
Loading...

Rank

URL Tag Cloud

Bookmark History

Saved by 1 people (0 private), first by anonymouse user on 2008-10-11


Public Sticky notes

Highlighted by suzannah

on 2008-10-11 by suzannah

Smarty, Ann. 2008. Insane SEO: Blocking Robots.txt from Being Indexed. Blog. Search Engine Journal. May 28. http://www.searchenginejournal.com/insane-seo-blocking-robotstxt-from-being-indexed/6961/.

Google currently indexes 62,100 robots.txt files. Many of them have a decent PR while others have no backlinks at all (according to Yahoo Site Explorer at least):

Highlighted by suzannah

No links to robots.txt file

Highlighted by suzannah

The irony is that:

  • you can’t use robots.txt to block robots.txt (that’s truly insane, as in this case a search engine would be unable to crawl robots.txt file and thus to find out that it is unable to do that);
  • you are unable to use meta tags in a robots.txt file;
  • you can’t remove the file using Google Webmaster Tools because for that you either need to block it in robots.txt or use meta tags (you are unable to do that) or return 404 header which is also impossible (because it actually exists).

Highlighted by suzannah

According to the forum member:

At any rate, this does bring up the crazy question, how can you remove a robots.txt file from Google’s index? If you use robots.txt to block it, that would mean that googlebot should not even request robots.txt - an insane loop. And of course, you don’t use meta tags in a robots.txt file.

Interesting, isn’t it?

Highlighted by suzannah

Another board member suggested using an X-Robots-Tag in the HTTP header:

<FilesMatch “robots\.txt”>
Header set X-Robots-Tag “noindex, nofollow”
</FilesMatch>

Highlighted by suzannah

The solution looks pretty good and that’s also nice that SEOs started at last seeing value in the X-Robots-Tag which is vaguely used.

Another question is why on Earth you would need to block your robots.txt file from being indexed and ranked (a much easier solution would be removing the file completely). But that is not at all important in this case. The truth remains the same: webmasters should have and be aware of the ways to hide any of their pages from search crawlers or prevent it from appearing in SERPs.

Highlighted by suzannah

  • Barry Schwartz on May 28, 2008 at 1:05 pm

    We have a good comment on this thread, which we covered on May 15th, from Googler JohnMu:

    Check out his comment at http://www.seroundtable.com/archives/017139.html#comment-929618

  • Ann Smarty on May 29, 2008 at 8:31 am

    @Barry : I can’t believe I missed it! I do check your blog daily… Thank you for pointing that out to me!

    Ok, so two possible solutions:

    1/ disallow robots.txt in robots.txt (which won’t prevent the bot to check it);
    2/ use an X-Robots-Tag in the HTTP header

  • Highlighted by suzannah

    Readers (1)