Using robots.txt To Prevent Google From Indexing Your Site
There may be some reasons why you wouldn’t want Google to index your site. For one, there’s privacy. Sometimes you might have folders on your web host that you wouldn’t want the public to gain access to. Sure, you can always password protect them. But your non-password protected folders can also be safe from prying eyes as long as no one stumbles upon them (and what’s easier than through Google, right?).
Also, you might want to limit the number of pages Google indexes on your site for SEO purposes. Quantity may not always be the best way to approach optimization. some pages might just cause your Google rankings to drop, especially if these are unrelated or are in other languages (blogging plugins that automatically make translations are particularly problematic in this regard).
What to do? You can use the trusty robots.txt file to prevent Google from indexing your site, or specific directories or files on your site. Here’s how to use it.
First make sure your web server has a robots.txt file. If this is non-existent, then you’ll have to create it using a text editor (not rich text, but only plain text). Save this on your home directory, or where you would like the contents to have effect on.
Here’s the syntax. You will have to use two tags: “User-Agent” and “Disallow”.
User-Agent: Google
Disallow: /
User-Agent is for defining which clients to ban. Usually you would want to define Google or other search engines you want out. A list of User-Agents can be found .
“Disallow:” meanwhile is used to define the folders or files that the Googlebot should not index. A single slash means the entire folder and the hierarchy after it.
Or you can define entire subdirectories (and trees under them) by using
User-Agent: Google
Disallow: /testfolder
In this case, the directory testfolder and everything underteah it will be hidden from the Googlebot.
You can also define just specific files under the home or subdirectory.
User-Agent: Google
Disallow: /testfolder/testfile.html
This will tell the Googlebot not to read testfile.html, but the rest are readable.
Of course, this may not be 100% accurate, as some crawlers have a blatant disregard for robots.txt. BUt at least we try.
Don't miss another post! Subscribe by RSS feed or by email today!
Share this post! No comments, be the first!



