Hiding Your Website From Google

Written by: Peter Jalbert on Wednesday, September 13th, 2006
Posted to: Censorship, Google, Privacy, Search, Webmaster
3 comments, add yours!

Google seems to know everything. After all, they can monitor your searches, your preferences, your favorite sites, and–yes–theoretically, even your email (if you use Gmail). It seems it’s difficult to hide from Google’s prying eyes, but in reality there are ways to lessen Google’s ability to go Big Brother on us. For instance, we earlier gave out a few tips on how to keep your searching activity private. This includes using different Web proxies and even avoiding browser plugins that keep track of your preferences.

Today we will give some tips on hiding your website from Google.

Most people would be excited when they get referral hits from Google, or when they rank top (or even on front page) for a keyword of particular interest. However, some people, for some reason, would prefer their sites not to be discoverable or searchable. Some would prefer their websites or blogs to be available only to a select few.

The options

  1. Password-protection. Of course the first thing you can do is password protect your website or blog. You can use the regular HTTP password protection, which you can activate using your host’s control panel. Or, if you use a hosted blogging solution, you can use the service’s own password protection mechanisms. Livejournal, Multiply and other social networking/blogging sites have features that lock your posts for viewing by your friends only. This keeps Google from indexing your posts, since the Google bots cannot crawl your password-protected sites in the first place.
  2. Robots.txt. If you don’t want to password protect your site, you can simply prevent the Google crawlers from indexing your site using the robots.txt file. This option is applicable if you have access to your hosting account. Simply edit the file robots.txt on your root www folder (or create one if it does not exist) and add the following lines of text:User-agent: *
    Disallow: /
     

    This will tell all search-bots not to crawl your site. You can actually define which subfolders in your site to allow or disallow by stating “Allow: /(foldername)” or “Disallow: /(foldername)”. You can even define which particular crawlers to disallow. For instance, if it’s only the Google bot that you don’t want to crawl your site, indicate “User-agent: Googlebot”.

  3. The Robots Meta Tag. If you don’t have access to your site’s hosting account, but can edit the layout (such as the blog theme), you can use the robots meta tag instead. Be sure to look for the part of your theme that’s within the <header> and </header> tags (usually where the other “meta” tags are located) and add this line:<meta name="robots" content="index,nofollow">This means the search bots will neither index your site, nor would they follow the links from your site to other sites elsewhere (the significance of which we will explain later). This would only work for the particular page where the meta tag is included, and not each and every other page on your site. However, if your site or blog uses a theme or layout where this line is included in the header line in all your pages, then this will apply to the other pages, too.
  4. Don’t link to your site. Lastly, hiding from Google can be as simple as putting up your site (or a readable directory) under an obscure subfolder under your root domain and never, ever linking to it from another site. For instance, instead of allowing directory access to http://yoursite.com, you can instead keep your files under http://yoursite.com/someweirdname. Remember never to link to this site from another site (or from within the same domain), so that Google will not discover it. Of course, if you do have access to your host account, anyway, it’s always a good idea to create a robots.txt file. It’s also good to turn directory lisitings off, for your own security, unless it’s your intention to share files this way, or access your files remotely from your own host or server.

Remember that Google would respect the robots.txt and robots meta tag, but it might not be the same for other crawlers, especially those that are not legitimate (like programs that scrape content from other sites). So if your intention is to hide just from Google, MSN, Yahoo! and the other major engines, then go ahead use these. But if you really really want privacy, then of course, the best option would be to go for full password protection.

Don't miss another post! Subscribe by RSS feed or by email today!

Share this post!   3 comments, add yours!

3 Responses to “Hiding Your Website From Google”

  1. […] More Google Hacking: Looking for Hidden StuffMurder. Powered by Google.Playing and Learning With Google MapsGoogle FORCED by Belgium to remove all Belgian news!Hiding Your Website From Google […]

  2. […] Using robots.txt To Prevent Google From Indexing Your Site There may be some reasons why you wouldn’t want Google to index your site. For one, there’s privacy. Sometimes you might have folders on your web host that you wouldn’t want the public to gain access to. Sure, you can always password protect them. But your non-password protected folders can also be safe from prying eyes as long as no one stumbles upon them (and what’s easier than through Google, right?). […]

  3. […] Google cache your page, then you would have to have it delisted from the index altogether. Here are a few tips from our archives on how you can do […]

Trackback URI | Comments RSS

Leave a Reply