How to prevent Google from crawling UserDir URLs (but not the real domain)?

Go To StackoverFlow.com

0

We have clients who build their site on a UserDir URL before their real domain goes live. The UserDir URL is always in the format:

http://1.2.3.4/~johndoe

Sometimes, Google crawls these UserDir URLs and the temporary site will show up in results even after the site is live on http://johndoe.com

So, once a client is live on http://johndoe.com, how can I prevent Google from crawling the UserDir address?

(of course, I need Google to crawl the real domain because SEO is important to our clients)

2012-04-04 19:03
by Callmeed
Have you tried using a robots.txt file on 1.2.3.4 - Adam Mihalcin 2012-04-04 19:06
Well, both the temp URL and real domain point to the same httpdocs .. - Callmeed 2012-04-04 19:23


0

I use the canonical tag for this purpose. If you put the canonical tag on the index.html file like such:

<link rel="canonical" href="http://johndoe.com/" />

Then when Googlebot finds it at http://1.2.3.4/~johndoe it will know that it is a duplicate of http://johndoe.com/ and Google will index the correct one. Googlebot will see the same tag when it crawls the real site and not have a problem with the self-referential canonical.

2012-04-06 15:30
by Stephen Ostermiller
Ads