How to Exclude *.publishwithagility.com or *.azurewebsites.net Domains from being Crawled

All Agility hosted websites have an internal reference alias for the domain. This is often used for CNAME records, or for testing before adding your live domain name. Once you go live, it is possible that these alias domain names could be crawled by search engines. This can be troublesome for SEO.

There are a couple ways to resolve this.

  1. Add a custom robots.txt handler where you can dynamically control the output and instruct search engines to either crawl on the currently loaded domain or disallow access.

  2. Register the AgilityRobotsHandler in your web.config and Agility.Web will handle this for you.

The easiest way to resolve this is to use the AgilityRobotsHandler, so we'll cover that in this example. Here's how you can enable it:

  1. In your web.config, add the following line:

    ...
    <system.webServer> <handlers> <add name="AgilityRobotsHandler" path="robots.txt" verb="GET" type="Agility.Web.HttpHandlers.RobotsFileHandler" preCondition="integratedMode,runtimeVersionv4.0" /> </handlers>
    ... </system.webServer>
    ...

 2. Now, when the website is loaded over the *.publishwithagility.com domain or the *.azurewebsites.net domain the resulting robots.txt will be:

User-agent: *
Disallow: /
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.