How to correctly canonicalize a URL in an ASP.NET MVC application?
Asked Answered
M

3

12

I'm trying to find a good general purpose way to canonicalize urls in an ASP.NET MVC 2 application. Here's what I've come up with so far:

// Using an authorization filter because it is executed earlier than other filters
public class CanonicalizeAttribute : AuthorizeAttribute
{
    public bool ForceLowerCase { get;set; }

    public CanonicalizeAttribute()
        : base()
    {
        ForceLowerCase = true;
    }

    public override void OnAuthorization(AuthorizationContext filterContext)
    {
        RouteValueDictionary values = ExtractRouteValues(filterContext);
        string canonicalUrl = new UrlHelper(filterContext.RequestContext).RouteUrl(values);
        if (ForceLowerCase)
            canonicalUrl = canonicalUrl.ToLower();

        if (filterContext.HttpContext.Request.Url.PathAndQuery != canonicalUrl)
            filterContext.Result = new PermanentRedirectResult(canonicalUrl);
    }

    private static RouteValueDictionary ExtractRouteValues(AuthorizationContext filterContext)
    {
        var values = filterContext.RouteData.Values.Union(filterContext.RouteData.DataTokens).ToDictionary(x => x.Key, x => x.Value);
        var queryString = filterContext.HttpContext.Request.QueryString;
        foreach (string key in queryString.Keys)
        {
            if (!values.ContainsKey(key))
                values.Add(key, queryString[key]);
        }
        return new RouteValueDictionary(values);
    }
}

// Redirect result that uses permanent (301) redirect
public class PermanentRedirectResult : RedirectResult
{
    public PermanentRedirectResult(string url) : base(url) { }

    public override void ExecuteResult(ControllerContext context)
    {
        context.HttpContext.Response.RedirectPermanent(this.Url);
    }
}

Now I can mark up my controllers like this:

[Canonicalize]
public class HomeController : Controller { /* ... */ }

This all appears to work fairly well, but I have the following concerns:

  1. I still have to add the CanonicalizeAttribute to every controller (or action method) I want canonicalized, when it's hard to think of a situation where I won't want this behaviour. It seems like there should be a way to get this behaviour site-wide, rather than one controller at a time.

  2. The fact that I'm implementing the 'force to lower-case' rule in the filter seems wrong. Surely it would be better to somehow role this up into the route url logic, but I can't think of a way to do this in my routing configuration. I thought of adding @"[a-z]*" constraints to the controller and action parameters (as well as any other string route parameters), but I think this will cause the routes to not be matched. Also, because the lower-case rule isn't being applied at the route level, it's possible to generate links in my pages that have upper-case letters in them, which seems pretty bad.

Is there something obvious I'm overlooking here?

Maggot answered 26/9, 2010 at 9:19 Comment(2)
There is a way to get it site wide in MVC3 via global filters but in less than MVC3, you'll need to create a base controller, apply the attribute to that and derive all controllers from it. I've got to ask what the use case for this is though?Intelligibility
SEO. Ensuring spiders coming in from incorrectly formatted links are redirected (permanently) to the correct one - the lower-casing is just an aside really. en.wikipedia.org/wiki/Canonicalization#Search_Engines_and_SEOMaggot
V
20

I have felt the same "itch" regarding the relaxed nature of the default ASP.NET MVC routing, ignoring letter casing, trailing slashes, etc. Like you, I wanted a general solution to the problem, preferably as part of the routing logic in my applications.

After searching the web high and low, finding no useful libraries, I decided to roll one myself. The result is Canonicalize, an open-source class library that complements the ASP.NET routing engine.

You can install the library via NuGet: Install-Package Canonicalize

And in your route registration: routes.Canonicalize().Lowercase();

Besides lowercase, several other URL canonicalization strategies are included in the package. Force www domain prefix on or off, force a specific host name, a trailing slash, etc. It is also very easy to add custom URL canonicalization strategies, and I am very open to accept patches adding more strategies to the "official" Canonicalize distribution.

I hope you or anyone else will find this helpful, even if the question is a year old :)

Vitus answered 5/10, 2011 at 15:58 Comment(3)
Will definitely check it out.Maggot
works well for redirecting to a hostname that maches the ssl cert - thanksDnepropetrovsk
Excellent library, just what I was looking for. Thanks.Goldeye
M
10

MVC 5 and 6 has the option of generating lower case URL's for your routes. My route config is shown below:

public static class RouteConfig
{
    public static void RegisterRoutes(RouteCollection routes)
    {
        // Imprive SEO by stopping duplicate URL's due to case or trailing slashes.
        routes.AppendTrailingSlash = true;
        routes.LowercaseUrls = true;

        routes.IgnoreRoute("{resource}.axd/{*pathInfo}");

        routes.MapRoute(
            name: "Default",
            url: "{controller}/{action}/{id}",
            defaults: new { controller = "Home", action = "Index", id = UrlParameter.Optional });
    }
}

With this code, you should no longer need the canonicalize the URL's as this is done for you. One problem that can occur if you are using HTTP and HTTPS URL's and want a canonical URL for this. In this case, it's pretty easy to use the above approaches and replace HTTP with HTTPS or vice versa.

Another problem is external websites that link to your site may omit the trailing slash or add upper-case characters and for this you should perform a 301 permanent redirect to the correct URL with the trailing slash. For full usage and source code, refer to my blog post and the RedirectToCanonicalUrlAttribute filter:

/// <summary>
/// To improve Search Engine Optimization SEO, there should only be a single URL for each resource. Case 
/// differences and/or URL's with/without trailing slashes are treated as different URL's by search engines. This 
/// filter redirects all non-canonical URL's based on the settings specified to their canonical equivalent. 
/// Note: Non-canonical URL's are not generated by this site template, it is usually external sites which are 
/// linking to your site but have changed the URL case or added/removed trailing slashes.
/// (See Google's comments at http://googlewebmastercentral.blogspot.co.uk/2010/04/to-slash-or-not-to-slash.html
/// and Bing's at http://blogs.bing.com/webmaster/2012/01/26/moving-content-think-301-not-relcanonical).
/// </summary>
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Class, Inherited = true, AllowMultiple = false)]
public class RedirectToCanonicalUrlAttribute : FilterAttribute, IAuthorizationFilter
{
    private readonly bool appendTrailingSlash;
    private readonly bool lowercaseUrls;

    #region Constructors

    /// <summary>
    /// Initializes a new instance of the <see cref="RedirectToCanonicalUrlAttribute" /> class.
    /// </summary>
    /// <param name="appendTrailingSlash">If set to <c>true</c> append trailing slashes, otherwise strip trailing 
    /// slashes.</param>
    /// <param name="lowercaseUrls">If set to <c>true</c> lower-case all URL's.</param>
    public RedirectToCanonicalUrlAttribute(
        bool appendTrailingSlash, 
        bool lowercaseUrls)
    {
        this.appendTrailingSlash = appendTrailingSlash;
        this.lowercaseUrls = lowercaseUrls;
    } 

    #endregion

    #region Public Methods

    /// <summary>
    /// Determines whether the HTTP request contains a non-canonical URL using <see cref="TryGetCanonicalUrl"/>, 
    /// if it doesn't calls the <see cref="HandleNonCanonicalRequest"/> method.
    /// </summary>
    /// <param name="filterContext">An object that encapsulates information that is required in order to use the 
    /// <see cref="RedirectToCanonicalUrlAttribute"/> attribute.</param>
    /// <exception cref="ArgumentNullException">The <paramref name="filterContext"/> parameter is <c>null</c>.</exception>
    public virtual void OnAuthorization(AuthorizationContext filterContext)
    {
        if (filterContext == null)
        {
            throw new ArgumentNullException("filterContext");
        }

        if (string.Equals(filterContext.HttpContext.Request.HttpMethod, "GET", StringComparison.Ordinal))
        {
            string canonicalUrl;
            if (!this.TryGetCanonicalUrl(filterContext, out canonicalUrl))
            {
                this.HandleNonCanonicalRequest(filterContext, canonicalUrl);
            }
        }
    }

    #endregion

    #region Protected Methods

    /// <summary>
    /// Determines whether the specified URl is canonical and if it is not, outputs the canonical URL.
    /// </summary>
    /// <param name="filterContext">An object that encapsulates information that is required in order to use the 
    /// <see cref="RedirectToCanonicalUrlAttribute" /> attribute.</param>
    /// <param name="canonicalUrl">The canonical URL.</param>
    /// <returns><c>true</c> if the URL is canonical, otherwise <c>false</c>.</returns>
    protected virtual bool TryGetCanonicalUrl(AuthorizationContext filterContext, out string canonicalUrl)
    {
        bool isCanonical = true;

        canonicalUrl = filterContext.HttpContext.Request.Url.ToString();
        int queryIndex = canonicalUrl.IndexOf(QueryCharacter);

        if (queryIndex == -1)
        {
            bool hasTrailingSlash = canonicalUrl[canonicalUrl.Length - 1] == SlashCharacter;

            if (this.appendTrailingSlash)
            {
                // Append a trailing slash to the end of the URL.
                if (!hasTrailingSlash)
                {
                    canonicalUrl += SlashCharacter;
                    isCanonical = false;
                }
            }
            else
            {
                // Trim a trailing slash from the end of the URL.
                if (hasTrailingSlash)
                {
                    canonicalUrl = canonicalUrl.TrimEnd(SlashCharacter);
                    isCanonical = false;
                }
            }
        }
        else
        {
            bool hasTrailingSlash = canonicalUrl[queryIndex - 1] == SlashCharacter;

            if (this.appendTrailingSlash)
            {
                // Append a trailing slash to the end of the URL but before the query string.
                if (!hasTrailingSlash)
                {
                    canonicalUrl = canonicalUrl.Insert(queryIndex, SlashCharacter.ToString());
                    isCanonical = false;
                }
            }
            else
            {
                // Trim a trailing slash to the end of the URL but before the query string.
                if (hasTrailingSlash)
                {
                    canonicalUrl = canonicalUrl.Remove(queryIndex - 1, 1);
                    isCanonical = false;
                }
            }
        }

        if (this.lowercaseUrls)
        {
            foreach (char character in canonicalUrl)
            {
                if (char.IsUpper(character))
                {
                    canonicalUrl = canonicalUrl.ToLower();
                    isCanonical = false;
                    break;
                }
            }
        }

        return isCanonical;
    }

    /// <summary>
    /// Handles HTTP requests for URL's that are not canonical. Performs a 301 Permanent Redirect to the canonical URL.
    /// </summary>
    /// <param name="filterContext">An object that encapsulates information that is required in order to use the 
    /// <see cref="RedirectToCanonicalUrlAttribute" /> attribute.</param>
    /// <param name="canonicalUrl">The canonical URL.</param>
    protected virtual void HandleNonCanonicalRequest(AuthorizationContext filterContext, string canonicalUrl)
    {
        filterContext.Result = new RedirectResult(canonicalUrl, true);
    }

    #endregion
}

Usage example to ensure all requests are 301 redirected to the correct canonical URL:

filters.Add(new RedirectToCanonicalUrlAttribute(
    RouteTable.Routes.AppendTrailingSlash, 
    RouteTable.Routes.LowercaseUrls));
Moussaka answered 30/9, 2014 at 14:21 Comment(5)
Why use an authorization filter when you should be using an action filter? Bacon isn't gammon for a reason.Moton
IAuthorizationFilter is the correct choice. See this answer. The filter gets executed earlier in the request pipeline. Ideally if we are going to redirect a user, we should do it as early as possible.Moussaka
That doesn't say anything about when you should use authorization filters instead of action filters. I think on this occasion we should agree to disagree.Moton
Fair enough. The RequireHttpsAttribute is also using the IAuthorizationFilter to perform a redirect. It's performance vs naming, it's up to the devs what they prefer.Moussaka
That is because it relates to security. Your canonical URL redirection does not.Moton
Y
6

Below is how I do my canonical URLs in MVC2. I use IIS7 rewrite module v2 to make all my URLs lowercase and also strip trailing slashes so don't need to do it from my code. (Full blog post)

Add this to the master page in the head section as follows:

<%=ViewData["CanonicalURL"] %>
<!--Your other head info here-->

Create a Filter Attribute (CanonicalURL.cs):

public class CanonicalURL : ActionFilterAttribute
{
    public string Url { get; private set; }

    public CanonicalURL(string url)
    {
       Url = url;
    }

    public override void OnResultExecuting(ResultExecutingContext filterContext)
    {
        string fullyQualifiedUrl = "http://www.example.com" + this.Url;
        filterContext.Controller.ViewData["CanonicalUrl"] = @"<link rel='canonical' href='" + fullyQualifiedUrl + "' />";
        base.OnResultExecuting(filterContext);
    }
}

Call this from your actions:

[CanonicalURL("Contact-Us")]
public ActionResult Index()
 {
      ContactFormViewModel contact = new ContactFormViewModel(); 
      return View(contact);
}

For some other interesting articles on Search Engine Related posts check out Matt Cutts blog

Yellowlegs answered 15/11, 2010 at 21:0 Comment(4)
I guess there is an error here. Look at SO, if you edit the title in the URL it still redirects to here. With your code I guess it would render an wrong URL. SO renders the same regardless of the pathKilloran
The URL is injected from the action. It's job is not to redirect just to indicate this is the primary url for this resource.Yellowlegs
i like this idea, much tidyer than trying to determine it automatically from the route or some hack like that. +1 AAA would comment againAgathy
Thanks for your comment, hope it helped in some way! :)Yellowlegs

© 2022 - 2024 — McMap. All rights reserved.