If a URL contains a quote how do you specify the rel=canonical value?
Asked Answered
H

5

5

Say the path of your URL is:

/thisisa"quote/helloworld/

Then how do you create the rel=canonical URL?

Is this kosher?

<link rel="canonical" href="/thisisa&amp;quot;/helloworld/" />

UPDATE

To clarify, I'm getting a form submission, I need to convert part of the query string into the URL. So the steps are:

  1. .htaccess does the redirect
  2. PHP processes a directory as a query string.
  3. The query string will be dynamically inserted into the:
    • Title,
    • Description,
    • Keywords
    • Canonical URL.
    • Spit back into the form's input box

So I need to know which processing has to be done each step of the way...On the first cut, this is my take:

  • Title: htmlspecialchars($rawQuery)
  • Description: htmlspecialchars($rawQery)
  • Keywords: htmlspecialchars($rawQuery)
  • Canonical URL: This is the tricky part. It must match the same URL .htaccess redirects to but even so, I think the raw query is unsafe because quotes can cause JavaScript injection. Worried about urlencode($rawquery) since it's coming from the URL, wouldn't it already be URL-encoded?
  • Spit back into form: htmlspecialchars($rawQuery)
Henhouse answered 9/10, 2009 at 23:0 Comment(0)
C
9

You have to split your question into two:

Do I need to encode the double quotation mark character in the URL path?

Yes, the quotation mark character (U+0022) is not allowed in plain and must be encoded with %22.

Do I need to encode the double quotation mark character in a HTML attribute value?

It depends on how you declare the attribute value:

By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa. Authors may also use numeric character references to represent double quotes (&#34;) and single quotes (&#39;). For double quotes authors can also use the character entity reference &quot;.

  • If you’re using double quotation mark character to declare the attribute value (attr="value"), then you must encode the douvke quoteation mark character inside the attribute value declaration with a character reference (&quot;, &#34; or &#x22;).
  • If you’re using the single quotation mark character (U+0027) for your attribute value declaration (attr='value'), then you don’t need to encode the quotation mark character. But it’s recommended to do so.

And since you have slash and a double quotation mark in your attribute value, the third case (using no quotes at all) is not applicable:

In certain cases, authors may specify the value of an attribute without any quotation marks. The attribute value may only contain letters (a-z and A-Z), digits (0-9), hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII decimal 95), and colons (ASCII decimal 58). We recommend using quotation marks even when it is possible to eliminate them.

Now bringing both answers together

Since a double quotation mark must be encoded in a URL (but the single quotation mark is!), you can use the following to do so with the path segments or you URL path:

$path = '/thisisa"quote/helloworld/';
$path = implode('/', array_map('rawurlencode', explode('/', $path)));

And if you want to put that URL path in a HTML attribute, use the htmlspecialchars function to encode remaining special HTML characters:

echo '<link rel="canonical" href="' . htmlspecialchars($path) . '" />';
Carboniferous answered 12/10, 2009 at 14:58 Comment(0)
L
7

Use URL escaping, in this case %22

http://everything2.com/title/URL+escape+sequences

Lavettelavigne answered 9/10, 2009 at 23:2 Comment(0)
P
1

I would say you want to use the HEX value for a quote which is %22.

Read this to learn more about URL Encoding.

Prepossess answered 9/10, 2009 at 23:4 Comment(0)
E
1

A quote is not even a valid URL character, so I think long-term you should address this. It is specifically excluded from the URI syntax by RFC 2396.

To solve the immediate problem though, you'll need to escape the character, using %22.

Ejective answered 9/10, 2009 at 23:5 Comment(1)
In reality, I'm not choosing the url. I'm having to partially convert a query string into a url...and need to make sure that query string is safe whereever it's used. I'm going to update the question with some follow ups.Henhouse
G
1

If the URL contains a double quote then contain it with single quotes.

<link rel="canonical" href='foo.com/thisisa"/helloworld/' />

Do not use HTML encoding in URI strings. That is invalid syntax as the ampersand must be encoded in URIs since it is a function special character. Instead always use percent encoding for URIs.

Gastrotrich answered 12/10, 2009 at 14:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.