What does Canonical Representation mean and its potential vulnerability to websites
Asked Answered
C

4

9

I searched on google for a meaning of canonical representation and turned up documents that are entirely too cryptic. Can anyone provide a quick explanation of canonical representation and also what are some typical vulnerabilities in websites to canonical representation attacks?

Clydesdale answered 22/7, 2009 at 18:49 Comment(0)
N
11

Canonicalisation is the process by which you take an input, such as a file name, or a string, and turn it into a standard representation.

For example if your web application only allows access to files under C:\websites\mydomain then typically any input referring to filenames is canonicalised to be a physical, direct path, rather than one which uses relative paths. If you wanted to open C:\websites\mydomain\example\example.txt one input into that function may be example\example.txt. It's hard to work out if this goes outside the boundaries of your web site, so the canonicalisation function would look at the application directory and change that relative path into a physical one, C:\websites\mydomain\example\example.txt. This is obviously easier to check as you simply do a string compare on the start of the file path.

For HTML inputs you take inputs like %20 and canonicalise them by unencoding, so this would turn into a space. This is a good idea as the number of different ways of encoding are numerous, canonicalisation means you would check the decoded string only, rather than try to cover all the encoding variations.

Basically you are taking input which is logically equivalent and converting them to a standard form which you can then act upon.

Natatorium answered 22/7, 2009 at 19:4 Comment(3)
so potentially in an input field I could try and do an sql injection attack or possible XSS to bypass normal string sanitization?Clydesdale
Sanitation is different. Generally a SQL injection attack isn't going to use encoding, so it's not a canonicalisation issue. XSS may be, it depends on what you do. If you're encoding all input before outputing it then no, it's not. However if you're attempting to white list, or worse blacklist certain parts of a string then you would canonicalise the string first because, for example, <script> can also be represented by &lt;script&gt; or &amp;lt;script&amp;gt; and so onNatatorium
Ahhh I see. Thanks a lot that clears up everything I was looking for. Thanks!Clydesdale
A
4

The following explanation is from the "Application Security and Development STIG" found here:

3.11 Canonical Representation Canonical representation issues arise when the name of a resource is used to control resource access. There are multiple methods of representing resource names on a computer system. An application relying solely on a resource name to control access may incorrectly make an access control decision if the name is specified in an unrecognized format.

For example, in Windows, notepad.exe may be represented by the following file and path name combinations:

C:\Windows\System32\notepad.exe

%SystemRoot%\System32\notepad.exe

\?\C:\Windows\System32\notepad.exe

\host\c$\Windows\system32\notepad.exe

An application attempting to restrict access to the file based solely on the file path and name may improperly grant or deny access. The same issue may apply to other named resources on a system, such as a hard- and soft-links, URL, pipe, share, directory, device name, or within data files, if alternate encoding mechanisms are used with the data.

The following items may indicate potential canonical representation issues in an application:

• Access control decisions based upon a resource name.

• Failure to reduce a resource name to its canonical form before use.

In order to minimize canonical representation issues in the application, implement the following procedures:

• Do not rely solely on resource names to control access.

• If using resource names to control access, validate the names to ensure they are in the proper format; reject all names not fitting the known-good criteria.

• Use operating system-based access control mechanisms such as permissions and ACLs.

Adrienadriena answered 4/2, 2011 at 18:30 Comment(0)
H
0

Canonicalisation means reducing the data received to its simplest form, it's used for Input validation.

Hatfield answered 10/4, 2014 at 18:35 Comment(0)
R
-4

Canonical (I think) means that console input is "typical behavior". Non-canonical means that input is non-standard and requires special knowledge, such as the input behavior of "vi" on linux.

Readymade answered 22/7, 2009 at 18:54 Comment(3)
But how does that apply to websites and being vulnerable?Clydesdale
See blowdart's answer for a definition of canonicalOverbear
My answer doesn't conflict at all with Blowdarts answer except that I could explain it in one sentence. For programmers, efficiency is key.Readymade

© 2022 - 2024 — McMap. All rights reserved.