To expand on @Martin's answer:
How you set a "character set" in SQL Server depends on the datatype that you are using. If you are using:
NVARCHAR
, NCHAR
, and NTEXT
(NTEXT
is deprecated and shouldn't be used as of SQL Server 2005) all use the Unicode character set and this cannot be changed. These datatypes are all encoded as UTF-16 LE (Little Endian) – a 16-bit encoding with each "character" being either 2 or 4 bytes – and this too cannot be changed. For these datatypes, the Collation being used only affects the locale (as determined by the LCID of the Collation) which determines the set of rules used for sorting and comparison.
XML
, like the N
-prefixed types, uses the Unicode character set and is encoded as UTF-16 LE (Little Endian), and neither of those can be changed. But unlike the other string datatypes, there is no Collation associated with XML
data as it cannot be sorted or compared (at least not without first converting it to NVARCHAR(MAX)
[preferred] or VARCHAR(MAX)
).
VARCHAR
, CHAR
, and TEXT
(TEXT
is deprecated and shouldn't be used as of SQL Server 2005) are all 8-bit encodings with each "character" being either 1 or 2 bytes. The character set is determined by the Code Page associated with each Collation. The sorting and comparison rules depend on the type of Collation being used:
- SQL Server Collations: These all have names starting with
SQL_
and have been deprecated since SQL Server 2000, though are (unfortunately) still in wide use today. These use simple rules indicated as the "SQL Server Sort Order" number as found in the description
field returned by sys.fn_helpcollations()
.
- Windows Collations: These all have names that do not start with
SQL_
. These Collations allow the non-Unicode string data to use the Unicode sorting and comparison rules indicated by the LCID of the Collation.
That being said, to find out which character set (for CHAR
, VARCHAR
, and TEXT
– i.e. non-Unicode – data) is being used, run the following query and pay close attention to the CodePage
field. The LCID
field indicates the locale used for sorting and comparison rules for the N
-prefixed – i.e. Unicode – types as well as the non-Unicode types if using a Windows Collation:
SELECT *,
COLLATIONPROPERTY(col.[name], 'CodePage') AS [CodePage],
COLLATIONPROPERTY(col.[name], 'LCID') AS [LCID]
FROM sys.fn_helpcollations() col
ORDER BY col.[name];
The Code Page IDs can be translated into something more meaningful via the MSDN page for Code Page Identifiers.
Regarding the O.P.'s comment on @Martin's answer:
It is unfortunate that they chose the misleading/incomplete term "collation" which clearly refers to sort order: collate definition.
While it is true that Microsoft could have done better when choosing a name, there is unfortunately a general, industry-wide confusion over terms such as "encoding", "character set", "collation", etc. Microsoft's use (or misuse) of "Collation" has merely contributed to the mass confusion. But, that confusion is also evident in MySQL as shown in this question, given that "utf8" is specifically not a character set ;-).
UTF-8 is one of several encodings for the Unicode character set. UTF-16 and UTF-32 are the other two encodings. All three of those encodings represent the exact same Unicode character set, just in different ways. Looking at the list of MySQL character sets – 11.1.10 Supported Character Sets and Collations – the "ucs2", "utf8", "utf8mb4", "utf16", "utf16le", "utf32" charsets are not actually character sets, per se, but various representations of the Unicode character set. But, given the overlap between the concepts of "character set" and "encoding", it would be difficult to not have this confusion. The 11.1.10.1 Unicode Character Sets page indicates that the "utf8mb4", "utf16", "utf16le", and "utf32" charsets are the full Unicode character sets while "ucs2" and "utf8" are subsets of the Unicode character set, specifically the first 65,536 code points (a.k.a. Basic Multilingual Plane (BMP)).
For more info regarding Collation across various RDBMS's, please see my answer to the following question on DBA.StackExchange:
Does any DBMS have a collation that is both case-sensitive and accent-insensitive?
UPDATE 2018-10-02
While this is not a viable option yet, SQL Server 2019 introduces native support for UTF-8 in VARCHAR
/ CHAR
datatypes. There are currently too many bugs with it for it to be used, but if they are fixed, then this is an option for some scenarios. Please see my post, "Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?", for a detailed analysis of this new feature.
varchar
column you also set the code page. – Dx