MySQL DISTINCT and accents
Asked Answered
P

2

6

A MySQL database running on Debian (version 5.5.41-0+wheezy1-log).

A table hotels with a column name VARCHAR(128) and engine is InnoDB.

CREATE TABLE `hotels` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT
  `name` varchar(128) NOT NULL DEFAULT '' COMMENT 'Hotel Name',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;

There are two records in this table:

1    BEST WESTERN PREMIER LE CARRE FOLIES OPERA
2    BEST WESTERN PREMIER LE CARRÉ FOLIES OPÉRA

When executing select DISTINCT name FROM hotels, the query is returning only 1 record, while 2 records were expected to be returned.

The DBMS doesn't seem to distinct between E and É.

How to change the table settings in order to get the expected result?

Premonish answered 10/4, 2015 at 21:16 Comment(0)
P
8

The table collation was set to utf8_general_ci. This was the default setting of the MySQL server and the schema.

There are 3 collation names available in MySQL 5.5:

  • A name ending in _ci indicates a case-insensitive collation.
  • A name ending in _cs indicates a case-sensitive collation.
  • A name ending in _bin indicates a binary collation. Character comparisons are based on character binary code values.

The collation had to be changed to utf8_bin. This can be done for the table and all columns with this query:

ALTER TABLE hotels CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;
Premonish answered 10/4, 2015 at 21:16 Comment(0)
S
0

Without altering the table collation, you could query DISTINCT with CAST:

SELECT DISTINCT(CAST(`name` AS CHAR CHARACTER SET utf8) COLLATE utf8_bin) `name`
FROM `hotels`
Sunglass answered 19/8, 2024 at 10:54 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.