7

We have issues with utf8-string comparisons in MySQL 5, regarding case and accents :

from what I gathered, what MySQL implements collations by considering that "groups of characters should be considered equal".

For example, in the utf8_unicode_ci collation, all the letters "EÉÈÊeéèê" are in the same box (together with other variants of "e").

So if you have a table containing ["video", "vidéo", "vidÉo", "vidÊo", "vidêo", "vidÈo", "vidèo", "vidEo"] (in a varchar column declared with ut8_general_ci collation) :

  • when asking MySQL to sort the rows according to this column, the sorting is random (MySQL does not enforce a sorting rule between "é" and "É" for example),
  • when asking MySQL to add a Unique Key on this column, it raises an error because it considers all the values are equal.

What setting can we fiddle with to fix these two points ?

PS : on a related note, I do not see any case sensitive collation for the utf8 charset. Did I miss something ?


[edit] I think my initial question still holds some interest, and I will leave it as is (and maybe one day get a positive answer).

It turned out, however, that our problems with string comparisons regarding accents was not linked to the collation of our text columns. It was linked to a configuration problem with the character_set_client parameter when talking with MySQL - which defaulted to latin1.

Here is the article that explained it all to us, and allowed us to fix the problem :

Getting out of MySQL character set hell

It is lengthy, but trust me, you need this length to explain both the problem and the fix.

3

1 Answer 1

1

Use collation that considers these characters to be distinct. Maybe utf8_bin (it's case sensitive, since it does binary comparison of characters)

http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets.html

7
  • My point is : isn't there a collation which would allow both ? consider these characters distinct AND sort them in a consistent order ?
    – LeGEC
    Commented Aug 11, 2011 at 9:32
  • I'm not sure, but if there is not, you can create your own collation (since MySQL 5.5) dev.mysql.com/doc/refman/5.5/en/adding-collation.html
    – Mchl
    Commented Aug 11, 2011 at 9:34
  • @LeGEC, Yes the sort order is consistent for _bin because the values are distinctly ordered. What do you mean "allow both" when they already do allow both?
    – Pacerier
    Commented Oct 19, 2014 at 0:46
  • @Pacerier: you are right : sorting in utf8_bin will always give the same order ; I was thinking "consistent for humans", though : have à greater than a and smaller than b, é greater than e and smaller than d, etc ... so that sorting by name (for example) gives what the end user would expect.
    – LeGEC
    Commented Oct 23, 2014 at 10:07
  • @LeGEC, So which collation did you end up using?
    – Pacerier
    Commented Oct 23, 2014 at 14:17

Not the answer you're looking for? Browse other questions tagged or ask your own question.