Sunday, May 27, 2012

Turkish I Problem

Turkish has finally made its fame in the world. Every self respecting software developer needs to know about it these days. The problem is that it is all for the wrong reasons. Turkish is wracking havoc through internationalized software because of a problem in its alphabet design, infamously named the "Turkish I Problem".

Briefly, the problem is caused by Turkish mapping the letter i-I in the Latin character set to two different letters i-İ and ı-I in the Turkish alphabet.

LetterLowercaseUppercase
English IiI
Turkish Dotted Iiİ
Turkish Dotless IıI

These conflicting rules complicate case conversions between upper and lower case text. Computer application use case conversions frequently and they may break when used with Turkish Locale.

As an example consider the popular children's book "Alice Harikalar Diyarında" / "Alice in Wonderland". When converted to uppercase it becomes "ALİCE HARİKALAR DİYARINDA" using Turkish capitalization rules or "ALICE HARIKALAR DIYARINDA" using English capitalization rules. The problem is that both of them are wrong. I have marked the incorrect conversions in red. The correct capitalization should be "ALICE HARİKALAR DİYARINDA", unless we are in the business of translating proper nouns to Turkish too.

As a result any text that mixes Turkish and another language with a Latin based alphabet cannot be case converted easily. Non Turkish names and text in Turkish databases are almost always case converted incorrectly. The reverse is also true.

See also
http://www.i18nguy.com/unicode/turkish-i18n.html
http://www.codinghorror.com/blog/2008/03/whats-wrong-with-turkey.html

Read Part 2, Part 3

No comments:

Post a Comment