====== MySQL and UTF-8 ====== ===== Notes ===== Good support from 4.1 **utf-8** is **utf8** in MySQL. A collation defines the sort order for the data, it may be case sensitive or not To find out your current setup: SHOW VARIABLES LIKE 'character_set_database'; SHOW VARIABLES LIKE 'character_set_client'; To see available character sets and collations on your database: SHOW CHARACTER SET; SHOW COLLATION LIKE 'utf8%'; Character set and collation can be set per server, database, table, connection; Server (**/etc/my.cnf**): [mysqld] ... default-character-set=utf8 default-collation=utf8_general_ci Database: (CREATE | ALTER) DATABASE ... DEFAULT CHARACTER SET utf8 Table: (CREATE | ALTER) TABLE ... DEFAULT CHARACTER SET utf8 Connection: SET NAMES 'utf8'; A PHP mysql connection (//not totally confirmed//, but see tests below) defaults to a latin1 connection, so, your first query after connection should be: mysql_query("SET NAMES 'utf8'"); In php versions 5.2 and later, use mysql_set_charset('utf8',$conn); The CONVERT() function can convert between charsets, eg: INSERT INTO utf8table (utf8column) SELECT CONVERT(latin1field USING utf8) FROM latin1table; As mentioned in [[php:118n:charsets]], field widths may need to be increased to deal with multi-byte characters. Code to generate a mass change of collations: '; while ($row_tables = mysql_fetch_row($rs_tables)) { $table = mysql_real_escape_string($row_tables[0]); // Alter table collation // ALTER TABLE `account` DEFAULT CHARACTER SET utf8 if ($show_alter_table) { echo("ALTER TABLE `$table` DEFAULT CHARACTER SET $character_set;\r\n"); } $rs = mysql_query(" SHOW FULL FIELDS FROM `$table` ") or die(mysql_error()); while ($row=mysql_fetch_assoc($rs)) { if ($row['Collation']!=$convert_from) continue; // Is the field allowed to be null? if ($row['Null']=='YES') { $nullable = ' NULL '; } else { $nullable = ' NOT NULL'; } // Does the field default to null, a string, or nothing? if ($row['Default']==NULL) { $default = " DEFAULT NULL"; } else if ($row['Default']!='') { $default = " DEFAULT '".mysql_real_escape_string($row['Default'])."'"; } else { $default = ''; } // Alter field collation: // ALTER TABLE `account` CHANGE `email` `email` VARCHAR( 50 ) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL if ($show_alter_field) { $field = mysql_real_escape_string($row['Field']); echo "ALTER TABLE `$table` CHANGE `$field` `$field` $row[Type] CHARACTER SET $character_set COLLATE $convert_to $nullable $default; \r\n"; } } } ?> ===== MySQL tables, columns and connections with UTF-8 ===== What happens when you ''INSERT'' a UTF-8 string into a MySQL database using PHP's [[http://php.net/mysql|MySQL extension]]? Well that depends on what string you use. If you use something like "Iñtërnâtiônàlizætiøn" then doing just about anything in MySQL will be data-safe (but, of course, the collation will be incorrect). If you have characters that don't encode the same in UTF-8 and latin1 (e.g. text in Chinese, Russian, ...) then the behaviour depends on **both** the table definition and the encoding of the connection to the database. | ^ Table or column charset ^^ | ^ latin1 (MySQL default) ^ utf8 ^ ^ default connection ''mysql_pconnect()'' |data is binary-safe but not encoded properly* |data is binary-safe but not encoded properly* | ^ using ''SET NAMES 'utf8';'' |data destroyed; string is converted to "????" |works fine! | * It seems that MySQL will look at the data as being a series of bytes all within the latin1 codepage. If you use the same code to read and write the data then it will round-trip fine (but MySQL's collation will obviously be wrong). If you use tools with a knowledge of the data types used by MySQL, like [[http://phpmyadmin.sf.net|phpMyAdmin]] or even mysql_dump, then the wrong encoding is obvious. (this test was performed with PHP versions 4.4.2 and 5.1.2 and MySQL 5.0.20; [[http://pastebin.com/751239|PHP source for test]]) ===== Further Reading ===== * [[http://blog.stuartherbert.com/gentoo.php/2006/01/11/fun_with_collations|Fun with Collations]] * [[http://dev.mysql.com/doc/mysql/en/charset-general.html|MySQL documentation]] * [[http://www.bluetwanger.de/blog/2006/11/20/mysql-and-utf-8-no-more-question-marks/|MySQL and UTF-8 - no more question marks]]