Setup:UTF-8

From Cerb Wiki
Jump to: navigation, search

Recommended Method: Converting an *existing* MySQL database from latin1 to utf8

Use this method if at all possible as it will attempt to recover non-English latin1 characters (accents, umlauts) in your existing data. If you can't follow these instructions (e.g. no shell access, Windows, can only make one DB, no willpower) then use the Quick & Dirty alternate approach below.

  • Confirm your database is currently encoded in latin1.
  • Make a fresh backup (ideally using mysqlhotcopy)
  • Temporarily disable your cronjob so you don't have incoming mail trying to access the database.
  • The steps below require the 'iconv' utility to ensure proper UTF8 encoding. Type 'iconv --version' at the shell to check if it's installed. If not, install it from your platform's package manager (e.g. apt-get, yum, rpm).

Dump the database schema (use your own user + password + database): (shell)

mysqldump -Q -d -u root -p \
  --default-character-set=latin1 --skip-set-charset \
  c4_database | sed 's/latin1/utf8/gi' > cerb4.schema.sql

Dump the database data (use your own user + password + database): (shell)

mysqldump -Q --insert-ignore -t -u root -p \
  --default-character-set=latin1 --skip-set-charset \
  c4_database | iconv -c -f utf8 -t utf8 > cerb4.data.sql

Create a new UTF-8 database: (MySQL)

CREATE DATABASE cerb4_utf8 CHARACTER SET utf8 COLLATE utf8_general_ci;


Import into the new database: (shell)

mysql -u root -p --default-character-set=utf8 cerb4_utf8 < cerb4.schema.sql;
mysql -u root -p --default-character-set=utf8 cerb4_utf8 < cerb4.data.sql;

Update your /cerb4/framework.config.php:

define('APP_DB_DATABASE','cerb4_utf8'); // use your new database name

define('LANG_CHARSET_CODE','utf-8'); // iso-8859-1, utf-8
define('DB_CHARSET_CODE','utf8'); // latin1, utf8

Notes:

  • Be sure to set your privileges on the new database. For convenience you can reuse the settings from your framework.config.php file.
  • If you're feeling really confident about your backup policy, you can DROP your old database and CREATE the new UTF-8 one with the same name. We created a second database so you can verify everything works without destroying your old data.
  • You should run the helpdesk's Maintenance task from Helpdesk Setup->Scheduler. This will purge any orphaned data from the import's cleanup (e.g. spammer e-mail addresses of entirely non-printable characters).
    Technical explanation: When you remove non-printable characters you can end up with duplicate e-mail addresses like "@localhost". The dump/import process uses INSERT-IGNORE to ignore these duplicates, and maintenance purges any orphaned (spam) content from removing an e-mail address record. If this wasn't removed they'd become "ghost tickets" because the ticket<->address JOIN would no longer exist.
  • Don't forget to turn your cronjob back on to process new mail.

Quick & Dirty (MySQL5+): Converting an *existing* MySQL database from latin1 to utf8

If you couldn't use the recommended approach above then you can use this approach. It will only work in MySQL5 and it won't attempt to recover non-English latin1 characters (accents, umlauts) from existing data. This is generally only a problem if those characters occur frequently in the Address Book; if so, you'll need to fix the input manually. All new data will be properly encoded.

  • Make a backup!
  • Turn off your cronjob.
  • Place the following file in your /cerb4/ directory and open it in the browser:

upgrade_utf8.php:

<?php 
@set_time_limit(0);
require(getcwd() . '/framework.config.php'); 
require(DEVBLOCKS_PATH . 'Devblocks.class.php'); 

$db = DevblocksPlatform::getDatabaseService(); 
$tables = $db->MetaTables(); 

if(!$db->Execute(sprintf("ALTER DATABASE `%s` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci", 
APP_DB_DATABASE 
))) { 
die($db->ErrorMsg()); 
} 

foreach($tables as $t) { 
if(!$db->Execute(sprintf("ALTER TABLE `%s` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci", 
$t 
))) { 
die($db->ErrorMsg()); 
} 
} 

echo "Success!  Remove this script."; 
?>

Update your /cerb4/framework.config.php:

define('LANG_CHARSET_CODE','utf-8'); // iso-8859-1, utf-8
define('DB_CHARSET_CODE','utf8'); // latin1, utf8
  • Delete the upgrade_utf8.php script.
  • Restart your cronjob.