Installation Experiences/Jun 2010: UTF8 on a LATIN1 database

From Davical
Jump to navigationJump to search

Background

In an ideal world, your PostgreSQL database will be set up to allow the use of UTF8, which DAViCal requires. However, I needed to get up running quickly on a server that was set up for LATIN1, and the suggested option of running a separate cluster wouldn't fly for various reasons.

Obviously another approach is to convert the existing stuff to using UTF8, but time constraints meant that wasn't an option for me.

Approaches currently under consideration

At first glance, both of these options seem to work ok:

  • Forget about the client encoding setting and just let it be stored as LATIN1. What happens in the future when we do migrate to UTF8?
  • Set up the database as SQL_ASCII, which simply stores whatever the client sends it.

The good idea that turned out not to be...

The theory went like this:

Postgres can do automatic character set conversion

When you set up your config file, you could add an option to the database connection string: options='--client_encoding=UTF8'

So the line in your config file would look something like this:

$c->pg_connect[] = "dbname=davical user=general port=5433 host=somehost password=mypass options='--client_encoding=UTF8'";

Before you create the database, you would edit the database creation script as follows:

--- /usr/share/davical/dba/create-database.sh.orig   2010-03-12 00:12:07.000000000 +1300
+++ /usr/share/davical/dba/create-database.sh      2010-06-14 02:44:30.000000000 +1200
@@ -68,7 +68,7 @@
 create_db_user "${AWL_APPUSER}"
 
 # FIXME: Need to check that the database was actually created.
-if ! createdb --encoding UTF8 --template template0 --owner "${AWL_DBAUSER}" "${DBNAME}" ; then
+if ! createdb --encoding LATIN1 --template template0 --owner "${AWL_DBAUSER}" "${DBNAME}" ; then
   echo "Unable to create database"
   exit 1
 fi

(If you were to simply remove the encoding option, it would default to SQL_ASCII.)

...and what's wrong with it

It turns out that even something as common as so-called "smart quotes" cannot be converted to LATIN1. The database update fails, and the client is sent an HTTP 500 error. The message in the Apache error long says something to the effect of

SQL error "22P05" - ERROR: character 0xe2809c of encoding "UTF8" has no equivalent in "LATIN1

or

SQL error "22P05" - ERROR: character 0xe28099 of encoding "UTF8" has no equivalent in "LATIN1"

Since we want to to use this with Windows clients which turn apostrophes into "smart quotes", this is not going to be usable