Localization proposal using gettext

This document describes a proposal to implement gettext for all translation strings in WikkaWiki, starting with Wikka 1.3. Note that this is only a proposal, so this document can and probably will be modified several times. Should the dev team decide to adopt gettext as the localization standard for Wikka, this page will be renamed to indicate this.

SVN checkout: svn co https://wush.net/svn/wikka/branches/1.3_gettext


The following references were used to develop the initial gettext implementation using the Wikka 1.3 development branch:

"Translating WordPress"
PHP-gettext dev blog (Danilo Segan)
PHP-gettext repository
GNU gettext manual
Some gettext notes from Pablo Hoch's blog

Implementation Notes

The PHP-gettext standalone library is used to implement gettext in Wikka. This eliminates the need for a Wikka administrator to ensure their version of PHP has gettext support compiled in. PHP-gettext requires no external libraries and only a minimal amount of configuration. It is licensed under GPLv2.

In Wikka 1.3, the PHP-gettext version 1.09 libraries are located in 3rdparty/core/php-gettext. No modifications are necessary when installing from the PHP-gettext version 1.09 release package.

Testing was conducted on a Windows 7 laptop running the excellent WampServer 2.0 package (Apache 2.2.11, PHP 5.2.11, and MySQL 5.1.36) using GNU gettext 0.17 tools under Cygwin.

Defines that were used as translation strings in lang/en/en.inc.php and related language files were replaced in source files with their English equivalents using a Perl script (jump to the end of this article for the script). The gettext macro used for all Wikka translation strings is T_ (the reason for this is that _ is already used by the installer). For instance, the following define:

if(!defined('FOOTER_PAGE_EDIT_LINK_DESC')) define('FOOTER_PAGE_EDIT_LINK_DESC', 'Edit page');

was replaced in the header.php source code file with the following:

T_('Edit page')

A file called localization.php is used in the Wikka top-level directory to configure PHP-gettext. This file should normally not require modification by the end-user. The file itself is invoked from within wikka.php via the include_once directive.

To use another locale, you have to add an entry in wikka.config.php for the parameter default_locale

    'default_locale' => 'fr_CH',

Locale directory structure

The locale directory is structured as follows:

locale/po <--contains the generic template file; must be copied to lang-specific directories for translation
locale/po/messages.pot <--generic template file
locale/en_US <--locale-specific
locale/en_US/LC_MESSAGES <--holds lang-specific translations
locale/en_US/LC_MESSAGES/en_US.po <--lang-specific template file, usually created by msginit
locale/en_US/LC_MESSAGES/en_US.mo <--compiled translation file, usually created by msgfmt

Generating the gettext template (.pot) file

Any time new translation macros (of the form T_(...)) are added to the source code, a new gettext template file must be generated. There are several different gettext utilities that can be used to generate this file. GNU gettext command-line utility examples are used in this document, so we will be using the gettext command from the Wikka top-level directory:

find ./ -name '*.php' | xargs xgettext -L PHP --force-po -kT_ -o locale/po/messages.pot

Creating language-specific template (.po) files

If one does not already exist, create a new directory structure under locale/ using BCP-47 language tags (validator). For instance:

mkdir -p locale/fr_CH/LC_MESSAGES

The GNU gettext command msginit can then be invoked to copy the messages.pot template file for use with the language to be translated:

msginit --locale fr_CH --input locale/po/messages.pot --output-file locale/fr_CH/LC_MESSAGES/fr_CH.po

Creating translations

Several utilities exist that can be used to modify .po files. Some of the available utilities are listed here. The file can also be modified manually in a text editor.

[More info needed? I really don't want this to become a translation how-to!]

Compiling translations (.po->.mo files)

Once translations in the .po file are complete, these must be compiled into a binary format for use by the PHP-gettext libraries. The GNU gettext msgfmt can be used here:

msgfmt -o locale/fr_CH/LC_MESSAGES/fr_CH.mo locale/fr_CH/LC_MESSAGES/fr_CH.po

If you are receiving multibyte errors when running this command, you will most likely have to manually edit the .po file, specifically the following line:

"Content-Type: text/plain; charset=UTF-8\n"

Merging translations



#! /usr/bin/perl -w
# expandDefines.pl: Expand defines, mark with T_() tag for gettext
# processing
# Usage: expandDefines.pl <lang>
# Author: Brian Koontz <brian@wikkawiki.org> Copyright 2010

# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
use strict;

# Create dictionary of defines appearing in lang file
	die "Usage: $0 <lang>\n";
my $lang = $ARGV[0];
my $langfile = "/cygdrive/c/wamp/www/wikka-gettext/lang/$lang/$lang.inc.php";
my %dict = ();
my $lineno = 0;
open(IN, "<$langfile") or die "Can't open $langfile for reading!";
	my @fields = split(/\s*,\s*/, $1);
	my $const = $fields[0];
	# Restore other commas
	my $val = join(', ', @fields[1..$#fields]);
	$const =~ s/['"](.*?)['"]/$1/;
	$val =~ s/['](.*?)[']/$1/;
	$dict{$const} = $val;
print "Number of language constants: $lineno\n";
close IN;

# Parse each .php file, replacing constants with expansion: T("..."). 
my @filelist = ();
# Exclude these files from search
my @exclude = ($langfile, '3rdparty', 'wikka.config.php');
# Exclude these strings from search
my @excludestrings = ('DIRECTORY_SEPARATOR', 'defined', 'define');
# Exclude these strings from gettext-wrapping
my @excludefromtranslation = ('class=', 'id=', 'name=');
use File::Find;
sub getFile
	if($File::Find::name=~/.*.php$/ &&
	   !grep $File::Find::name=~/$_/, @exclude)
		push(@filelist, $File::Find::name);
# Get list of files
find (\&getFile, ".");
$lineno = 0;
my $header = 0;
foreach my $file(@filelist)
	open(IN, "<$file") or die "Can't open $file for reading!";
	open(OUT, ">$file.new") or die "Can't open $file.new for writing!";
	my $search = "([A-Z0-9]+(_[A-Z0-9]+)+)";
		my $line = $_;
		if(!grep($line=~/$_/, @excludestrings))
			# Search for two or more consecutive upper-case letter groupings
			# separated by _
			while($line =~ /$search/g)
					if($dict{$1} =~ /^[0-9]+$/)
						1; # Don't do anything with defines 
						   # for numeric constants
					elsif(!grep($dict{$1}=~/$_/, @excludefromtranslation))
						$line =~ s/$search/T_("$dict{$1}")/;
						$line =~ s/$search/'$dict{$1}'/;
		print OUT $line;
	close IN;
	close OUT;
	system("cp $file $file.orig");
	system("cp $file.new $file");
	$lineno = 0;
	$header = 0;
There are no comments on this page.
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki