Enabling Sinhala in GNU/Linux HOWTO


Table of Contents
1. About This Guide
1.1. Revisions
2. Introduction
2.1. Learning about Sinhala
2.2. Learning about Unicode
2.3. Standards
2.4. Mailing Lists
2.4.1. Sinhala CVS Commits List (in English for developers)
2.4.2. Sinhala Technical List (in English for developers)
2.4.3. Sinhala Linux List (in English for users)
2.4.4. Sinhala Unicode List (in Sinhala for developers and users)
3. Enabling Sinhala
3.1. Debian and Ubuntu Sinhala repositories
3.1.1. What is available?
3.1.2. How to install
3.1.3. How to test
3.2. Fonts
3.3. Firefox/Mozilla
3.3.1. Debian 4.0 (Etch)
3.3.2. Ubuntu 5.10
3.3.3. Ubuntu 6.06
3.3.4. Ubuntu 6.10
3.3.5. Fedora Core 3
3.3.6. Fedora Core 4 & Above
3.4. Open Office
3.4.1. Debian 4.0 (Etch)
3.4.2. Ubuntu 6.10
3.4.3. Fedora Core 5
3.4.4. Fedora Core 6
3.4.5. Open Office 2.1
3.5. Input Methods
3.5.1. Keyboard Layouts: X Keyboard Extension
3.5.2. Character Maps
3.6. Locales
4. Developer Notes
4.1. Open Type Fonts
4.1.1. List of feature tags
4.1.2. Glyph Naming
4.1.3. Indic
4.1.4. How Freefont obtained Sinhala glyphs
4.2. Renderer
4.2.1. Pango
4.2.2. ICU
4.3. Firefox/Mozilla
4.3.1. Debian
4.3.2. Ubuntu
4.3.3. Fedora Core
4.3.4. Epiphany Browser
4.4. Open Office
4.4.1. Open Office 2.0.4
4.5. Input Methods
4.5.1. Keyboard Layouts
4.5.2. XKB - adding a new keyboard layout
4.5.3. SCIM
4.5.4. m17n
4.5.5. xmodmap
4.5.6. gvim
4.6. Databases
4.6.1. MySQL 5.0.27
4.7. Locales
4.8. Translations
4.8.1. GNOME
4.8.2. KDE
4.8.3. Ubuntu
4.9. DONE
4.10. TODO
5. Resources/Links
5.1. Input Methods
5.2. Internationalisation
5.3. Localisation
5.4. Sinhala
5.5. Typography
5.6. Unicode
6. Conclusion

1. About This Guide

Sinhala is the main language of Sri Lanka. This guide describes the level of Sinhala support available in GNU/Linux. It also describes how to enable Sinhala support and the tasks that still require attention.

This guide is GNOME and Debian/Ubuntu centric. Most of the explanations and suggestions should also be applicable to other distributions.


1.1. Revisions

  • v0.1 - 2004/06/05

  • v0.2 - 2004/10/07

  • v0.3 - 2005/03/06

  • v0.4 - 2006/04/03

  • v0.5 - 2006/11/20 - in progress


2. Introduction

The level of Sinhala support in GNU/Linux distributions has improved quite significantly as patches have been committed upstream.

Debian 4.0 (Etch/testing) and Ubuntu 6.10 (Edgy) ship with SLS1134 support in Pango, the GNOME renderer. They also ship with a phonetic Sinhala keyboard layout in Xorg and a basic, incomplete, Unicode Sinhala font. Open Office's renderer, ICU, has SLS1134 support in Debian 4.0 (Etch/testing) but Ubuntu 6.10 ships with ICU 3.4 and does not have SLS1134 support.

As Sinhala support in GNU/Linux distributions improve, this HOWTO will only be of historical value. However, for the time being, use this document to check the current status of Sinhala support in GNU/Linux and how to enable better support.

Many individuals have contributed to the project. Some of the notable contributors are:


2.1. Learning about Sinhala


3. Enabling Sinhala

3.1. Debian and Ubuntu Sinhala repositories

3.1.1. What is available?

  • LKLUG Unicode Sinhala font

  • libicu (renderer used by Open Office) with Sinhala support

  • SCIM transliterated input method

    • Automatically set the required environment variables in /etc/environment


3.1.2. How to install

  1. On Debian Etch (testing) add to /etc/apt/sources.list

    deb http://sinhala.sourceforge.net/debian/i386/etch/ ./

    On Ubuntu Edgy (6.10) add to /etc/apt/sources.list

    deb http://sinhala.sourceforge.net/ubuntu/i386/edgy/ ./
  2. Update repository metadata:

    apt-get update
  3. Install Sinhala packages:

    apt-get install sinhala-gnu-linux
  4. Upgrade libicu package:

    apt-get upgrade
  5. Logout and login again. Environment variables need to be set/updated (NO NEED TO REBOOT)


3.1.3. How to test

  1. Visit http://si.wikipedia.org/ and see if the Sinhala letters render correctly.

  2. Copy and paste some of the content from Sinhala wikipedia to Open Office Writer. Then highlight the Sinhala text and choose the LKLUG font to display them.

  3. To test SCIM, press Control-space whilst you are running a GNOME application. Then select one of the Sinhala input methods.


3.2. Fonts

Download a Unicode Sinhala font:

If you are using a modern GNU/Linux version and it has fontconfig installed, all you have to do is make a .fonts directory in your home directory:

mkdir ~/.fonts

and copy the True/Open Type font into that directory.

If you want to make the font available to all users of the system, become root and copy the font to:

/usr/share/fonts

In both the above cases, run:

fc-cache -fv

To check which font file provides the Sinhala support, run:

fc-list :lang=si file

Immediately you'll be able to read Unicode Sinhala in these programs:

  • Anything gtk2 based

    • evolution

    • gedit

    • gucharmap

    • Firefox/Mozilla (built with gtk2, FreeType2 and Pango support)

If you have Pango 1.8.2 and greater, you will have full SLS1134 Sinhala support.


3.3. Firefox/Mozilla

3.3.1. Debian 4.0 (Etch)

Pango is enabled by default in Debian 4.0.


3.3.2. Ubuntu 5.10

Pango is enabled by default in Ubuntu 5.10.


3.3.3. Ubuntu 6.06

Ubuntu 6.06 users can enable Pango in Firefox by setting an environment variable:

MOZ_DISABLE_PANGO=0

3.3.4. Ubuntu 6.10

Ubuntu 6.10 users can enable Pango in Firefox by setting an environment variable:

MOZ_DISABLE_PANGO=0

Or by simply installing the Ubuntu package:

language-pack-si-base

3.3.5. Fedora Core 3

Firefox and Mozilla can be enabled with pango rendering support, which enables many text layout features, including the rendering of CTL (Complex Text Layout) such as Indic languages. To enable this, set the following environment variable when running Firefox or Mozilla:

MOZ_ENABLE_PANGO=1 [1]


3.3.6. Fedora Core 4 & Above

Pango is enabled by default so you don't have to do anything extra:

7.2. Pango Text Renderer for Firefox

Fedora is building Firefox with the Pango system as the text renderer. This provides better support for certain language scripts, such as Indic and some CJK scripts. Pango is included with with permission of the Mozilla Corporation. This change is known to break rendering of MathML, and may negatively impact performance on some pages. To disable the use of Pango, set your environment before launching Firefox:

MOZ_DISABLE_PANGO=1 /usr/bin/firefox

...

23.4. Pango Support in Firefox

Firefox in Fedora Core is built with Pango, which provides better support for certain scripts, such as Indic and some CJK scripts. Fedora has the permission of the Mozilla Corporation to use the Pango system for text renderering.

To disable the use of Pango, set MOZ_DISABLE_PANGO=1 in your environment before launching Firefox. [2]


3.4. Open Office

3.4.1. Debian 4.0 (Etch)

Debian 4.0 ships with ICU 3.6 which contains Sinhala support. However, Open Office has not been patched to not filter out ZWJ. Also, there is a bug in ICU that can trigger Open Office to crash when typing in Sinhala.


3.4.2. Ubuntu 6.10

Ubuntu 6.10 ships with ICU 3.4 which does not contain Sinhala support.


3.4.3. Fedora Core 5

Fedora Core 5 ships with ICU 3.4 which does not contain Sinhala support.


3.4.4. Fedora Core 6

Fedora Core 6 ships with ICU 3.6 which contains Sinhala support. Furthermore, Open Office has been patched to not filter out ZWJ. However, there is a bug in ICU that can trigger Open Office to crash when typing in Sinhala.


3.4.5. Open Office 2.1

Open Office 2.1 should support Sinhala on all distros.


3.5. Input Methods

To test multi-lingual input methods in gtk2 based programs, run:

gedit

To check which input methods are available for gtk2 based programs, run:

/usr/bin/gtk-query-immodules-2.0

which comes with gtk2.


3.5.1. Keyboard Layouts: X Keyboard Extension

You should have at least XFree86 4.3 or Xorg 6.7. To familiarise yourself with this keyboard layout, read:

The X Keyboard Extension only allows one-to-one mappings between keys and codepoints, therefore rakaaranshaya, yansaya and repaya, which consist of multiple codepoints, have to be manually constructed. See the comments in the Sinhala X Keyboard Extension layout file.


3.5.1.1. Xorg 6.9+

The aforementioned layout is already included in Xorg 6.9 and above and distributions that ship with xkeyboard-config 0.6 and above.

Debian Etch:

/usr/share/X11/xkb/symbols/lk

Ubuntu 5.10:

/etc/X11/xkb/symbols/lk

Fedora Core 5:

/usr/share/X11/xkb/symbols/lk

The latest version of the keyboard layout can be downloaded from CVS:

Read the comments in the lk file to see how to create rakaaranshaya, yansaya and repaya.

The window manager should come with a program which allows the user to choose multiple keyboard layouts.

In the example below I have chosen the SHIFT keys to switch between the Sinhala phonetic layout and the US QWERTY layout. Hold one of the SHIFT keys down and then press the other SHIFT key, this should toggle between the layouts.

Using the GUI in GNOME:

  1. Run:

    gnome-keyboard-properties
  2. Choose the “Layout” tab and click on the “Add” button. This will open a new window which contains a list of layouts ordered by country.

  3. Scroll down the list till you find “Sri Lanka” and then highlight it by clicking on it. The Sinhala layout is the default in the Sri Lanka layouts file, so you do NOT need to click the expand triangle icon. Then press “OK”.

  4. Choose the “Layout Options” tab and click on the text “Group Shift/Lock behavior”. A list will expand below this text

  5. Scroll down the list till you find the text “Both Shift keys together change group”. Click on the corresponding checkbox.

  6. If you wish to use an LED to indicate the toggling of keyboard layouts, click on the text "Use keyboard LED to show alternative group". A list will expand below this text

  7. Scroll down the list till you find the text “ScrollLock LED shows alternative group”.

Using the command line in X:

  1. In an xterm do:

    setxkbmap -layout "us,lk" -option "grp:shifts_toggle,grp_led:scroll"

Alternately, you can directly modify /etc/X11/xorg.conf:

  1. To add the new lk keyboard layout, look for this line:

    Section "InputDevice"

    There will probably be two such lines, one for the keyboard and another for the mouse. Go to the keyboard related line.

  2. Then add 'lk' to a line that looks like:

    Option "XkbLayout" "us,lk"
  3. Also add a mechanism to switch between 'us' and 'lk' and indicate which LED should be used:

    Option "XkbOptions" "grp:shifts_toggle,grp_led:scroll"
  4. If asked by the window manager, reset keyboard defaults to the X defaults.


3.5.1.2. XFree86 4.3+ or Xorg 6.7+

In the example below I have chosen the ALT keys to switch between the Sinhala phonetic layout and the US QWERTY layout. Hold one of the ALT keys down and then press the other ALT key, this should toggle between the layouts.

  1. Download the keyboard layout from:

  2. Copy the keyboard layout to:

    /etc/X11/xkb/symbols/pc/
  3. There are two options:

    1. In an xterm do:

      setxkbmap -layout "sin,us" -option "grp:alts_toggle,grp_led:scroll"
    2. Or alternately, edit the /etc/X11/XF86Config or /etc/X11/xorg.conf file.

      1. To add the new 'sin' keyboard layout, look for this line:

        Section "InputDevice"

        There will probably be two such lines, one for the keyboard and another for the mouse. Go to the keyboard related line.

      2. Then add 'sin' to a line that looks like:

        Option "XkbLayout" "sin,us"
      3. Also add a mechanism to switch between 'us' and 'sin' and indicate which LED should be used:

        Option "XkbOptions" "grp:alts_toggle,grp_led:scroll"
      4. If asked by the window manager, reset keyboard defaults to the X defaults.


3.5.2. Character Maps

You can use a Unicode Character Map program to copy and paste Sinhala characters into your program/document. Available programs are:

  • gucharmap


4. Developer Notes


4.2. Renderer

The top of tree Pango (since 1.8.2) & ICU (since 3.6) now support SLS1134.


4.2.1. Pango

Pango's Indic renderer is based on ICU's Indic renderer.

The original patch to add Sinhala support was created by Harsha Senanayake for ICU [3]and later ported to Pango. The Pango patch was ported to the latest version of Pango by Chamath Keppitiyagama. It was submitted to bugzilla by Anuradha Ratnaweera[4]. Harshula Jayasuriya modified the Pango state table & ZWJ handling [5] & [6].

The Pango code for Sinhala and Indic rendering is common and can be found in the Pango source at:

modules/indic/

One of the most important files to understand is:

modules/indic/indic-ot-class-tables.c

Particularly how the function:

indic_ot_find_syllable()

works.

Next have a look at the file:

modules/indic/indic-ot.c

and the function:

indic_ot_reorder()

4.2.2. ICU

Owen Taylor (Pango) submitted the Pango Sinhala patch to the ICU project [7]. Eric Mader (ICU) ported the Pango patch to ICU and checked-in the changes to ICU 3.6. Then Eric added the state table & ZWJ modifications from Pango to ICU 3.6 [8] & [9].


4.2.2.2. Split dependent vowel modifier (diga o) issue

There's an issue with U+0DDD (dependent vowel diga o) that can cause Open Office to crash. Opening this text file will crash Open Office and ICU 3.6:

The worstCaseExpansion for Sinhala was set to 3 when it should have been set to 4. The dependent vowel 'oo' (U+0DDD) consists of (kombuva)(dotted-circle)(aela-pilla)(al-lakuna) which are 4 glyphs. As a result of the worstCaseExpansion being 3, memory was probably being allocated for 3 glyphs when memory was required for 4 glyphs. The actual crash occurred when unallocated memory was being freed.

Caolan McNamara also found and fixed this bug. [10]


4.2.2.3. Call Tree

  1. source/layoutex/ParagraphLayout.cpp

    ParagraphLayout::ParagraphLayout(const LEUnicode chars[], le_int32 count, const FontRuns *fontRuns, const ValueRuns *levelRuns, const ValueRuns *scriptRuns, const LocaleRuns *localeRuns, UBiDiLevel paragraphLevel, le_bool vertical, LEErrorCode &status)

    1. source/layout/LayoutEngine.cpp:

      LayoutEngine *LayoutEngine::layoutEngineFactory(const LEFontInstance *fontInstance, le_int32 scriptCode, le_int32 languageCode, LEErrorCode &success)

    2. LayoutEngine *LayoutEngine::layoutEngineFactory(const LEFontInstance *fontInstance, le_int32 scriptCode, le_int32 languageCode, le_int32 typoFlags, LEErrorCode &success)

      • IndicOpenTypeLayoutEngine::IndicOpenTypeLayoutEngine(const LEFontInstance *fontInstance, le_int32 scriptCode, le_int32 languageCode, le_int32 typoFlags, const GlyphSubstitutionTableHeader *gsubTable)

      • IndicOpenTypeLayoutEngine::IndicOpenTypeLayoutEngine(const LEFontInstance *fontInstance, le_int32 scriptCode, le_int32 languageCode, le_int32 typoFlags)

  2. source/layout/LayoutEngine.cpp

    le_int32 LayoutEngine::layoutChars(const LEUnicode chars[], le_int32 offset, le_int32 count, le_int32 max, le_bool rightToLeft, float x, float y, LEErrorCode &success)

    1. le_int32 LayoutEngine::computeGlyphs(const LEUnicode chars[], le_int32 offset, le_int32 count, le_int32 max, le_bool rightToLeft, LEGlyphStorage &glyphStorage, LEErrorCode &success)

      1. source/layout/IndicLayoutEngine.cpp

        le_int32 IndicOpenTypeLayoutEngine::characterProcessing(const LEUnicode chars[], le_int32 offset, le_int32 count, le_int32 max, le_bool rightToLeft, LEUnicode *&outChars, LEGlyphStorage &glyphStorage, LEErrorCode &success)

        1. source/layout/IndicReordering.cpp

          le_int32 IndicReordering::reorder(const LEUnicode *chars, le_int32 charCount, le_int32 scriptCode, LEUnicode *outChars, LEGlyphStorage &glyphStorage, MPreFixups **outMPreFixups)

  3. engine->getGlyphs(fStyleRunInfo[run].glyphs, layoutStatus);

  4. engine->getGlyphPositions(fStyleRunInfo[run].positions, layoutStatus);

  5. engine->getCharIndices(&fGlyphToCharMap[glyphBase], runStart, layoutStatus);


4.3. Firefox/Mozilla

Interestingly, Debian, Fedora Core and Ubuntu decided to address enabling Pango in Firefox in completely different ways.


4.3.1. Debian

Since Debian 4.0 (Etch), Pango is enabled by default.


4.3.2. Ubuntu

4.3.2.1. Ubuntu 5.10

Ubuntu 5.10 enabled Pango by default. Have a look at:

/usr/bin/mozilla-firefox 

which contains the code:

##
## Set MOZ_ENABLE_PANGO
##
MOZ_ENABLE_PANGO=1
export MOZ_ENABLE_PANGO

4.3.2.2. Ubuntu 6.06

On the other hand Ubuntu 6.06, decided to disable Pango in Firefox by default except for a pre-determined list of locales. The extensive discussion can be found here:

Have a look at:

/usr/bin/mozilla-firefox 

which contains the code:

if [ "x${MOZ_DISABLE_PANGO}" = x ]; then
    if egrep '^(bn|gu|hi|kn|ml|mr|ne|pa|ta|te)_' \
        /var/lib/locales/supported.d/*[^~] >/dev/null 2>&1; then
        MOZ_DISABLE_PANGO=0
    else
        MOZ_DISABLE_PANGO=1
    fi
    export MOZ_DISABLE_PANGO
fi
if [ "x${MOZ_DISABLE_PANGO}" = x0 ]; then
    unset MOZ_DISABLE_PANGO
fi

This means that Ubuntu 6.06 users that need Pango enabled in Firefox need to set an environment variable

MOZ_DISABLE_PANGO=0

You can see the difference by running Firefox at the command line like so:

# MOZ_DISABLE_PANGO=0 mozilla-firefox

4.3.3. Fedora Core

Since Fedora Core 4, Pango is enabled in Firefox by default. In order to disable Pango in Firefox an environment variable has to be set:

MOZ_DISABLE_PANGO=1

You can see the difference by running Firefox at the command line like so:

# MOZ_DISABLE_PANGO=1 firefox

Have a look at:

/usr/bin/firefox 

for an explanation.


4.3.3.1. Fedora Core 4 [11]
##
## Set MOZ_ENABLE_PANGO is no longer used because Pango is enabled by default
## you may use MOZ_DISABLE_PANGO=1 to force disabling of pango
##
#MOZ_DISABLE_PANGO=1
#export MOZ_DISABLE_PANGO

4.3.3.2. Fedora Core 5 [12]
##
## In order to better support certain scripts (such as Indic and some CJK 
## scripts), Fedora builds its Firefox, with permission from the Mozilla 
## Corporation, with the Pango system as its text renderer.  This change 
## is known to break rendering of MathML, and may negatively impact 
## performance on some pages.  To disable the use of Pango, set 
## MOZ_DISABLE_PANGO=1 in your environment before launching Firefox.
##
#
# MOZ_DISABLE_PANGO=1
# export MOZ_DISABLE_PANGO
#

4.3.4. Epiphany Browser

Changelog:

2006-01-27  Christian Persch  <chpe at cvs dot gnome dot org>
        * src/ephy-main.c: (main):
        Disable pango rendering by default, unless MOZ_ENABLE_PANGO env
        var is set. Bug #328844.

src/ephy-main.c:

        /* Work around bug #328844, and avoid the gecko+pango performance problem */
        env = g_getenv ("MOZ_ENABLE_PANGO");
        enable_pango = env != NULL &&
                       env[0] != '\0' &&
                       g_ascii_strtoull (env, NULL, 10) != 0;
        if (eel_gconf_get_boolean (CONF_GECKO_ENABLE_PANGO))
        {
                g_print ("NOTE: Enabling gecko pango renderer; this may cause performance degradation.\n"
                         "You can set " CONF_GECKO_ENABLE_PANGO " to \"false\" to disable it.\n");
        }
        else if (!enable_pango)
        {
                g_setenv ("MOZ_DISABLE_PANGO", "1", TRUE);
        }

Epiphany also has a file, data/epiphany-pango.schemas containing a list of locales which require Pango to be enabled by default.


4.4. Open Office


4.4.1. Open Office 2.0.4

Whilst working on the patches for adding Sinhala support to ICU, the renderer of Open Office, I observed that the ZWJ characters do not appear to reach ICU. [13]

Then, Caolan McNamara found the Open Office file that filters ZWJ and ZWNJ. [14]

The source file:

vcl/source/gdi/sallayout.cxx

contains a function:

inline bool IsControlChar( sal_Unicode cChar )

This function tells a caller that characters U+200B to U+200F are control characters.

In the source file:

linguistic/source/misc.cxx

two functions,

static INT16 GetOrigWordPos( const OUString &rOrigWord, INT16 nPos )

and

INT32 GetPosInWordToCheck( const OUString &rTxt, INT32 nPos )

call

inline bool IsControlChar( sal_Unicode cChar )

when doing lingustic analysis for what appears to be spelling purposes. Even found some comments written in, I assume, German.

In the source file:

vcl/source/gdi/sallayout.cxx

there is a function:

void ImplLayoutArgs::AddRun( int nCharPos0, int nCharPos1, bool bRTL )

which calls the function:

inline bool IsControlChar( sal_Unicode cChar )

it's purpose is to:

// add a run after splitting it up to get rid of control chars

It should be noted that this function handles RTL text in a different way to LTR text. My initial reaction is that should not be the case. However, I have not looked into it any further.

Compiling Open Office 2.0.4 on Debian Etch on a Pentium M 2.13 GHz with 1 GiB RAM took approximately 10 hours and required 10 GBs of additional hard drive space for the source and the compiled files.


4.5. Input Methods

The recommended infrastructure for keyboard layouts are XKB, XIM, IIIMF[15], m17n [16] and SCIM[17]. The recommended infrastructures are XKB, for simple one-to-one keyboard layouts, and SCIM/m17n for complex keyboard layouts. XKB is a component of Xorg.


4.5.1. Keyboard Layouts

  1. Wijesekera Compatible

  2. ASCII Compatible Wijesekera

  3. Phonetic (Static)

  4. Phonetic (Dynamic)

A Unicode Sinhala Font has to be installed in order to read the Keyboard Layouts.

You can use showkey in linux to display the scancode.

  • 57 - space

  • 56 - left alt

  • 100 - right alt

  • 29 - left ctrl

  • 97 - right ctrl

  • 42 - left shift

  • 54 - right shift

Look in the linux source:

drivers/char/keyboard.c

Look for the function:

getkeycode()

4.5.2. XKB - adding a new keyboard layout

All you need to do is just copy the keyboard layout file into the correct directory:

/etc/X11/xkb/symbols/

or

/etc/X11/xkb/symbols/pc/

or

/usr/share/X11/xkb/symbols

However, for completeness some files in these directories:

/etc/X11/

or

/usr/X11R6/lib/X11/

or

/usr/share/X11/

need to be modified, namely these files:

xkb/rules/{xorg,xfree86}
xkb/rules/{xorg,xfree86}.lst
xkb/rules/{xorg,xfree86}.xml
xkb/symbols.dir

To test a loaded keyboard layout:

setxkbmap -print | xkbcomp -w 10 -xkb - <outfile>

4.5.3. SCIM

SCIM can be used as the frontend, which is exposed to the user, and the backend that maps keycodes to codepoints. Or SCIM can be used as a frontend for other backends. e.g. m17n can be a backend.


4.5.4. m17n

The m17n backend keyboard layout definition file is a text file. The documentation can be found:


4.5.5. xmodmap

The xmodmap keyboard layout is not fully functional, hence it is recommended you use the X Keyboard Extension keyboard layout. To familiarise yourself with this keyboard layout, read:

  1. Download the keyboard layout from:

  2. Then run xmodmap:

    xmodmap sin.xmodmap

4.5.6. gvim

To familiarise yourself with this keyboard layout, read:

  1. Download the keyboard layout and redirector from:

  2. Copy the keyboard layout and redirector to ~/.vim/keymap/

  3. Start gvim

  4. Need to disable the menu so that you can use the 'alt' key.

    set guioptions-=m
  5. Select the new keyboard layout, using the redirector, by typing:

    set keymap=sinhala

    or select the new keyboard layout directly by typing:

    set keymap=sinhala-phonetic_utf-8

    To toggle between the Sinhala keyboard layout and the standard ASCII keyboard layout, press <Ctrl> <6> whilst in insert mode.


4.6. Databases

4.6.1. MySQL 5.0.27


4.6.1.1. Terminology

  • ci = case insensitive

  • cs = case sensitive

  • bin = binary


4.6.1.2. Useful Commands

  • SHOW CHARACTER SET;

  • SHOW COLLATION;

  • SHOW COLLATION like 'ucs%';

  • SHOW COLLATION like 'utf8%';

  • SET NAMES 'utf8'; // after connecting to server if the server has NOT set 'skip-character-set-client-handshake'

  • SHOW CREATE TABLE <table-name>

  • SHOW VARIABLES;

  • \s


4.6.1.3. Setup MySQL Server

Edit the file /etc/mysql/my.cnf and add to the [mysqld] section:

  • default-character_set=utf8

  • skip-character-set-client-handshake

This is done to ensure that UTF-8 is the default encoding for the server and client.


4.6.1.4. Files requiring modification

  • mysql/config/ac-macros/character_sets.m4

  • mysql/mysys/charset-def.c

  • mysql/strings/ctype-uca.c

  • mysql/configure (generated)


4.6.1.5. Code

  • mysql/strings/ctype-uca.c

/*
  Collation language is implemented according to
  subset of ICU Collation Customization (tailorings):
  http://icu.sourceforge.net/userguide/Collate_Customization.html
  
  Collation language elements:
  Delimiters:
    space   - skipped
  
  <char> :=  A-Z | a-z | \uXXXX
  
  Shift command:
    <shift>  := &       - reset at this letter. 
  
  Diff command:
    <d1> :=  <     - Identifies a primary difference.
    <d2> :=  <<    - Identifies a secondary difference.
    <d3> := <<<    - Idenfifies a tertiary difference.
  
  
  Collation rules:
    <ruleset> :=  <rule>  { <ruleset> }
    
    <rule> :=   <d1>    <string>
              | <d2>    <string>
              | <d3>    <string>
              | <shift> <char>
    
    <string> := <char> [ <string> ]
  An example, Polish collation:
  
    &A < \u0105 <<< \u0104
    &C < \u0107 <<< \u0106
    &E < \u0119 <<< \u0118
    &L < \u0142 <<< \u0141
    &N < \u0144 <<< \u0143
    &O < \u00F3 <<< \u00D3
    &S < \u015B <<< \u015A
    &Z < \u017A <<< \u017B    
*/

  • mysql/include/m_ctype.h

typedef struct charset_info_st
{
  uint      number;
  uint      primary_number;
  uint      binary_number;
  uint      state;
  const char *csname;
  const char *name;
  const char *comment;
  const char *tailoring;
  uchar    *ctype;
  uchar    *to_lower;
  uchar    *to_upper;
  uchar    *sort_order;
  uint16   *contractions;
  uint16   **sort_order_big;
  uint16      *tab_to_uni;
  MY_UNI_IDX  *tab_from_uni;
  MY_UNICASE_INFO **caseinfo;
  uchar     *state_map;
  uchar     *ident_map;
  uint      strxfrm_multiply;
  uchar     caseup_multiply;
  uchar     casedn_multiply;
  uint      mbminlen;
  uint      mbmaxlen;
  uint16    min_sort_char;
  uint16    max_sort_char; /* For LIKE optimization */
  uchar     pad_char;
  my_bool   escape_with_backslash_is_dangerous;
  
  MY_CHARSET_HANDLER *cset;
  MY_COLLATION_HANDLER *coll;
  
} CHARSET_INFO;

  • mysys/charset.c: CHARSET_INFO *all_charsets[256]

  • mysys/charset.c

    init_available_charsets()

    • mysys/charset-def.c

      init_compiled_charsets()

      • mysys/charset.c

        add_compiled_collation()

    • init_state_maps()

  • mysys/charset.c

    get_charset_by_name()

    • get_collation_number()

      • get_collation_number_internal()

    • get_internal_charset()

  • strings/ctype-uca.c

    create_tailoring()


4.6.1.6. Testing

  1. Test Swedish (MySQL is a Swedish company) collation algorithm:

    http://en.wikipedia.org/wiki/Swedish_alphabet

    CREATE TABLE ts (
            id SERIAL PRIMARY KEY,
            letter VARCHAR(10) NOT NULL
    ) CHARACTER SET utf8;

    Running this collation handler:

    SELECT * FROM ts ORDER BY letter COLLATE utf8_swedish_ci;

    results in non-English alphabet characters appearing at the end of the sorted alphabet as expected.

  2. Apply the Sinhala collation patch to MySQL:

  3. Test new Sinhala collation algorithm:

    http://en.wikipedia.org/wiki/Sinhala_alphabet

    CREATE TABLE tl (
            id SERIAL PRIMARY KEY,
            letter VARCHAR(10) NOT NULL
    ) CHARACTER SET utf8;

    Load the data from this file:

    Running this collation handler:

    SELECT * FROM tl ORDER BY letter COLLATE utf8_sinhala_ci;

    results match the Sinhala Character Code for Information Interchange Part 1 : Collation Sequence.

    Download the output file from:

  4. Need to add a Sinhala collation test to the MySQL test suite:

    • mysql/mysql-test/t/ctype_utf8.test

    • mysql/mysql-test/r/ctype_utf8.result


4.9. DONE

  • Renderer

    • Create and submit Pango patch - don't implicitly create conjuncts[18]

    • Inform bengalinux-core team (India) the implications of the fix to bugzilla.gnome.org's bug 145233[19]

    • Create and submit Pango patch - Enable touching letters in Sinhala rendering[20]

    • Convince Pango and ICU maintainers to emit ZWJ to the font lookup stage.

    • Provide test files and images for ICU Sinhala support [21]

    • ICU: ZWJ Processing in Sinhala[22]

    • ICU: Implement PR 37 ZWJ/ZWNJ Behavior [23]

    • ICU: Indic Reordering State Table Allows ZWJ Virama ZWJ[24]

    • Port Pango patches, which add Sinhala support, to ICU for immediate use in ICU 3.4[25]

    • Convince Open Office developers to stop filtering ZWJ and ZWNJ - ZWJ: The zero width joiner shouldn't be filtered out [26]

    • Epiphany - Add si (Sinhala) to the list of locales requiring Pango [27]

    • Ubuntu - Add si (Sinhala) to the list of locales requiring Pango [28]

    • Inform of Open Office filtering ZWJ and ZWNJ - Incorrect Bengali rendering of ra+japhala [29]

  • Input Methods

    • Submit XKB keyboard layout to X Keyboard Configuration Database

    • Submit vim keyboard layout to Bram

    • Submit XKB keyboard layout to xorg[30] and xfree86[31]

    • Submit phonetic static keyboard layout to m17n

    • Test and provide feedback on the m17n Wijesekera input method.

  • Fonts

    • The printing problem was due to the font being an OTF font. Once the lklug font was changed to a TTF, the printing problem disappeared.

    • Added a glyph for “Kunddaliya” to LKLUG font.

    • Reorganised glyphs containing “Repaya” and added corresponding lookups to LKLUG font.

  • Standards

    • Amend ISO639 to include 'Sinhala' in the languages list alongside 'Sinhalese'.

    • Report an error in the Sinhala Character Code for Information Interchange - Part 1: Collation Sequence


4.10. TODO

  • Renderer

    • ICU bug that crashes Open Office.

    • Support touching letters in QT Renderer.

    • Pango tries to display a 25CC, which is a dotted circle, when a dependent vowel is displayed without a consonant. However the glyph for 25CC does not appear.

    • see chapter 9 of TUS Unicode - apparently discusses ZWJ/ZWNJ.

  • Input Methods

    • Develop Phonetic & Transliteration standards.

    • Examine SCIM/m17n & implement static & dynamic phonetic keyboard layouts and transliteration schemes.

    • See if XKB can be extended to allow multiple codepoints per keycode.

  • Fonts

    • Learn about OT features/order.

    • Discover why the appearance of the Unicode Sinhala fonts deteriorate at smaller sizes. Is it a smoothing or hinting issue in the font. Or an issue with the renderers.

    • Improve the range and correctness of the Unicode Sinhala section in the freefont.

    • Develop a standard lookup table for font developers.

  • Sorting

    • Implement established standards

    • String matching - a consonant followed by dependent vowel 'o' should not match the same consonant followed by dependent vowel 'oo'.

  • Other GNU/Linux Infrastructure

    • Add Sinhala to /usr/include/X11/keysymdef.h

    • OTF printing problem.

    • Surrounding text / Cursor positioning in gedit with the text <gayanna><al-lakuna><hayanna><al-lakuna>

  • Misc

    • Submit corrections to Unicode. e.g. aae Vs aee.

    • UTF-8 should be declared the standard file encoding.

    • Develop Sinhala IPA transliteration for documents

    • Develop Sinhala literary transliteration for documents

    • English Locale for Sri Lanka


5. Resources/Links


6. Conclusion

We have made significant progress in providing Sinhala support in GNU/Linux. There are still areas that require attention before GNU/Linux can be deployed more widely in Sri Lanka.

Notes

[1]

http://download.fedora.redhat.com/pub/fedora/linux/core/3/i386/os/RELEASE-NOTES-en.html

[2]

http://fedora.redhat.com/docs/release-notes/fc5/

[3]

http://marc.theaimsgroup.com/?t=106354110900001&r=1&w=2

[4]

http://bugzilla.gnome.org/show_bug.cgi?id=153517

[5]

http://bugzilla.gnome.org/show_bug.cgi?id=161981

[6]

http://bugzilla.gnome.org/show_bug.cgi?id=302577

[7]

http://dev.icu-project.org/cgi-bin/icu-bugs?findid=4298

[8]

http://dev.icu-project.org/cgi-bin/icu-bugs?findid=4711

[9]

http://dev.icu-project.org/cgi-bin/icu-bugs?findid=5057

[10]

http://dev.icu-project.org/cgi-bin/icu-bugs?findid=5501

[11]

http://cvs.fedora.redhat.com/viewcvs/*checkout*/rpms/firefox/devel/firefox.sh.in?rev=1.8

[12]

http://cvs.fedora.redhat.com/viewcvs/*checkout*/rpms/firefox/devel/firefox.sh.in?rev=1.11

[13]

http://mail.lug.lk/lurker/message/20060410.130454.19cefb01.en.html

[14]

http://www.openoffice.org/issues/show_bug.cgi?id=68047

[15]

http://www.openi18n.org/modules.php?op=modload&name=Sections&file=index&req=viewarticle&artid=103&page=1

[16]

http://www.m17n.org/

[17]

http://www.scim-im.org/

[18]

http://bugzilla.gnome.org/show_bug.cgi?id=161981

[19]

http://sourceforge.net/mailarchive/forum.php?thread_id=6637263&forum_id=12023

[20]

http://bugzilla.gnome.org/show_bug.cgi?id=302577

[21]

http://dev.icu-project.org/cgi-bin/icu-bugs?findid=4298

[22]

http://dev.icu-project.org/cgi-bin/icu-bugs?findid=4710

[23]

http://dev.icu-project.org/cgi-bin/icu-bugs?findid=4711

[24]

http://dev.icu-project.org/cgi-bin/icu-bugs?findid=5057

[25]

http://www.redhat.com/archives/fedora-cvs-commits/2006-May/msg00126.html

[26]

http://qa.openoffice.org/issues/show_bug.cgi?id=68047

[27]

http://bugzilla.gnome.org/show_bug.cgi?id=361538

[28]

https://launchpad.net/distros/ubuntu/+source/firefox/+bug/66270/

[29]

https://launchpad.net/distros/debian/+source/icu/+bug/35085

[30]

https://bugs.freedesktop.org/show_bug.cgi?id=1850

[31]

http://bugs.xfree86.org/show_bug.cgi?id=1509