Getting Indic language input working in Linux

नमस्ते. வணக்கம்.

Getting Multi-language working always seemed out of reach on Linux. I mean, websites work usually out of the box, but what about input? What about the terminal? Your editor?

I have finally figured this out and this post covers everything that needs to be done. Note: I assume that you are running X11 - not sure how these carry over to Wayland.

1. Getting fonts setup

Luckily this part is quite straightforward. A lot of fonts today support Indic languages and most likely your Linux install already comes with a few.

2. Viewing Indic fonts

Browsers work out of the box, so that's something.

If you are using a all-in-one Desktop Environment like Gnome or KDE - most apps should support Indic languages out of the box. This means things like GEdit etc. Emacs for example seems to support Indic fonts out of the box.

The bad news is Terminals. Most terminal emulators do not support Indic languages. Apparently Konsole does support - but this I have not checked. mlterm is another option; but I have not been able to configure it to work correctly.

If you go further out to things like standalone WM, support gets worse off.

The main issue seems to be that vowel + consonents form entirely new shapes in most Indic languages that don't have dedicated Unicode codepoint. The library "Harfbuzz" seems to be the major solution at the text shaping level.

Interestingly, programming ligatures, which introduces fancy symbols for multi-character sequences such as == and >= follows the same principle and often is implemented using Harfbuzz. This means that if you can find patches for your software that adds ligatures, Indic languages should also be supported. I have not tested this out personally yet.

3. Writing in Indic Languages

Instead of using Gnome (or other DE based approaches), you can directly do it at the X11 layer. A lot of this is taken from here: https://simpleit.rocks/linux/switch-keyboard-layouts/

To query your current keyboard setup, run:

setxkbmap -query

Note these default values to use for the first entry below.

The command you need is this:

setxkbmap -model pc105 -layout in,in,in -variant en,tam,hin-kagapa -option grp:caps_toggle

Use the -model field from your current setup. Now, you can have upto 4 layouts configured. Each layout is specified as a combination of the entries in the -layout field and the -variant field, mapped one to one. You can skip variants as shown to use default values.

Details of all fields are found in the file: /usr/share/X11/xkb/rules/base.lst. All Indic languages can be found under the layount "in" as variants.

Some languages have multiple layouts. You can choose the "phonetic" option which is much easier to use from a English layout keyboard since the sounds are mapped similar to the English letter sounds.

The last field, -option allows you to setup a key shortcut to switch between layouts. Available options are also listed in the above file. In my example, I have set it to CapsLock, which is otherwise very useless. In case you want to use Caps as it normally works, just hit <Shift> + <CapsLock>.

4. Using the Key layouts

So, you have only English labels on your keyboard. How do you know what key maps to what?

You have 2 options: gkbd-keyboard-display and the Internet.

The former obviously needs Gnome. You can run it with -g #num to show the layout for your #num layout. There is one problem with this: it does not display some languages correctly due to weird spacing issues. Also, it takes too much space for what it does. This is a known problem but not going to be fixed.

The other option is the Internet. You get images of all layouts with a quick search, which you can use as a reference. This for example is the one for Devanagari used for Hindi/Sanskrit:

Figure 1: in:hin-kagapa layout

This is the example for Tamil:

Figure 2: in:tam layout

Now, how do you read these images? For each key:

Bottom-Left is what you normally get.
Top-Left is <key> + <shift> - like normal capital keys.
Bottom-Right is <key> + <AltGr> (usually, the right alt).
Top-Right is <key> + <Shift> + <AltGr>.

Now, on phonetic keyboards, this layout makes a lot of sense. For Hindi, for example, s gets you स, where <Shift>+s gets you श.

Now, wherever you see an empty circle on the key, this means a placeholder. This is usually only used for vowels after consonents. So, for eg, in Hindi, the key a inputs the आ sound after a consonent, so typing r, a, m gets you राम, making it very easy to type. If you wanted just the vowel अ, you would have to type a+<AltGr>.

We also have special key for half-consonants. So to input श्याम, you would have to use:

S for श,
f to make the previous consonant into a half-consonant: श्,
y to add the य, श्य
a to lenghten the previous consonant, श्या
m to complete the word, श्याम

For an example in Tamil, to input சிவன்:

Type ; to start with the ச consonant
Add the "e" modifier using f, சி
Add the "v" and "n" consonants using b and V, சிவன
Make the previous consonant into a half one using d, சிவன்

Non-phonetic keyboard is indeed super difficult to work without key labels, but having the layout image helps to some extent.

Finally, as a bonus section, some Devanagari symbols need to be typed in using multiple letters that are automatically condensed into one. Specifically:

The "श्र" symbol (like in श्रीराम), needs you to type in "श्" followed by "र"
The "त्र" symbol (like in त्रिमूर्ति) is formed by typing "त्" followed by "र"
The "ज्ञ" symbol (like in ज्ञानं) is formed by a "ज्" followed by "ञ"
The "क्ष" symbol (like in महालक्ष्मी) is formed by a "क्" followed by "ष"

That's it. Have fun setting up your own Linux machines to be Indic language compatible!