Latin as a complex script
October 31st, 2007 by Andrew CunninghamCurrent font technologies and font rendering systems make a distinction between writing scripts that require complex rendering and those scripts that do not. In most cases Latin is treated as a non-complex script. For a range of African and South East Asian languages that use the Latin script, the Latin script needs to be treated as a complex script.
In simple font architectures, e.g. TrueType, there is a one to one relationship between each glyph in a font and an encoded character. A number of font formats building on TrueType provide support for more complex glyph and character relationships.
| Vendor | Font format | Rendering technology |
| Adobe | OpenType | Rendering support built into applications |
| Microsoft | OpenType | OTLS and Uniscribe |
| Apple | Apple Advanced Typography (AAT) | Apple Type Services for Unicode Imaging (ATSUI) |
| SIL International | Graphite | Graphite |
It is possible to develop a single font that contains OpenType, Graphite and AAT tables. John Hudson’s article Windows Glyph Processing provides a good introduction to complex rendering on the Windows platform.
Complex rendering includes support for:
- diacritic positioning
- contextual shaping
- reordering and splitting
- bidirectionality
The distinction between a complex script and a non-complex script is that a complex script requires complex rendering in order to make text legible and readable. Many Middle Eastern, South Asian and South East Asian scripts are complex scripts.
Richard Ishida provides a useful introduction to writing systems and the SIL have a page providing examples of complex rendering.
Latin and Cyrillic scripts, which are usually considered as non-complex scripts, can use complex rendering to provide sophisticated typographic rendering. Such rendering is not essential to the intelligibility and readability of the text, and few applications provide support for these advanced features.
When you turn to Africa you can find languages that use the Latin script in ways that require the Latin script to be treated as a complex script. These languages have alphabets with letters that can only be represented by a base character and one or more combining diacritics. These letters are not available as a single character in Unicode, e.g.
| Letter | Codepoints (NFC) | Codepoints (NFD) |
| ɛ̱̈ | U+025b U+0331 U+0308 | U+025b U+0331 U+0308 |
| Ɛ̱̈ | U+0190 U+0331 U+0308 | U+0190 U+0331 U+0308 |
| ọ́ | U+1ecd U+0301 | U+006f U+0323 U+0301 |
| Ọ́ | U+1ecc U+0301 | U+004f U+0323 U+0301 |
| ɔ̃́ | U+0254 U+0303 U+0301 | U+0254 U+0303 U+0301 |
| Ɔ̃́ | U+0186 U+0303 U+0301 | U+0186 U+0303 U+0301 |
| ŋ̀ | U+014b U+0300 | U+014b U+0300 |
| Ŋ̀ | U+014a U+0300 | U+014a U+0300 |
These characters should render as:

Sample of Latin characters with combining diacritics - Charis SIL (Windows XP SP2 and Vista) - Firefox 2.0.0.7
The class used for the presentational styling of the sample characters, on this page, uses the CSS rule:
.sampleLatin {font-family: Charis SIL,Doulos SIL,African Serif,serif;}
An alternative CSS rule could be:
.sampleLatinSans {font-family: African Sans,sans-serif;}
For the sample text to display correctly you require a “smart font” that supports glyph positioning of diacritics relative to a base character and the positioning of one diacritic relative to a second diacritic. An appropriate font rendering system and applications that can use the font rendering system and the “smart font” are also required.
Microsoft introduced support for combining diacritics and diacritic stacking in two stages. Microsoft introduced limited diacritic support in Windows 2000 in order to support Vietnamese. Microsoft’s Vietnamese keyboard layouts output a mix of precomposed characters and combining diacritics. Combining diacritics are used for the five Vietnamese tone markers. Additional vowel characters that use diacritics are represented by precomposed characters, e.g. for the Microsoft keyboard layout the character ế (U+1EBF) is represented as two characters ê (U+00EA e-circumflex) and combining acute (U+0301) rather than ế (U+1EBF).
Languages that use a subset of the Vietnamese repertoire are supported on Windows 2000 onwards using Microsoft Window’s core fonts.
Full combining diacritic support was released in the version of Uniscribe that was included in Service Pack 2 for Windows XP. Complex script support is not active by default. It is necessary to enable supplemental language support for complex script and right-to-left languages within Windows XP.
No appropriate fonts were shipped with Windows XP. The core fonts in Windows Vista have been updated and include support for a range of combining diacritics and base characters. Times New Roman v. 5.0.1 ships with Windows Vista. This version of Times New Roman will display the sample text correctly.

Sample of Latin characters with combining diacritics - Times New Roman v. 5.0.1 (Windows Vista) - Firefox 2.0.0.7
If we look at the same text using Times New Roman v. 3.0.6 in Internet Explorer 6, we will see:

Sample of Latin characters with combining diacritics - Times New Roman v. 3.0.6 (Windows XP SP2) - Internet Explorer 6.0 SP2
A couple of issues can be observed:
- The older version of Times New Roman does not support all the necessary Extended Latin characters.
- The tilde and acute accents do not stack relative to each other, but rather overstrike.
- Acute and grave diacritics are placed in a single location rather than being positioned relative to each base character.
The same text using the same version of Times New Roman looks quite different in Firefox.

Sample of Latin characters with combining diacritics - Times New Roman v. 3.0.6 (Windows XP SP2) - Firefox 2.0.0.7
Internet Explorer will use Times New Roman for all the characters and will display a missing glyph symbol for characters that are not supported by the font. On the other hand Firefox will change fonts in order to display the unsupported characters. You end up with a ransom note effect in Firefox.
OpenType
There are two specific OpenType features that are critical for the sample text above:
- Mark to base positioning
- Mark to mark positioning
In this context mark refers to the glyph used to display a combining diacritic.
Mark to base positioning (mark) feature positions the diacritic glyph relative to the base glyph. The diacritic could be attached to either a single character or to ligature. Mark to mark positioning (mkmk) feature positions the diacritic in relation to another diacritic.
Further information is available in the OpenType specification.
Web development issues
The core issue with developing Latin script web sites where the languages require combining diacritic support is the lack of fonts on most operating systems. Although the core fonts on Windows Vista support combining diacritics, the version of these fonts on other operating systems do not contain the same OpenType features. Unless you are only targeting Windows Vista users, it is best to avoid referencing the core Windows fonts.
It is necessary to specify appropriate fonts in your website’s stylesheet. It is also worthwhile having a help page or FAQ which provides download links for the fonts used. Also consider using language specific styling.
For optimal display of text using combining diacritics, it is recommended to normalize the text using Unicode Normalization Form C.
Downloads
The following fonts are OpenType or Graphite fonts that contain combining diacritic support. Check the fonts for suitability for your language.
- African Sans (OpenType)
- African Serif (OpenType)
- Charis SIL (Graphite and OpenType)
- Code2000 (OpenType)
- DejaVu Sans (OpenType)
- Doulos SIL (Graphite and OpenType)
Misdirections
It is common to find inappropriate fonts being used for Latin complex script text. Common fonts that are often used include Arial Unicode MS and Lucinda Sans Unicode. Lucinda Sans Unicode is a TrueType font. Arial Unicode MS has OpenType tables but doesn’t include OpenType features for the Latin script.
Possibilities
You may some across websites that uses inappropriate fonts for some African languages. Using Firefox and the Stylish extension it is possible to write a set of CSS rules to use to change the fonts that are used to render the web page.
In a previous post I gave an example where email messages in a GMail account would be displayed using the Charis SIL font. It is possible to use this technique to override the fonts used to display any page, allowing you to correctly render combining diacritics when the web developer used inappropriate fonts.
Posted in Web i18n, MPAS, Language enablement, Languages |