Introduction Control Panel The Tests Notes My Fonts

Introduction

Purpose

This page provides a set of test cases for a Tai Tham renderer. It has been compiled with a view to putting the 'Universal Shaping Engine' through its paces for the Tai Tham script. This set of tests is incomplete in that it does not directly give the correct renderings, although some one in possession of the source documents could visually check them.

I was originally requested to provide words of one syllable for such a test. By syllable, I understand an Indic syllable of the form C+(M*V*M*C*M*)* with a single base consonant. (M = miscellaneous mark). I include cases where the rôle of the base consonant is played by something other than a letter. The post-vocalic consonants occur not only in other SEA Indic scripts such as the Khmer script and Lao script (as in the use of ຽ U+0EBD LAO SEMIVOWEL SIGN NYO in the Lao writing system), but also in Tibetan.

However, many of the interesting cases occur in the second syllable of a word, and certain initial syllables are obligatorily followed by more characters of the word. I have therefore also supplied longer words when a conceivable problem would not appear in a word of one syllable.

The dependent vowel AA (U+1A63 and U+1A64) may form the base of its own little stack of dependent marks. Manual line breaking may also separate it from its base consonant. I have nevertheless counted it as part of the same syllable as a base consonant; the two stacks frequently interact in Northern Thai, with MAI KANG migrating to or towards the base consonant and interacting with its dependents.

The page was originally set up to use either my own stick font, 'Da Lekh', which is based on Deja Vu Sans, or the cut down version, 'Da Lekh Seri'. The Da Lekh font is intended to be suitable for use in preparing (but perhaps not publishing) Tai Tham text. It therefore includes work-arounds for known rendering engine problems. The Da Lekh Seri font deliberately does not include such work-arounds. You may be interested in using or examining my fonts for your own purposes.

I have added two other families of fonts. These fonts are available under the SIL Open Font license. The font 'A Tai Tham KH' relies only on the ccmp feature being enabled; it handles all Indic rearrangement itself.

The Hariphunchai font is an OpenType Layout font that looked promising when used with the South-East Asian shaper of HarfBuzz. Development seems to have drastically slowed when HarfBuzz switched Tai Tham to its implementation of the Universal Shaping Engine (USE). The code for this font is available on SourceForge and there is further documentation elsewhere. I have added work-arounds and a few further touches to enable it to work under the USE; I have dubbed the resulting font 'Lamphun'. I have included two versions in the menus, the 2014 version used for Lamphun, dubbed 'early Hariphunchai', and the latest (2019) version, dubbed 'Hariphunchai4'.

Feel free to adapt this web page to add your own fonts and test cases.

Content and Layout

The test text is given in the table columns headed 'Text', and is the content of the first table cell in table rows with class tst1. Two further columns, headed 'Encoding' and 'Hacked via ASCII', are automatically derived from this text as the page is loaded. The hacked column is intended to show users how the text should look, though it too may suffer from rendering engine limitations. The font used for this column is the member of the Da Lekh font family last selected to display the first column. Ideally, I would include images of the text from credible sources, but that may cause copyright problems, for the Unicode Consortium wishes to be able to use this document for commercial purposes.

The 'Hacked via ASCII' column contains an unambiguous transliteration to ASCII of the Tai Tham text in the column headed 'Text'. Members of the Da Lekh font family contain an open type font feature, Stylistic Set 2, whose enabling may cause it to render the transliteration as the original Tai Tham text as it is intended to be rendered. For more details, see the style sheet in the source of this page.

The 'Meaning and Pronunciation' column is given to identfy the word given as an example. There may be better glosses, and pronunciation can vary extensively within a nominal language. The letter RA is particularly variation between /l/, /h/ and even /r/, and there are regional variations as to whether vowel length distinction exist and, if so, whether they are phonemic. For the Tai languages the pronunciation is given using IPA, while Pali is simply transliterated (as Pali). I have omitted tone, as phonetic tone is also quite variable. Where no indication to the contrary is given, the Tai pronunciation given approximates that of Chiangmai.

The test words may, in principle, be extracted quite simply from this web page. Each test 'word' is the content of the first cell in each row whose class is tst1. For convenience, I have extracted the first two cells in such rows, along with titles, to a CSV file. Rows where there is a plausible case for treating the encoding used as erroneous are marked in pink. (Their CSC class is tst2.) For completeness, I have included alternative encodings which the Universal Script Engine (USE) calls for with an orange background and CSS class tst3 when they are defensible encodings. The USE encoding is not well-supported by fonts and is not robust to alternative classifications of combining marks.

The HTML comments within this web page should not be construed as holding test words.

Font Testing

This page is intended as a rendering engine test, rather than as a font test. However, you may modify this page to try out your own font. The necessary changes will be confined to the style sheet in the source code of this page, unless you use a different ASCIIfication scheme, in which case look at the usage of javascript variable ss02_hack.

My Rendering Performance

When this page was initially composed, in June 2015, the Da Lekh font currently mostly worked for the Tai Tham script in the Firefox and Chrome broswers. It worked in them because they use HarfBuzz to render the Tai Tham script. Since then, the HarfBuzz rendering engine used for Tai Tham has been brought into line with the Universal Script Engine, with a consequent dramatic fall in the rendering performance for the Tai Tham script.

The solution to this problem was to add numerous work-arounds to the font. These work-arounds have mostly restored performance, the main exceptions being subtle positioning errors where mark to base positioning is ignored and the default mark position is used instead.

The quality of the 'Hacked via ASCII' column varies from browser to browser and operating system to system, and also varies over time. For Internet Explorer 11, Microsoft Edge and for the HarfBuzz-based browsers Firefox and Chrome, it is actually the best rendered column. (Script-specific rendering engines have a tendency to make the achievement of advanced script features dificult rather than easy; Tai Tham has many 'advanced' features.)

Font Peculiarities

Traditionally, the consonants used in neither Pali nor Sanskrit did not have subscript forms. However, one significant text book, the 'big blue book', provides a subscript form for LOW FA for use in loans from English. However, this form is cramped and ugly, which goes against the tradition of Lanna script writing. The MFL treats the stroke distinguishing HIGH KXA, LOW KXA, LOW SA, LOW FA and LETTER UU from HIGH KHA, LOW KA, LOW CA, LOW PA and LETTER U as a diacritic. The Da Lekh font follows this interpretation, and leaves this diacritic above the baseline when the letter is subscripted.

If you have difficulty reading the Da Lekh fonts, you may find it useful to consult their glyph gallery.

Introduction Control Panel The Tests Notes My Fonts

Control Panel

Font Selection

Da Lekh Seri Da Lekh
Da Lekh Si Seri Da Lekh Si
Da Lekh Seri (transliterated form) Da Lekh (transliterated form)
Da Lekh Si Seri (transliterated form) Da Lekh Si (transliterated form)
If using Lamphun or one of the four fonts above, the font is at Version . (The version number is stored in the character U+EAE7 in addition to the font file's
name
and
head
tables.)
System default Guest font
A Tai Tham KH (ccmp defaults) A Tai Tham KH (ccmp enabled)
Early Hariphunchai Lamphun
Hariphunchai 4 (2019)

Specimen text: ᩋᩁᩉᨶ᩠ᨲᩮᩣ arahanto 'arahats'

Play Area

You may type your own text in the area below. It will, if possible, be displayed in the font selected above.

Table Completion

It is possible that the columns recording code points and showing how the text should look may not have been generated. If that happens, try clicking this control button:

Debugging messages

Tests

Vowel Combinations
Other Explicit Coding Sequences
Other Examples from L2/07-007R
Mai Kang Lai
Ligature NAA
English Loanwords
Tai Lü Blends
More Tai Lü
And Not
Potential Surprises
Test and Tell
Rendering Challenges from MFL
Introduction Control Panel The Tests Notes My Fonts

Vowel Combinations

These vowel combinations are taken from Revised proposal for encoding the Lanna script in the BMP of the UCS, ISO/IEC JTC1/SC2/WG2/N3207R, L2/07-007R (Everson, Hosken & Constable). Changes have been rung on the initial consonants to check for silly omissions.

A hyphen in the pronunciation indicates a syllable-final consonant that would be specified by a subscript consonant or following orthographic syllable.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨠᩫN/A
/ko-/
Section 5 No. 1. This sequence does not form a whole word. An example may be seen in a word for 'danger'.
ᨣᩴthen, and
/kɔː/
Section 5 No. 2
ᨧᩢ(irrealis marker)
/tɕaʔ/
Section 5 No. 4
ᨲ᩠ᩅᩫᩡ to prevaricate
/tuaʔ/
Section 5 No. 5
ᨷ᩠ᩅᩫlotus
/bua/
Section 5 No. 6
ᨠ᩠ᩅN/A
/kua-/
Section 5 No. 7. This sequence does not form a whole word. An example may be seen in one of the words for 'big'.
ᨡᩬᩴ to request
/kʰɔː/
Section 5 No. 8
ᨠᩬN/A
/kɔː-/
Section 5 No. 9. This sequence does not form a whole word. An example may be seen in the fuller spelling of the word for 'belongings'.
ᨦᩡ to split up
/ŋaʔ/
Section 5 No. 10
ᨠᩣcrow
/kaː/
Section 5 No. 11
ᨴᩤto paint
/taː/
Section 5 No. 12
ᩌᩣᩴ to sprinkle
/ham/
Section 5 No. 13
ᨣᩤᩴword
/kam/
Section 5 No. 14
ᨳᩥto pretend
/tʰiʔ/
Section 5 No. 15
ᨺᩦboil (n.)
/fiː/
Section 5 No. 16
ᨩᩧmoist
/tɕɯʔ/
Section 5 No. 17
ᨾᩨhand
/mɯː/
Section 5 No. 18
ᨵᩩmonk
/tʰuʔ/
Section 5 No. 19
ᨦᩪsnake
/ŋuː/
Section 5 No. 20
ᨲᩮᩡto kick
/keʔ/
Section 5 No. 21
ᨽᩮdanger
/pʰeː/
Section 5 No. 22
ᨤᩯᩡ to limp along
/kʰɛʔ/
Section 5 No. 23
ᨧᩯcorner
/tɕɛː/
Section 5 No. 24
ᨸᩮᩬᩥᩡ mud
/pɤʔ/
Section 5 No. 25
ᨸᩮᩥᩬᩡ Different from the proposals.
ᨶᩮᩬᩥ (final particle for commands and entreaties)
/nɤː/
Section 5 No. 26.
ᨶᩮᩥᩬ Different from the proposals.
ᨠᩮᩬᩨᩡ N/A
/kɯaʔ/
Section 5 No. 27
ᨠᩮᩨᩬᩡ Different from the proposals.
ᨠᩮᩬᩨ
/kɯa/
Section 5 No. 28
ᨠᩮᩨᩬ Different to the proposals.
ᩁᩮᩢᩣwe
/hau/
Section 5 No. 29
ᨾᩳdrunk
/mau/
Section 5 No. 30. This example is not taken from the MFL, which does not use this vowel symbol.
ᨠᩮᩣN/A
/ko:/
Section 5 No. 31. This is very rare in monosyllables, but is quite common at the end of monks' names, e.g. Adittadhammo.
ᨹ᩠ᨿᩮᩡ a type of sound
/pʰiaʔ/
Section 5 No. 32
ᨻ᩠ᨿᩮflower
/pia/
Section 5 No. 33
ᨠ᩠ᨿN/A
/kia-/
Section 5 No. 34. This sequence does not form a whole word. An example may be seen in a spelling of the word for 'city'.
ᨾᩮᩬᩥᩋᩡ mucus
/mɯaʔ/
Section 5 No. 35. (2 syllables)
ᨾᩮᩥᩬᩋᩡ Different from the proposals.
ᨠᩖᩮᩬᩥᩋ salt
/kɯa/
Section 5 No. 36. (2 syllables)
ᨠᩖᩮᩥᩬᩋ Different from the proposals.
ᩈᩰᩡ to practice
/soʔ/
Section 5 No. 37
ᨾᩰbig
/moː/
Section 5 No. 38
ᨪᩰᩬᩡ to gouge out
/sɔʔ/
Section 5 No. 39
ᨩᩢ᩠ᨿvictory
/tɕai/
Section 5 No. 40
ᨶᩲin
/nai/
Section 5 No. 41
ᨢᩱ to expose
/kʰai/
Section 5 No. 42
ᨴᩱ᩠ᨿThailand
/tai/
Section 5 No. 43
ᨠᩮᩬᩨᩡ
Khün /kɤʔ/
Section 5.3 No. 22
ᨠᩮᩨᩬᩡ Different from proposals.
ᨠᩮᩬᩨ
Khün /kɤː/
Section 5.3 No. 23
ᨠᩮᩨᩬ Different from proposals.
ᨠᩰᩢ
Khün /ko-/
Section 5.3 No. 26
ᩈᩘ First syllable of compounds of saṅgha.
/saŋ/
Section 5.3 No. 29. Apparently not a possible final syllable, but can be left stranded as a result of line-breaking.
ᨴᩢ᩠ᨦwhole
/taŋ/
Section 5.3 No. 30
ᩌᩥᩴedge
/him/
Section 5.3 No. 31 (Example from Apiradee p53, but different language, different pronunciation, i.e. not /-iŋ/.)
ᨠᩥ᩠ᨦ
/kiŋ/
Section 5.3 No. 32
ᨠᩢ᩠ᨾ
/kam/
Section 5.3 No. 34
ᨠᩢᨾ
/kam/
Section 5.3 No. 35
ᨯᩭmountain
/dɔːi/
Section 5.3 No. 36
Introduction Control Panel The Tests Notes My Fonts

Other Explicit Coding Sequences

Other explicit coding sequences are given in Revised proposal for encoding the Lanna script in the BMP of the UCS, ISO/IEC JTC1/SC2/WG2/N3207R, L2/07-007R (Everson, Hosken & Constable), and these are recorded here. Amended and exploratory material is highlighted in yellow; it is not vouched for by the proposal. The remarks are my own.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
᪓᩠ᨴthrice
/saːm tiː/
Section 2
ᨲ᩵ᩣ᩠ᨦ᩻ different in my view
/taːŋ taːŋ/
Section 7
ᨲᩣ᩠᩵ᨦ᩻ Different from the proposals.
ᨲᩣ᩠᩵ᨦ᩻ Normalisation of the above.
ᨳ᩠ᨶ᩻ᩫᩁ path
/tʰănon/
Sections 7 and 14.6 (2 syllables - the second is a single character).
ᨳ᩠ᨶᩫ᩻ᩁ Different from proposals, which specifically specified the various semantically sensitive positions of mai sam. For this word, the visual position of the marks above is free.
ᨡᩢ᩶᩻ᩬᨦ belongings
/kʰau kʰɔːŋ/
Section 7 (2 syllables - the second is a single character)
ᨡᩢᩬ᩶᩻ᨦ Different from the proposals.
ᨡᩮᩢ᩶ᩣᨡᩬᨦ belongings
/kʰau kʰɔːŋ/
Section 7 (3 syllables - the third is a single character)
ᨡᩮᩢᩣ᩶ᨡᩬᨦ Different from the proposals.
᪭ᩣ elephant
/tɕaːŋ/
Section 11
ᩉ᩠ᨶᩦto flee
/niː/
Section 14.1
ᨤ᩠ᩅᩯ᩶ᩁ to blockade
/kʰwɛːn/
Section 14.2 (2 syllables - the second is a single character)
ᩉ᩠ᩅᩫ head
/hua/
Section 14.3
ᨯᩢ᩵ᨦ᩠ᨶᩦ᩶ like this
/daŋ niː/
Section 14.4 (2 syllables)
ᩉᩥ᩠ᨶstone
/hin/
Section 14.5
ᨷ᩠᩵ᨾᩦ to not have
/bɔː miː/
Section 14.6. The proposal lists MAI KANG as a code point, but it is visually dropped in this compound. I presume the renderer is not intended to suppress the appearance of the character. The upper row drops the MAI KANG from the encoding, so is not the encoding intended, while the lower row uses the stated encoding. Da Lekh fails to arrange the marks above properly; arrangement is a proper challenge for a Tai Tham font. The phonetic syllable boundary is part of the context!
ᨷᩴ᩠᩵ᨾᩦ to not have
/bɔː miː/
ᨲᩣ᩠ᨾ to follow
/taːm/
Section 14.7
ᨻ᩠ᨿᩣ᩠ᨵᩥ sickness
/păɲaːt/
Section 14.8
ᨸ᩠ᩃ᩠ᨿ᩵ᩁ to change
/pian/
Section 14.9 (2 syllables - the second is a single character)
ᨾᩯ᩠᩶ᨶ᩠ᩅ᩵ᩣ even though
/mɛːn waː/
Section 14.9. A sophisticated font might transpose the tone marks. The phonetic syllable boundary should be part of the context.
ᨾᩯ᩠᩶ᨶ᩠ᩅ᩵ᩣ even though
/mɛːn waː/
Same as above, but normalised, so not the code point sequence in the proposal. Proposal explicitly stated SAKOT was to have ccc=0, not 9, but ccc=9 was quietly inserted in draft properties and not noticed until too late.
ᩈ᩠ᩅᩯ᩵ to butt in
/swɛː/
Section 14.10
ᩈᩯ᩠᩵ᩅ to embroider
/sɛːw/
Section 14.10 (but the proposal has vowel and tone the wrong way round)
ᩈᩯ᩠᩵ᩅ to embroider
/sɛːw/
As above, but normalised, so very much not the codepoint sequence in the proposal.
ᩈ᩵ᩯ᩠ᩅ to embroider
/sɛːw/
As above, but uncorrected. Arguably, the rendering is unconstrained.
ᨿᩪ broom, whisk
/ɲuː/
Section 15 No. 1
ᨾᩦ to have
/miː/
Section 15 No. 2
ᩉ᩠ᨾᩪ pig
/muː/
Section 15 No. 3
ᩉ᩠ᨾᩦ bear (n.)
/miː/
Section 15 No. 4
ᨹ᩠ᩅᩫhusband
/pʰua/
Section 15 No. 5
ᩉ᩠ᩃᩬᩴ᩵ to cast (in metal)
/lɔː/
Section 15 No. 6
ᨾᩣto come
/maː/
Section 15 No. 7
ᩉᩱ᩵to hit
/hai/
Section 15 No. 8
ᨾ᩠ᨿ Section 15 No. 9
ᩅ᩠ᨿᨦ city
/wiaŋ/
Section 15 No. 10 (2 syllables - the second is a single character)
ᩉᩣ᩠ᨾ to carry by the handles
/haːm/
Section 15 No. 11
ᨯᩣᩴblack
/dam/
Section 15 No. 12
ᨡᩮ᩠ᩅ Section 15 No. 13
ᩉ᩠ᨾᩣdog
/maː/
Section 15 No. 14
ᨠᩕᩣ᩠ᨸ to prostrate oneself
/kʰaːp/
Section 15 No. 15. The later addition of SIGN BA to the repertoire makes the correct final consonant here unclear.
ᨻᩕ᩵ᩣᩴ indefatigable
/pʰam/
Section 15 No. 16
ᨻᩕᩣᩴ᩵ Different from the proposals. The USE diktat at December 2021 does not determine the relative order of the tone mark and mai kang. In some styles the tone mark is associated with and follows mai kang, either above or to the right of it, but in other styles the tone mark sits on the consonant and the mai kang on the spacing vowel. Both encodings are shown here.
ᨻᩕᩣ᩵ᩴ
ᨠᩕᩬᨦ garland; Mekong
/kʰɔːŋ/
Section 15 No. 17 (2 syllables - the second is a single character)
ᩈᩕᩫᨾ᩠ᨱ᩺ ascetic
/sălom/

Section 15 No. 18. If the word is interpreted as having two phonetic syllables, then the medial consonant comes between an implicit vowel and an explicit vowel.

(2 syllables)

ᩈᩕ᩠ᩅᩫᨾ to embrace
/săluam/
Section 15 No. 19 (2 syllables - the second is a single character). Ignore final ; it makes the spelling ungrammatical. However, a few such spellings do occur in the MFL.
ᩈᩕ᩠ᩅᨾ to embrace
/săluam/
Spelling of above in the MFL, so this form's encoding is not given in the proposal.
ᨯᩮᩬᩨᩁmonth
/dɯan/
Section 15 No. 20 (2 syllables - the second is a single character)
ᨯᩮᩨᩬᩁ Different from the proposals.
ᩁᩮᩬᩨᩋboat
/hɯa/
Section 15 No. 21 (2 syllables - the second is a single character)
ᩁᩮᩨᩬᩋ Different from the proposals.
ᩉ᩠ᩃᩮᩬᩨᩋ to exceed
/lɯa/
Section 15 No. 22 (2 syllables - the second is a single character)
ᩉ᩠ᩃᩮᩨᩬᩋ Different from the proposals.
ᩉ᩠ᨾ᩵ᩣᩴ to eat
/mam/
Section 15 No. 23
ᩉ᩠ᨾᩣᩴ᩵ The USE diktat does not show specify whether mai kang or tone mark comes first. Both encodings are shown.
ᩉ᩠ᨾᩣ᩵ᩴ
ᩈ᩠ᨾᩬᩥ᩻ very level(?)
/sămɤː sămɤː/
Section 15 No. 24. Encoding as given, omitting SIGN E, which is depicted in the proposal. Moreover, the word appears to be a misreading of the next but one.
ᩈ᩠ᨾᩨᩬ᩻ Encoding is different from the proposals.
ᩈ᩠ᨾᩮᩬᩥ᩻ Section 15 No. 24. SIGN E restored to encoding.
ᩈ᩠ᨾᩮᩥᩬ᩻ SIGN E restored to a different encoding from the proposals.
ᩈ᩠ᨾ᩻ᩮᩬᩥ level (adj.)
/sămɤː/
Probable reading of above. Consequently, the encoding is not vouched for by the proposal. Phonetically, this is one or two syllables, depend on how one counts.
ᩈ᩠ᨾᩮᩥᩬ᩻ Probable reading of the above. The USE-compliant encodings of the two readings are the same, but each has compatible renderings inconsistent with the other interpretation.
ᩉ᩠ᨾᩮᩬᩨᨦ mine (n.)
/mɯaŋ/
Section 15 No. 25 (2 syllables - the second is a single character)
ᩉ᩠ᨾᩮᩨᩬᨦ Different from the proposals.
ᩉ᩠ᨿᩮᩬᩨᨦ to despise
/ɲɯaŋ/
Section 15 No. 26 (2 syllables - the second is a single character)
ᩉ᩠ᨿᩮᩨᩬᨦ Different from the proposals.
ᩉ᩠ᨾᩫ᩵ᩁ winter melon (Benincasa hispida)
/mon/
Section 15 No. 27 (2 syllables - the second is a single character)
ᩉ᩠ᩃᩣ᩠ᨿmany
/laːi/
Section 15 No. 28
ᩉ᩠ᩃᩮᩬᩨᨦ yellow
/lɯaŋ/
Section 15 No. 29 (2 syllables - the second is a single character)
ᩉ᩠ᩃᩮᩨᩬᨦ Different from the proposals.
Introduction Control Panel The Tests Notes My Fonts

Other Examples from L2/07-007R

The actual coding sequences to be used here are open to challenge.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨠᩬᩢᩃ᩠ᨼ᩺ golf
/kɔp/
Section 2. The position of RA HAAM is debatable - cf. Thai กอล์ฟ. The first example places it on the second consonant, the second on the first. The third then normalises the spelling of the second. Note that this word consists of two orthographic syllables.
ᨠᩬᩢᩃ᩠᩺ᨼ
ᨠᩬᩢᩃ᩠᩺ᨼ
ᨠᩢᩬᩃ᩠ᨼ᩺ In the December 2021 USE order.
ᨠᩢᩬᩃ᩠᩺ᨼ
ᨠᩢᩬᩃ᩠᩺ᨼ
ᨠᩕᩣ᩠ᨼ graph
/kaːp/ (?)
Section 2.
ᨴᩬᨼ᩠ᨼᩦ᩵toffee Section 2 (2 syllables)
ᨠᨽᩚ pregnant
/kap pʰa?/
Section 4 (2 syllables - the first is a single character)
ᩈᨱᩛᩣ᩠ᨶ shape
/san tʰaːn/
Section 4 (2 syllables - the first is a single character)
ᩁᨭᩛᨷᩣ᩠ᩃ government
/rat tʰa baːn/
Section 4 (3 syllables)
ᩁᩢᨭᩛᨷᩣ᩠ᩃ government
/rat tʰa baːn/
Section 4 (3 syllables)
ᩈᨻᩛ omniscience
/sap paʔ/
Section 4 (2 syllables - the first is a single character)
ᩋᨾᩛ mango
/ʔam paʔ/
Section 4 (2 syllables - the first is a single character)
ᩁᩣᨩᨽᩢ᩠ᨮ Rajabhat
/la:t tɕa pʰat/
Section 4 (3 syllables)
ᨷᩢᨱ᩠ᨻᨷᩩᩁᩩᩈ disciple
"banop burus"
Section 4 (5 syllables)
Introduction Control Panel The Tests Notes My Fonts

Mai Kang Lai

The mai kang lai character can be challenge to a font. The character has a wide range of behaviours. It can behave as a spacing final character (as in modern Tai Khün fonts) to a repha-like character, the old-fashioned behaviour seen in Tai Khün, Thailand and Laos. The MFL dictionary shows an intermediate behaviour, where marks above the following base consonant cause it to be positioned within the previous syllable. This is the style employed by the Da Lekh font.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨴᩘ᩠ᩃᩣ᩠ᨿ all
/taŋ laːi/
The ascending tail of SAKOT LA prevents the MAI KANG LAI moving on to a subsequent syllable/word. This prevents fonts exploiting the rphf feature of the Universal Shaping Engine.
ᨴ᩠ᩃᩘᩣ᩠ᨿ With total disregard for logical order.
ᩈᩘᨥᩮᩣ Nominative of Pali saṅgha
<saṅgho>
(2 syllables)
ᩁᩘᩈᩦray
/raŋ siː/
(2 syllables)
Introduction Control Panel The Tests Notes My Fonts

Ligature NAA

This is mostly a test for readers!

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨶᩣᩴto lead
/nam/
ᨾᨶᩮᩣ heart, mind
/maʔ no:/
(2 syllables)
ᨶᩮᩢᩣ to sew a long stitch
/nau/
Some fonts may fail here because they handle the ligature in pstf; this worked with HarfBuzz until pstf was moved to before Indic rearrangement.
ᨶᩣ᩠ᨿ leader
/na:i/
ᨶ᩵ᩣ᩠ᨶ Nan
/na:n/
ᨶᩣ᩠᩵ᨶ Using formalism where neither current nor historical speech defines phonetic order. The first of these two keeps user-perceivable characters contiguous, and the second is its normalisation (NFC/NFD).
ᨶᩣ᩠᩵ᨶ
ᩍᨶ᩠ᨴᩣ Indra
/ʔin ta:/
The more usual form lacks U+1A63. (2 syllables - first has one character.)
ᩋᩫᨶ᩠ᨲᩕᩣ᩠ᨿ danger
/ʔon tʰaʔ la:i/
(2 syllables)
ᨶ᩶ᩣᩴ water
/nam/
This can be surprisingly hard to achieve in a font. Logic designed to stop Arabic vowel marks wrongly interacting has to be circumvented so that the two marks will interact!
ᨶᩣ᩶ᩴ The USE rules do not dictate whether the tone mark comes before or after the mai kang. Both the canonically inequivalent forms are given here.
ᨶᩣᩴ᩶
ᨶ᩠ᩅᩣ᩠ᨷ to falsely accuse
/nwaːp/
MFL p352
ᨴᩤᩴᨶ᩠ᩅ‌ᩣ᩠ᨿ to foretell
/tam nwaːi/
NTDPLM p285. Sometimes the writer wants to avoid the ligature! (2 syllables)
ᨲ᩵ᩣᩴᨶ᩠ᩅᩣ᩠ᨿ to foretell
/tam nwaːi/
MFL p320, but only in transliteration. Shape of second syllable (ligature plus subscript consonant) is attested elsewhere. (2 syllables)
ᨲᩣ᩵ᩴᨶ᩠ᩅᩣ᩠ᨿ The USE does not dictate whether mai kang or the tone mark comes first. Both options are given here.
ᨲᩣᩴ᩵ᨶ᩠ᩅᩣ᩠ᨿ
ᨶ‌ᩣ rice field
/naː/
An isolated test of the ZWNJ feature above. This form is to be expected in texts teaching the writing system.
ᩉ᩠ᨶ᩶ᩣ face
/naː/
Note that the SAKOT prevents ligature formation.
ᩉ᩠ᨶᩣ᩶ Tone mark above consonant still follows the vowel.
Introduction Control Panel The Tests Notes My Fonts

English Loanwords

These examples are taken from the 'big blue book' pp151-6. Some of these renderings are unusual compared with the native tradition, and are included for that reason. The position of RA HAAM is particularly noteworthy.

The pronunciations given are guesswork where Siamese practice and Lanna script orthography conflict.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨠᩯᩢ᩠ᩈgas
/kɛs/
ᨴᩕᩯ᩠ᨠᨴᩮᩬᩥᩁ᩺ tractor
/tʰɛːk tʰɤː/
Slightly complicated set of consonants in first syllable. (2 syllables)
ᨴᩕᩯ᩠ᨠᨴᩮᩥᩬᩁ᩺ Vowel not as in the proposals.
ᨶᩰᩫ᩠᩶ᨲnote
/noːt/
Vowel combination not listed above
ᨷᩕᩰᨴᩦ᩠ᨶ protein
/pʰoː tiːn/
Tests reordering - the vowel symbol should appear first. (2 syllables)
ᨼᩥᩅ᩠ᩈ᩺fuse
/fiu/
ᩈᨲᩯᨾ᩠ᨷ᩺ postage stamp
/sa tɛːm/
(3 syllables)
ᩈᩮᩥᩁ᩠᩺ᨷ to serve
/sɤːp/
Compare the placement of RA HAAM with the previous word. The same contrast may be seen on p155 of the 'big blue book'. (2 syllables)
Introduction Control Panel The Tests Notes My Fonts

Tai Lü Blends

These examples are all taken from Graphic Blends at SEAsite. The pronunciations given are Tai Lü.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨴᩢ᩵ᩗᩣall
/taŋ laːi/

This word, in some of its various forms, seems to be the only word containing U+1A57 TAI THAM CONSONANT SIGN LA TANG LAI.

I withdraw my previous, surprised, reading of the word shown as containing NGA as the base consonant.

ᨡᨶ᩠ᨵᩣ spell (magic)
/kʰan tʰaː/
(2 syllables, first a single character)
ᨣ᩠᩶ᨯᩦ  okay
/kɔː diː/
A non-breaking space has been appended to avoid truncation. A sophisticated font would slide the vowel under the tone mark.
ᨷ᩠᩶ᨾᩣ to not come
/bau maː/
ᨷ᩠᩶ᨾᩣ Same again, but normalised.
ᨷ᩠᩶ᨯᩣ᩠ᨿ to not have
/bau da:i/
ᨧᩢ᩠ᩅᩤ How big an area?
/tsak va:/
ᩈᩮ᩠ᩓ᩠ᩅ deceased
/se: lɛu/
ᨴᩯ᩠ᨶᩳ Really, is that true?
/tɛː nɔː/
ᩓ᩠ᨾᩣ to look this way
/lɛ maː/
ᨠᩮ᩠ᩈᩣ hair
/keː saː/
ᨻᩱ᩠ᨾᩣ to come and go
/pai maː/
ᩈᩮ᩠ᩅ᩶ᩤ if
/seː vaː/
ᩈᩮ᩠ᩅᩤ᩶ Tone mark position not as in the proposals.
ᩅᩮ᩠ᩃᩣ time
/veː laː/
Also in Apiradee p49
ᨵᩤ᩠ᨲᩩ physical body
/tʰaː tuʔ/
The vowel on the final consonant is inescapable - there is no way of rewriting the orthographic syllable to escape the combination.
ᨩ᩠ᩓ in conclusion
/tsălɛː/
ᨻᩭ᩠ᩅ᩻ᩣ because
/pɔi vaː/
The MAI SAM tags the WA as starting a chained syllable. The spelling presumes that a font can decide that the subscript WA goes to the left of the MAI KOY.
ᨻᩭ᩠᩻ᩅᩣ A purely visual placement of MAI SAM.
ᨻᩭ᩠᩻ᩅᩣ Normalised form of the above.
ᩈᩫ᩠ᨦᩣ᩠ᨶ world
/suŋ saːn/
Introduction Control Panel The Tests Notes My Fonts

More Tai Lü

These words are taken from the MA thesis 'Development of Tai Lue Scripts and Orthography' by Apiradee Techasiriwan (อภิรดี เตชะศิริวรรณ). The pronunciations given are Tai Lü. Comparative material from elsewhere is highlighted in yellow.

TextMeaningEncodingHacked via ASCII Remarks
ᨻᩬᩳ᩵ father
/pɔː/
p3. Vowel combination not listed above. Spelling is archaic.
ᨻᩳᩬ᩵ USE vowel ordering.
ᩈᨷ᩷ᩣ᩠ᨿ content, well
/săbaːi/
p3. Rare example of a word with this tone mark. (2 syllables, first is a single character.)
ᩈᨷᩣ᩠᩷ᨿ USE tone positioning.
ᩅ᩠ᨿᩙcity
/weŋ/
p4.
ᨣᩪ᩺ person
/kun/
p4. Unetymological, phonetic spelling. The mark above is serving as a final consonant, not a cancellation mark.
ᨣ᩺ᩪ USE ordering as vowels.
᪁᪂ ᨻᩢ᩠ᨶ᩻ᩣ Sipsongpanna
/sip sɔːŋ pan naː/
p10. (Number precedes syllable). Example of mai sam marking a double-acting consonant.
᪁᪂ ᨻᩢ᩠᩻ᨶᩣ Best-looking hack for USE compliance.
ᨻᩱ᩻ᩣ᩠ᨿ to go to the location
/pai paːi/
p47.
ᨻᩱᩣ᩠᩻ᨿ Best-looking hack for USE compliance
ᨻᩱᩣ᩠᩻ᨿ Normalisation of the above.
ᨩ᩠ᨿᩙᨲᩩᩴ Kengtung
/tseŋ tuŋ/

p53. (2 syllables)

Possibly the Chengtung on the Vietnamese border.

ᩅᨲᩛᩩ matter
/wat tʰu/
p49. U+1A5B represents subscript HIGH THA rather than high RATHA. This is an issue for a font's repertoire of conjuncts.
ᩅᨲ᩠ᨳᩩ matter
/wat tʰu/
The Northern Thai writing of the above. Perhaps this should be rendered as the above when the language is Tai Lü or Lao.
ᨯ᩠ᨿᩴone
/deu/
p53. Assuming the word has TAI THAM SIGN MAI KANG rather than unencoded *TAI THAM CONSONANT SIGN FINAL WA.
ᩉ᩠ᨶᩦᩢ᩶debt
/niː/
p57.
ᩁᩮᩂ᩠ᨠ auspicious occasion
/hɤːk/
p79.
ᩁ᩠ᨿ᩺to learn
/heːn/
p118.
Introduction Control Panel The Tests Notes My Fonts

And Not

The word typically meaning 'and...not' or 'and...then' may be written with a chained syllable, and this may present challenges to renderers. The form of the letter representing /b/ in a chained syllable presented an encoding challenge. N3207R proposed using the sequence <SAKOT, BA> for it, and using <SAKOT, HIGH PA> for the subscript form corresponding to both BA (common) and HIGH PA (extremely rare) in its rôle as a final (Thai sakot) consonant. During the ISO process, a new character was introduced instead for the special form, SIGN BA, and it is widely assumed that <SAKOT, BA> represents the usual subscript form corresponding to BA, both as a sakot consonant and in the Pali /mp/ and /pp/ intervocalic clusters.

When syllables are chained, shared vowel symbols are not repeated. This leads to ambiguity as to which symbol is dropped.

All the spellings in the table below represent the same careful pronunciation in Northern Thai, namely /kɔː bɔː/. The Tai Lü forms are written with different marks and pronounced with different vowels, but use the same two consonant forms in the stack.

TextMeaningEncodingHacked via ASCII Remarks
ᨣᩴᨷᩴ᩵and...not, then...not Full form - 2 syllables, and arguably 2 words.
ᨣᩴᨷᩴdo. Univerbated form in MFL (2 syllables)
ᨣᩝᩴ᩵do. First mai kang dropped.
ᨣᩴᩝ᩵do. Second mai kang dropped.
ᨣᩝᩴdo. First mai kang dropped.
ᨣᩴᩝdo. Second mai kang dropped.
Introduction Control Panel The Tests Notes My Fonts

Potential Surprises

These words behave slightly oddly.

TextMeaning and PronunciationEncodingHacked via ASCII Remarks
ᩓᩯ very much
/lɛː/
Redundant vowel mark
ᩐᩣto take
/ʔau/
Vowel on independent vowel
ᩐ᩵ᩣ very hot
/ʔau/
Vowel and tone mark on independent vowel
ᩐᩣ᩵ USE-compliant order
ᨯᩪᩕᩣ listen to me
/duː haː/
Medial consonant between explicit vowels
ᨯᩮᩬᩥᩁᨹᩫᩖᨣᩩᨱ᩺ March
/dɯan pʰon laʔ kun/
NTDPLM p259. Double-acting medial consonant with implicit vowel after it. (3 syllables)
ᨯᩮᩥᩬᩁᨹᩫᩖᨣᩩᨱ᩺ USE-compliant vowel ordering
ᨻᩣᨷᩰᩖ Pabol (sic)
/paː boːn/
A mistake for Spanish Pablo seen on Wikipedia, but in light of the above a renderer should render it as intended.
ᨶ᩶ᩭ little
/nɔːi/
Tai Khün spelling.
ᨶᩭ᩶ USE-compliant tone mark sequencing.
ᩉᩖ᩠ᩅᨦ big
/luaŋ/
Medial consonant in middle of stack. The proposal classified the final consonant of the stack as a 'medial vowel'. (2 syllables, second a single character)
ᩉᩖ᩠ᩅᩣiron
/lwaː/
Medial consonant in middle of stack. In this case, the WA is very much a consonant.
ᨻᩕ᩠ᨿᩮᩡ a type of sound
/pʰiaʔ/
Preposed medial consonant in middle of stack along with a preposed vowel.
ᨠᩩ᩶ᩣ᩠ᨶ᩠ᨦ to prosper
/kaːn kuŋ/
The first word in the MFL! Note that there are two final consonants. The SIGN AA prevents a phonetic spelling.
ᨠᩩᩣ᩠᩶ᨶ᩠ᨦ USE-compliant tone mark placement.
ᩋᩢ᩠ᨭᩛ a satang coin
/ʔat/
Two consonants in final consonant position (3 consonants in total)
ᩆᩢᨠ᩠ᨯᩥ᩺ rank
/sak/
Consonant-killer also killing explicit vowel above (2 syllables)
ᩆᩢᨠ᩠ᨯᩥ᩼ rank
/sak/
Same again, but with KARAN instead of RA HAAM. Some people are using KARAN in Northern Thai instead of RA HAAM! (2 syllables)
ᨾᩉᩣᩉᩥᨦ᩠ᨣᩩ᩺ giant fennel
/ma haː hiŋ/
Consonant-killer also killing explicit vowel below (4 syllables)
ᨾᩉᩣᩉᩥᨦ᩠ᨣ᩺ᩩ USE then requires that the killer precede the killed vowel.
ᨾᩉᩣᩉᩥᨦ᩠ᨣᩩ᩼ Same again, but with KARAN. (4 syllables)
ᩆᩣᩈ᩠ᨲᩕ᩺ science
/saːt/
Consonant-killer also killing medial consonant. NT spelling. (2 syllables)
ᩈᩣᩈ᩠ᨲᩕ᩼ science
/saːt/
Consonant-killer also killing medial consonant. Tai Khün spelling. (2 syllables)
ᩁᩪ᩠ᨷimage
/huːp/
This spelling is archaic in Northern Thailand (but current in Tai Khün)
ᨻᩦ᩠᩵ᨶᩬ᩶ᨦ relatives
/piː nɔːŋ/
(2 syllables - second is a single character)
ᩃᩢᩪ child (progeny)
/luːk/
USE demands that mai kak (see next) precede most of the vowels that it phonetically follows.
ᩃᩪᩢ MAI SAT can serve as a final consonant, /k/. This leads to yet more formal vowel combinations.
ᨸᩢᩣmouth
/paːk/
ᨯᩬᩢ flower
/dɔːk/
ᨯᩢᩬ USE-compliant ordering.
ᨯᩢᩬᩡ USE-compliant ordering.
ᨯᩬᩢᩡ MAI SAT can even be reinforced by SIGN A.
ᨻ᩠ᩅᩢᩡ group
/puak/
ᨲᩯ᩠ᨶᩬᩴ᩵ wasp, hornet
/tɛːn tɔː/
A single orthographic syllable.
ᨲᩬᩴ᩵͏ᩯ᩠ᨶ wasp, hornet
/tɔː tɛːn/
Should normally be visually identical with the above - the font may be too crude. However, when font colouring is supported, the vowel below should be coloured differently in the Da Lekh Si font; that font is intended to reveal the order of characters.
ᨲᩬᩴ᩵ᩯ᩠ᨶ Would it be legitimate for this to render differently to the above?
ᩈ᩠ᨶᩫ᩻street
/sănon/
The mai sam represents the final consonant in addition to the epenthetic vowel.
ᨠᨾᩛᩦ scripture
/kam piː/

The surprise is that U+1A5B had InSC=Consonant_Final until Unicode 10.0.

(2 syllables - the first is a single consonant in the first example.)

ᨶᩥᨻᩛᩣ᩠ᨶ nirvana
/nip paːn/
ᨵᨾᩜᩥᨠ saintly
/tʰam miʔ kaʔ/
Chiengtung p166. It has 3 syllables - the second is of interest. It may show a problem with U+1A5C having InSC=Consonant_Final until Unicode 10.0.
ᩈᨵᩩ᩠ᨷ stupa(?)
/sătʰup/
Chiengtung p166. (2 syllables - the first is a single letter.) This shows the issue with placement of the vowel and 'sakot' consonant also applies to this explicit vowel.
ᩋᩣᨴᩥᨲ᩠ᨲᨵᨾᩜᩮᩣ Adittadhammo
Pali <Ādittadhammo>
Chiengtung p264. (5 syllables)
ᨬᩣᨱᨵᨾᩜᩮᩣ Nyanadhammo
Pali <Ñāṇadhammo>
Chiengtung p238. The individual referred is not the one hyperlinked to. (4 syllables)
ᩅᩥᩈᩮ᩠ᩈ special
/wiʔ seːt/
Note the lack of a ligature. (2 syllables)
ᨢ᩶ᩣ slave
/kʰaː/
Same character order as in Thai and Lao!
ᨢᩣ᩶ But not if the USE prevails!
ᩈᩣᩈᨶᩣ religion
/saː saʔ naː/

Full (5 chars) and contracted (7 chars) forms.

(3 and 2 syllables respectively)

ᩈᩣᩈ᩠ᨶ᩻ᩣ
ᩈ᩠ᨶ᩻ᩮᩢ᩶ᩣ javelin
/sănau/
ᨲᩦ͏ᩣ᩠ᨿ to beat to death
/tiː taːi/
Uses CGJ as an invisible MAI SAM to stand for the duplicated consonant.
ᩋᩮᩰᩣᨽᩣᩈ to illuminate
/ʔoː pʰaː saʔ/
MFL p919. While the spelling rules call for either just U+1A70 SIGN OO or just the combination of <U+1A6E SIGN E, U+1A63 SIGN AA>, this might conceivably be a private lexicographer's notation indicating that both occur that happened to escape into the published work. The graphical order, left-to-right, in the MFL is SIGN OO, SIGN E, LETTER A, SIGN AA. The 'hacked via ASCII' rendering is wrong. (3 syllables - first is of interest.)
ᩉ᩠ᨾ᩵ᩣᩴ᩻ Grub's up!
/mam mam/
ᩉ᩠ᨾᩣᩴ᩵᩻ It is not clear whether a USE-compliant form should have MAI KANG or the tone mark first.
ᩉ᩠ᨾᩣ᩵ᩴ᩻
ᩃᩮᩞ trickery
/leːs/
Tai Khün spelling, cited in N3384
ᩋᨶᩣᨳᨷᩥᨱ᩠ᨯᩥᨠᩈᩞ Anathapindika's

Pali <Anāthapiṇḍikassa>
A rare spelling of the Pali masculine genitive singular ending. Note that SIGN SA starts the final phonetic syllable. (7 syllables - the last one is of interest.)
Introduction Control Panel The Tests Notes My Fonts

Test and Tell

These exampless are intended to reveal the behaviour of the rendering system, rather than be clear pass or fail tests.

TextMeaning and PronunciationEncoding Hacked via ASCIIRemarks
Interpretation
ᨠ᩠ᨷ (no meaning) Interpretation of <SAKOT, BA> and <SAKOT, HIGH PA> respectively. This looks at font behaviour rather than at layout engine behaviour.
ᨠ᩠ᨸ (no meaning)
Line-Breaking within the Orthographic Syllable
Manual line breaking may break lines between the dependent vowel AA (U+1A63 and possibly U+1A64) and its base consonant. Figure 9b of N3207R provides an example. What appears to be a misspelt form of sammodamānehi (as ᩈᨾᩮᩣᨴ᩠ᨴᨾᩣᨶᩮᩉᩥ), the instrumental or ablative plural of the present participle of sammodati, is split between the second and third lines of the second leaf by splitting just before U+1A63. How will this be handled? The answer may be application dependent.
ᩈᨾᩮᩣᨴ᩠ᨴᨾ­ᩣᨶᩮᩉᩥ with (things) on friendly terms
Pali <samoddamānehi>

Split using a soft hyphen. (Many syllables.)

The text occurs with and without dingbats (U+1AA5) so that one can see whether an inactive soft hyphen affects it.

᪥᪥᪥ᩈᨾᩮᩣᨴ᩠ᨴᨾ­ᩣᨶᩮᩉᩥ with (things) on friendly terms
Pali <samoddamānehi>
ᩈᨾᩮᩣᨴ᩠ᨴᨾ​ᩣᨶᩮᩉᩥ with (things) on friendly terms
Pali <samoddamānehi>
Split using zero width space - this uses the presentation-oriented view that ZWSP is simply a soft hyphen without visible rendering. This test is uninformative if the renderer refuses to make the break. See above for dingbats. (Many syllables)
᪥᪥ᩈᨾᩮᩣᨴ᩠ᨴᨾ​ᩣᨶᩮᩉᩥ with (things) on friendly terms
Pali <samoddamānehi>
Baseless Marks and Non-alphabetic Bases
(no meaning) Bare vowel symbol
 ᩣ(no meaning) Vowel symbol 'on' NBSP
 ‍ᩣ (no meaning) Vowel symbol 'on' NBSP with ZWJ.
ᨷ ◌ᩮ N/A And now discourage the use of multiple script runs by the renderer.
Dependent Consonant Above and Tone Mark - What Chooses the Order?
ᨾ᩠ᩅ᩺᩵to be fun
Khün /mon/
Typed as seen. Da Lekh fonts place the glyphs side by side, but the order is as in the Tai Khün manuscript. To be precise, it is an extract from a 1949 edition of the Khemarat Weekly, reproduced in L2/17-120 Figure 4.
ᨾ᩠ᩅ᩵᩺to be fun
Khün /mon/
Typed with tone mark first. Da Lekh accepts the order, just as Thai does not rearrange THANTHAKHAT (or vowels above) with tone marks. The Da Lekh rendering does not match the Tai Khün manuscript
ᨾ᩠ᩅ᩺᩵᩻ to be lots of fun
Khün /mon mon/
Not actually attested, but grammatical derivatives of the above.
ᨾ᩠ᩅ᩵᩺᩻ to be lots of fun
Khün /mon mon/
ᨣᩪ᩺᩻ everyone
Tai Lü /kun kun/
Theoretical derivative of the unetymological, phonetic spelling of the word for person. The first mark above is serving as a final consonant, not a cancellation mark.
Coda Consonants v. Onset Consonants

U+1A5B TAI THAM CONSONANT SIGN HIGH RATHA OR LOW PA and U+1A54 TAI THAM LETTER GREAT SA may have been created so that conjuncts would be different from accidental combinations of initial and final consonants. Are these differences maintained? This primarily probes the font properties, though rendering engines may have an effect.

There are very few words that are affected. A word for 'special' is one of the few.

ᨭᩮ᩠ᨮ ᨭᩛᩮ (no meaning)
/te:t/ /-t tʰe:/
Should be different. (2 syllables)
ᨱᩮ᩠ᨮ ᨱᩛᩮ (no meaning)
/ne:t/ /-n tʰe:/
Should be different. (2 syllables)
ᨲᩮ᩠ᨮ ᨲᩛᩮ (no meaning)
/te:t/ /-t tʰe:/
Should probably be different. (2 syllables)
ᨻᩮ᩠ᨻ ᨻᩛᩮ (no meaning)
/pe:p/ /-p pe:/
Should be different. (2 syllables)
ᨾᩮ᩠ᨻ ᨾᩛᩮ (no meaning)
/me:p/ /-m pe:/
Should be different. (2 syllables)
ᨠᩮ᩠ᩁ ᨻᩕᩮ (no meaning)
/keːn/ /kʰe/
Should be different. (2 syllables)
ᨠᩮ᩠ᩃ ᨠᩖᩮ (no meaning)
/keːn/ /keː/
Should be different. (2 syllables)
ᨠᩖᩮ ᨠ᩠ᩃᩮ (no meaning)
/keː/ /keː/

However, those who don't use MEDIAL LA won't make a visual distinction to show the position of the vowel!

(2 syllables)

ᩈ᩠ᩈ ᩈᩞ ᩔ(no meaning) Should be different. (3 syllables)
ᨾ᩠ᨾ ᨾᩜ(no meaning) Should be different. (2 syllables)
Behaviour of <SAKOT, NYA>
ᨬᩮ᩠ᨬ ᨬ᩠ᨬᩮ (no meaning)
/ɲeːn/ /-n ɲeː/
Would ideally be different, but this may not be readily and robustly achievable. (2 syllables)
ᨬᩮ‌᩠ᨬ ᨬ᩠ᨬᩮ (no meaning)
/ɲeːn/ /-n ɲeː/
Instead should be different. (2 syllables)
ᨱ᩠ᨬ ᨬ᩠ᨬ(no meaning) Should these be different? (2 syllables)
ᨱᩮ᩠ᨬ ᨱ᩠ᨬᩮ (no meaning)
/neːn/ /-n ɲeː/
Should these be different? (2 syllables)
Marks from outside the Tai Tham Block
ᩋᩦ๊ (meaningless syllable in refrain of a song)
/ʔiː/
Thai mai tri and mai chattawa are found on tua mueang 'words' on p236 of the big blue book! Of course, these might just be the unencoded THAI-LAO TONES THREE and FOUR. In this particular case, a rendering issue might be alleviated by making the default positions of the tone marks higher than that of the vowels above.
ᩋᩦ๋ (meaningless syllable in refrain of a song)
/ʔiː/
Language-Sensitive Forms (Browser Test?)
ᩌᩣᩴ ᩌᩣᩴ bran (written twice)
/ham/

The top two rows are declared to be in Lao, and the second also has a corresponding style-setting lest the language setting be ignored.

The initial consonant takes the form in the Da Lekh family of the consonant form used in that role in Laos and Northeast Thailand, namely , which is only subtly different from U+1A41 TAI THAM LETTER RA. So doing may be improper behaviour, but is seen in fonts.

The mai kang should appear on the vowel U+1A63 TAI THAM VOWEL SIGN AA, its usual position outside Thailand. Of course, this won't happen if the font cannot be appropriate for such writing systems. At least one browser has failed to render the final stack properly when it has been the final glyph in the glyph stream; this is why the word is written twice.

The bottom row is not marked for language, and shows the same word (and encoding). The Da Lekh font follows the more technically challenging Chiangmai style by default, with the MAI KANG on the consonant.

(2 words, so 2 syllables!)

ᩌᩣᩴ ᩌᩣᩴ bran (written twice)
/ham/
ᩌᩣᩴ ᩌᩣᩴ bran (written twice)
/ham/
Tone before Vowel!
ᨣ᩠ᩅ᩵ᩢᩣ᩠ᨶ when
kan waː
p118 - MFL clearly has the tone as the first mark! It may be that these are just typing errors. There are two other examples of tone and then vowel in the dictionary, the same tone and vowel as here.
ᨣ᩠ᩅ᩵ᩢᩣ and say
kɔʔ waː
Introduction Control Panel The Tests Notes My Fonts

Rendering Challenges from MFL

These words presented problems, now overcome (Version 0.05), when developing the Da Lekh font to overcome the problems presented by the Universal Shaping Engine of mid 2016. (The solution is not entirely compliant with the Unicode standard - dotted circles in the input are sometimes deleted.) These are offered as an aid to font developers fighting unhelpful layout engines; they are not expected to help developers of the core layout engines.

TextMeaning and PronunciationEncoding Hacked via ASCIIRemarks
ᨠ᩠ᩃ᩻ᩬ᩵ᨾ Cambodian
/kălɔːm/
p4 (2 syllables, second a single consonant)
ᨠ᩠ᩃᩬ᩵᩻ᨾ Hard to interpret encoding.
ᨠ᩠ᩃᩬ᩻᩵ᨾ USE-compatible, with interpretable rendering specification.
ᨠᩕᩥ᩠᩵ᨦ suspicious
/kʰiŋ/
p15
ᨡᩮᩢ᩶ᩬᩣ᩠ᨦ belongings
/kʰau kʰɔːŋ/

p101 - the one syllable form.

The first form minimises the disruption to the pattern of first element followed by second element. The second spelling tries sticking in CGJ to advise that the ordering of the marks is not an error. The third spelling follows the principle that if the components cannot be concatenated (with deletion and addition of SAKOT or equivalent as appropriate), then the ordering should be based on the visual layout of the marks.

ᨡᩮᩢ᩶͏ᩬᩣ᩠ᨦ
ᨡᩮᩬᩢ᩶ᩣ᩠ᨦ
ᨡᩮᩢᩬᩣ᩠᩶ᨦ USE (December 2021)-compatible rearrangement of the above - but the final consonant is still incompatible at 2021.
ᨦ᩠ᩅ᩶ᩣ᩻ ᨪᩰᩫ᩠᩶ᨦ᩻ spastic
/ŋwaː ŋwaː soːŋ soːŋ/
p168 (2 syllables)
ᨦ᩠ᩅᩣ᩶᩻ ᨪᩰᩫ᩠᩶ᨦ᩻ Vowel and tone order adjusted to the USE as at December 2021.
ᨴᩯ᩠᩶ᩃ truth to tell
/tɛː lɛː/
p318. The first entry has the written vowel with the first consonant, the second with the second, and the third entry is the same as the second but normalised.
ᨴ᩠᩶ᩃᩯ
ᨴ᩠᩶ᩃᩯ
ᨳᩮᩬᩥᩡ᩻ ᨳᩮᩥ᩠ᨠ᩻ bruised
/tʰɤʔ tʰɤʔ tʰɤːk tʰɤːk/
p314. (2 syllables)
ᨳᩮᩥᩬᩡ᩻ ᨳᩮᩥ᩠ᨠ᩻ With vowel order of the USE as at December 2021
ᨾᩉᩫᩖᨿᩰᨴᩤ great army
/maʔ hon yoː tʰaː/
NTDPLM p511.
Introduction Control Panel The Tests Notes My Fonts

Notes

References

Short NameFull Reference
N3207R Everson M., Hosken M. & Constable P. Revised proposal for encoding the Lanna script in the BMP of the UCS, ISO/IEC JTC1/SC2/WG2/N3207R, L2/07-007R
MFL Rungrueangsi, Udom (2004) [1991]. Lanna-Thai Dictionary, Princess Mother Version พจนานุกรมล้านนา ~ ไทย ฉบับแม่ฟ้าหลวง ᨻᨧᨶᩣᨶᩩᨠᩕᩫ᩠ᨾᩃ᩶ᩣ᩠ᨶᨶᩣ ~ ᨴᩱ᩠ᨿ ᨨᨷᩢ᩠ᨷᨾᩯ᩵ᨼ᩶ᩣᩉᩖ᩠ᩅᨦ [Photchananukrom Lanna ~ Thai, Chabap Maefa Luang] (in Thai) (Revision 1 ed.). Chiang Mai: Rongphim Ming Mueang (โรงพิมพ์มิ่งเมือง). ISBN 974-8359-03-4.
big blue book Wacharasat, Bunkhit (2003). Language of Mueang Lanna ᨽᩣᩈᩣᨾᩮᩬᩨᨦᩃ᩶ᩣ᩠ᨶᨶᩣ ภาษาเมืองล้านนา [Phasa Mueang Lanna] (in Thai). ISBN 974-85472-0-5
Apiradee Techasiriwan, Apiradee อภิรดี เตชะศิริวรรณ. พัฒนาการของอักษรและอัขรวิธีในเอกสานไทลื้ [Patthanakan khong Akson lae Akhara Witi nai Ekasan Thai Lue] Development of Tai Lue Scripts and Orthography. MA Thesis, Chiangmai University (in Thai)
NTDPLM Arunrat Wichiankhiao et al. อรุณรัตน์ วิเชียรเขียว (1996). ᨻᨧᨶᩣᨶᩩᨠᩕᩫ᩠ᨾᩃ᩶ᩣ᩠ᨶᨶᩣᨨᨻᩕᩰᩬᩡᨣᩤᩴᨴᩦ᩵ᨷᩕᩤᨠᩫ᩠ᨭᨶᩱᨷᩱᩃᩣ᩠ᨶ พจนานุกรมศัพท์ล้านนาเฉพาะคำที่ปรากฏในใบลาน The Northern Thai Dictionary of Palm-Leaf Manuscripts. ISBN 974-7067-77-2
Chiengtung Chieng Tung: Its Way of Life ᨡᩮᨾᩁᨭᩛᨶᨣᩬᩁᨩ᩠ᨿᨦᨲᩩᨦ [Khemarattha Nakon Cheng Tung] เขมรัฐนครดชียงตุง [Khemarat Nakhon Chiang Tung] (in Thai, Tai Khün, French and English) Chiang Mai: Wat Tha Kradas (วัดท่ากระดาษ)
L2/17-120 Wordingham J.R. Corrections to the Indic Syllabic Category for the Tai Tham Script, L2/17-120
N3384 Hosken M. Tai Tham Subjoined Variants, ISO/IEC JTC1/SC2/WG2/N3384, L2/08-073

Document History

This is Version 2.12 of the web page, which has been written by Richard Wordingham.

History

Version Date Changes
1.0 14 June 2015 Initial 'stable' (i.e. abandoned) version. Work had started on 27 February 2015, and there may be earlier versions around.
1.1 25 September 2016 Converted from XML to HTML (by stripping off XML header) for new website.
2.0 25 October 2016

Added option to dynamically switch fonts - free font Da Lekh Seri for exposure to rendering engine foibles, and encumbered font Da Lekh for resistance. Both fonts are open source, but I created all the inked glyphs for the Da Lekh Seri font. ('Seri' means beholding to no-one.)

Completed references, and improved, pruned and extended the examples.

2.1 26 October 2016

Corrected typos. Started testing of display bases.

2.2 7 November 2016

Fixed transliterator bug. Added examples from testing of Da Lekh font work-arounds. Corrected more typos. Tested language sensitivity.

2.3 14 November 2016

Added styles to force Lao forms. Reorganised 'test and tell'. Added one new test word, for mai kam followed by mai sam.

2.4 14 April 2017

Improved 'bran' bug alert.

Added 'A Tai Tham KH' font with and without ccmp enabled. The radar buttons are hidden, and anyone enabling them would also have to supply the font.

Added test for double acting MEDIAL LA.

2.5 8 July 2017

Added test for tone plus SIGN OY.

Added colour fonts to show phonetic position of subscripts relative to vowel.

Added "onclick" for radio buttons.

2.6 22 February 2018

Added test cases for karan on vowels and medial la following preposed vowel.

2.7 12 May 2018

Added three new fonts - 'A Tai Tham KH', 'Hariphunchai' and my extension of the latter, 'Lamphun'.

Added a few more examples of the ᨶᩣ ligature.

Added query as to when ᨲᩬᩴ᩵͏ᩯ᩠ᨶ should render properly.

Colour for spell-checking is now a reality.

2.8 17 February 2019

Added test cases for ᩃᩮᩞ and ᨻᩕ᩠ᨿᩮᩡ.

2.9 9 December 2021

Corrected feature ss99 to ss19.

Clarified rôle of Da Lekh Si fonts. Updated repertoire of Da Lekh Seri fonts.

Added play area for readers to try the fonts out.

Massively duplicated navigation bars to avoid need to scroll to top or bottom.

Made the difference between strings that shall be rendered and other possible sequences clearer. Promoted four test and tell cases to test cases - three sequences for displaying marks and one for the combination of RA HAAM and MAI SAM.

Linked to my font compiler.

2.10 30 December 2021

Noted that colour now works even in IE 11 and also in LibreOffice.

Added most recent (2019) version of Hariphunchai, dubbed Hariphunchai 4.

Added USE-compatible encodings to avoid maligning any fonts that assume a USE-compatible encoding.

2.12 23 January 2022

(Including changes to 2.11).

Fixed miscellaneous typos, including alternative encodings of ᨶ᩶ᩣᩴ. Changed shortcomings of 'my fonts' to shortcomings of 'Da Lekh'.

Changed site from HTTP to HTTPS.

Changed background for USE encoding from red to orange to avoid clash with coloured fonts.

Testing

This web page has been developed with frequent testing on Firefox Version 54 and occasional viewing using Safari on iPhone (iOS 10.3.2), IE 11 (on Windows 7) and Microsoft Edge (on Windows 10).

Switching fonts has been tested in all these browsers.

Introduction Control Panel The Tests Notes My Fonts

Font Availability

Da Lekh Font Family

You may freely use my four fonts mentioned here without modification and may freely examine my fonts. See the respective licensing for conditions and modification. I do not own all the intellectual property rights for the Da Lekh and Da Lekh Si fonts. The fonts are available as follows:

NameFont fileSource file Licence file
Da Lekh
(ᨯᩣᩃᩮ᩠ᨡ)
dalekh.ttf File dalekh.txt in dalekh.zip. This is also the ultimate source code for the Da Lekh Seri font. See Makefile therein for preprocessing directives. DejaVu licence
Da Lekh Si
(ᨯᩣᩃᩮ᩠ᨡᩈᩦ)
dalekh_si.ttf
Da Lekh Seri
(ᨯᩣᩃᩮ᩠ᨡᩈᩮᩁᩥ)
dalekh_seri.ttf Either start from the source code, which is subject to the DejaVu licence, for the Da Lekh font, or use the preprocessed file dalekh_seri.txt. If the GNU Compiler Collection is available, one may use the following command to generate the immediate 'source' code:
cc -E -fdirectives-only -DSERI -x c dalekh.txt | grep -v ^# >| dalekh_seri.txt
seri_license.htm
Da Lekh Si Seri
(ᨯᩣᩃᩮ᩠ᨡᩈᩦᩈᩮᩁᩥ)
dalekh_si_seri.ttf Either start from the source code, which is subject to the DejaVu licence, for the Da Lekh font, or use the preprocessed file dalekh_si_seri.txt. If the GNU Compiler Collection is available, one may use the following command to generate the immediate 'source' code:
cc -E -fdirectives-only -DSERI -DCOLOUR -x c dalekh.txt | grep -v ^# >| dalekh_seri.txt

If you wish to have WOFF files, you should either generate them yourself from the font files listed above, or simply copy them from this website.

The fonts are generated from the source code by means of a DIY font compiler that still has many rough edges. However, the source code of the font, although spartanly commented, may make it clearer what the font is attempting to do. I have endeavoured to make reverse engineering unnecessary.

The font Da Lekh is partly intended for my practical use in analysing material in the Tai Tham script. It therefore contains a large set of Latin characters to support transcription and transliteration. It also contains work arounds so that it may render properly despite problems with rendering engines.

The other purpose of the fonts is to explore issues in making an OpenType font for the Tai Tham script.

The font Da Lekh Seri is an unencumbered font intended for testing rendering engines. It therefore has, besides the glyphs for Tai Tham writing systems, just a bespoke set of (poor) ASCII glyphs; both the extra characters required by Microsoft Office and the characters recommended for the Universal Script Engine; and the characters needed for transliteration style (feature ss04) and their closure under NFC. Known existing work-arounds have been removed. This removal is implemented by compiler directives.

The font Da Lekh Si (ᨯᩣᩃᩮ᩠ᨡᩈᩦ) differs from Da Lekh in that it aims to reveal the spelling of words. This is useful when using a spell-checker, for example on Firefox. The ideal is that subscript consonants in the coda of an orthographic syllable would be distinguished from those in the onset by colour, whence the word 'Si' in the name of the font. The colour technology used works in the dominant browsers (Chrome, Safari, Firefox, MS Edge and even Internet Explorer 11) and in the word processor of LibreOffice. The colouring is also applied to chained syllables.

It is possible that Da Lekh Si may be reduced to an optional OpenType feature applied to the Da Lekh font.

The font Da Lekh Si Seri is an unencumbered font that colours glyphs in the same way. Like Da Lekh Seri, it deliberately lacks work-arounds for problems with renderers. It is intended as an aid for the development of the Da Lekh Si font.

Lamphun Font

The Lamphun font is available under the SIL open font licence; the applicable customisation declares that "Hariphunchai" and "Lamphun" are reserved font names. The font file is lamphun.otf and what I have used as 'source' code to build the font is an untidy mess assembled in lamphun.zip:

RôleNameRemarks
GlyphsHariphunchai.otf

A version of the font dated 5 May 2014, taken from SourceForge. The 'unique identifier' in the name table is FontForge : Hariphunchai : 5-5-2014. There were later .sfd and .fea files at the same location, but at best they offered improved glyphs compared to Lamphun.

This is the file that defines the 'early' Hariphunchai font as used on this web page.

OTL tableslamphun.txt

This defines a font with the same glyph numbering, but with blank glyphs. I then replace 7 tables in the early Hariphunchai font with tables from this new font:

NameReason
name
Rename font to comply with the licence, and record appropriate licensing and history information.
GSUB
Include lookups to undo rendering damage by the USE. Position medial ra using feature pref. Move other lookups from ccmp to blws, so that they are applied when missyllabification by USE is no longer an issue. Choose appropriate glyphs when there are level 2 subscripts with vocalic function. Handle mai kam, subscript consonants on NYA, and N.WAA and N.HAA.
GPOS
Add some mark-to-mark positioning. Use dist to restore advance widths of spacing subscripts.
GDEF
Correct a few glyph categorisations.
cmap
Add mappings for control characters so that rendering engine damage can be repaired.
OS/2
Allow more complex OTL operations. Declare greater line depth to enable more rapid rendering. Blanked vendor ID.
head
Change font revision and modification time.
Change logfontlog.txtOnly for Lamphun.
Make filelamphun_makefile The compiler invoked by '~/oft/parse' is my DIY font compiler.

It is likely that I will create a variant coloured to indicate spelling.

Hariphunchai Font

There are two versions of the font used on this page. The fonts themselves are distinguished by the unique font identifiers in their name tables.

The early version of the Hariphunchai font, whose 'unique identifier' is FontForge : Hariphunchai : 5-5-2014, is available as Hariphunchai.otf both on SourceForge and within the Lamphun source zip file. The reversibly generated WOFF file is available here.

The 2019 version, whose 'unique identifier' in the name table is TragerStudio : Hariphunchai : 19-5-2019, is available as Hariphunchai4.otf on SourceForge. The reversibly generated WOFF file is available here.

The licence is available on Source Forge. The WOFF files, being derivative works, are licensed under the same licence. As the original OTF files can be recovered from them, they preserve the font names.

Introduction Control Panel The Tests Notes My Fonts