Introduction Control Panel The Tests Notes My Fonts

Introduction

Purpose

This page provides a set of test cases for a Tai Tham renderer. It has been compiled with a view to putting the 'Universal Shaping Engine' through its paces for the Tai Tham script. This set of tests is incomplete in that it does not directly give the correct renderings, although some one in possession of the source documents could visually check them.

I was originally requested to provide words of one syllable for such a test. By syllable, I understand an Indic syllable of the form C+(M*V*M*C*M*)* with a single base consonant. (M = miscellaneous mark). I include cases where the rôle of the base consonant is played by something other than a letter. The post-vocalic consonants occur not only in other SEA Indic scripts such as the Khmer script and Lao script (as in the use of ຽ U+0EBD LAO SEMIVOWEL SIGN NYO in the Lao writing system), but also in Tibetan.

However, many of the interesting cases occur in the second syllable of a word, and certain initial syllables are obligatorily followed by more characters of the word. I have therefore also supplied longer words when a conceivable problem would not appear in a word of one syllable.

The dependent vowel AA (U+1A63 and U+1A64) may form the base of its own little stack of dependent marks. Manual line breaking may also separate it from its base consonant. I have nevertheless counted it as part of the same syllable as a base consonant; the two stacks frequently interact in Northern Thai, with MAI KANG migrating to or towards the base consonant and interacting with its dependents.

The page was originally set up to use either my own stick font, 'Da Lekh', which is based on Deja Vu Sans, or the cut down version, 'Da Lekh Seri'. The Da Lekh font is intended to be suitable for use in preparing (but perhaps not publishing) Tai Tham text. It therefore includes work-arounds for known rendering engine problems. The Da Lekh Seri font deliberately does not include such work-arounds. You may be interested in using or examining my fonts for your own purposes.

I have added two other families of fonts. These fonts are available under the SIL Open Font license. The font 'A Tai Tham KH' relies only on the ccmp feature being enabled; it handles all Indic rearrangement itself.

The Hariphunchai font is an OpenType Layout font that looked promising when used with the South-East Asian shaper of HarfBuzz. Development seems to have drastically slowed when HarfBuzz switched Tai Tham to its implementation of the Universal Shaping Engine (USE). The code for this font is available on SourceForge and there is further documentation elsewhere. I have added work-arounds and a few further touches to enable it to work under the USE; I have dubbed the resulting font 'Lamphun'. I have included two versions in the menus, the 2014 version used for Lamphun, dubbed 'early Hariphunchai', and the latest (2019) version, dubbed 'Hariphunchai4'.

Feel free to adapt this web page to add your own fonts and test cases.

Content and Layout

The test text is given in the table columns headed 'Text', and is the content of the first table cell in table rows with class tst1. Two further columns, headed 'Encoding' and 'Hacked via ASCII', are automatically derived from this text as the page is loaded. The hacked column is intended to show users how the text should look, though it too may suffer from rendering engine limitations. The font used for this column is the member of the Da Lekh font family last selected to display the first column. Ideally, I would include images of the text from credible sources, but that may cause copyright problems, for the Unicode Consortium wishes to be able to use this document for commercial purposes.

The 'Hacked via ASCII' column contains an unambiguous transliteration to ASCII of the Tai Tham text in the column headed 'Text'. Members of the Da Lekh font family contain an open type font feature, Stylistic Set 2, whose enabling may cause it to render the transliteration as the original Tai Tham text as it is intended to be rendered. For more details, see the style sheet in the source of this page.

The 'Meaning and Pronunciation' column is given to identfy the word given as an example. There may be better glosses, and pronunciation can vary extensively within a nominal language. The letter RA is particularly variation between /l/, /h/ and even /r/, and there are regional variations as to whether vowel length distinction exist and, if so, whether they are phonemic. For the Tai languages the pronunciation is given using IPA, while Pali is simply transliterated (as Pali). I have omitted tone, as phonetic tone is also quite variable. Where no indication to the contrary is given, the Tai pronunciation given approximates that of Chiangmai.

The test words may, in principle, be extracted quite simply from this web page. Each test 'word' is the content of the first cell in each row whose class is tst1. For convenience, I have extracted the first two cells in such rows, along with titles, to a CSV file. Rows where there is a plausible case for treating the encoding used as erroneous are marked in pink. (Their CSC class is tst2.) For completeness, I have included alternative encodings which the Universal Script Engine (USE) calls for with an orange background and CSS class tst3 when they are defensible encodings. The USE encoding is not well-supported by fonts and is not robust to alternative classifications of combining marks.

The HTML comments within this web page should not be construed as holding test words.

Font Testing

This page is intended as a rendering engine test, rather than as a font test. However, you may modify this page to try out your own font. The necessary changes will be confined to the style sheet in the source code of this page, unless you use a different ASCIIfication scheme, in which case look at the usage of javascript variable ss02_hack.

My Rendering Performance

When this page was initially composed, in June 2015, the Da Lekh font currently mostly worked for the Tai Tham script in the Firefox and Chrome broswers. It worked in them because they use HarfBuzz to render the Tai Tham script. Since then, the HarfBuzz rendering engine used for Tai Tham has been brought into line with the Universal Script Engine, with a consequent dramatic fall in the rendering performance for the Tai Tham script.

The solution to this problem was to add numerous work-arounds to the font. These work-arounds have mostly restored performance, the main exceptions being subtle positioning errors where mark to base positioning is ignored and the default mark position is used instead.

The quality of the 'Hacked via ASCII' column varies from browser to browser and operating system to system, and also varies over time. For Internet Explorer 11, Microsoft Edge and for the HarfBuzz-based browsers Firefox and Chrome, it is actually the best rendered column. (Script-specific rendering engines have a tendency to make the achievement of advanced script features dificult rather than easy; Tai Tham has many 'advanced' features.)

Font Peculiarities

Traditionally, the consonants used in neither Pali nor Sanskrit did not have subscript forms. However, one significant text book, the 'big blue book', provides a subscript form for LOW FA for use in loans from English. However, this form is cramped and ugly, which goes against the tradition of Lanna script writing. The MFL treats the stroke distinguishing HIGH KXA, LOW KXA, LOW SA, LOW FA and LETTER UU from HIGH KHA, LOW KA, LOW CA, LOW PA and LETTER U as a diacritic. The Da Lekh font follows this interpretation, and leaves this diacritic above the baseline when the letter is subscripted.

If you have difficulty reading the Da Lekh fonts, you may find it useful to consult their glyph gallery.

Introduction Control Panel The Tests Notes My Fonts

Control Panel

Font Selection

Da Lekh Seri Da Lekh
Da Lekh Si Seri Da Lekh Si
Da Lekh Seri (transliterated form) Da Lekh (transliterated form)
Da Lekh Si Seri (transliterated form) Da Lekh Si (transliterated form)
If using Lamphun or one of the four fonts above, the font is at Version . (The version number is stored in the character U+EAE7 in addition to the font file's
name
and
head
tables.)
System default Guest font
A Tai Tham KH (ccmp defaults) A Tai Tham KH (ccmp enabled)
Early Hariphunchai Lamphun
Hariphunchai 4 (2019)

Specimen text: ᩋᩁᩉᨶ᩠ᨲᩮᩣ arahanto 'arahats'

Play Area

You may type your own text in the area below. It will, if possible, be displayed in the font selected above.

Table Completion

It is possible that the columns recording code points and showing how the text should look may not have been generated. If that happens, try clicking this control button:

Debugging messages

Tests

Vowel Combinations
Other Explicit Coding Sequences
Other Examples from L2/07-007R
Mai Kang Lai
Ligature NAA
English Loanwords
Tai Lü Blends
More Tai Lü
And Not
Potential Surprises
Test and Tell
Rendering Challenges from MFL
Introduction Control Panel The Tests Notes My Fonts

Vowel Combinations

These vowel combinations are taken from Revised proposal for encoding the Lanna script in the BMP of the UCS, ISO/IEC JTC1/SC2/WG2/N3207R, L2/07-007R (Everson, Hosken & Constable). Changes have been rung on the initial consonants to check for silly omissions.

A hyphen in the pronunciation indicates a syllable-final consonant that would be specified by a subscript consonant or following orthographic syllable.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨠᩫN/A
/ko-/
1A20 1A6Bko Section 5 No. 1. This sequence does not form a whole word. An example may be seen in a word for 'danger'.
ᨣᩴthen, and
/kɔː/
1A23 1A74gMSection 5 No. 2
ᨧᩢ(irrealis marker)
/tɕaʔ/
1A27 1A62caSection 5 No. 4
ᨲ᩠ᩅᩫᩡ to prevaricate
/tuaʔ/
1A32 1A60 1A45 1A6B 1A61t/woHSection 5 No. 5
ᨷ᩠ᩅᩫlotus
/bua/
1A37 1A60 1A45 1A6BB/woSection 5 No. 6
ᨠ᩠ᩅN/A
/kua-/
1A20 1A60 1A45k/w Section 5 No. 7. This sequence does not form a whole word. An example may be seen in one of the words for 'big'.
ᨡᩬᩴ to request
/kʰɔː/
1A21 1A6C 1A74khVOMSection 5 No. 8
ᨠᩬN/A
/kɔː-/
1A20 1A6CkVO Section 5 No. 9. This sequence does not form a whole word. An example may be seen in the fuller spelling of the word for 'belongings'.
ᨦᩡ to split up
/ŋaʔ/
1A26 1A61GHSection 5 No. 10
ᨠᩣcrow
/kaː/
1A20 1A63kASection 5 No. 11
ᨴᩤto paint
/taː/
1A34 1A64d^ASection 5 No. 12
ᩌᩣᩴ to sprinkle
/ham/
1A4C 1A63 1A74rhAMSection 5 No. 13
ᨣᩤᩴword
/kam/
1A23 1A64 1A74g^AMSection 5 No. 14
ᨳᩥto pretend
/tʰiʔ/
1A33 1A65th_iSection 5 No. 15
ᨺᩦboil (n.)
/fiː/
1A3A 1A66FISection 5 No. 16
ᨩᩧmoist
/tɕɯʔ/
1A29 1A67jueSection 5 No. 17
ᨾᩨhand
/mɯː/
1A3E 1A68mUESection 5 No. 18
ᨵᩩmonk
/tʰuʔ/
1A35 1A69dhuSection 5 No. 19
ᨦᩪsnake
/ŋuː/
1A26 1A6AGUSection 5 No. 20
ᨲᩮᩡto kick
/keʔ/
1A32 1A6E 1A61t_eHSection 5 No. 21
ᨽᩮdanger
/pʰeː/
1A3D 1A6Ebh_eSection 5 No. 22
ᨤᩯᩡ to limp along
/kʰɛʔ/
1A24 1A6F 1A61gx_EHSection 5 No. 23
ᨧᩯcorner
/tɕɛː/
1A27 1A6Fc_ESection 5 No. 24
ᨸᩮᩬᩥᩡ mud
/pɤʔ/
1A38 1A6E 1A6C 1A65 1A61p_eVO_iHSection 5 No. 25
ᨸᩮᩥᩬᩡ 1A38 1A6E 1A65 1A6C 1A61p_e_iVOH Different from the proposals.
ᨶᩮᩬᩥ (final particle for commands and entreaties)
/nɤː/
1A36 1A6E 1A6C 1A65n_eVO_i Section 5 No. 26.
ᨶᩮᩥᩬ 1A36 1A6E 1A65 1A6Cn_e_iVO Different from the proposals.
ᨠᩮᩬᩨᩡ N/A
/kɯaʔ/
1A20 1A6E 1A6C 1A68 1A61k_eVOUEHSection 5 No. 27
ᨠᩮᩨᩬᩡ 1A20 1A6E 1A68 1A6C 1A61k_eUEVOH Different from the proposals.
ᨠᩮᩬᩨ
/kɯa/
1A20 1A6E 1A6C 1A68k_eVOUESection 5 No. 28
ᨠᩮᩨᩬ 1A20 1A6E 1A68 1A6Ck_eUEVO Different to the proposals.
ᩁᩮᩢᩣwe
/hau/
1A41 1A6E 1A62 1A63r_eaASection 5 No. 29
ᨾᩳdrunk
/mau/
1A3E 1A73m^O Section 5 No. 30. This example is not taken from the MFL, which does not use this vowel symbol.
ᨠᩮᩣN/A
/ko:/
1A20 1A6E 1A63k_eA Section 5 No. 31. This is very rare in monosyllables, but is quite common at the end of monks' names, e.g. Adittadhammo.
ᨹ᩠ᨿᩮᩡ a type of sound
/pʰiaʔ/
1A39 1A60 1A3F 1A6E 1A61ph/_y_eHSection 5 No. 32
ᨻ᩠ᨿᩮflower
/pia/
1A3B 1A60 1A3F 1A6Eb/_y_eSection 5 No. 33
ᨠ᩠ᨿN/A
/kia-/
1A20 1A60 1A3Fk/_y Section 5 No. 34. This sequence does not form a whole word. An example may be seen in a spelling of the word for 'city'.
ᨾᩮᩬᩥᩋᩡ mucus
/mɯaʔ/
1A3E 1A6E 1A6C 1A65 1A4B 1A61m_eVO_i_qH Section 5 No. 35. (2 syllables)
ᨾᩮᩥᩬᩋᩡ 1A3E 1A6E 1A65 1A6C 1A4B 1A61m_e_iVO_qH Different from the proposals.
ᨠᩖᩮᩬᩥᩋ salt
/kɯa/
1A20 1A56 1A6E 1A6C 1A65 1A4BkVl_eVO_i_q Section 5 No. 36. (2 syllables)
ᨠᩖᩮᩥᩬᩋ 1A20 1A56 1A6E 1A65 1A6C 1A4BkVl_e_iVO_q Different from the proposals.
ᩈᩰᩡ to practice
/soʔ/
1A48 1A70 1A61sOHSection 5 No. 37
ᨾᩰbig
/moː/
1A3E 1A70mOSection 5 No. 38
ᨪᩰᩬᩡ to gouge out
/sɔʔ/
1A2A 1A70 1A6C 1A61jxOVOHSection 5 No. 39
ᨩᩢ᩠ᨿvictory
/tɕai/
1A29 1A62 1A60 1A3Fja/_ySection 5 No. 40
ᨶᩲin
/nai/
1A36 1A72naueSection 5 No. 41
ᨢᩱ to expose
/kʰai/
1A22 1A71kxaiSection 5 No. 42
ᨴᩱ᩠ᨿThailand
/tai/
1A34 1A71 1A60 1A3Fdai/_ySection 5 No. 43
ᨠᩮᩬᩨᩡ
Khün /kɤʔ/
1A20 1A6E 1A6C 1A68 1A61k_eVOUEHSection 5.3 No. 22
ᨠᩮᩨᩬᩡ 1A20 1A6E 1A68 1A6C 1A61k_eUEVOH Different from proposals.
ᨠᩮᩬᩨ
Khün /kɤː/
1A20 1A6E 1A6C 1A68k_eVOUESection 5.3 No. 23
ᨠᩮᩨᩬ 1A20 1A6E 1A68 1A6Ck_eUEVODifferent from proposals.
ᨠᩰᩢ
Khün /ko-/
1A20 1A70 1A62kOaSection 5.3 No. 26
ᩈᩘ First syllable of compounds of saṅgha.
/saŋ/
1A48 1A58s>G Section 5.3 No. 29. Apparently not a possible final syllable, but can be left stranded as a result of line-breaking.
ᨴᩢ᩠ᨦwhole
/taŋ/
1A34 1A62 1A60 1A26da/GSection 5.3 No. 30
ᩌᩥᩴedge
/him/
1A4C 1A65 1A74rh_iM Section 5.3 No. 31 (Example from Apiradee p53, but different language, different pronunciation, i.e. not /-iŋ/.)
ᨠᩥ᩠ᨦ
/kiŋ/
1A20 1A65 1A60 1A26k_i/GSection 5.3 No. 32
ᨠᩢ᩠ᨾ
/kam/
1A20 1A62 1A60 1A3Eka/mSection 5.3 No. 34
ᨠᩢᨾ
/kam/
1A20 1A62 1A3EkamSection 5.3 No. 35
ᨯᩭmountain
/dɔːi/
1A2F 1A6DDoiSection 5.3 No. 36
Introduction Control Panel The Tests Notes My Fonts

Other Explicit Coding Sequences

Other explicit coding sequences are given in Revised proposal for encoding the Lanna script in the BMP of the UCS, ISO/IEC JTC1/SC2/WG2/N3207R, L2/07-007R (Everson, Hosken & Constable), and these are recorded here. Amended and exploratory material is highlighted in yellow; it is not vouched for by the proposal. The remarks are my own.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
᪓᩠ᨴthrice
/saːm tiː/
1A93 1A60 1A343T/dSection 2
ᨲ᩵ᩣ᩠ᨦ᩻ different in my view
/taːŋ taːŋ/
1A32 1A75 1A63 1A60 1A26 1A7Bt1A/G"Section 7
ᨲᩣ᩠᩵ᨦ᩻ 1A32 1A63 1A75 1A60 1A26 1A7BtA1/G"Different from the proposals.
ᨲᩣ᩠᩵ᨦ᩻ 1A32 1A63 1A60 1A75 1A26 1A7BtA/1G"Normalisation of the above.
ᨳ᩠ᨶ᩻ᩫᩁ path
/tʰănon/
1A33 1A60 1A36 1A7B 1A6B 1A41th/n"o_r Sections 7 and 14.6 (2 syllables - the second is a single character).
ᨳ᩠ᨶᩫ᩻ᩁ 1A33 1A60 1A36 1A6B 1A7B 1A41th/no"_r Different from proposals, which specifically specified the various semantically sensitive positions of mai sam. For this word, the visual position of the marks above is free.
ᨡᩢ᩶᩻ᩬᨦ belongings
/kʰau kʰɔːŋ/
1A21 1A62 1A76 1A7B 1A6C 1A26kha2"VOG Section 7 (2 syllables - the second is a single character)
ᨡᩢᩬ᩶᩻ᨦ 1A21 1A62 1A6C 1A76 1A7B 1A26khaVO2"G Different from the proposals.
ᨡᩮᩢ᩶ᩣᨡᩬᨦ belongings
/kʰau kʰɔːŋ/
1A21 1A6E 1A62 1A76 1A63 1A21 1A6C 1A26kh_ea2AkhVOG Section 7 (3 syllables - the third is a single character)
ᨡᩮᩢᩣ᩶ᨡᩬᨦ 1A21 1A6E 1A62 1A63 1A76 1A21 1A6C 1A26kh_eaA2khVOG Different from the proposals.
᪭ᩣ elephant
/tɕaːŋ/
1AAD 1A63᪭ASection 11
ᩉ᩠ᨶᩦto flee
/niː/
1A49 1A60 1A36 1A66h/nISection 14.1
ᨤ᩠ᩅᩯ᩶ᩁ to blockade
/kʰwɛːn/
1A24 1A60 1A45 1A6F 1A76 1A41gx/w_E2_r Section 14.2 (2 syllables - the second is a single character)
ᩉ᩠ᩅᩫ head
/hua/
1A49 1A60 1A45 1A6Bh/woSection 14.3
ᨯᩢ᩵ᨦ᩠ᨶᩦ᩶ like this
/daŋ niː/
1A2F 1A62 1A75 1A26 1A60 1A36 1A66 1A76Da1G/nI2 Section 14.4 (2 syllables)
ᩉᩥ᩠ᨶstone
/hin/
1A49 1A65 1A60 1A36h_i/nSection 14.5
ᨷ᩠᩵ᨾᩦ to not have
/bɔː miː/
1A37 1A75 1A60 1A3E 1A66B1/mI Section 14.6. The proposal lists MAI KANG as a code point, but it is visually dropped in this compound. I presume the renderer is not intended to suppress the appearance of the character. The upper row drops the MAI KANG from the encoding, so is not the encoding intended, while the lower row uses the stated encoding. Da Lekh fails to arrange the marks above properly; arrangement is a proper challenge for a Tai Tham font. The phonetic syllable boundary is part of the context!
ᨷᩴ᩠᩵ᨾᩦ to not have
/bɔː miː/
1A37 1A74 1A75 1A60 1A3E 1A66BM1/mI
ᨲᩣ᩠ᨾ to follow
/taːm/
1A32 1A63 1A60 1A3EtA/mSection 14.7
ᨻ᩠ᨿᩣ᩠ᨵᩥ sickness
/păɲaːt/
1A3B 1A60 1A3F 1A63 1A60 1A35 1A65b/_yA/dh_iSection 14.8
ᨸ᩠ᩃ᩠ᨿ᩵ᩁ to change
/pian/
1A38 1A60 1A43 1A60 1A3F 1A75 1A41p/_l/_y1_r Section 14.9 (2 syllables - the second is a single character)
ᨾᩯ᩠᩶ᨶ᩠ᩅ᩵ᩣ even though
/mɛːn waː/
1A3E 1A6F 1A76 1A60 1A36 1A60 1A45 1A75 1A63m_E2/n/w1A Section 14.9. A sophisticated font might transpose the tone marks. The phonetic syllable boundary should be part of the context.
ᨾᩯ᩠᩶ᨶ᩠ᩅ᩵ᩣ even though
/mɛːn waː/
1A3E 1A6F 1A60 1A76 1A36 1A60 1A45 1A75 1A63m_E/2n/w1A Same as above, but normalised, so not the code point sequence in the proposal. Proposal explicitly stated SAKOT was to have ccc=0, not 9, but ccc=9 was quietly inserted in draft properties and not noticed until too late.
ᩈ᩠ᩅᩯ᩵ to butt in
/swɛː/
1A48 1A60 1A45 1A6F 1A75s/w_E1Section 14.10
ᩈᩯ᩠᩵ᩅ to embroider
/sɛːw/
1A48 1A6F 1A75 1A60 1A45s_E1/w Section 14.10 (but the proposal has vowel and tone the wrong way round)
ᩈᩯ᩠᩵ᩅ to embroider
/sɛːw/
1A48 1A6F 1A60 1A75 1A45s_E/1w As above, but normalised, so very much not the codepoint sequence in the proposal.
ᩈ᩵ᩯ᩠ᩅ to embroider
/sɛːw/
1A48 1A75 1A6F 1A60 1A45s1_E/w As above, but uncorrected. Arguably, the rendering is unconstrained.
ᨿᩪ broom, whisk
/ɲuː/
1A3F 1A6AyUSection 15 No. 1
ᨾᩦ to have
/miː/
1A3E 1A66mISection 15 No. 2
ᩉ᩠ᨾᩪ pig
/muː/
1A49 1A60 1A3E 1A6Ah/mUSection 15 No. 3
ᩉ᩠ᨾᩦ bear (n.)
/miː/
1A49 1A60 1A3E 1A66h/mISection 15 No. 4
ᨹ᩠ᩅᩫhusband
/pʰua/
1A39 1A60 1A45 1A6Bph/woSection 15 No. 5
ᩉ᩠ᩃᩬᩴ᩵ to cast (in metal)
/lɔː/
1A49 1A60 1A43 1A6C 1A74 1A75h/_lVOM1Section 15 No. 6
ᨾᩣto come
/maː/
1A3E 1A63mASection 15 No. 7
ᩉᩱ᩵to hit
/hai/
1A49 1A71 1A75hai1Section 15 No. 8
ᨾ᩠ᨿ 1A3E 1A60 1A3Fm/_ySection 15 No. 9
ᩅ᩠ᨿᨦ city
/wiaŋ/
1A45 1A60 1A3F 1A26w/_yG Section 15 No. 10 (2 syllables - the second is a single character)
ᩉᩣ᩠ᨾ to carry by the handles
/haːm/
1A49 1A63 1A60 1A3EhA/mSection 15 No. 11
ᨯᩣᩴblack
/dam/
1A2F 1A63 1A74DAMSection 15 No. 12
ᨡᩮ᩠ᩅ 1A21 1A6E 1A60 1A45kh_e/wSection 15 No. 13
ᩉ᩠ᨾᩣdog
/maː/
1A49 1A60 1A3E 1A63h/mASection 15 No. 14
ᨠᩕᩣ᩠ᨸ to prostrate oneself
/kʰaːp/
1A20 1A55 1A63 1A60 1A38kVrA/p Section 15 No. 15. The later addition of SIGN BA to the repertoire makes the correct final consonant here unclear.
ᨻᩕ᩵ᩣᩴ indefatigable
/pʰam/
1A3B 1A55 1A75 1A63 1A74bVr1AMSection 15 No. 16
ᨻᩕᩣᩴ᩵ 1A3B 1A55 1A63 1A74 1A75bVrAM1 Different from the proposals. The USE diktat at December 2021 does not determine the relative order of the tone mark and mai kang. In some styles the tone mark is associated with and follows mai kang, either above or to the right of it, but in other styles the tone mark sits on the consonant and the mai kang on the spacing vowel. Both encodings are shown here.
ᨻᩕᩣ᩵ᩴ 1A3B 1A55 1A63 1A75 1A74bVrA1M
ᨠᩕᩬᨦ garland; Mekong
/kʰɔːŋ/
1A20 1A55 1A6C 1A26kVrVOG Section 15 No. 17 (2 syllables - the second is a single character)
ᩈᩕᩫᨾ᩠ᨱ᩺ ascetic
/sălom/
1A48 1A55 1A6B 1A3E 1A60 1A31 1A7AsVrom/N^r

Section 15 No. 18. If the word is interpreted as having two phonetic syllables, then the medial consonant comes between an implicit vowel and an explicit vowel.

(2 syllables)

ᩈᩕ᩠ᩅᩫᨾ to embrace
/săluam/
1A48 1A55 1A60 1A45 1A6B 1A3EsVr/wom Section 15 No. 19 (2 syllables - the second is a single character). Ignore final ; it makes the spelling ungrammatical. However, a few such spellings do occur in the MFL.
ᩈᩕ᩠ᩅᨾ to embrace
/săluam/
1A48 1A55 1A60 1A45 1A3EsVr/wm Spelling of above in the MFL, so this form's encoding is not given in the proposal.
ᨯᩮᩬᩨᩁmonth
/dɯan/
1A2F 1A6E 1A6C 1A68 1A41D_eVOUE_r Section 15 No. 20 (2 syllables - the second is a single character)
ᨯᩮᩨᩬᩁ 1A2F 1A6E 1A68 1A6C 1A41D_eUEVO_r Different from the proposals.
ᩁᩮᩬᩨᩋboat
/hɯa/
1A41 1A6E 1A6C 1A68 1A4Br_eVOUE_q Section 15 No. 21 (2 syllables - the second is a single character)
ᩁᩮᩨᩬᩋ 1A41 1A6E 1A68 1A6C 1A4Br_eUEVO_q Different from the proposals.
ᩉ᩠ᩃᩮᩬᩨᩋ to exceed
/lɯa/
1A49 1A60 1A43 1A6E 1A6C 1A68 1A4Bh/_l_eVOUE_q Section 15 No. 22 (2 syllables - the second is a single character)
ᩉ᩠ᩃᩮᩨᩬᩋ 1A49 1A60 1A43 1A6E 1A68 1A6C 1A4Bh/_l_eUEVO_q Different from the proposals.
ᩉ᩠ᨾ᩵ᩣᩴ to eat
/mam/
1A49 1A60 1A3E 1A75 1A63 1A74h/m1AMSection 15 No. 23
ᩉ᩠ᨾᩣᩴ᩵ 1A49 1A60 1A3E 1A63 1A74 1A75h/mAM1 The USE diktat does not show specify whether mai kang or tone mark comes first. Both encodings are shown.
ᩉ᩠ᨾᩣ᩵ᩴ 1A49 1A60 1A3E 1A63 1A75 1A74h/mA1M
ᩈ᩠ᨾᩬᩥ᩻ very level(?)
/sămɤː sămɤː/
1A48 1A60 1A3E 1A6C 1A65 1A7B s/mVO_i" Section 15 No. 24. Encoding as given, omitting SIGN E, which is depicted in the proposal. Moreover, the word appears to be a misreading of the next but one.
ᩈ᩠ᨾᩨᩬ᩻ 1A48 1A60 1A3E 1A68 1A6C 1A7B s/mUEVO" Encoding is different from the proposals.
ᩈ᩠ᨾᩮᩬᩥ᩻ 1A48 1A60 1A3E 1A6E 1A6C 1A65 1A7B s/m_eVO_i" Section 15 No. 24. SIGN E restored to encoding.
ᩈ᩠ᨾᩮᩥᩬ᩻ 1A48 1A60 1A3E 1A6E 1A65 1A6C 1A7Bs/m_e_iVO" SIGN E restored to a different encoding from the proposals.
ᩈ᩠ᨾ᩻ᩮᩬᩥ level (adj.)
/sămɤː/
1A48 1A60 1A3E 1A7B 1A6E 1A6C 1A65s/m"_eVO_i Probable reading of above. Consequently, the encoding is not vouched for by the proposal. Phonetically, this is one or two syllables, depend on how one counts.
ᩈ᩠ᨾᩮᩥᩬ᩻ 1A48 1A60 1A3E 1A6E 1A65 1A6C 1A7Bs/m_e_iVO" Probable reading of the above. The USE-compliant encodings of the two readings are the same, but each has compatible renderings inconsistent with the other interpretation.
ᩉ᩠ᨾᩮᩬᩨᨦ mine (n.)
/mɯaŋ/
1A49 1A60 1A3E 1A6E 1A6C 1A68 1A26h/m_eVOUEG Section 15 No. 25 (2 syllables - the second is a single character)
ᩉ᩠ᨾᩮᩨᩬᨦ 1A49 1A60 1A3E 1A6E 1A68 1A6C 1A26h/m_eUEVOG Different from the proposals.
ᩉ᩠ᨿᩮᩬᩨᨦ to despise
/ɲɯaŋ/
1A49 1A60 1A3F 1A6E 1A6C 1A68 1A26h/_y_eVOUEG Section 15 No. 26 (2 syllables - the second is a single character)
ᩉ᩠ᨿᩮᩨᩬᨦ 1A49 1A60 1A3F 1A6E 1A68 1A6C 1A26h/_y_eUEVOG Different from the proposals.
ᩉ᩠ᨾᩫ᩵ᩁ winter melon (Benincasa hispida)
/mon/
1A49 1A60 1A3E 1A6B 1A75 1A41h/mo1_r Section 15 No. 27 (2 syllables - the second is a single character)
ᩉ᩠ᩃᩣ᩠ᨿmany
/laːi/
1A49 1A60 1A43 1A63 1A60 1A3Fh/_lA/_ySection 15 No. 28
ᩉ᩠ᩃᩮᩬᩨᨦ yellow
/lɯaŋ/
1A49 1A60 1A43 1A6E 1A6C 1A68 1A26h/_l_eVOUEG Section 15 No. 29 (2 syllables - the second is a single character)
ᩉ᩠ᩃᩮᩨᩬᨦ 1A49 1A60 1A43 1A6E 1A68 1A6C 1A26h/_l_eUEVOG Different from the proposals.
Introduction Control Panel The Tests Notes My Fonts

Other Examples from L2/07-007R

The actual coding sequences to be used here are open to challenge.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨠᩬᩢᩃ᩠ᨼ᩺ golf
/kɔp/
1A20 1A6C 1A62 1A43 1A60 1A3C 1A7AkVOa_l/f^r Section 2. The position of RA HAAM is debatable - cf. Thai กอล์ฟ. The first example places it on the second consonant, the second on the first. The third then normalises the spelling of the second. Note that this word consists of two orthographic syllables.
ᨠᩬᩢᩃ᩠᩺ᨼ 1A20 1A6C 1A62 1A43 1A7A 1A60 1A3CkVOa_l^r/f
ᨠᩬᩢᩃ᩠᩺ᨼ 1A20 1A6C 1A62 1A43 1A60 1A7A 1A3CkVOa_l/^rf
ᨠᩢᩬᩃ᩠ᨼ᩺ 1A20 1A62 1A6C 1A43 1A60 1A3C 1A7AkaVO_l/f^r In the December 2021 USE order.
ᨠᩢᩬᩃ᩠᩺ᨼ 1A20 1A62 1A6C 1A43 1A7A 1A60 1A3CkaVO_l^r/f
ᨠᩢᩬᩃ᩠᩺ᨼ 1A20 1A62 1A6C 1A43 1A60 1A7A 1A3CkaVO_l/^rf
ᨠᩕᩣ᩠ᨼ graph
/kaːp/ (?)
1A20 1A55 1A63 1A60 1A3CkVrA/f Section 2.
ᨴᩬᨼ᩠ᨼᩦ᩵toffee 1A34 1A6C 1A3C 1A60 1A3C 1A66 1A75dVOf/fI1 Section 2 (2 syllables)
ᨠᨽᩚ pregnant
/kap pʰa?/
1A20 1A3D 1A5Akbh^b Section 4 (2 syllables - the first is a single character)
ᩈᨱᩛᩣ᩠ᨶ shape
/san tʰaːn/
1A48 1A31 1A5B 1A63 1A60 1A36sNVbA/n Section 4 (2 syllables - the first is a single character)
ᩁᨭᩛᨷᩣ᩠ᩃ government
/rat tʰa baːn/
1A41 1A2D 1A5B 1A37 1A63 1A60 1A43r_TVbBA/_l Section 4 (3 syllables)
ᩁᩢᨭᩛᨷᩣ᩠ᩃ government
/rat tʰa baːn/
1A41 1A62 1A2D 1A5B 1A37 1A63 1A60 1A43ra_TVbBA/_l Section 4 (3 syllables)
ᩈᨻᩛ omniscience
/sap paʔ/
1A48 1A3B 1A5BsbVb Section 4 (2 syllables - the first is a single character)
ᩋᨾᩛ mango
/ʔam paʔ/
1A4B 1A3E 1A5BqmVb Section 4 (2 syllables - the first is a single character)
ᩁᩣᨩᨽᩢ᩠ᨮ Rajabhat
/la:t tɕa pʰat/
1A41 1A63 1A29 1A3D 1A62 1A60 1A2E rAjbha/ThSection 4 (3 syllables)
ᨷᩢᨱ᩠ᨻᨷᩩᩁᩩᩈ disciple
"banop burus"
1A37 1A62 1A31 1A60 1A3B 1A37 1A69 1A41 1A69 1A48BaN/bBu_ru_s Section 4 (5 syllables)
Introduction Control Panel The Tests Notes My Fonts

Mai Kang Lai

The mai kang lai character can be challenge to a font. The character has a wide range of behaviours. It can behave as a spacing final character (as in modern Tai Khün fonts) to a repha-like character, the old-fashioned behaviour seen in Tai Khün, Thailand and Laos. The MFL dictionary shows an intermediate behaviour, where marks above the following base consonant cause it to be positioned within the previous syllable. This is the style employed by the Da Lekh font.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨴᩘ᩠ᩃᩣ᩠ᨿ all
/taŋ laːi/
1A34 1A58 1A60 1A43 1A63 1A60 1A3Fd>G/_lA/_y The ascending tail of SAKOT LA prevents the MAI KANG LAI moving on to a subsequent syllable/word. This prevents fonts exploiting the rphf feature of the Universal Shaping Engine.
ᨴ᩠ᩃᩘᩣ᩠ᨿ 1A34 1A60 1A43 1A58 1A63 1A60 1A3Fd/_l>GA/_y With total disregard for logical order.
ᩈᩘᨥᩮᩣ Nominative of Pali saṅgha
<saṅgho>
1A48 1A58 1A25 1A6E 1A63s>Ggh_eA(2 syllables)
ᩁᩘᩈᩦray
/raŋ siː/
1A41 1A58 1A48 1A66r>G_sI(2 syllables)
Introduction Control Panel The Tests Notes My Fonts

Ligature NAA

This is mostly a test for readers!

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨶᩣᩴto lead
/nam/
1A36 1A63 1A74nAM
ᨾᨶᩮᩣ heart, mind
/maʔ no:/
1A3E 1A36 1A6E 1A63mn_eA (2 syllables)
ᨶᩮᩢᩣ to sew a long stitch
/nau/
1A36 1A6E 1A62 1A63n_eaA Some fonts may fail here because they handle the ligature in pstf; this worked with HarfBuzz until pstf was moved to before Indic rearrangement.
ᨶᩣ᩠ᨿ leader
/na:i/
1A36 1A63 1A60 1A3FnA/_y
ᨶ᩵ᩣ᩠ᨶ Nan
/na:n/
1A36 1A75 1A63 1A60 1A36n1A/n
ᨶᩣ᩠᩵ᨶ 1A36 1A63 1A75 1A60 1A36nA1/n Using formalism where neither current nor historical speech defines phonetic order. The first of these two keeps user-perceivable characters contiguous, and the second is its normalisation (NFC/NFD).
ᨶᩣ᩠᩵ᨶ 1A36 1A63 1A60 1A75 1A36nA/1n
ᩍᨶ᩠ᨴᩣ Indra
/ʔin ta:/
1A4D 1A36 1A60 1A34 1A63qqin/dA The more usual form lacks U+1A63. (2 syllables - first has one character.)
ᩋᩫᨶ᩠ᨲᩕᩣ᩠ᨿ danger
/ʔon tʰaʔ la:i/
1A4B 1A6B 1A36 1A60 1A32 1A55 1A63 1A60 1A3Fqon/tVrA/_y(2 syllables)
ᨶ᩶ᩣᩴ water
/nam/
1A36 1A76 1A63 1A74n2AM This can be surprisingly hard to achieve in a font. Logic designed to stop Arabic vowel marks wrongly interacting has to be circumvented so that the two marks will interact!
ᨶᩣ᩶ᩴ 1A36 1A63 1A76 1A74nA2M The USE rules do not dictate whether the tone mark comes before or after the mai kang. Both the canonically inequivalent forms are given here.
ᨶᩣᩴ᩶ 1A36 1A63 1A74 1A76nAM2
ᨶ᩠ᩅᩣ᩠ᨷ to falsely accuse
/nwaːp/
1A36 1A60 1A45 1A63 1A60 1A37n/wA/B MFL p352
ᨴᩤᩴᨶ᩠ᩅ‌ᩣ᩠ᨿ to foretell
/tam nwaːi/
1A34 1A64 1A74 1A36 1A60 1A45 200C 1A63 1A60 1A3Fd^AMn/w‌A/_y NTDPLM p285. Sometimes the writer wants to avoid the ligature! (2 syllables)
ᨲ᩵ᩣᩴᨶ᩠ᩅᩣ᩠ᨿ to foretell
/tam nwaːi/
1A32 1A75 1A63 1A74 1A36 1A60 1A45 1A63 1A60 1A3Ft1AMn/wA/_y MFL p320, but only in transliteration. Shape of second syllable (ligature plus subscript consonant) is attested elsewhere. (2 syllables)
ᨲᩣ᩵ᩴᨶ᩠ᩅᩣ᩠ᨿ 1A32 1A63 1A75 1A74 1A36 1A60 1A45 1A63 1A60 1A3FtA1Mn/wA/_y The USE does not dictate whether mai kang or the tone mark comes first. Both options are given here.
ᨲᩣᩴ᩵ᨶ᩠ᩅᩣ᩠ᨿ 1A32 1A63 1A74 1A75 1A36 1A60 1A45 1A63 1A60 1A3FtAM1n/wA/_y
ᨶ‌ᩣ rice field
/naː/
1A36 200C 1A63n‌A An isolated test of the ZWNJ feature above. This form is to be expected in texts teaching the writing system.
ᩉ᩠ᨶ᩶ᩣ face
/naː/
1A49 1A60 1A36 1A76 1A63h/n2A Note that the SAKOT prevents ligature formation.
ᩉ᩠ᨶᩣ᩶ 1A49 1A60 1A36 1A63 1A76h/nA2 Tone mark above consonant still follows the vowel.
Introduction Control Panel The Tests Notes My Fonts

English Loanwords

These examples are taken from the 'big blue book' pp151-6. Some of these renderings are unusual compared with the native tradition, and are included for that reason. The position of RA HAAM is particularly noteworthy.

The pronunciations given are guesswork where Siamese practice and Lanna script orthography conflict.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨠᩯᩢ᩠ᩈgas
/kɛs/
1A20 1A6F 1A62 1A60 1A48k_Ea/_s
ᨴᩕᩯ᩠ᨠᨴᩮᩬᩥᩁ᩺ tractor
/tʰɛːk tʰɤː/
1A34 1A55 1A6F 1A60 1A20 1A34 1A6E 1A6C 1A65 1A41 1A7AdVr_E/kd_eVO_i_r^r Slightly complicated set of consonants in first syllable. (2 syllables)
ᨴᩕᩯ᩠ᨠᨴᩮᩥᩬᩁ᩺ 1A34 1A55 1A6F 1A60 1A20 1A34 1A6E 1A65 1A6C 1A41 1A7AdVr_E/kd_e_iVO_r^r Vowel not as in the proposals.
ᨶᩰᩫ᩠᩶ᨲnote
/noːt/
1A36 1A70 1A6B 1A76 1A60 1A32nOo2/t Vowel combination not listed above
ᨷᩕᩰᨴᩦ᩠ᨶ protein
/pʰoː tiːn/
1A37 1A55 1A70 1A34 1A66 1A60 1A36BVrOdI/n Tests reordering - the vowel symbol should appear first. (2 syllables)
ᨼᩥᩅ᩠ᩈ᩺fuse
/fiu/
1A3C 1A65 1A45 1A60 1A48 1A7Af_iw/_s^r
ᩈᨲᩯᨾ᩠ᨷ᩺ postage stamp
/sa tɛːm/
1A48 1A32 1A6F 1A3E 1A60 1A37 1A7Ast_Em/B^r (3 syllables)
ᩈᩮᩥᩁ᩠᩺ᨷ to serve
/sɤːp/
1A48 1A6E 1A65 1A41 1A7A 1A60 1A37s_e_i_r^r/B Compare the placement of RA HAAM with the previous word. The same contrast may be seen on p155 of the 'big blue book'. (2 syllables)
Introduction Control Panel The Tests Notes My Fonts

Tai Lü Blends

These examples are all taken from Graphic Blends at SEAsite. The pronunciations given are Tai Lü.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨴᩢ᩵ᩗᩣall
/taŋ laːi/
1A34 1A62 1A75 1A57 1A63da1Vl+A

This word, in some of its various forms, seems to be the only word containing U+1A57 TAI THAM CONSONANT SIGN LA TANG LAI.

I withdraw my previous, surprised, reading of the word shown as containing NGA as the base consonant.

ᨡᨶ᩠ᨵᩣ spell (magic)
/kʰan tʰaː/
1A21 1A36 1A60 1A35 1A63khn/dhA (2 syllables, first a single character)
ᨣ᩠᩶ᨯᩦ  okay
/kɔː diː/
1A23 1A76 1A60 1A2F 1A66 00A0g2/DI  A non-breaking space has been appended to avoid truncation. A sophisticated font would slide the vowel under the tone mark.
ᨷ᩠᩶ᨾᩣ to not come
/bau maː/
1A37 1A76 1A60 1A3E 1A63B2/mA
ᨷ᩠᩶ᨾᩣ 1A37 1A60 1A76 1A3E 1A63B/2mA Same again, but normalised.
ᨷ᩠᩶ᨯᩣ᩠ᨿ to not have
/bau da:i/
1A37 1A76 1A60 1A2F 1A63 1A60 1A3FB2/DA/_y
ᨧᩢ᩠ᩅᩤ How big an area?
/tsak va:/
1A27 1A62 1A60 1A45 1A64ca/w^A
ᩈᩮ᩠ᩓ᩠ᩅ deceased
/se: lɛu/
1A48 1A6E 1A60 1A53 1A60 1A45s_e/lE/w
ᨴᩯ᩠ᨶᩳ Really, is that true?
/tɛː nɔː/
1A34 1A6F 1A60 1A36 1A73d_E/n^O
ᩓ᩠ᨾᩣ to look this way
/lɛ maː/
1A53 1A60 1A3E 1A63lE/mA
ᨠᩮ᩠ᩈᩣ hair
/keː saː/
1A20 1A6E 1A60 1A48 1A63k_e/_sA
ᨻᩱ᩠ᨾᩣ to come and go
/pai maː/
1A3B 1A71 1A60 1A3E 1A63bai/mA
ᩈᩮ᩠ᩅ᩶ᩤ if
/seː vaː/
1A48 1A6E 1A60 1A45 1A76 1A64s_e/w2^A
ᩈᩮ᩠ᩅᩤ᩶ 1A48 1A6E 1A60 1A45 1A64 1A76s_e/w^A2 Tone mark position not as in the proposals.
ᩅᩮ᩠ᩃᩣ time
/veː laː/
1A45 1A6E 1A60 1A43 1A63w_e/_lA Also in Apiradee p49
ᨵᩤ᩠ᨲᩩ physical body
/tʰaː tuʔ/
1A35 1A64 1A60 1A32 1A69dh^A/tu The vowel on the final consonant is inescapable - there is no way of rewriting the orthographic syllable to escape the combination.
ᨩ᩠ᩓ in conclusion
/tsălɛː/
1A29 1A60 1A53j/lE
ᨻᩭ᩠ᩅ᩻ᩣ because
/pɔi vaː/
1A3B 1A6D 1A60 1A45 1A7B 1A63boi/w"A The MAI SAM tags the WA as starting a chained syllable. The spelling presumes that a font can decide that the subscript WA goes to the left of the MAI KOY.
ᨻᩭ᩠᩻ᩅᩣ 1A3B 1A6D 1A7B 1A60 1A45 1A63boi"/wA A purely visual placement of MAI SAM.
ᨻᩭ᩠᩻ᩅᩣ 1A3B 1A6D 1A60 1A7B 1A45 1A63boi/"wA Normalised form of the above.
ᩈᩫ᩠ᨦᩣ᩠ᨶ world
/suŋ saːn/
1A48 1A6B 1A60 1A26 1A63 1A60 1A36so/GA/n
Introduction Control Panel The Tests Notes My Fonts

More Tai Lü

These words are taken from the MA thesis 'Development of Tai Lue Scripts and Orthography' by Apiradee Techasiriwan (อภิรดี เตชะศิริวรรณ). The pronunciations given are Tai Lü. Comparative material from elsewhere is highlighted in yellow.

TextMeaningEncodingHacked via ASCII Remarks
ᨻᩬᩳ᩵ father
/pɔː/
1A3B 1A6C 1A73 1A75bVO^O1 p3. Vowel combination not listed above. Spelling is archaic.
ᨻᩳᩬ᩵ 1A3B 1A73 1A6C 1A75b^OVO1 USE vowel ordering.
ᩈᨷ᩷ᩣ᩠ᨿ content, well
/săbaːi/
1A48 1A37 1A77 1A63 1A60 1A3FsB3A/_y p3. Rare example of a word with this tone mark. (2 syllables, first is a single character.)
ᩈᨷᩣ᩠᩷ᨿ 1A48 1A37 1A63 1A77 1A60 1A3FsBA3/_y USE tone positioning.
ᩅ᩠ᨿᩙcity
/weŋ/
1A45 1A60 1A3F 1A59w/_y^Gp4.
ᨣᩪ᩺ person
/kun/
1A23 1A6A 1A7AgU^r p4. Unetymological, phonetic spelling. The mark above is serving as a final consonant, not a cancellation mark.
ᨣ᩺ᩪ 1A23 1A7A 1A6Ag^rU USE ordering as vowels.
᪁᪂ ᨻᩢ᩠ᨶ᩻ᩣ Sipsongpanna
/sip sɔːŋ pan naː/
1A81 1A82 00A0 1A3B 1A62 1A60 1A36 1A7B 1A631P2P ba/n"A p10. (Number precedes syllable). Example of mai sam marking a double-acting consonant.
᪁᪂ ᨻᩢ᩠᩻ᨶᩣ 1A81 1A82 00A0 1A3B 1A62 1A7B 1A60 1A36 1A631P2P ba"/nA Best-looking hack for USE compliance.
ᨻᩱ᩻ᩣ᩠ᨿ to go to the location
/pai paːi/
1A3B 1A71 1A7B 1A63 1A60 1A3Fbai"A/_yp47.
ᨻᩱᩣ᩠᩻ᨿ 1A3B 1A71 1A63 1A7B 1A60 1A3FbaiA"/_yBest-looking hack for USE compliance
ᨻᩱᩣ᩠᩻ᨿ 1A3B 1A71 1A63 1A7B 1A60 1A3FbaiA"/_yNormalisation of the above.
ᨩ᩠ᨿᩙᨲᩩᩴ Kengtung
/tseŋ tuŋ/
1A29 1A60 1A3F 1A59 1A32 1A69 1A74j/_y^GtuM

p53. (2 syllables)

Possibly the Chengtung on the Vietnamese border.

ᩅᨲᩛᩩ matter
/wat tʰu/
1A45 1A32 1A5B 1A69wtVbu p49. U+1A5B represents subscript HIGH THA rather than high RATHA. This is an issue for a font's repertoire of conjuncts.
ᩅᨲ᩠ᨳᩩ matter
/wat tʰu/
1A45 1A32 1A60 1A33 1A69wt/thu The Northern Thai writing of the above. Perhaps this should be rendered as the above when the language is Tai Lü or Lao.
ᨯ᩠ᨿᩴone
/deu/
1A2F 1A60 1A3F 1A74D/_yM p53. Assuming the word has TAI THAM SIGN MAI KANG rather than unencoded *TAI THAM CONSONANT SIGN FINAL WA.
ᩉ᩠ᨶᩦᩢ᩶debt
/niː/
1A49 1A60 1A36 1A66 1A62 1A76h/nIa2p57.
ᩁᩮᩂ᩠ᨠ auspicious occasion
/hɤːk/
1A41 1A6E 1A42 1A60 1A20r_e_R/kp79.
ᩁ᩠ᨿ᩺to learn
/heːn/
1A41 1A60 1A3F 1A7Ar/_y^rp118.
Introduction Control Panel The Tests Notes My Fonts

And Not

The word typically meaning 'and...not' or 'and...then' may be written with a chained syllable, and this may present challenges to renderers. The form of the letter representing /b/ in a chained syllable presented an encoding challenge. N3207R proposed using the sequence <SAKOT, BA> for it, and using <SAKOT, HIGH PA> for the subscript form corresponding to both BA (common) and HIGH PA (extremely rare) in its rôle as a final (Thai sakot) consonant. During the ISO process, a new character was introduced instead for the special form, SIGN BA, and it is widely assumed that <SAKOT, BA> represents the usual subscript form corresponding to BA, both as a sakot consonant and in the Pali /mp/ and /pp/ intervocalic clusters.

When syllables are chained, shared vowel symbols are not repeated. This leads to ambiguity as to which symbol is dropped.

All the spellings in the table below represent the same careful pronunciation in Northern Thai, namely /kɔː bɔː/. The Tai Lü forms are written with different marks and pronounced with different vowels, but use the same two consonant forms in the stack.

TextMeaningEncodingHacked via ASCII Remarks
ᨣᩴᨷᩴ᩵and...not, then...not 1A23 1A74 1A37 1A74 1A75gMBM1 Full form - 2 syllables, and arguably 2 words.
ᨣᩴᨷᩴdo. 1A23 1A74 1A37 1A74gMBM Univerbated form in MFL (2 syllables)
ᨣᩝᩴ᩵do. 1A23 1A5D 1A74 1A75gVBM1 First mai kang dropped.
ᨣᩴᩝ᩵do. 1A23 1A74 1A5D 1A75gMVB1 Second mai kang dropped.
ᨣᩝᩴdo. 1A23 1A5D 1A74gVBM First mai kang dropped.
ᨣᩴᩝdo. 1A23 1A74 1A5DgMVB Second mai kang dropped.
Introduction Control Panel The Tests Notes My Fonts

Potential Surprises

These words behave slightly oddly.

TextMeaning and PronunciationEncodingHacked via ASCII Remarks
ᩓᩯ very much
/lɛː/
1A53 1A6FlE_E Redundant vowel mark
ᩐᩣto take
/ʔau/
1A50 1A63qqUA Vowel on independent vowel
ᩐ᩵ᩣ very hot
/ʔau/
1A50 1A75 1A63qqU1A Vowel and tone mark on independent vowel
ᩐᩣ᩵ 1A50 1A63 1A75qqUA1 USE-compliant order
ᨯᩪᩕᩣ listen to me
/duː haː/
1A2F 1A6A 1A55 1A63DUVrA Medial consonant between explicit vowels
ᨯᩮᩬᩥᩁᨹᩫᩖᨣᩩᨱ᩺ March
/dɯan pʰon laʔ kun/
1A2F 1A6E 1A6C 1A65 1A41 1A39 1A6B 1A56 1A23 1A69 1A31 1A7AD_eVO_i_rphoVlguN^r NTDPLM p259. Double-acting medial consonant with implicit vowel after it. (3 syllables)
ᨯᩮᩥᩬᩁᨹᩫᩖᨣᩩᨱ᩺ 1A2F 1A6E 1A65 1A6C 1A41 1A39 1A6B 1A56 1A23 1A69 1A31 1A7AD_e_iVO_rphoVlguN^r USE-compliant vowel ordering
ᨻᩣᨷᩰᩖ Pabol (sic)
/paː boːn/
1A3B 1A63 1A37 1A70 1A56bABOVl A mistake for Spanish Pablo seen on Wikipedia, but in light of the above a renderer should render it as intended.
ᨶ᩶ᩭ little
/nɔːi/
1A36 1A76 1A6Dn2oi Tai Khün spelling.
ᨶᩭ᩶ 1A36 1A6D 1A76noi2 USE-compliant tone mark sequencing.
ᩉᩖ᩠ᩅᨦ big
/luaŋ/
1A49 1A56 1A60 1A45 1A26hVl/wG Medial consonant in middle of stack. The proposal classified the final consonant of the stack as a 'medial vowel'. (2 syllables, second a single character)
ᩉᩖ᩠ᩅᩣiron
/lwaː/
1A49 1A56 1A60 1A45 1A63hVl/wA Medial consonant in middle of stack. In this case, the WA is very much a consonant.
ᨻᩕ᩠ᨿᩮᩡ a type of sound
/pʰiaʔ/
1A3B 1A55 1A60 1A3F 1A6E 1A61bVr/_y_eH Preposed medial consonant in middle of stack along with a preposed vowel.
ᨠᩩ᩶ᩣ᩠ᨶ᩠ᨦ to prosper
/kaːn kuŋ/
1A20 1A69 1A76 1A63 1A60 1A36 1A60 1A26ku2A/n/G The first word in the MFL! Note that there are two final consonants. The SIGN AA prevents a phonetic spelling.
ᨠᩩᩣ᩠᩶ᨶ᩠ᨦ 1A20 1A69 1A63 1A76 1A60 1A36 1A60 1A26kuA2/n/G USE-compliant tone mark placement.
ᩋᩢ᩠ᨭᩛ a satang coin
/ʔat/
1A4B 1A62 1A60 1A2D 1A5Bqa/_TVb Two consonants in final consonant position (3 consonants in total)
ᩆᩢᨠ᩠ᨯᩥ᩺ rank
/sak/
1A46 1A62 1A20 1A60 1A2F 1A65 1A7Ashak/D_i^r Consonant-killer also killing explicit vowel above (2 syllables)
ᩆᩢᨠ᩠ᨯᩥ᩼ rank
/sak/
1A46 1A62 1A20 1A60 1A2F 1A65 1A7Cshak/D_iX Same again, but with KARAN instead of RA HAAM. Some people are using KARAN in Northern Thai instead of RA HAAM! (2 syllables)
ᨾᩉᩣᩉᩥᨦ᩠ᨣᩩ᩺ giant fennel
/ma haː hiŋ/
1A3E 1A49 1A63 1A49 1A65 1A26 1A60 1A23 1A69 1A7Am_hA_h_iG/gu^r Consonant-killer also killing explicit vowel below (4 syllables)
ᨾᩉᩣᩉᩥᨦ᩠ᨣ᩺ᩩ 1A3E 1A49 1A63 1A49 1A65 1A26 1A60 1A23 1A7A 1A69m_hA_h_iG/g^ru USE then requires that the killer precede the killed vowel.
ᨾᩉᩣᩉᩥᨦ᩠ᨣᩩ᩼ 1A3E 1A49 1A63 1A49 1A65 1A26 1A60 1A23 1A69 1A7Cm_hA_h_iG/guX Same again, but with KARAN. (4 syllables)
ᩆᩣᩈ᩠ᨲᩕ᩺ science
/saːt/
1A46 1A63 1A48 1A60 1A32 1A55 1A7AshA_s/tVr^r Consonant-killer also killing medial consonant. NT spelling. (2 syllables)
ᩈᩣᩈ᩠ᨲᩕ᩼ science
/saːt/
1A48 1A63 1A48 1A60 1A32 1A55 1A7CsA_s/tVrX Consonant-killer also killing medial consonant. Tai Khün spelling. (2 syllables)
ᩁᩪ᩠ᨷimage
/huːp/
1A41 1A6A 1A60 1A37rU/B This spelling is archaic in Northern Thailand (but current in Tai Khün)
ᨻᩦ᩠᩵ᨶᩬ᩶ᨦ relatives
/piː nɔːŋ/
1A3B 1A66 1A75 1A60 1A36 1A6C 1A76 1A26bI1/nVO2G (2 syllables - second is a single character)
ᩃᩢᩪ child (progeny)
/luːk/
1A43 1A62 1A6AlaU USE demands that mai kak (see next) precede most of the vowels that it phonetically follows.
ᩃᩪᩢ 1A43 1A6A 1A62lUa MAI SAT can serve as a final consonant, /k/. This leads to yet more formal vowel combinations.
ᨸᩢᩣmouth
/paːk/
1A38 1A62 1A63paA
ᨯᩬᩢ flower
/dɔːk/
1A2F 1A6C 1A62DVOa
ᨯᩢᩬ 1A2F 1A62 1A6CDaVO USE-compliant ordering.
ᨯᩢᩬᩡ 1A2F 1A62 1A6C 1A61DaVOH USE-compliant ordering.
ᨯᩬᩢᩡ 1A2F 1A6C 1A62 1A61DVOaH MAI SAT can even be reinforced by SIGN A.
ᨻ᩠ᩅᩢᩡ group
/puak/
1A3B 1A60 1A45 1A62 1A61b/waH
ᨲᩯ᩠ᨶᩬᩴ᩵ wasp, hornet
/tɛːn tɔː/
1A32 1A6F 1A60 1A36 1A6C 1A74 1A75t_E/nVOM1 A single orthographic syllable.
ᨲᩬᩴ᩵͏ᩯ᩠ᨶ wasp, hornet
/tɔː tɛːn/
1A32 1A6C 1A74 1A75 034F 1A6F 1A60 1A36tVOM1͏_E/n Should normally be visually identical with the above - the font may be too crude. However, when font colouring is supported, the vowel below should be coloured differently in the Da Lekh Si font; that font is intended to reveal the order of characters.
ᨲᩬᩴ᩵ᩯ᩠ᨶ 1A32 1A6C 1A74 1A75 1A6F 1A60 1A36tVOM1_E/n Would it be legitimate for this to render differently to the above?
ᩈ᩠ᨶᩫ᩻street
/sănon/
1A48 1A60 1A36 1A6B 1A7Bs/no" The mai sam represents the final consonant in addition to the epenthetic vowel.
ᨠᨾᩛᩦ scripture
/kam piː/
1A20 1A3E 1A5B 1A66kmVbI

The surprise is that U+1A5B had InSC=Consonant_Final until Unicode 10.0.

(2 syllables - the first is a single consonant in the first example.)

ᨶᩥᨻᩛᩣ᩠ᨶ nirvana
/nip paːn/
1A36 1A65 1A3B 1A5B 1A63 1A60 1A36n_ibVbA/n
ᨵᨾᩜᩥᨠ saintly
/tʰam miʔ kaʔ/
1A35 1A3E 1A5C 1A65 1A20dhmVm_ik Chiengtung p166. It has 3 syllables - the second is of interest. It may show a problem with U+1A5C having InSC=Consonant_Final until Unicode 10.0.
ᩈᨵᩩ᩠ᨷ stupa(?)
/sătʰup/
1A48 1A35 1A69 1A60 1A37sdhu/B Chiengtung p166. (2 syllables - the first is a single letter.) This shows the issue with placement of the vowel and 'sakot' consonant also applies to this explicit vowel.
ᩋᩣᨴᩥᨲ᩠ᨲᨵᨾᩜᩮᩣ Adittadhammo
Pali <Ādittadhammo>
1A4B 1A63 1A34 1A65 1A32 1A60 1A32 1A35 1A3E 1A5C 1A6E 1A63qAd_it/tdhmVm_eA Chiengtung p264. (5 syllables)
ᨬᩣᨱᨵᨾᩜᩮᩣ Nyanadhammo
Pali <Ñāṇadhammo>
1A2C 1A63 1A31 1A35 1A3E 1A5C 1A6E 1A63nyANdhmVm_eA Chiengtung p238. The individual referred is not the one hyperlinked to. (4 syllables)
ᩅᩥᩈᩮ᩠ᩈ special
/wiʔ seːt/
1A45 1A65 1A48 1A6E 1A60 1A48w_i_s_e/_s Note the lack of a ligature. (2 syllables)
ᨢ᩶ᩣ slave
/kʰaː/
1A22 1A76 1A63kx2A Same character order as in Thai and Lao!
ᨢᩣ᩶ 1A22 1A63 1A76kxA2 But not if the USE prevails!
ᩈᩣᩈᨶᩣ religion
/saː saʔ naː/
1A48 1A63 1A48 1A36 1A63sA_snA

Full (5 chars) and contracted (7 chars) forms.

(3 and 2 syllables respectively)

ᩈᩣᩈ᩠ᨶ᩻ᩣ 1A48 1A63 1A48 1A60 1A36 1A7B 1A63sA_s/n"A
ᩈ᩠ᨶ᩻ᩮᩢ᩶ᩣ javelin
/sănau/
1A48 1A60 1A36 1A7B 1A6E 1A62 1A76 1A63s/n"_ea2A
ᨲᩦ͏ᩣ᩠ᨿ to beat to death
/tiː taːi/
1A32 1A66 034F 1A63 1A60 1A3FtI͏A/_y Uses CGJ as an invisible MAI SAM to stand for the duplicated consonant.
ᩋᩮᩰᩣᨽᩣᩈ to illuminate
/ʔoː pʰaː saʔ/
1A4B 1A6E 1A70 1A63 1A3D 1A63 1A48q_eOAbhA_s MFL p919. While the spelling rules call for either just U+1A70 SIGN OO or just the combination of <U+1A6E SIGN E, U+1A63 SIGN AA>, this might conceivably be a private lexicographer's notation indicating that both occur that happened to escape into the published work. The graphical order, left-to-right, in the MFL is SIGN OO, SIGN E, LETTER A, SIGN AA. The 'hacked via ASCII' rendering is wrong. (3 syllables - first is of interest.)
ᩉ᩠ᨾ᩵ᩣᩴ᩻ Grub's up!
/mam mam/
1A49 1A60 1A3E 1A75 1A63 1A74 1A7Bh/m1AM"
ᩉ᩠ᨾᩣᩴ᩵᩻ 1A49 1A60 1A3E 1A63 1A74 1A75 1A7Bh/mAM1" It is not clear whether a USE-compliant form should have MAI KANG or the tone mark first.
ᩉ᩠ᨾᩣ᩵ᩴ᩻ 1A49 1A60 1A3E 1A63 1A75 1A74 1A7Bh/mA1M"
ᩃᩮᩞ trickery
/leːs/
1A43 1A6E 1A5El_eVs Tai Khün spelling, cited in N3384
ᩋᨶᩣᨳᨷᩥᨱ᩠ᨯᩥᨠᩈᩞ Anathapindika's

Pali <Anāthapiṇḍikassa>
1A4B 1A36 1A63 1A33 1A37 1A65 1A31 1A60 1A2F 1A65 1A20 1A48 1A5EqnAthB_iN/D_ik_sVs A rare spelling of the Pali masculine genitive singular ending. Note that SIGN SA starts the final phonetic syllable. (7 syllables - the last one is of interest.)
Introduction Control Panel The Tests Notes My Fonts

Test and Tell

These exampless are intended to reveal the behaviour of the rendering system, rather than be clear pass or fail tests.

TextMeaning and PronunciationEncoding Hacked via ASCIIRemarks
Interpretation
ᨠ᩠ᨷ (no meaning) 1A20 1A60 1A37k/B Interpretation of <SAKOT, BA> and <SAKOT, HIGH PA> respectively. This looks at font behaviour rather than at layout engine behaviour.
ᨠ᩠ᨸ (no meaning) 1A20 1A60 1A38k/p
Line-Breaking within the Orthographic Syllable
Manual line breaking may break lines between the dependent vowel AA (U+1A63 and possibly U+1A64) and its base consonant. Figure 9b of N3207R provides an example. What appears to be a misspelt form of sammodamānehi (as ᩈᨾᩮᩣᨴ᩠ᨴᨾᩣᨶᩮᩉᩥ), the instrumental or ablative plural of the present participle of sammodati, is split between the second and third lines of the second leaf by splitting just before U+1A63. How will this be handled? The answer may be application dependent.
ᩈᨾᩮᩣᨴ᩠ᨴᨾ­ᩣᨶᩮᩉᩥ with (things) on friendly terms
Pali <samoddamānehi>
1A48 1A3E 1A6E 1A63 1A34 1A60 1A34 1A3E 00AD 1A63 1A36 1A6E 1A49 1A65sm_eAd/dm­An_e_h_i

Split using a soft hyphen. (Many syllables.)

The text occurs with and without dingbats (U+1AA5) so that one can see whether an inactive soft hyphen affects it.

᪥᪥᪥ᩈᨾᩮᩣᨴ᩠ᨴᨾ­ᩣᨶᩮᩉᩥ with (things) on friendly terms
Pali <samoddamānehi>
1AA5 1AA5 1AA5 1A48 1A3E 1A6E 1A63 1A34 1A60 1A34 1A3E 00AD 1A63 1A36 1A6E 1A49 1A65᪥᪥᪥_sm_eAd/dm­An_e_h_i
ᩈᨾᩮᩣᨴ᩠ᨴᨾ​ᩣᨶᩮᩉᩥ with (things) on friendly terms
Pali <samoddamānehi>
1A48 1A3E 1A6E 1A63 1A34 1A60 1A34 1A3E 200B 1A63 1A36 1A6E 1A49 1A65sm_eAd/dm​An_e_h_i Split using zero width space - this uses the presentation-oriented view that ZWSP is simply a soft hyphen without visible rendering. This test is uninformative if the renderer refuses to make the break. See above for dingbats. (Many syllables)
᪥᪥ᩈᨾᩮᩣᨴ᩠ᨴᨾ​ᩣᨶᩮᩉᩥ with (things) on friendly terms
Pali <samoddamānehi>
1AA5 1AA5 1A48 1A3E 1A6E 1A63 1A34 1A60 1A34 1A3E 200B 1A63 1A36 1A6E 1A49 1A65᪥᪥_sm_eAd/dm​An_e_h_i
Baseless Marks and Non-alphabetic Bases
(no meaning) 1A63A Bare vowel symbol
 ᩣ(no meaning) 00A0 1A63 A Vowel symbol 'on' NBSP
 ‍ᩣ (no meaning) 00A0 200D 1A63 ‍A Vowel symbol 'on' NBSP with ZWJ.
ᨷ ◌ᩮ N/A 1A37 0020 25CC 1A6EB ◌_e And now discourage the use of multiple script runs by the renderer.
Dependent Consonant Above and Tone Mark - What Chooses the Order?
ᨾ᩠ᩅ᩺᩵to be fun
Khün /mon/
1A3E 1A60 1A45 1A7A 1A75m/w^r1 Typed as seen. Da Lekh fonts place the glyphs side by side, but the order is as in the Tai Khün manuscript. To be precise, it is an extract from a 1949 edition of the Khemarat Weekly, reproduced in L2/17-120 Figure 4.
ᨾ᩠ᩅ᩵᩺to be fun
Khün /mon/
1A3E 1A60 1A45 1A75 1A7Am/w1^r Typed with tone mark first. Da Lekh accepts the order, just as Thai does not rearrange THANTHAKHAT (or vowels above) with tone marks. The Da Lekh rendering does not match the Tai Khün manuscript
ᨾ᩠ᩅ᩺᩵᩻ to be lots of fun
Khün /mon mon/
1A3E 1A60 1A45 1A7A 1A75 1A7Bm/w^r1" Not actually attested, but grammatical derivatives of the above.
ᨾ᩠ᩅ᩵᩺᩻ to be lots of fun
Khün /mon mon/
1A3E 1A60 1A45 1A75 1A7A 1A7Bm/w1^r"
ᨣᩪ᩺᩻ everyone
Tai Lü /kun kun/
1A23 1A6A 1A7A 1A7BgU^r" Theoretical derivative of the unetymological, phonetic spelling of the word for person. The first mark above is serving as a final consonant, not a cancellation mark.
Coda Consonants v. Onset Consonants

U+1A5B TAI THAM CONSONANT SIGN HIGH RATHA OR LOW PA and U+1A54 TAI THAM LETTER GREAT SA may have been created so that conjuncts would be different from accidental combinations of initial and final consonants. Are these differences maintained? This primarily probes the font properties, though rendering engines may have an effect.

There are very few words that are affected. A word for 'special' is one of the few.

ᨭᩮ᩠ᨮ ᨭᩛᩮ (no meaning)
/te:t/ /-t tʰe:/
1A2D 1A6E 1A60 1A2E 0020 1A2D 1A5B 1A6ET_e/Th _TVb_e Should be different. (2 syllables)
ᨱᩮ᩠ᨮ ᨱᩛᩮ (no meaning)
/ne:t/ /-n tʰe:/
1A31 1A6E 1A60 1A2E 0020 1A31 1A5B 1A6EN_e/Th NVb_e Should be different. (2 syllables)
ᨲᩮ᩠ᨮ ᨲᩛᩮ (no meaning)
/te:t/ /-t tʰe:/
1A32 1A6E 1A60 1A2E 0020 1A32 1A5B 1A6Et_e/Th tVb_e Should probably be different. (2 syllables)
ᨻᩮ᩠ᨻ ᨻᩛᩮ (no meaning)
/pe:p/ /-p pe:/
1A3B 1A6E 1A60 1A3B 0020 1A3B 1A5B 1A6Eb_e/b bVb_e Should be different. (2 syllables)
ᨾᩮ᩠ᨻ ᨾᩛᩮ (no meaning)
/me:p/ /-m pe:/
1A3E 1A6E 1A60 1A3B 0020 1A3E 1A5B 1A6Em_e/b mVb_e Should be different. (2 syllables)
ᨠᩮ᩠ᩁ ᨻᩕᩮ (no meaning)
/keːn/ /kʰe/
1A20 1A6E 1A60 1A41 0020 1A3B 1A55 1A6Ek_e/_r bVr_e Should be different. (2 syllables)
ᨠᩮ᩠ᩃ ᨠᩖᩮ (no meaning)
/keːn/ /keː/
1A20 1A6E 1A60 1A43 0020 1A20 1A56 1A6Ek_e/_l kVl_e Should be different. (2 syllables)
ᨠᩖᩮ ᨠ᩠ᩃᩮ (no meaning)
/keː/ /keː/
1A20 1A56 1A6E 0020 1A20 1A60 1A43 1A6EkVl_e k/_l_e

However, those who don't use MEDIAL LA won't make a visual distinction to show the position of the vowel!

(2 syllables)

ᩈ᩠ᩈ ᩈᩞ ᩔ(no meaning) 1A48 1A60 1A48 0020 1A48 1A5E 0020 1A54s/_s _sVs ss Should be different. (3 syllables)
ᨾ᩠ᨾ ᨾᩜ(no meaning) 1A3E 1A60 1A3E 0020 1A3E 1A5Cm/m mVm Should be different. (2 syllables)
Behaviour of <SAKOT, NYA>
ᨬᩮ᩠ᨬ ᨬ᩠ᨬᩮ (no meaning)
/ɲeːn/ /-n ɲeː/
1A2C 1A6E 1A60 1A2C 0020 1A2C 1A60 1A2C 1A6Eny_e/ny ny/ny_e Would ideally be different, but this may not be readily and robustly achievable. (2 syllables)
ᨬᩮ‌᩠ᨬ ᨬ᩠ᨬᩮ (no meaning)
/ɲeːn/ /-n ɲeː/
1A2C 1A6E 200C 1A60 1A2C 0020 1A2C 1A60 1A2C 1A6Eny_e‌/ny ny/ny_e Instead should be different. (2 syllables)
ᨱ᩠ᨬ ᨬ᩠ᨬ(no meaning) 1A31 1A60 1A2C 0020 1A2C 1A60 1A2CN/ny ny/ny Should these be different? (2 syllables)
ᨱᩮ᩠ᨬ ᨱ᩠ᨬᩮ (no meaning)
/neːn/ /-n ɲeː/
1A31 1A6E 1A60 1A2C 0020 1A31 1A60 1A2C 1A6EN_e/ny N/ny_e Should these be different? (2 syllables)
Marks from outside the Tai Tham Block
ᩋᩦ๊ (meaningless syllable in refrain of a song)
/ʔiː/
1A4B 1A66 0E4AqI3K Thai mai tri and mai chattawa are found on tua mueang 'words' on p236 of the big blue book! Of course, these might just be the unencoded THAI-LAO TONES THREE and FOUR. In this particular case, a rendering issue might be alleviated by making the default positions of the tone marks higher than that of the vowels above.
ᩋᩦ๋ (meaningless syllable in refrain of a song)
/ʔiː/
1A4B 1A66 0E4BqI4K
Language-Sensitive Forms (Browser Test?)
ᩌᩣᩴ ᩌᩣᩴ bran (written twice)
/ham/
1A4C 1A63 1A74 0020 1A4C 1A63 1A74rhAM rhAM

The top two rows are declared to be in Lao, and the second also has a corresponding style-setting lest the language setting be ignored.

The initial consonant takes the form in the Da Lekh family of the consonant form used in that role in Laos and Northeast Thailand, namely , which is only subtly different from U+1A41 TAI THAM LETTER RA. So doing may be improper behaviour, but is seen in fonts.

The mai kang should appear on the vowel U+1A63 TAI THAM VOWEL SIGN AA, its usual position outside Thailand. Of course, this won't happen if the font cannot be appropriate for such writing systems. At least one browser has failed to render the final stack properly when it has been the final glyph in the glyph stream; this is why the word is written twice.

The bottom row is not marked for language, and shows the same word (and encoding). The Da Lekh font follows the more technically challenging Chiangmai style by default, with the MAI KANG on the consonant.

(2 words, so 2 syllables!)

ᩌᩣᩴ ᩌᩣᩴ bran (written twice)
/ham/
1A4C 1A63 1A74 0020 1A4C 1A63 1A74rhAM rhAM
ᩌᩣᩴ ᩌᩣᩴ bran (written twice)
/ham/
1A4C 1A63 1A74 0020 1A4C 1A63 1A74rhAM rhAM
Tone before Vowel!
ᨣ᩠ᩅ᩵ᩢᩣ᩠ᨶ when
kan waː
1A23 1A60 1A45 1A75 1A62 1A63 1A60 1A36g/w1aA/n p118 - MFL clearly has the tone as the first mark! It may be that these are just typing errors. There are two other examples of tone and then vowel in the dictionary, the same tone and vowel as here.
ᨣ᩠ᩅ᩵ᩢᩣ and say
kɔʔ waː
1A23 1A60 1A45 1A75 1A62 1A63g/w1aA
Introduction Control Panel The Tests Notes My Fonts

Rendering Challenges from MFL

These words presented problems, now overcome (Version 0.05), when developing the Da Lekh font to overcome the problems presented by the Universal Shaping Engine of mid 2016. (The solution is not entirely compliant with the Unicode standard - dotted circles in the input are sometimes deleted.) These are offered as an aid to font developers fighting unhelpful layout engines; they are not expected to help developers of the core layout engines.

TextMeaning and PronunciationEncoding Hacked via ASCIIRemarks
ᨠ᩠ᩃ᩻ᩬ᩵ᨾ Cambodian
/kălɔːm/
1A20 1A60 1A43 1A7B 1A6C 1A75 1A3Ek/_l"VO1m p4 (2 syllables, second a single consonant)
ᨠ᩠ᩃᩬ᩵᩻ᨾ 1A20 1A60 1A43 1A6C 1A75 1A7B 1A3Ek/_lVO1"m Hard to interpret encoding.
ᨠ᩠ᩃᩬ᩻᩵ᨾ 1A20 1A60 1A43 1A6C 1A7B 1A75 1A3Ek/_lVO"1m USE-compatible, with interpretable rendering specification.
ᨠᩕᩥ᩠᩵ᨦ suspicious
/kʰiŋ/
1A20 1A55 1A65 1A75 1A60 1A26kVr_i1/Gp15
ᨡᩮᩢ᩶ᩬᩣ᩠ᨦ belongings
/kʰau kʰɔːŋ/
1A21 1A6E 1A62 1A76 1A6C 1A63 1A60 1A26kh_ea2VOA/G

p101 - the one syllable form.

The first form minimises the disruption to the pattern of first element followed by second element. The second spelling tries sticking in CGJ to advise that the ordering of the marks is not an error. The third spelling follows the principle that if the components cannot be concatenated (with deletion and addition of SAKOT or equivalent as appropriate), then the ordering should be based on the visual layout of the marks.

ᨡᩮᩢ᩶͏ᩬᩣ᩠ᨦ 1A21 1A6E 1A62 1A76 034F 1A6C 1A63 1A60 1A26kh_ea2͏VOA/G
ᨡᩮᩬᩢ᩶ᩣ᩠ᨦ 1A21 1A6E 1A6C 1A62 1A76 1A63 1A60 1A26kh_eVOa2A/G
ᨡᩮᩢᩬᩣ᩠᩶ᨦ 1A21 1A6E 1A62 1A6C 1A63 1A76 1A60 1A26kh_eaVOA2/G USE (December 2021)-compatible rearrangement of the above - but the final consonant is still incompatible at 2021.
ᨦ᩠ᩅ᩶ᩣ᩻ ᨪᩰᩫ᩠᩶ᨦ᩻ spastic
/ŋwaː ŋwaː soːŋ soːŋ/
1A26 1A60 1A45 1A76 1A63 1A7B 0020 1A2A 1A70 1A6B 1A76 1A60 1A26 1A7BG/w2A" jxOo2/G"p168 (2 syllables)
ᨦ᩠ᩅᩣ᩶᩻ ᨪᩰᩫ᩠᩶ᨦ᩻ 1A26 1A60 1A45 1A63 1A76 1A7B 0020 1A2A 1A70 1A6B 1A76 1A60 1A26 1A7BG/wA2" jxOo2/G" Vowel and tone order adjusted to the USE as at December 2021.
ᨴᩯ᩠᩶ᩃ truth to tell
/tɛː lɛː/
1A34 1A6F 1A76 1A60 1A43d_E2/_l p318. The first entry has the written vowel with the first consonant, the second with the second, and the third entry is the same as the second but normalised.
ᨴ᩠᩶ᩃᩯ 1A34 1A76 1A60 1A43 1A6Fd2/_l_E
ᨴ᩠᩶ᩃᩯ 1A34 1A60 1A76 1A43 1A6Fd/2_l_E
ᨳᩮᩬᩥᩡ᩻ ᨳᩮᩥ᩠ᨠ᩻ bruised
/tʰɤʔ tʰɤʔ tʰɤːk tʰɤːk/
1A33 1A6E 1A6C 1A65 1A61 1A7B 0020 1A33 1A6E 1A65 1A60 1A20 1A7Bth_eVO_iH" th_e_i/k"p314. (2 syllables)
ᨳᩮᩥᩬᩡ᩻ ᨳᩮᩥ᩠ᨠ᩻ 1A33 1A6E 1A65 1A6C 1A61 1A7B 0020 1A33 1A6E 1A65 1A60 1A20 1A7Bth_e_iVOH" th_e_i/k" With vowel order of the USE as at December 2021
ᨾᩉᩫᩖᨿᩰᨴᩤ great army
/maʔ hon yoː tʰaː/
1A3E 1A49 1A6B 1A56 1A3F 1A70 1A34 1A64m_hoVl_yOd^ANTDPLM p511.
Introduction Control Panel The Tests Notes My Fonts

Notes

References

Short NameFull Reference
N3207R Everson M., Hosken M. & Constable P. Revised proposal for encoding the Lanna script in the BMP of the UCS, ISO/IEC JTC1/SC2/WG2/N3207R, L2/07-007R
MFL Rungrueangsi, Udom (2004) [1991]. Lanna-Thai Dictionary, Princess Mother Version พจนานุกรมล้านนา ~ ไทย ฉบับแม่ฟ้าหลวง ᨻᨧᨶᩣᨶᩩᨠᩕᩫ᩠ᨾᩃ᩶ᩣ᩠ᨶᨶᩣ ~ ᨴᩱ᩠ᨿ ᨨᨷᩢ᩠ᨷᨾᩯ᩵ᨼ᩶ᩣᩉᩖ᩠ᩅᨦ [Photchananukrom Lanna ~ Thai, Chabap Maefa Luang] (in Thai) (Revision 1 ed.). Chiang Mai: Rongphim Ming Mueang (โรงพิมพ์มิ่งเมือง). ISBN 974-8359-03-4.
big blue book Wacharasat, Bunkhit (2003). Language of Mueang Lanna ᨽᩣᩈᩣᨾᩮᩬᩨᨦᩃ᩶ᩣ᩠ᨶᨶᩣ ภาษาเมืองล้านนา [Phasa Mueang Lanna] (in Thai). ISBN 974-85472-0-5
Apiradee Techasiriwan, Apiradee อภิรดี เตชะศิริวรรณ. พัฒนาการของอักษรและอัขรวิธีในเอกสานไทลื้ [Patthanakan khong Akson lae Akhara Witi nai Ekasan Thai Lue] Development of Tai Lue Scripts and Orthography. MA Thesis, Chiangmai University (in Thai)
NTDPLM Arunrat Wichiankhiao et al. อรุณรัตน์ วิเชียรเขียว (1996). ᨻᨧᨶᩣᨶᩩᨠᩕᩫ᩠ᨾᩃ᩶ᩣ᩠ᨶᨶᩣᨨᨻᩕᩰᩬᩡᨣᩤᩴᨴᩦ᩵ᨷᩕᩤᨠᩫ᩠ᨭᨶᩱᨷᩱᩃᩣ᩠ᨶ พจนานุกรมศัพท์ล้านนาเฉพาะคำที่ปรากฏในใบลาน The Northern Thai Dictionary of Palm-Leaf Manuscripts. ISBN 974-7067-77-2
Chiengtung Chieng Tung: Its Way of Life ᨡᩮᨾᩁᨭᩛᨶᨣᩬᩁᨩ᩠ᨿᨦᨲᩩᨦ [Khemarattha Nakon Cheng Tung] เขมรัฐนครดชียงตุง [Khemarat Nakhon Chiang Tung] (in Thai, Tai Khün, French and English) Chiang Mai: Wat Tha Kradas (วัดท่ากระดาษ)
L2/17-120 Wordingham J.R. Corrections to the Indic Syllabic Category for the Tai Tham Script, L2/17-120
N3384 Hosken M. Tai Tham Subjoined Variants, ISO/IEC JTC1/SC2/WG2/N3384, L2/08-073

Document History

This is Version 2.12 of the web page, which has been written by Richard Wordingham.

History

Version Date Changes
1.0 14 June 2015 Initial 'stable' (i.e. abandoned) version. Work had started on 27 February 2015, and there may be earlier versions around.
1.1 25 September 2016 Converted from XML to HTML (by stripping off XML header) for new website.
2.0 25 October 2016

Added option to dynamically switch fonts - free font Da Lekh Seri for exposure to rendering engine foibles, and encumbered font Da Lekh for resistance. Both fonts are open source, but I created all the inked glyphs for the Da Lekh Seri font. ('Seri' means beholding to no-one.)

Completed references, and improved, pruned and extended the examples.

2.1 26 October 2016

Corrected typos. Started testing of display bases.

2.2 7 November 2016

Fixed transliterator bug. Added examples from testing of Da Lekh font work-arounds. Corrected more typos. Tested language sensitivity.

2.3 14 November 2016

Added styles to force Lao forms. Reorganised 'test and tell'. Added one new test word, for mai kam followed by mai sam.

2.4 14 April 2017

Improved 'bran' bug alert.

Added 'A Tai Tham KH' font with and without ccmp enabled. The radar buttons are hidden, and anyone enabling them would also have to supply the font.

Added test for double acting MEDIAL LA.

2.5 8 July 2017

Added test for tone plus SIGN OY.

Added colour fonts to show phonetic position of subscripts relative to vowel.

Added "onclick" for radio buttons.

2.6 22 February 2018

Added test cases for karan on vowels and medial la following preposed vowel.

2.7 12 May 2018

Added three new fonts - 'A Tai Tham KH', 'Hariphunchai' and my extension of the latter, 'Lamphun'.

Added a few more examples of the ᨶᩣ ligature.

Added query as to when ᨲᩬᩴ᩵͏ᩯ᩠ᨶ should render properly.

Colour for spell-checking is now a reality.

2.8 17 February 2019

Added test cases for ᩃᩮᩞ and ᨻᩕ᩠ᨿᩮᩡ.

2.9 9 December 2021

Corrected feature ss99 to ss19.

Clarified rôle of Da Lekh Si fonts. Updated repertoire of Da Lekh Seri fonts.

Added play area for readers to try the fonts out.

Massively duplicated navigation bars to avoid need to scroll to top or bottom.

Made the difference between strings that shall be rendered and other possible sequences clearer. Promoted four test and tell cases to test cases - three sequences for displaying marks and one for the combination of RA HAAM and MAI SAM.

Linked to my font compiler.

2.10 30 December 2021

Noted that colour now works even in IE 11 and also in LibreOffice.

Added most recent (2019) version of Hariphunchai, dubbed Hariphunchai 4.

Added USE-compatible encodings to avoid maligning any fonts that assume a USE-compatible encoding.

2.12 23 January 2022

(Including changes to 2.11).

Fixed miscellaneous typos, including alternative encodings of ᨶ᩶ᩣᩴ. Changed shortcomings of 'my fonts' to shortcomings of 'Da Lekh'.

Changed site from HTTP to HTTPS.

Changed background for USE encoding from red to orange to avoid clash with coloured fonts.

Testing

This web page has been developed with frequent testing on Firefox Version 54 and occasional viewing using Safari on iPhone (iOS 10.3.2), IE 11 (on Windows 7) and Microsoft Edge (on Windows 10).

Switching fonts has been tested in all these browsers.

Introduction Control Panel The Tests Notes My Fonts

Font Availability

Da Lekh Font Family

You may freely use my four fonts mentioned here without modification and may freely examine my fonts. See the respective licensing for conditions and modification. I do not own all the intellectual property rights for the Da Lekh and Da Lekh Si fonts. The fonts are available as follows:

NameFont fileSource file Licence file
Da Lekh
(ᨯᩣᩃᩮ᩠ᨡ)
dalekh.ttf File dalekh.txt in dalekh.zip. This is also the ultimate source code for the Da Lekh Seri font. See Makefile therein for preprocessing directives. DejaVu licence
Da Lekh Si
(ᨯᩣᩃᩮ᩠ᨡᩈᩦ)
dalekh_si.ttf
Da Lekh Seri
(ᨯᩣᩃᩮ᩠ᨡᩈᩮᩁᩥ)
dalekh_seri.ttf Either start from the source code, which is subject to the DejaVu licence, for the Da Lekh font, or use the preprocessed file dalekh_seri.txt. If the GNU Compiler Collection is available, one may use the following command to generate the immediate 'source' code:
cc -E -fdirectives-only -DSERI -x c dalekh.txt | grep -v ^# >| dalekh_seri.txt
seri_license.htm
Da Lekh Si Seri
(ᨯᩣᩃᩮ᩠ᨡᩈᩦᩈᩮᩁᩥ)
dalekh_si_seri.ttf Either start from the source code, which is subject to the DejaVu licence, for the Da Lekh font, or use the preprocessed file dalekh_si_seri.txt. If the GNU Compiler Collection is available, one may use the following command to generate the immediate 'source' code:
cc -E -fdirectives-only -DSERI -DCOLOUR -x c dalekh.txt | grep -v ^# >| dalekh_seri.txt

If you wish to have WOFF files, you should either generate them yourself from the font files listed above, or simply copy them from this website.

The fonts are generated from the source code by means of a DIY font compiler that still has many rough edges. However, the source code of the font, although spartanly commented, may make it clearer what the font is attempting to do. I have endeavoured to make reverse engineering unnecessary.

The font Da Lekh is partly intended for my practical use in analysing material in the Tai Tham script. It therefore contains a large set of Latin characters to support transcription and transliteration. It also contains work arounds so that it may render properly despite problems with rendering engines.

The other purpose of the fonts is to explore issues in making an OpenType font for the Tai Tham script.

The font Da Lekh Seri is an unencumbered font intended for testing rendering engines. It therefore has, besides the glyphs for Tai Tham writing systems, just a bespoke set of (poor) ASCII glyphs; both the extra characters required by Microsoft Office and the characters recommended for the Universal Script Engine; and the characters needed for transliteration style (feature ss04) and their closure under NFC. Known existing work-arounds have been removed. This removal is implemented by compiler directives.

The font Da Lekh Si (ᨯᩣᩃᩮ᩠ᨡᩈᩦ) differs from Da Lekh in that it aims to reveal the spelling of words. This is useful when using a spell-checker, for example on Firefox. The ideal is that subscript consonants in the coda of an orthographic syllable would be distinguished from those in the onset by colour, whence the word 'Si' in the name of the font. The colour technology used works in the dominant browsers (Chrome, Safari, Firefox, MS Edge and even Internet Explorer 11) and in the word processor of LibreOffice. The colouring is also applied to chained syllables.

It is possible that Da Lekh Si may be reduced to an optional OpenType feature applied to the Da Lekh font.

The font Da Lekh Si Seri is an unencumbered font that colours glyphs in the same way. Like Da Lekh Seri, it deliberately lacks work-arounds for problems with renderers. It is intended as an aid for the development of the Da Lekh Si font.

Lamphun Font

The Lamphun font is available under the SIL open font licence; the applicable customisation declares that "Hariphunchai" and "Lamphun" are reserved font names. The font file is lamphun.otf and what I have used as 'source' code to build the font is an untidy mess assembled in lamphun.zip:

RôleNameRemarks
GlyphsHariphunchai.otf

A version of the font dated 5 May 2014, taken from SourceForge. The 'unique identifier' in the name table is FontForge : Hariphunchai : 5-5-2014. There were later .sfd and .fea files at the same location, but at best they offered improved glyphs compared to Lamphun.

This is the file that defines the 'early' Hariphunchai font as used on this web page.

OTL tableslamphun.txt

This defines a font with the same glyph numbering, but with blank glyphs. I then replace 7 tables in the early Hariphunchai font with tables from this new font:

NameReason
name
Rename font to comply with the licence, and record appropriate licensing and history information.
GSUB
Include lookups to undo rendering damage by the USE. Position medial ra using feature pref. Move other lookups from ccmp to blws, so that they are applied when missyllabification by USE is no longer an issue. Choose appropriate glyphs when there are level 2 subscripts with vocalic function. Handle mai kam, subscript consonants on NYA, and N.WAA and N.HAA.
GPOS
Add some mark-to-mark positioning. Use dist to restore advance widths of spacing subscripts.
GDEF
Correct a few glyph categorisations.
cmap
Add mappings for control characters so that rendering engine damage can be repaired.
OS/2
Allow more complex OTL operations. Declare greater line depth to enable more rapid rendering. Blanked vendor ID.
head
Change font revision and modification time.
Change logfontlog.txtOnly for Lamphun.
Make filelamphun_makefile The compiler invoked by '~/oft/parse' is my DIY font compiler.

It is likely that I will create a variant coloured to indicate spelling.

Hariphunchai Font

There are two versions of the font used on this page. The fonts themselves are distinguished by the unique font identifiers in their name tables.

The early version of the Hariphunchai font, whose 'unique identifier' is FontForge : Hariphunchai : 5-5-2014, is available as Hariphunchai.otf both on SourceForge and within the Lamphun source zip file. The reversibly generated WOFF file is available here.

The 2019 version, whose 'unique identifier' in the name table is TragerStudio : Hariphunchai : 19-5-2019, is available as Hariphunchai4.otf on SourceForge. The reversibly generated WOFF file is available here.

The licence is available on Source Forge. The WOFF files, being derivative works, are licensed under the same licence. As the original OTF files can be recovered from them, they preserve the font names.

Introduction Control Panel The Tests Notes My Fonts