Introduction Control Panel The Tests Notes My Fonts

Introduction

Purpose

This page provides a set of test cases for a Tai Tham renderer. It has been compiled with a view to putting the 'Universal Shaping Engine' through its paces for the Tai Tham script. This set of tests is incomplete in that it does not directly give the correct renderings, although some one in possession of the source documents could visually check them.

I was originally requested to provide words of one syllable for such a test. By syllable, I understand an Indic syllable of the form C+(M*V*M*C*M*)* with a single base consonant. (M = miscellaneous mark). I include cases where the rôle of the base consonant is played by something other than a letter. The post-vocalic consonants occur not only in other SEA Indic scripts such as the Khmer script and Lao script (as in the use of ຽ U+0EBD LAO SEMIVOWEL SIGN NYO in the Lao writing system), but also in Tibetan.

However, many of the interesting cases occur in the second syllable of a word, and certain initial syllables are obligatorily followed by more characters of the word. I have therefore also supplied longer words when a conceivable problem would not appear in a word of one syllable.

The dependent vowel AA (U+1A63 and U+1A64) may form the base of its own little stack of dependent marks. Manual line breaking may also separate it from its base consonant. I have nevertheless counted it as part of the same syllable as a base consonant; the two stacks frequently interact in Northern Thai, with MAI KANG migrating to or towards the base consonant and interacting with its dependents.

The page was originally set up to use either my own stick font, 'Da Lekh', which is based on Deja Vu Sans, or the cut down version, 'Da Lekh Seri'. The Da Lekh font is intended to be suitable for use in preparing (but perhaps not publishing) Tai Tham text. It therefore includes work-arounds for known rendering engine problems. The Da Lekh Seri font deliberately does not include such work-arounds. You may be interested in using or examining my fonts for your own purposes.

I have added two other families of fonts. These fonts are available under the SIL Open Font license. The font 'A Tai Tham KH' relies only on the ccmp feature being enabled; it handles all Indic rearrangement itself.

The Hariphunchai font is an OpenType Layout font that looked promising when used with the South-East Asian shaper of HarfBuzz. Development seems to have stopped when HarfBuzz switched Tai Tham to its implementation of the Universal Shaping Engine (USE). The code for this font is available on SourceForge and there is further documentation elsewhere. I have added work-arounds and a few further touches to enable it to work under the USE; I have dubbed the resulting font 'Lamphun'.

Feel free to adapt this web page to add your own fonts and test cases.

Content and Layout

The test text is given in the table columns headed 'Text', and is the content of the first table cell in table rows with class tst1. Two further columns, headed 'Encoding' and 'Hacked via ASCII', are automatically derived from this text as the page is loaded. The hacked column is intended to show users how the text should look, though it too may suffer from rendering engine limitations. The font used for this column is the member of the Da Lekh font family last selected to display the first column. Ideally, I would include images of the text from credible sources, but that may cause copyright problems, for the Unicode Consortium wishes to be able to use this document for commercial purposes.

The 'Hacked via ASCII' column contains an unambiguous transliteration to ASCII of the Tai Tham text in the column headed 'Text'. Members of the Da Lekh font family contain an open type font feature, Stylistic Set 2, whose enabling may cause it to render the transliteration as the original Tai Tham text as it is intended to be rendered. For more details, see the style sheet in the source of this page.

The 'Meaning and Pronunciation' column is given to identfy the word given as an example. There may be better glosses, and pronunciation can vary extensively within a nominal language. The letter RA is particularly variable between /l/, /h/ and even /r/, and there are regional variations as to whether vowel length distinction exist and, if so, whether they are phonemic. For the Tai languages the pronuciation is given using IPA, while Pali is simply transliterated (as Pali). I have omitted tone, as phonetic tone is also quite variable. Where no indication to the contrary is given, the Tai pronuciation given approximates that of Chiangmai.

The test words may, in principle, be extracted quite simply from this web page. Each test 'word' is the content of the first cell in each row whose class is tst1. For convenience, I have extracted the first two cells in such rows, along with titles, to a CSV file.

Font Testing

This page is intended as a rendering engine test, rather than as a font test. However, you may modify this page to try out your own font. The necessary changes will be confined to the style sheet in the source code of this page, unless you use a different ASCIIfication scheme, in which case look at the usage of javascript variable ss02_hack.

My Rendering Performance

When this page was initially composed, in June 2015, the Da Lekh font currently mostly worked for the Tai Tham script in the Firefox and Chrome broswers. It worked in them because they use HarfBuzz to render the Tai Tham script. Since then, the HarfBuzz rendering engine used for Tai Tham has been brought into line with the Universal Script Engine, with a consequent dramatic fall in the rendering performance for the Tai Tham script.

The solution to this problem was to add numerous work-arounds to the font. These work-arounds have mostly restored performance, the main exceptions being subtle positioning errors where mark to base positioning is ignored and the default mark position is used instead.

The quality of the 'Hacked via ASCII' column varies from browser to browser and operating system to system, and also varies over time. For Internet Explorer 11, Microsoft Edge and for the HarfBuzz-based browsers Firefox and Chrome, it is actually the best rendered column. (Script-specific rendering engines have a tendency to make the achievement of advanced script features dificult rather than easy; Tai Tham has many 'advanced' features.)

Font Peculiarities

Traditionally, the consonants used in neither Pali nor Sanskrit did not have subscript forms. However, one significant text book, the 'big blue book', provides a subscript form for LOW FA for use in loans from English. However, this form is cramped and ugly, which goes against the tradition of Lanna script writing. The MFL treats the stroke distinguishing HIGH KXA, LOW KXA, LOW SA, LOW FA and LETTER UU from HIGH KHA, LOW KA, LOW CA, LOW PA and LETTER U as a diacritic. The Da Lekh font follows this interpretation, and leaves this diacritic above the baseline when the letter is subscripted.

If you have difficulty reading the Da Lekh fonts, you may find it useful to consult their glyph gallery.

Control Panel

Font Selection

Da Lekh Seri Da Lekh
Da Lekh Si Seri Da Lekh Si
If using Lamphun or one of the four fonts above, the font is at Version . (The version number is stored in the character U+EAE7 in addition to the font file's
name
and
head
tables.)
System default Guest font
A Tai Tham KH (ccmp defaults) A Tai Tham KH (ccmp enabled)
Hariphunchai Lamphun

Specimen text: ᩋᩁᩉᨶ᩠ᨲᩮᩣ arahanto 'arahats'

Table Completion

It is possible that the columns recording code points and showing how the text should look may not have been generated. If that happens, try clicking this control button:

Debugging messages

Tests

Vowel Combinations
Other Explicit Coding Sequences
Other Examples from L2/07-007R
Mai Kang Lai
Ligature NAA
English Loanwords
Tai Lü Blends
More Tai Lü
And Not
Potential Surprises
Test and Tell
Rendering Challenges from MFL

Vowel Combinations

These vowel combinations are taken from Revised proposal for encoding the Lanna script in the BMP of the UCS, ISO/IEC JTC1/SC2/WG2/N3207R, L2/07-007R (Everson, Hosken & Constable). Changes have been rung on the initial consonants to check for silly omissions.

A hyphen in the pronunciation indicates a syllable-final consonant that would be specifed by a subscript consonant or following orthographic syllable.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨠᩫN/A
/ko-/
Section 5 No. 1. This sequence does not form a whole word. An example may be seen in a word for 'danger'.
ᨣᩴthen, and
/kɔː/
Section 5 No. 2
ᨧᩢ(irrealis marker)
/tɕaʔ/
Section 5 No. 4
ᨲ᩠ᩅᩫᩡ to prevaricate
/tuaʔ/
Section 5 No. 5
ᨷ᩠ᩅᩫlotus
/bua/
Section 5 No. 6
ᨠ᩠ᩅN/A
/kua-/
Section 5 No. 7. This sequence does not form a whole word. An example may be seen in one of the words for 'big'.
ᨡᩬᩴ to request
/kʰɔː/
Section 5 No. 8
ᨠᩬN/A
/kɔː-/
Section 5 No. 9. This sequence does not form a whole word. An example may be seen in the fuller spelling of the word for 'belongings'.
ᨦᩡ to split up
/ŋaʔ/
Section 5 No. 10
ᨠᩣcrow
/kaː/
Section 5 No. 11
ᨴᩤto paint
/taː/
Section 5 No. 12
ᩌᩣᩴ to sprinkle
/ham/
Section 5 No. 13
ᨣᩤᩴword
/kam/
Section 5 No. 14
ᨳᩥto pretend
/tʰiʔ/
Section 5 No. 15
ᨺᩦboil (n.)
/fiː/
Section 5 No. 16
ᨩᩧmoist
/tɕɯʔ/
Section 5 No. 17
ᨾᩨhand
/mɯː/
Section 5 No. 18
ᨵᩩmonk
/tʰuʔ/
Section 5 No. 19
ᨦᩪsnake
/ŋuː/
Section 5 No. 20
ᨲᩮᩡto kick
/keʔ/
Section 5 No. 21
ᨽᩮdanger
/pʰeː/
Section 5 No. 22
ᨤᩯᩡ to limp along
/kʰɛʔ/
Section 5 No. 23
ᨧᩯcorner
/tɕɛː/
Section 5 No. 24
ᨸᩮᩬᩥᩡmud
/pɤʔ/
Section 5 No. 25
ᨶᩮᩬᩥ (final particle for commands and entreaties)
/nɤː/
Section 5 No. 26.
ᨠᩮᩬᩨᩡN/A
/kɯaʔ/
Section 5 No. 27
ᨠᩮᩬᩨ
/kɯa/
Section 5 No. 28
ᩁᩮᩢᩣwe
/hau/
Section 5 No. 29
ᨾᩳdrunk
/mau/
Section 5 No. 30. This example is not taken from the MFL, which does not use this vowel symbol.
ᨠᩮᩣN/A
/ko:/
Section 5 No. 31. This is very rare in monosyllables, but is quite common at the end of monks' names, e.g. Adittadhammo.
ᨹ᩠ᨿᩮᩡ a type of sound
/pʰiaʔ/
Section 5 No. 32
ᨻ᩠ᨿᩮflower
/pia/
Section 5 No. 33
ᨠ᩠ᨿN/A
/kia-/
Section 5 No. 34. This sequence does not form a whole word. An example may be seen in a spelling of the word for 'city'.
ᨾᩮᩬᩥᩋᩡmucus
mɯaʔ/
Section 5 No. 35. (2 syllables)
ᨠᩖᩮᩬᩥᩋsalt
/kɯa/
Section 5 No. 36. (2 syllables)
ᩈᩰᩡ to practice
/soʔ/
Section 5 No. 37
ᨾᩰbig
moː/
Section 5 No. 38
ᨪᩰᩬᩡ to gouge out
/sɔʔ/
Section 5 No. 39
ᨩᩢ᩠ᨿvictory
/tɕai/
Section 5 No. 40
ᨶᩲin
/nai/
Section 5 No. 41
ᨢᩱ to expose
/kʰai/
Section 5 No. 42
ᨴᩱ᩠ᨿThailand
/tai/
Section 5 No. 43
ᨠᩮᩬᩨᩡ
Khün /kɤʔ/
Section 5.3 No. 22
ᨠᩮᩬᩨ
Khün /kɤː/
Section 5.3 No. 23
ᨠᩰᩢ
Khün /ko-/
Section 5.3 No. 26
ᩈᩘ First syllable of compounds of saṅgha.
/saŋ/
Section 5.3 No. 29. Apparently not a possible final syllable, but can be left stranded as a result of line-breaking.
ᨴᩢ᩠ᨦwhole
/taŋ/
Section 5.3 No. 30
ᩌᩥᩴedge
/him/
Section 5.3 No. 31 (Example from Apiradee p53, but different language, different pronunciation, i.e. not /-iŋ/.)
ᨠᩥ᩠ᨦ
/kiŋ/
Section 5.3 No. 32
ᨠᩢ᩠ᨾ
/kam/
Section 5.3 No. 34
ᨠᩢᨾ
/kam/
Section 5.3 No. 35
ᨯᩭmountain
/dɔːi/
Section 5.3 No. 36

Other Explicit Coding Sequences

Other explicit coding sequences are given in Revised proposal for encoding the Lanna script in the BMP of the UCS, ISO/IEC JTC1/SC2/WG2/N3207R, L2/07-007R (Everson, Hosken & Constable), and these are recorded here. Amended and exploratory material is highlighted in yellow; it is not vouched for by the proposal. The remarks are my own.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
᪓᩠ᨴthrice
/saːm tiː/
Section 2
ᨲ᩵ᩣ᩠ᨦ᩻ different in my view
/taːŋ taːŋ/
Section 7
ᨳ᩠ᨶ᩻ᩫᩁpath
/tʰănon/
Sections 7 and 14.6 (2 syllables - the second is a single character).
ᨡᩢ᩶᩻ᩬᨦ belongings
/kʰau kʰɔːŋ/
Section 7 (2 syllables - the second is a single character)
ᨡᩮᩢ᩶ᩣᨡᩬᨦ belongings
/kʰau kʰɔːŋ/
Section 7 (3 syllables - the third is a single character)
᪭ᩣ elephant
/tɕaːŋ/
Section 11
ᩉ᩠ᨶᩦto flee
/niː/
Section 14.1
ᨤ᩠ᩅᩯ᩶ᩁ to blockade
/kʰwɛːn/
Section 14.2 (2 syllables - the second is a single character)
ᩉ᩠ᩅᩫ head
/hua/
Section 14.3
ᨯᩢ᩵ᨦ᩠ᨶᩦ᩶ like this
/daŋ niː/
Section 14.4 (2 syllables)
ᩉᩥ᩠ᨶstone
/hin/
Section 14.5
ᨷ᩠᩵ᨾᩦ to not have
/bɔː miː/
Section 14.6. The proposal lists MAI KANG as a code point, but it is visually dropped in this compound. I presume the renderer is not intended to suppress the appearance of the character. The upper row drops the MAI KANG from the encoding, so is not the encoding intended, while the lower row uses the stated encoding. My fonts fail to arrange the marks above properly; arrangement is a proper challenge for a Tai Tham font. The phonetic syllable boundary is part of the context!
ᨷᩴ᩠᩵ᨾᩦ to not have
/bɔː miː/
ᨲᩣ᩠ᨾ to follow
/taːm/
Section 14.7
ᨻ᩠ᨿᩣ᩠ᨵᩥ sickness
/păɲaːt/
Section 14.8
ᨸ᩠ᩃ᩠ᨿ᩵ᩁ to change
/pian/
Section 14.9 (2 syllables - the second is a single character)
ᨾᩯ᩠᩶ᨶ᩠ᩅ᩵ᩣ even though
/mɛːn waː/
Section 14.9. A sophisticated font might transpose the tone marks. The phonetic syllable boundary should be part of the context.
ᨾᩯ᩠᩶ᨶ᩠ᩅ᩵ᩣ even though
/mɛːn waː/
Same as above, but normalised, so not the code point sequence in the proposal. Proposal explicitly stated SAKOT was to have ccc=0, not 9, but ccc=9 was quietly inserted in draft properties and not noticed until too late.
ᩈ᩠ᩅᩯ᩵ to butt in
/swɛː/
Section 14.10
ᩈᩯ᩠᩵ᩅ to embroider
/sɛːw/
Section 14.10 (but the proposal has vowel and tone the wrong way round)
ᩈᩯ᩠᩵ᩅ to embroider
/sɛːw/
As above, but normalised, so very much not the codepoint sequence in the proposal.
ᩈ᩵ᩯ᩠ᩅ to embroider
/sɛːw/
As above, but uncorrected. Arguably, the rendering is unconstrained.
ᨿᩪ broom, whisk
/ɲuː/
Section 15 No. 1
ᨾᩦ to have
/miː/
Section 15 No. 2
ᩉ᩠ᨾᩪ pig
/muː/
Section 15 No. 3
ᩉ᩠ᨾᩦ bear (n.)
/miː/
Section 15 No. 4
ᨹ᩠ᩅᩫhusband
/pʰua/
Section 15 No. 5
ᩉ᩠ᩃᩬᩴ᩵ to cast (in metal)
/lɔː/
Section 15 No. 6
ᨾᩣto come
/maː/
Section 15 No. 7
ᩉᩱ᩵to hit
/hai/
Section 15 No. 8
ᨾ᩠ᨿ Section 15 No. 9
ᩅ᩠ᨿᨦ city
/wiaŋ/
Section 15 No. 10 (2 syllables - the second is a single character)
ᩉᩣ᩠ᨾ to carry by the handles
/haːm/
Section 15 No. 11
ᨯᩣᩴblack
/dam/
Section 15 No. 12
ᨡᩮ᩠ᩅ Section 15 No. 13
ᩉ᩠ᨾᩣdog
/maː/
Section 15 No. 14
ᨠᩕᩣ᩠ᨸ to prostrate oneself
/kʰaːp/
Section 15 No. 15. The later addition of SIGN BA to the repertoire makes the correct final consonant here unclear.
ᨻᩕ᩵ᩣᩴ indefatigable
/pʰam/
Section 15 No. 16
ᨠᩕᩬᨦ garland; Mekong
/kʰɔːŋ/
Section 15 No. 17 (2 syllables - the second is a single character)
ᩈᩕᩫᨾ᩠ᨱ᩺ ascetic
/sălom/

Section 15 No. 18. If the word is interpreted as having two phonetic syllables, then the medial consonant comes between an implicit vowel and an explicit vowel.

(2 syllables)

ᩈᩕ᩠ᩅᩫᨾ to embrace
/săluam/
Section 15 No. 19 (2 syllables - the second is a single character). Ignore final ; it makes the spelling ungrammatical. However, a few such spellings do occur in the MFL.
ᩈᩕ᩠ᩅᨾ to embrace
/săluam/
Spelling of above in the MFL, so this form's encoding is not given in the proposal.
ᨯᩮᩬᩨᩁmonth
/dɯan/
Section 15 No. 20 (2 syllables - the second is a single character)
ᩁᩮᩬᩨᩋboat
/hɯa/
Section 15 No. 21 (2 syllables - the second is a single character)
ᩉ᩠ᩃᩮᩬᩨᩋto exceed
/lɯa/
Section 15 No. 22 (2 syllables - the second is a single character)
ᩉ᩠ᨾ᩵ᩣᩴto eat
/mam/
Section 15 No. 23
ᩈ᩠ᨾᩬᩥ᩻ very level(?)
/sămɤː sămɤː/
Section 15 No. 24. Encoding as given, omitting SIGN E, which is depicted in the proposal. Moreover, the word appears to be a misreading of the next but one.
ᩈ᩠ᨾᩮᩬᩥ᩻ very level(?)
/sămɤː sămɤː/
Section 15 No. 24. SIGN E restored to encoding.
ᩈ᩠ᨾ᩻ᩮᩬᩥ level (adj.)
/sămɤː/
Probable reading of above. Consequently, the encoding is not vouched for by the proposal. Phonetically, this is one or two syllables, depend on how one counts.
ᩉ᩠ᨾᩮᩬᩨᨦ mine (n.)
/mɯaŋ/
Section 15 No. 25 (2 syllables - the second is a single character)
ᩉ᩠ᨿᩮᩬᩨᨦ to despise
/ɲɯaŋ/
Section 15 No. 26 (2 syllables - the second is a single character)
ᩉ᩠ᨾᩫ᩵ᩁ winter melon (Benincasa hispida)
/mon/
Section 15 No. 27 (2 syllables - the second is a single character)
ᩉ᩠ᩃᩣ᩠ᨿmany
/laːi/
Section 15 No. 28
ᩉ᩠ᩃᩮᩬᩨᨦyellow
/lɯaŋ/
Section 15 No. 29 (2 syllables - the second is a single character)

Other Examples from L2/07-007R

The actual coding sequences to be used here are open to challenge.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨠᩬᩢᩃ᩠ᨼ᩺golf
/kɔp/
Section 2. The position of RA HAAM is debatable - cf. Thai กอล์ฟ. The first example places it on the second consonant, the second on the first. The third then normalises the spelling of the second. Note that this word consists of two orthographic syllables.
ᨠᩬᩢᩃ᩠᩺ᨼgolf
/kɔp/
ᨠᩬᩢᩃ᩠᩺ᨼgolf
/kɔp/
ᨠᩕᩣ᩠ᨼ graph
/kaːp/ (?)
Section 2.
ᨴᩬᨼ᩠ᨼᩦ᩵toffee Section 2 (2 syllables)
ᨠᨽᩚ pregnant
/kap pʰa?/
Section 4 (2 syllables - the first is a single character)
ᩈᨱᩛᩣ᩠ᨶ shape
/san tʰaːn/
Section 4 (2 syllables - the first is a single character)
ᩁᨭᩛᨷᩣ᩠ᩃ government
/rat tʰa baːn/
Section 4 (3 syllables)
ᩁᩢᨭᩛᨷᩣ᩠ᩃ government
/rat tʰa baːn/
Section 4 (3 syllables)
ᩈᨻᩛ omniscience
/sap paʔ/
Section 4 (2 syllables - the first is a single character)
ᩋᨾᩛ mango
/ʔam paʔ/
Section 4 (2 syllables - the first is a single character)
ᩁᩣᨩᨽᩢ᩠ᨮ Rajabhat
/la:t tɕa pʰat/
Section 4 (3 syllables)
ᨷᩢᨱ᩠ᨻᨷᩩᩁᩩᩈ disciple
"banop burus"
Section 4 (5 syllables)

Mai Kang Lai

The mai kang lai character can be challenge to a font. The character has a wide range of behaviours. It can behave as a spacing final character (as in modern Tai Khün fonts) to a repha-like character, the old-fashioned behaviour seen in Tai Khün, Thailand and Laos. The MFL dictionary shows an intermediate behaviour, where marks above the following base consonant cause it to be positioned within the previous syllable. This is the style employed by the Da Lekh font.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨴᩘ᩠ᩃᩣ᩠ᨿall
/taŋ laːi/
The ascending tail of SAKOT LA prevents the MAI KANG LAI moving on to a subsequent syllable/word. This prevents fonts exploiting the rphf feature of the Universal Shaping Engine.
ᩈᩘᨥᩮᩣ Nominative of Pali saṅgha
<saṅgho>
(2 syllables)
ᩁᩘᩈᩦray
/raŋ siː/
(2 syllables)

Ligature NAA

This is mostly a test for readers!

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨶᩣᩴto lead
/nam/
ᨾᨶᩮᩣ heart, mind
/maʔ no:/
(2 syllables)
ᨶᩮᩢᩣ to sew a long stitch
/nau/
Some fonts may fail here because they handle the ligature in pstf; this worked with HarfBuzz until pstf was moved to before Indic rearrangement.
ᨶᩣ᩠ᨿ leader
/na:i/
ᨶ᩵ᩣ᩠ᨶ Nan
/na:n/
ᩍᨶ᩠ᨴᩣ Indra
/ʔin ta:/
The more usual form lacks U+1A63. (2 syllables - first has one character.)
ᩋᩫᨶ᩠ᨲᩕᩣ᩠ᨿ danger
/ʔon tʰaʔ la:i/
(2 syllables)
ᨶ᩶ᩣᩴwater
/nam/
This can be surprisingly hard to achieve in a font. Logic designed to stop Arabic vowel marks wrongly interacting has to be circumvented so that the two marks will interact!
ᨶ᩠ᩅᩣ᩠ᨷ to falsely accuse
/nwaːp/
MFL p352
ᨴᩤᩴᨶ᩠ᩅ‌ᩣ᩠ᨿ to foretell
/tam nwaːi/
NTDPLM p285. Sometimes the writer wants to avoid the ligature! (2 syllables)
ᨲ᩵ᩣᩴᨶ᩠ᩅᩣ᩠ᨿ to foretell
/tam nwaːi/
MFL p320, but only in transliteration. Shape of second syllable (ligature plus subscript consonant) is attested elsewhere. (2 syllables)
ᨶ‌ᩣ rice field
/naː/
An isolated test of the ZWNJ feature above. This form is to be expected in texts teaching the writing system.
ᩉ᩠ᨶ᩶ᩣface
/naː/
Note that the SAKOT prevents ligature formation.

English Loanwords

These examples are taken from the 'big blue book' pp151-6. Some of these renderings are unusual compared with the native tradition, and are included for that reason. The position of RA HAAM is particularly noteworthy.

The pronuciations given are guesswork where Siamese practice and Lanna script orthography conflict.

TextMeaning and PronuciationEncoding Hacked via ASCII Remarks
ᨠᩯᩢ᩠ᩈgas
/kɛs/
ᨴᩕᩯ᩠ᨠᨴᩮᩬᩥᩁ᩺ tractor
/tʰɛːk tʰɤː/
Slightly complicated set of consonants in first syllable. (2 syllables)
ᨶᩰᩫ᩠᩶ᨲnote
/noːt/
Vowel combination not listed above
ᨷᩕᩰᨴᩦ᩠ᨶ protein
/pʰoː tiːn/
Tests reordering - the vowel symbol should appear first. (2 syllables)
ᨼᩥᩅ᩠ᩈ᩺fuse
/fiu/
ᩈᨲᩯᨾ᩠ᨷ᩺ postage stamp
/sa tɛːm/
(3 syllables)
ᩈᩮᩥᩁ᩠᩺ᨷ to serve
/sɤːp/
Compare the placement of RA HAAM with the previous word. The same contrast may be seen on p155 of the 'big blue book'. (2 syllables)

Tai Lü Blends

These examples are all taken from Graphic Blends at SEAsite. The pronunciations given are Tai Lü.

TextMeaning and PronunciationEncoding Hacked via ASCII Remarks
ᨴᩢ᩵ᩗᩣall
/taŋ laːi/

This word, in some of its various forms, seems to be the only word containing U+1A57 TAI THAM CONSONANT SIGN LA TANG LAI.

I withdraw my previous, surprised, reading of the word shown as containing NGA as the base consonant.

ᨡᨶ᩠ᨵᩣ spell (magic)
/kʰan tʰaː/
(2 syllables, first a single character)
ᨣ᩠᩶ᨯᩦ  okay
/kɔː diː/
A non-breaking space has been appended to avoid truncation. A sophisticated font would slide the vowel under the tone mark.
ᨷ᩠᩶ᨾᩣ to not come
/bau maː/
ᨷ᩠᩶ᨾᩣ to not come
/bau ma:/
Same again, but normalised.
ᨷ᩠᩶ᨯᩣ᩠ᨿ to not have
/bau da:i/
ᨧᩢ᩠ᩅᩤ How big an area?
/tsak va:/
ᩈᩮ᩠ᩓ᩠ᩅ deceased
/se: lɛu/
ᨴᩯ᩠ᨶᩳ Really, is that true?
/tɛː nɔː/
ᩓ᩠ᨾᩣ to look this way
/lɛ maː/
ᨠᩮ᩠ᩈᩣ hair
/keː saː/
ᨻᩱ᩠ᨾᩣ to come and go
/pai maː/
ᩈᩮ᩠ᩅ᩶ᩤif
/seː vaː/
ᩅᩮ᩠ᩃᩣ time
/veː laː/
Also in Apiradee p49
ᨵᩤ᩠ᨲᩩ physical body
/tʰaː tuʔ/
The vowel on the final consonant is inescapable - there is no way of rewriting the orthographic syllable to escape the combination.
ᨩ᩠ᩓ in conclusion
/tsălɛː/
ᨻᩭ᩠ᩅ᩻ᩣ because
/pɔi vaː/
The MAI SAM tags the WA as starting a chained syllable. The spelling presumes that a font can decide that the subscript WA goes to the left of the MAI KOY.
ᩈᩫ᩠ᨦᩣ᩠ᨶ world
/suŋ saːn/

More Tai Lü

These words are taken from the MA thesis 'Development of Tai Lue Scripts and Orthography' by Apiradee Techasiriwan (อภิรดี เตชะศิริวรรณ). The pronunciations given are Tai Lü. Comparative material from elsewhere is highlighted in yellow.

TextMeaningEncodingHacked via ASCII Remarks
ᨻᩬᩳ᩵father
/pɔː/
p3. Vowel combination not listed above. Spelling is archaic.
ᩈᨷ᩷ᩣ᩠ᨿ content, well
/săbaːi/
p3. Rare example of a word with this tone mark. (2 syllables, first is a single character.)
ᩅ᩠ᨿᩙcity
/weŋ/
p4.
ᨣᩪ᩺ person
/kun/
p4. Unetymological, phonetic spelling. The mark above is serving as a final consonant, not a cancellation mark.
᪁᪂ ᨻᩢ᩠ᨶ᩻ᩣ Sipsongpanna
/sip sɔːŋ pan naː/
p10. (Number precedes syllable). Example of mai sam marking a double-acting consonant.
ᨻᩱ᩻ᩣ᩠ᨿ to go to the location
/pai paːi/
p47.
ᨩ᩠ᨿᩙᨲᩩᩴ Kengtung
/tseŋ tuŋ/

p53. (2 syllables)

Possibly the Chengtung on the Vietnamese border.

ᩅᨲᩛᩩ matter
/wat tʰu/
p49. U+1A5B represents subscript HIGH THA rather than high RATHA. This is an issue for a font's repertoire of conjuncts.
ᩅᨲ᩠ᨳᩩ matter
/wat tʰu/
The Northern Thai writing of the above. Perhaps this should be rendered as the above when the language is Tai Lü or Lao.
ᨯ᩠ᨿᩴone
/deu/
p53. Assuming the word has TAI THAM SIGN MAI KANG rather than unencoded *TAI THAM CONSONANT SIGN FINAL WA.
ᩉ᩠ᨶᩦᩢ᩶debt
/niː/
p57.
ᩁᩮᩂ᩠ᨠ auspicious occasion
/hɤːk/
p79.
ᩁ᩠ᨿ᩺to learn
/heːn/
p118.

And Not

The word typically meaning 'and...not' or 'and...then' may be written with a chained syllable, and this may present challenges to renderers. The form of the letter representing /b/ in a chained syllable presented an encoding challenge. N3207R proposed using the sequence <SAKOT, BA> for it, and using <SAKOT, HIGH PA> for the subscript form corresponding to both BA (common) and HIGH PA (extremely rare) in its rôle as a final (Thai sakot) consonant. During the ISO process, a new character was introduced instead for the special form, SIGN BA, and it is widely assumed that <SAKOT, BA> represents the usual subscript form corresponding to BA, both as a sakot consonant and in the Pali /mp/ and /pp/ intervocalic clusters.

When syllables are chained, shared vowel symbols are not repeated. This leads to ambiguity as to which symbol is dropped.

All the spellings in the table below represent the same careful pronuciation in Northern Thai, namely /kɔː bɔː/. The Tai Lü forms are written with different marks and pronounced with different vowels, but use the same two consonant forms in the stack.

TextMeaningEncodingHacked via ASCII Remarks
ᨣᩴᨷᩴ᩵and...not, then...not Full form - 2 syllables, and arguably 2 words.
ᨣᩴᨷᩴdo. Univerbated form in MFL (2 syllables)
ᨣᩝᩴ᩵do. First mai kang dropped.
ᨣᩴᩝ᩵do. Second mai kang dropped.
ᨣᩝᩴdo. First mai kang dropped.
ᨣᩴᩝdo. Second mai kang dropped.

Potential Surprises

These words behave slightly oddly.

TextMeaning and PronunciationEncodingHacked via ASCII Remarks
ᩓᩯ very much
/lɛː/
Redundant vowel mark
ᩐᩣto take
/ʔau/
Vowel on independent vowel
ᩐ᩵ᩣvery hot
/ʔau/
Vowel and tone mark on independent vowel
ᨯᩪᩕᩣ listen to me
/duː haː/
Medial consonant between explicit vowels
ᨯᩮᩬᩥᩁᨹᩫᩖᨣᩩᨱ᩺ March
/dɯan pʰon laʔ kun/
NTDPLM p259. Double-acting medial consonant with implicit vowel after it. (3 syllables)
ᨻᩣᨷᩰᩖ Pabol (sic)
/paː boːn/
A mistake for Spanish Pablo seen on Wikipedia, but in light of the above a renderer should render it as intended.
ᨶ᩶ᩭ little
/nɔːi/
Tai Khün spelling.
ᩉᩖ᩠ᩅᨦ big
/luaŋ/
Medial consonant in middle of stack. The proposal classified the final consonant of the stack as a 'medial vowel'. (2 syllables, second a single character)
ᩉᩖ᩠ᩅᩣiron
/lwaː/
Medial consonant in middle of stack. In this case, the WA is very much a consonant.
ᨻᩕ᩠ᨿᩮᩡ a type of sound
/pʰiaʔ/
Preposed medial consonant in middle of stack along with a preposed vowel.
ᨠᩩ᩶ᩣ᩠ᨶ᩠ᨦ to prosper
/kaːn kuŋ/
The first word in the MFL! Note that there are two final consonants. The SIGN AA prevents a phonetic spelling.
ᩋᩢ᩠ᨭᩛ a satang coin
/ʔat/
Two consonants in final consonant position (3 consonants in total)
ᩆᩢᨠ᩠ᨯᩥ᩺ rank
/sak/
Consonant-killer also killing explicit vowel above (2 syllables)
ᩆᩢᨠ᩠ᨯᩥ᩼ rank
/sak/
Same again, but with KARAN instead of RA HAAM. Some people are using KARAN in Northern Thai instead of RA HAAM! (2 syllables)
ᨾᩉᩣᩉᩥᨦ᩠ᨣᩩ᩺ giant fennel
/ma haː hiŋ/
Consonant-killer also killing explicit vowel below (4 syllables)
ᨾᩉᩣᩉᩥᨦ᩠ᨣᩩ᩼ giant fennel
/ma haː hiŋ/
Same again, but with KARAN. (4 syllables)
ᩆᩣᩈ᩠ᨲᩕ᩺ science
/saːt/
Consonant-killer also killing medial consonant. NT spelling. (2 syllables)
ᩈᩣᩈ᩠ᨲᩕ᩼ science
/saːt/
Consonant-killer also killing medial consonant. Tai Khün spelling. (2 syllables)
ᩁᩪ᩠ᨷimage
/huːp/
This spelling is archaic in Northern Thailand (but current in Tai Khün)
ᨻᩦ᩠᩵ᨶᩬ᩶ᨦ relatives
/piː nɔːŋ/
(2 syllables - second is a single character)
ᨸᩢᩣmouth
/paːk/
MAI SAT can serve as a final consonant, /k/. This leads to yet more formal vowel combinations.
ᩃᩪᩢ child (progeny)
/luːk/
ᨯᩬᩢflower
/dɔːk/
ᨯᩬᩢᩡflower
/dɔːk/
MAI SAT can even be reinforced by SIGN A.
ᨻ᩠ᩅᩢᩡgroup
/puak/
ᨲᩯ᩠ᨶᩬᩴ᩵ wasp, hornet
/tɛːn tɔː/
A single orthographic syllable.
ᨲᩬᩴ᩵͏ᩯ᩠ᨶ wasp, hornet
/tɔː tɛːn/
Should normally be visually identical with the above - the font may be too crude. However, when font colouring is supported, the vowel below should be coloured differently in the Da Lekh Si font; that font is intended to reveal the order of characters.
ᨲᩬᩴ᩵ᩯ᩠ᨶ wasp, hornet
/tɔː tɛːn/
Would it be legitimate for this to render differently to the above?
ᩈ᩠ᨶᩫ᩻street
/sănon/
The mai sam represents the final consonant in addition to the epenthetic vowel.
ᨠᨾᩛᩦ scripture
/kam piː/

The surprise is that U+1A5B had InSC=Consonant_Final until Unicode 10.0.

(2 syllables - the first is a single consonant in the first example.)

ᨶᩥᨻᩛᩣ᩠ᨶ nirvana
/nip paːn/
ᨵᨾᩜᩥᨠ saintly
/tʰam miʔ kaʔ/
Chiengtung p166. It has 3 syllables - the second is of interest. It may show a problem with U+1A5C having InSC=Consonant_Final until Unicode 10.0.
ᩈᨵᩩ᩠ᨷ stupa(?)
/sătʰup/
Chiengtung p166. (2 syllables - the first is a single letter.) This shows the issue with placement of the vowel and 'sakot' consonant also applies to this explicit vowel.
ᩋᩣᨴᩥᨲ᩠ᨲᨵᨾᩜᩮᩣ Adittadhammo
Pali <Ādittadhammo>
Chiengtung p264. (5 syllables)
ᨬᩣᨱᨵᨾᩜᩮᩣ Nyanadhammo
Pali <Ñāṇadhammo>
Chiengtung p238. The individual referred is not the one hyperlinked to. (4 syllables)
ᩅᩥᩈᩮ᩠ᩈ special
/wiʔ seːt/
Note the lack of a ligature. (2 syllables)
ᨢ᩶ᩣ slave
/kʰaː/
Same character order as in Thai and Lao!
ᩈᩣᩈᨶᩣ religion
/saː saʔ naː/

Full (5 chars) and contracted (7 chars) forms.

(3 and 2 syllables respectively)

ᩈᩣᩈ᩠ᨶ᩻ᩣ religion
/saː saʔ naː/
ᩈ᩠ᨶ᩻ᩮᩢ᩶ᩣ javelin
/sănau/
ᨲᩦ͏ᩣ᩠ᨿ to beat to death
/tiː taːi/
Uses CGJ as an invisible MAI SAM to stand for the duplicated consonant.
ᩋᩮᩰᩣᨽᩣᩈ to illuminate
/ʔoː pʰaː saʔ/
MFL p919. While the spelling rules call for either just U+1A70 SIGN OO or just the combination of <U+1A6E SIGN E, U+1A63 SIGN AA>, this might conceivably be a private lexicographer's notation indicating that both occur that happened to escape into the published work. The graphical order, left-to-right, in the MFL is SIGN OO, SIGN E, LETTER A, SIGN AA. The 'hacked via ASCII' rendering is wrong. (3 syllables - first is of interest.)
ᩉ᩠ᨾ᩵ᩣᩴ᩻ Grub's up!
/mam mam/
ᩃᩮᩞ trickery
/leːs/
Tai Khün spelling, cited in N3384
ᩋᨶᩣᨳᨷᩥᨱ᩠ᨯᩥᨠᩈᩞ Anathapindika's

Pali <Anāthapiṇḍikassa>
A rare spelling of the Pali msculine genitive singular ending. Note that SIGN SA starts the final phonetic syllable. (7 syllables - the last one is of interest.)

Test and Tell

These exampless are intended to reveal the behaviour of the rendering system, rather than be clear pass or fail tests.

TextMeaning and PronunciationEncoding Hacked via ASCIIRemarks
Interpretation
ᨠ᩠ᨷ (no meaning) Interpretation of <SAKOT, BA> and <SAKOT, HIGH PA> respectively. This looks at font behaviour rather than at layout engine behaviour.
ᨠ᩠ᨸ (no meaning)
Line-Breaking within the Orthographic Syllable
Manual line breaking may break lines between the dependent vowel AA (U+1A63 and possibly U+1A64) and its base consonant. Figure 9b of N3207R provides an example. What appears to be a misspelt form of sammodamānehi (as ᩈᨾᩮᩣᨴ᩠ᨴᨾᩣᨶᩮᩉᩥ), the instumental or ablative plural of the present participle of sammodati, is split between the second and third lines of the second leaf by splitting just before U+1A63. How will this be handled? The answer may be application dependent.
ᩈᨾᩮᩣᨴ᩠ᨴᨾ­ᩣᨶᩮᩉᩥ with (things) on friendly terms
Pali <samoddamānehi>

Split using a soft hyphen. (Many syllables.)

The text occurs with and without dingbats (U+1AA5) so that one can see whether an inactive soft hyphen affects it.

᪥᪥᪥ᩈᨾᩮᩣᨴ᩠ᨴᨾ­ᩣᨶᩮᩉᩥ with (things) on friendly terms
Pali <samoddamānehi>
ᩈᨾᩮᩣᨴ᩠ᨴᨾ​ᩣᨶᩮᩉᩥ with (things) on friendly terms
Pali <samoddamānehi>
Split using zero width space - this uses the presentation-oriented view that ZWSP is simply a soft hyphen without visible rendering. This test is uninformative if the renderer refuses to make the break. See above for dingbats. (Many syllables)
᪥᪥ᩈᨾᩮᩣᨴ᩠ᨴᨾ​ᩣᨶᩮᩉᩥ with (things) on friendly terms
Pali <samoddamānehi>
Baseless Marks and Non-alphabetic Bases
(no meaning) Bare vowel symbol
 ᩣ(no meaning) Vowel symbol 'on' NBSP
 ‍ᩣ (no meaning) Vowel symbol 'on' NBSP with ZWJ.
ᨷ ◌ᩮ N/A And now discourage the use of multiple script runs by the renderer.
Dependent Consonant Above and Tone Mark - What Chooses the Order?
ᨾ᩠ᩅ᩺᩵to be fun
Khün /mon/
Typed as seen. Da Lekh fonts place the glyphs side by side, but the order is as in the Tai Khün manuscript. To be precise, it is an extract from a 1949 edition of the Khemarat Weekly, reproduced in L2/17-120 Figure 4.
ᨾ᩠ᩅ᩵᩺to be fun
Khün /mon/
Typed with tone mark first. Da Lekh accepts the order, just as Thai does not rearrange THANTHAKHAT (or vowels above) with tone marks. The Da Lekh rendering does not match the Tai Khün manuscript
ᨾ᩠ᩅ᩺᩵᩻ to be lots of fun
Khün /mon mon/
Not actually attested, but grammatical derivatives of the above.
ᨾ᩠ᩅ᩵᩺᩻ to be lots of fun
Khün /mon mon/
ᨣᩪ᩺᩻ everyone
Tai Lü /kun kun/
Theoretical derivative of the unetymological, phonetic spelling of the word for person. The first mark above is serving as a final consonant, not a cancellation mark.
Coda Consonants v. Onset Consonants

U+1A5B TAI THAM CONSONANT SIGN HIGH RATHA OR LOW PA and U+1A54 TAI THAM LETTER GREAT SA may have been created so that conjuncts would be different from accidental combinations of initial and final consonants. Are these differences maintained? This primarily probes the font properties, though rendering engines may have an effect.

There are very few words that are affected. A word for 'special' is one of the few.

ᨭᩮ᩠ᨮ ᨭᩛᩮ (no meaning)
/te:t/ /-t tʰe:/
Should be different. (2 syllables)
ᨱᩮ᩠ᨮ ᨱᩛᩮ (no meaning)
/ne:t/ /-n tʰe:/
Should be different. (2 syllables)
ᨲᩮ᩠ᨮ ᨲᩛᩮ (no meaning)
/te:t/ /-t tʰe:/
Should probably be different. (2 syllables)
ᨻᩮ᩠ᨻ ᨻᩛᩮ (no meaning)
/pe:p/ /-p pe:/
Should be different. (2 syllables)
ᨾᩮ᩠ᨻ ᨾᩛᩮ (no meaning)
/me:p/ /-m pe:/
Should be different. (2 syllables)
ᨠᩮ᩠ᩁ ᨻᩕᩮ (no meaning)
/keːn/ /kʰe/
Should be different. (2 syllables)
ᨠᩮ᩠ᩃ ᨠᩖᩮ (no meaning)
/keːn/ /keː/
Should be different. (2 syllables)
ᨠᩖᩮ ᨠ᩠ᩃᩮ (no meaning)
/keː/ /keː/

However, those who don't use MEDIAL LA won't make a visual distinction to show the position of the vowel!

(2 syllables)

ᩈ᩠ᩈ ᩈᩞ ᩔ(no meaning) Should be different. (3 syllables)
ᨾ᩠ᨾ ᨾᩜ(no meaning) Should be different. (2 syllables)
Behaviour of <SAKOT, NYA>
ᨬᩮ᩠ᨬ ᨬ᩠ᨬᩮ (no meaning)
/ɲeːn/ /-n ɲeː/
Would ideally be different, but this may not be readily and robustly achievable. (2 syllables)
ᨬᩮ‌᩠ᨬ ᨬ᩠ᨬᩮ (no meaning)
/ɲeːn/ /-n ɲeː/
Instead should be different. (2 syllables)
ᨱ᩠ᨬ ᨬ᩠ᨬ(no meaning) Should these be different? (2 syllables)
ᨱᩮ᩠ᨬ ᨱ᩠ᨬᩮ (no meaning)
/neːn/ /-n ɲeː/
Should these be different? (2 syllables)
Marks from outside the Tai Tham Block
ᩋᩦ๊ (meaningless syllable in refrain of a song)
/ʔiː/
Thai mai tri and mai chattawa are found on tua mueang 'words' on p236 of the big blue book! Of course, these might just be the unencoded THAI-LAO TONES THREE and FOUR. In this particular case, a rendering issue might be alleviated by making the default positions of the tone marks higher than that of the vowels above.
ᩋᩦ๋ (meaningless syllable in refrain of a song)
/ʔiː/
Language-Sensitive Forms (Browser Test?)
ᩌᩣᩴ ᩌᩣᩴ bran (written twice)
/ham/

The top two rows are declared to be in Lao, and the second also has a corresponding style-setting lest the language setting be ignored.

The initial consonant takes the form in the Da Lekh family of the consonant form used in that role in Laos and Northeast Thailand, namely , which is only subtly different from U+1A41 TAI THAM LETTER RA. So doing may be improper behaviour, but is seen in fonts.

The mai kang should appear on the vowel U+1A63 TAI THAM VOWEL SIGN AA, its usual position outside Thailand. Of course, this won't happen if the font cannot be appropriate for such writing systems. At least one browser has failed to render the final stack properly when it has been the final glyph in the glyph stream; this is why the word is written twice.

The bottom row is not marked for language, and shows the same word (and encoding). The Da Lekh font follows the more technically challenging Chiangmai style by default, with the MAI KANG on the consonant.

(2 words, so 2 syllables!)

ᩌᩣᩴ ᩌᩣᩴ bran (written twice)
/ham/
ᩌᩣᩴ ᩌᩣᩴ bran (written twice)
/ham/
Tone before Vowel!
ᨣ᩠ᩅ᩵ᩢᩣ᩠ᨶ when
kan waː
p118 - MFL clearly has the tone as the first mark! It may be that these are just typing errors. There are two other examples of tone and then vowel in the dictionary, the same tone and vowel as here.
ᨣ᩠ᩅ᩵ᩢᩣ and say
kɔʔ waː

Rendering Challenges from MFL

These words presented problems, now overcome (Version 0.05), when developing the Da Lekh font to overcome the problems presented by the Universal Shaping Engine of mid 2016. (The solution is not entirely compliant with the Unicode standard - dotted circles in the input are sometimes deleted.) These are offered as an aid to font developers fighting unhelpful layout engines; they are not expected to help developers of the core layout engines.

TextMeaning and PronunciationEncoding Hacked via ASCIIRemarks
ᨠ᩠ᩃ᩻ᩬ᩵ᨾ Cambodian
/kălɔːm/
p4 (2 syllables, second a single consonant)
ᨠᩕᩥ᩠᩵ᨦ suspicious
/kʰiŋ/
p15
ᨡᩮᩢ᩶ᩬᩣ᩠ᨦ belongings
/kʰau kʰɔːŋ/

p101 - the one syllable form.

The first form minimises the disruption to the pattern of first element followed by second element. The second spelling tries sticking in CGJ to advise that the ordering of the marks is not an error. The third spelling follows the principle that if the components cannot be concatenated (with deletion and addition of SAKOT or equivalent as appropriate), then the ordering should be based on the visual layout of the marks.

ᨡᩮᩢ᩶͏ᩬᩣ᩠ᨦ belongings
/kʰau kʰɔːŋ/
ᨡᩮᩬᩢ᩶ᩣ᩠ᨦ belongings
/kʰau kʰɔːŋ/
ᨦ᩠ᩅ᩶ᩣ᩻ ᨪᩰᩫ᩠᩶ᨦ᩻ spastic
/ŋwaː ŋwaː soːŋ soːŋ/
p168 (2 syllables)
ᨴᩯ᩠᩶ᩃ truth to tell
/tɛː lɛː/
p318. The first entry has the written vowel with the first consonant, the second with the second, and the third entry is the same as the second but normalised.
ᨴ᩠᩶ᩃᩯ truth to tell
/tɛː lɛː/
ᨴ᩠᩶ᩃᩯ truth to tell
/tɛː lɛː/
ᨳᩮᩬᩥᩡ᩻ ᨳᩮᩥ᩠ᨠ᩻ bruised
/tʰɤʔ tʰɤʔ tʰɤːk tʰɤːk/
p314. (2 syllables)
ᨾᩉᩫᩖᨿᩰᨴᩤ great army
/maʔ hon yoː tʰaː/
NTDPLM p511.

Notes

References

Short NameFull Reference
N3207R Everson M., Hosken M. & Constable P. Revised proposal for encoding the Lanna script in the BMP of the UCS, ISO/IEC JTC1/SC2/WG2/N3207R, L2/07-007R
MFL Rungrueangsi, Udom (2004) [1991]. Lanna-Thai Dictionary, Princess Mother Version พจนานุกรมล้านนา ~ ไทย ฉบับแม่ฟ้าหลวง ᨻᨧᨶᩣᨶᩩᨠᩕᩫ᩠ᨾᩃ᩶ᩣ᩠ᨶᨶᩣ ~ ᨴᩱ᩠ᨿ ᨨᨷᩢ᩠ᨷᨾᩯ᩵ᨼ᩶ᩣᩉᩖ᩠ᩅᨦ [Photchananukrom Lanna ~ Thai, Chabap Maefa Luang] (in Thai) (Revision 1 ed.). Chiang Mai: Rongphim Ming Mueang (โรงพิมพ์มิ่งเมือง). ISBN 974-8359-03-4.
big blue book Wacharasat, Bunkhit (2003). Language of Mueang Lanna ᨽᩣᩈᩣᨾᩮᩬᩨᨦᩃ᩶ᩣ᩠ᨶᨶᩣ ภาษาเมืองล้านนา [Phasa Mueang Lanna] (in Thai). ISBN 974-85472-0-5
Apiradee Techasiriwan, Apiradee อภิรดี เตชะศิริวรรณ. พัฒนาการของอักษรและอัขรวิธีในเอกสานไทลื้ [Patthanakan khong Akson lae Akhara Witi nai Ekasan Thai Lue] Development of Tai Lue Scripts and Orthography. MA Thesis, Chiangmai University (in Thai)
NTDPLM Arunrat Wichiankhiao et al. อรุณรัตน์ วิเชียรเขียว (1996). ᨻᨧᨶᩣᨶᩩᨠᩕᩫ᩠ᨾᩃ᩶ᩣ᩠ᨶᨶᩣᨨᨻᩕᩰᩬᩡᨣᩤᩴᨴᩦ᩵ᨷᩕᩤᨠᩫ᩠ᨭᨶᩱᨷᩱᩃᩣ᩠ᨶ พจนานุกรมศัพท์ล้านนาเฉพาะคำที่ปรากฏในใบลาน The Northern Thai Dictionary of Palm-Leaf Manuscripts. ISBN 974-7067-77-2
Chiengtung Chieng Tung: Its Way of Life ᨡᩮᨾᩁᨭᩛᨶᨣᩬᩁᨩ᩠ᨿᨦᨲᩩᨦ [Khemarattha Nakon Cheng Tung] เขมรัฐนครดชียงตุง [Khemarat Nakhon Chiang Tung] (in Thai, Tai Khün, French and English) Chiang Mai: Wat Tha Kradas (วัดท่ากระดาษ)
L2/17-120 Wordingham J.R. Corrections to the Indic Syllabic Category for the Tai Tham Script, L2/17-120
N3384 Hosken M. Tai Tham Subjoined Variants, ISO/IEC JTC1/SC2/WG2/N3384, L2/08-073

Document History

This is Version 2.8 of the web page, which has been written by Richard Wordingham.

History

Version Date Changes
1.0 14 June 2015 Initial 'stable' (i.e. abandoned) version. Work had started on 27 February 2015, and there may be earlier versions around.
1.1 25 September 2016 Converted from XML to HTML (by stripping off XML header) for new website.
2.0 25 October 2016

Added option to dynamically switch fonts - free font Da Lekh Seri for exposure to rendering engine foibles, and encumbered font Da Lekh for resistance. Both fonts are open source, but I created all the inked glyphs for the Da Lekh Seri font. ('Seri' means beholding to no-one.)

Completed references, and improved, pruned and extended the examples.

2.1 26 October 2016

Corrected typos. Started testing of display bases.

2.2 7 November 2016

Fixed transliterator bug. Added examples from testing of Da Lekh font work-arounds. Corrected more typos. Tested language sensitivity.

2.3 14 November 2016

Added styles to force Lao forms. Reorganised 'test and tell'. Added one new test word, for mai kam followed by mai sam.

2.4 14 April 2017

Improved 'bran' bug alert.

Added 'A Tai Tham KH' font with and without ccmp enabled. The radar buttons are hidden, and anyone enabling them would also have to supply the font.

Added test for double acting MEDIAL LA.

2.5 8 July 2017

Added test for tone plus SIGN OY.

Added colour fonts to show phonetic position of subscripts relative to vowel.

Added "onclick" for radio buttons.

2.6 22 February 2018

Added test cases for karan on vowels and medial la following preposed vowel.

2.7 12 May 2018

Added three new fonts - 'A Tai Tham KH', 'Hariphunchai' and my extension of the latter, 'Lamphun'.

Added a few more examples of the ᨶᩣ ligature.

Added query as to when ᨲᩬᩴ᩵͏ᩯ᩠ᨶ should render properly.

Colour for spell-checking is now a reality.

2.8 17 February 2019

Added test cases for ᩃᩮᩞ and ᨻᩕ᩠ᨿᩮᩡ.

Testing

This web page has been developed with frequent testing on Firefox Version 54 and occasional viewing using Safari on iPhone (iOS 10.3.2), IE 11 (on Windows 7) and Microsoft Edge (on Windows 10).

Switching fonts has been tested in all these browsers.

Font Availability

Da Lekh Font Family

You may freely use my four fonts mentioned here without modification and may freely examine my fonts. See the respective licensing for conditions and modification. I do not own all the intellectual property rights for the Da Lekh and Da Lekh Si fonts. The fonts are available as follows:

NameFont fileSource file Licence file
Da Lekh
(ᨯᩣᩃᩮ᩠ᨡ)
dalekh.ttf File dalekh.txt in dalekh.zip. This is also the ultimate source code for the Da Lekh Seri font. See Makefile therein for preprocessing directives. DejaVu licence
Da Lekh Si
(ᨯᩣᩃᩮ᩠ᨡᩈᩦ)
dalekh_si.ttf
Da Lekh Seri
(ᨯᩣᩃᩮ᩠ᨡᩈᩮᩁᩥ)
dalekh_seri.ttf Either start from the source code, which is subject to the DejaVu licence, for the Da Lekh font, or use the preprocessed file dalekh_seri.txt. If the GNU Compiler Collection is available, one may use the following command to generate the immediate 'source' code:
cc -E -fdirectives-only -DSERI -x c dalekh.txt | grep -v ^# >| dalekh_seri.txt
seri_license.htm
Da Lekh Si Seri
(ᨯᩣᩃᩮ᩠ᨡᩈᩦᩈᩮᩁᩥ)
dalekh_si_seri.ttf Either start from the source code, which is subject to the DejaVu licence, for the Da Lekh font, or use the preprocessed file dalekh_si_seri.txt. If the GNU Compiler Collection is available, one may use the following command to generate the immediate 'source' code:
cc -E -fdirectives-only -DSERI -DCOLOUR -x c dalekh.txt | grep -v ^# >| dalekh_seri.txt

If you wish to have WOFF files, you should either generate them yourself from the font file listed above, or simply copy them from this website.

The fonts are generated from the source code by means of an unpublished, DIY font compiler that still has many rough edges. However, the source code of the font, although spartanly commented, may make it clearer what the font is attempting to do. I have endeavoured to make reverse engineering unnecessary.

The font Da Lekh is partly intended for my practical use in analysing material in the Tai Tham script. It therefore contains a large set of Latin characters to support transcription and transliteration. It also contains work arounds so that it may render properly despite problems with rendering engines.

The other purpose of the fonts is to explore issues in making an OpenType font for the Tai Tham script.

The font Da Lekh Seri is an unencumbered font intended for testing rendering engines. It therefore has, besides the glyphs for Tai Tham writing systems, just a bespoke set of (poor) ASCII glyphs and both the extra characters required by Microsoft Office and the characters recommended for the Universal Script Engine. Known existing work-arounds have been removed. This removal is implemented by compiler directives.

The font Da Lekh Si (ᨯᩣᩃᩮ᩠ᨡᩈᩦ) differs from Da Lekh in that it aims to reveal the spelling of words. This is useful when using a spell-checker, for example on Firefox. The ideal is that subscript consonants in the coda of an orthographic syllable would be distinguished from those in the onset by colour, whence the word 'Si' in the name of the font. Note that of the major browsers, the colour technology used currently works only in Firefox and Microsoft Edge. The colouring is also applied to chained syllables, except that this has not yet been applied to chained syllables.

It is possible that Da Lekh Si may be reduced to an optional OpenType feature applied to the Da Lekh font.

The font Da Lekh Si Seri is an unencumbered font without work-arounds for problems with renderers. It is intended as an aid for the development of the Da Lekh Si font.

Lamphun Font

The Lamphun font is available under the SIL open font licence; the applicable customisation declares that "Hariphunchai" and "Lamphun" are reserved font names. The font file is lamphun.otf and what I have used as 'source' code to build the font is an untidy mess assembled in lamphun.zip:

RôleNameRemarks
GlyphsHariphunchai.otf

A version of the font dated 5 May 2014, taken from SourceForge. It is stamped as Version 001.000, but that probably means nothing. There are later .sfd and .fea files at the same location, but at best they may offer improved glyphs compared to Lamphun.

This is the file that defines the Hariphunchai font as used on this web page.

OTL tableslamphun.txt

This defines a font with the same glyph numbering, but with blank glyphs. I then replace 7 tables in the Hariphunchai font with tables from this new font:

NameReason
name
Rename font to comply with the licence, and record appropriate licensing and history information.
GSUB
Include lookups to undo rendering damage by the USE. Position medial ra using feature pref. Move other lookups from ccmp to blws, so that they are applied when missyllabification by USE is no longer an issue. Choose appropriate glyphs when there are level 2 subscripts with vocalic function. Handle mai kam, subscript consonants on NYA, and N.WAA and N.HAA.
GPOS
Add some mark-to-mark positioning. Use dist to restore advance widths of spacing subscripts.
GDEF
Correct a few glyph categorisations.
cmap
Add mappings for control characters so that rendering engine damage can be repaired.
OS/2
Allow more complex OTL operations. Declare greater line depth to enable more rapid rendering. Blanked vendor ID.
head
Change font revision and modification time.
Change logfontlog.txtOnly for Lamphun.
Make filelamphun_makefile Provided for completeness - it is of little use without my font compiler, but is a complete record of the path from my sources to the font.

It is likely that I will create a variant coloured to indicate spelling.

Introduction Control Panel The Tests Notes My Fonts