Introduction | Control Panel | The Tests | Notes | My Fonts |
This page provides a set of test cases for a Tai Tham renderer. It has been compiled with a view to putting the 'Universal Shaping Engine' through its paces for the Tai Tham script. This set of tests is incomplete in that it does not directly give the correct renderings, although some one in possession of the source documents could visually check them.
I was originally requested to provide words of one syllable for such a test. By syllable, I understand an Indic syllable of the form C+(M*V*M*C*M*)* with a single base consonant. (M = miscellaneous mark). I include cases where the rôle of the base consonant is played by something other than a letter. The post-vocalic consonants occur not only in other SEA Indic scripts such as the Khmer script and Lao script (as in the use of ຽ U+0EBD LAO SEMIVOWEL SIGN NYO in the Lao writing system), but also in Tibetan.
However, many of the interesting cases occur in the second syllable of a word, and certain initial syllables are obligatorily followed by more characters of the word. I have therefore also supplied longer words when a conceivable problem would not appear in a word of one syllable.
The dependent vowel AA (U+1A63 and U+1A64) may form the base of its own little stack of dependent marks. Manual line breaking may also separate it from its base consonant. I have nevertheless counted it as part of the same syllable as a base consonant; the two stacks frequently interact in Northern Thai, with MAI KANG migrating to or towards the base consonant and interacting with its dependents.
The page was originally set up to use either my own stick font, 'Da Lekh', which is based on Deja Vu Sans, or the cut down version, 'Da Lekh Seri'. The Da Lekh font is intended to be suitable for use in preparing (but perhaps not publishing) Tai Tham text. It therefore includes work-arounds for known rendering engine problems. The Da Lekh Seri font deliberately does not include such work-arounds. You may be interested in using or examining my fonts for your own purposes.
I have added two other families of fonts. These fonts are available under the SIL Open Font license. The font 'A Tai Tham KH' relies only on the ccmp feature being enabled; it handles all Indic rearrangement itself.
The Hariphunchai font is an OpenType Layout font that looked promising when used with the South-East Asian shaper of HarfBuzz. Development seems to have drastically slowed when HarfBuzz switched Tai Tham to its implementation of the Universal Shaping Engine (USE). The code for this font is available on SourceForge and there is further documentation elsewhere. I have added work-arounds and a few further touches to enable it to work under the USE; I have dubbed the resulting font 'Lamphun'. I have included two versions in the menus, the 2014 version used for Lamphun, dubbed 'early Hariphunchai', and the latest (2019) version, dubbed 'Hariphunchai4'.
Feel free to adapt this web page to add your own fonts and test cases.
The test text is given in the table columns headed 'Text', and is the content
of the first table cell in table rows with class tst1
.
Two further columns,
headed 'Encoding' and 'Hacked via ASCII', are automatically derived
from this text as the page is loaded. The hacked column is intended to show
users how the text should look, though it too
may suffer from rendering engine limitations. The font used for this column
is the member of the Da Lekh font family last selected to display the first
column.
Ideally, I would include images of the text from credible
sources, but that may cause copyright problems, for the Unicode Consortium
wishes to be able to use this document for commercial purposes.
The 'Hacked via ASCII' column contains an unambiguous transliteration to ASCII of the Tai Tham text in the column headed 'Text'. Members of the Da Lekh font family contain an open type font feature, Stylistic Set 2, whose enabling may cause it to render the transliteration as the original Tai Tham text as it is intended to be rendered. For more details, see the style sheet in the source of this page.
The 'Meaning and Pronunciation' column is given to identfy the word given as an example. There may be better glosses, and pronunciation can vary extensively within a nominal language. The letter RA is particularly variation between /l/, /h/ and even /r/, and there are regional variations as to whether vowel length distinction exist and, if so, whether they are phonemic. For the Tai languages the pronunciation is given using IPA, while Pali is simply transliterated (as Pali). I have omitted tone, as phonetic tone is also quite variable. Where no indication to the contrary is given, the Tai pronunciation given approximates that of Chiangmai.
The test words may, in principle, be extracted quite simply from this
web page. Each test 'word' is the content of the first cell in each row
whose class is tst1
. For convenience, I have extracted the
first two cells in such rows, along with titles, to a
CSV file. Rows where there is a plausible case for
treating the encoding used as erroneous are marked in pink. (Their CSC
class is tst2
.) For completeness, I have included
alternative
encodings which the Universal Script Engine (USE) calls for with
an orange
background and CSS class tst3
when they are defensible
encodings. The USE encoding is not well-supported by fonts and is not
robust to alternative classifications of combining marks.
The HTML comments within this web page should not be construed as holding test words.
This page is intended as a rendering engine test, rather than as a font
test. However, you may modify this page to try out your own font. The
necessary changes will be confined to the style sheet in the
source code of this page, unless you use a different ASCIIfication scheme,
in which case look at the usage of javascript variable
ss02_hack
.
When this page was initially composed, in June 2015, the Da Lekh font currently mostly worked for the Tai Tham script in the Firefox and Chrome broswers. It worked in them because they use HarfBuzz to render the Tai Tham script. Since then, the HarfBuzz rendering engine used for Tai Tham has been brought into line with the Universal Script Engine, with a consequent dramatic fall in the rendering performance for the Tai Tham script.
The solution to this problem was to add numerous work-arounds to the font. These work-arounds have mostly restored performance, the main exceptions being subtle positioning errors where mark to base positioning is ignored and the default mark position is used instead.
The quality of the 'Hacked via ASCII' column varies from browser to browser and operating system to system, and also varies over time. For Internet Explorer 11, Microsoft Edge and for the HarfBuzz-based browsers Firefox and Chrome, it is actually the best rendered column. (Script-specific rendering engines have a tendency to make the achievement of advanced script features dificult rather than easy; Tai Tham has many 'advanced' features.)
Traditionally, the consonants used in neither Pali nor Sanskrit did not have subscript forms. However, one significant text book, the 'big blue book', provides a subscript form for LOW FA for use in loans from English. However, this form is cramped and ugly, which goes against the tradition of Lanna script writing. The MFL treats the stroke distinguishing HIGH KXA, LOW KXA, LOW SA, LOW FA and LETTER UU from HIGH KHA, LOW KA, LOW CA, LOW PA and LETTER U as a diacritic. The Da Lekh font follows this interpretation, and leaves this diacritic above the baseline when the letter is subscripted.
If you have difficulty reading the Da Lekh fonts, you may find it useful to consult their glyph gallery.
Introduction | Control Panel | The Tests | Notes | My Fonts |
You may type your own text in the area below. It will, if possible, be displayed in the font selected above.
Introduction | Control Panel | The Tests | Notes | My Fonts |
These vowel combinations are taken from Revised proposal for encoding the Lanna script in the BMP of the UCS, ISO/IEC JTC1/SC2/WG2/N3207R, L2/07-007R (Everson, Hosken & Constable). Changes have been rung on the initial consonants to check for silly omissions.
A hyphen in the pronunciation indicates a syllable-final consonant that would be specified by a subscript consonant or following orthographic syllable.
Text | Meaning and Pronunciation | Encoding | Hacked via ASCII | Remarks |
---|---|---|---|---|
ᨠᩫ | N/A /ko-/ |
Section 5 No. 1. This sequence does not form a whole word. An example may be seen in a word for 'danger'. | ||
ᨣᩴ | then, and /kɔː/ |
Section 5 No. 2 | ||
ᨧᩢ | (irrealis marker) /tɕaʔ/ |
Section 5 No. 4 | ||
ᨲ᩠ᩅᩫᩡ | to prevaricate /tuaʔ/ |
Section 5 No. 5 | ||
ᨷ᩠ᩅᩫ | lotus /bua/ |
Section 5 No. 6 | ||
ᨠ᩠ᩅ | N/A /kua-/ |
Section 5 No. 7. This sequence does not form a whole word. An example may be seen in one of the words for 'big'. | ||
ᨡᩬᩴ | to request /kʰɔː/ |
Section 5 No. 8 | ||
ᨠᩬ | N/A /kɔː-/ |
Section 5 No. 9. This sequence does not form a whole word. An example may be seen in the fuller spelling of the word for 'belongings'. | ||
ᨦᩡ | to split up /ŋaʔ/ |
Section 5 No. 10 | ||
ᨠᩣ | crow /kaː/ |
Section 5 No. 11 | ||
ᨴᩤ | to paint /taː/ |
Section 5 No. 12 | ||
ᩌᩣᩴ | to sprinkle /ham/ |
Section 5 No. 13 | ||
ᨣᩤᩴ | word /kam/ |
Section 5 No. 14 | ||
ᨳᩥ | to pretend /tʰiʔ/ |
Section 5 No. 15 | ||
ᨺᩦ | boil (n.) /fiː/ |
Section 5 No. 16 | ||
ᨩᩧ | moist /tɕɯʔ/ |
Section 5 No. 17 | ||
ᨾᩨ | hand /mɯː/ |
Section 5 No. 18 | ||
ᨵᩩ | monk /tʰuʔ/ |
Section 5 No. 19 | ||
ᨦᩪ | snake /ŋuː/ |
Section 5 No. 20 | ||
ᨲᩮᩡ | to kick /keʔ/ |
Section 5 No. 21 | ||
ᨽᩮ | danger /pʰeː/ |
Section 5 No. 22 | ||
ᨤᩯᩡ | to limp along /kʰɛʔ/ |
Section 5 No. 23 | ||
ᨧᩯ | corner /tɕɛː/ |
Section 5 No. 24 | ||
ᨸᩮᩬᩥᩡ | mud /pɤʔ/ |
Section 5 No. 25 | ||
ᨸᩮᩥᩬᩡ | Different from the proposals. | |||
ᨶᩮᩬᩥ | (final particle for commands and entreaties) /nɤː/ |
Section 5 No. 26. | ||
ᨶᩮᩥᩬ | Different from the proposals. | |||
ᨠᩮᩬᩨᩡ | N/A /kɯaʔ/ |
Section 5 No. 27 | ||
ᨠᩮᩨᩬᩡ | Different from the proposals. | |||
ᨠᩮᩬᩨ | /kɯa/ |
Section 5 No. 28 | ||
ᨠᩮᩨᩬ | Different to the proposals. | |||
ᩁᩮᩢᩣ | we /hau/ |
Section 5 No. 29 | ||
ᨾᩳ | drunk /mau/ |
Section 5 No. 30. This example is not taken from the MFL, which does not use this vowel symbol. | ||
ᨠᩮᩣ | N/A /ko:/ |
Section 5 No. 31. This is very rare in monosyllables, but is quite common at the end of monks' names, e.g. Adittadhammo. | ||
ᨹ᩠ᨿᩮᩡ | a type of sound /pʰiaʔ/ |
Section 5 No. 32 | ||
ᨻ᩠ᨿᩮ | flower /pia/ |
Section 5 No. 33 | ||
ᨠ᩠ᨿ | N/A /kia-/ |
Section 5 No. 34. This sequence does not form a whole word. An example may be seen in a spelling of the word for 'city'. | ||
ᨾᩮᩬᩥᩋᩡ | mucus /mɯaʔ/ |
Section 5 No. 35. (2 syllables) | ||
ᨾᩮᩥᩬᩋᩡ | Different from the proposals. | |||
ᨠᩖᩮᩬᩥᩋ | salt /kɯa/ |
Section 5 No. 36. (2 syllables) | ||
ᨠᩖᩮᩥᩬᩋ | Different from the proposals. | |||
ᩈᩰᩡ | to practice /soʔ/ |
Section 5 No. 37 | ||
ᨾᩰ | big /moː/ |
Section 5 No. 38 | ||
ᨪᩰᩬᩡ | to gouge out /sɔʔ/ |
Section 5 No. 39 | ||
ᨩᩢ᩠ᨿ | victory /tɕai/ |
Section 5 No. 40 | ||
ᨶᩲ | in /nai/ |
Section 5 No. 41 | ||
ᨢᩱ | to expose /kʰai/ |
Section 5 No. 42 | ||
ᨴᩱ᩠ᨿ | Thailand /tai/ |
Section 5 No. 43 | ||
ᨠᩮᩬᩨᩡ | Khün /kɤʔ/ |
Section 5.3 No. 22 | ||
ᨠᩮᩨᩬᩡ | Different from proposals. | |||
ᨠᩮᩬᩨ | Khün /kɤː/ |
Section 5.3 No. 23 | ||
ᨠᩮᩨᩬ | Different from proposals. | |||
ᨠᩰᩢ | Khün /ko-/ |
Section 5.3 No. 26 | ||
ᩈᩘ | First syllable of compounds of saṅgha.
/saŋ/ |
Section 5.3 No. 29. Apparently not a possible final syllable, but can be left stranded as a result of line-breaking. | ||
ᨴᩢ᩠ᨦ | whole /taŋ/ |
Section 5.3 No. 30 | ||
ᩌᩥᩴ | edge /him/ |
Section 5.3 No. 31 (Example from Apiradee p53, but different language, different pronunciation, i.e. not /-iŋ/.) | ||
ᨠᩥ᩠ᨦ | /kiŋ/ |
Section 5.3 No. 32 | ||
ᨠᩢ᩠ᨾ | /kam/ |
Section 5.3 No. 34 | ||
ᨠᩢᨾ | /kam/ |
Section 5.3 No. 35 | ||
ᨯᩭ | mountain /dɔːi/ |
Section 5.3 No. 36 |
Introduction | Control Panel | The Tests | Notes | My Fonts |
Other explicit coding sequences are given in Revised proposal for encoding the Lanna script in the BMP of the UCS, ISO/IEC JTC1/SC2/WG2/N3207R, L2/07-007R (Everson, Hosken & Constable), and these are recorded here. Amended and exploratory material is highlighted in yellow; it is not vouched for by the proposal. The remarks are my own.
Text | Meaning and Pronunciation | Encoding | Hacked via ASCII | Remarks |
---|---|---|---|---|
᪓᩠ᨴ | thrice /saːm tiː/ |
Section 2 | ||
ᨲ᩵ᩣ᩠ᨦ᩻ | different in my view /taːŋ taːŋ/ |
Section 7 | ||
ᨲᩣ᩠᩵ᨦ᩻ | Different from the proposals. | |||
ᨲᩣ᩠᩵ᨦ᩻ | Normalisation of the above. | |||
ᨳ᩠ᨶ᩻ᩫᩁ | path /tʰănon/ |
Sections 7 and 14.6 (2 syllables - the second is a single character). | ||
ᨳ᩠ᨶᩫ᩻ᩁ | Different from proposals, which specifically specified the various semantically sensitive positions of mai sam. For this word, the visual position of the marks above is free. | |||
ᨡᩢ᩶᩻ᩬᨦ | belongings /kʰau kʰɔːŋ/ |
Section 7 (2 syllables - the second is a single character) | ||
ᨡᩢᩬ᩶᩻ᨦ | Different from the proposals. | |||
ᨡᩮᩢ᩶ᩣᨡᩬᨦ | belongings
/kʰau kʰɔːŋ/ |
Section 7 (3 syllables - the third is a single character) | ||
ᨡᩮᩢᩣ᩶ᨡᩬᨦ | Different from the proposals. | |||
᪭ᩣ | elephant /tɕaːŋ/ |
Section 11 | ||
ᩉ᩠ᨶᩦ | to flee /niː/ |
Section 14.1 | ||
ᨤ᩠ᩅᩯ᩶ᩁ | to blockade /kʰwɛːn/ |
Section 14.2 (2 syllables - the second is a single character) | ||
ᩉ᩠ᩅᩫ | head /hua/ |
Section 14.3 | ||
ᨯᩢ᩵ᨦ᩠ᨶᩦ᩶ | like this /daŋ niː/ |
Section 14.4 (2 syllables) | ||
ᩉᩥ᩠ᨶ | stone /hin/ |
Section 14.5 | ||
ᨷ᩠᩵ᨾᩦ | to not have /bɔː miː/ |
Section 14.6. The proposal lists MAI KANG as a code point, but it is visually dropped in this compound. I presume the renderer is not intended to suppress the appearance of the character. The upper row drops the MAI KANG from the encoding, so is not the encoding intended, while the lower row uses the stated encoding. Da Lekh fails to arrange the marks above properly; arrangement is a proper challenge for a Tai Tham font. The phonetic syllable boundary is part of the context! | ||
ᨷᩴ᩠᩵ᨾᩦ | to not have /bɔː miː/ |
|||
ᨲᩣ᩠ᨾ | to follow /taːm/ |
Section 14.7 | ||
ᨻ᩠ᨿᩣ᩠ᨵᩥ | sickness /păɲaːt/ |
Section 14.8 | ||
ᨸ᩠ᩃ᩠ᨿ᩵ᩁ | to change /pian/ |
Section 14.9 (2 syllables - the second is a single character) | ||
ᨾᩯ᩠᩶ᨶ᩠ᩅ᩵ᩣ | even though /mɛːn waː/ |
Section 14.9. A sophisticated font might transpose the tone marks. The phonetic syllable boundary should be part of the context. | ||
ᨾᩯ᩠᩶ᨶ᩠ᩅ᩵ᩣ | even though /mɛːn waː/ |
Same as above, but normalised, so not the code point sequence in the proposal. Proposal explicitly stated SAKOT was to have ccc=0, not 9, but ccc=9 was quietly inserted in draft properties and not noticed until too late. | ||
ᩈ᩠ᩅᩯ᩵ | to butt in /swɛː/ |
Section 14.10 | ||
ᩈᩯ᩠᩵ᩅ | to embroider /sɛːw/ |
Section 14.10 (but the proposal has vowel and tone the wrong way round) | ||
ᩈᩯ᩠᩵ᩅ | to embroider /sɛːw/ |
As above, but normalised, so very much not the codepoint sequence in the proposal. | ||
ᩈ᩵ᩯ᩠ᩅ | to embroider /sɛːw/ |
As above, but uncorrected. Arguably, the rendering is unconstrained. | ||
ᨿᩪ | broom, whisk /ɲuː/ |
Section 15 No. 1 | ||
ᨾᩦ | to have /miː/ |
Section 15 No. 2 | ||
ᩉ᩠ᨾᩪ | pig /muː/ |
Section 15 No. 3 | ||
ᩉ᩠ᨾᩦ | bear (n.) /miː/ |
Section 15 No. 4 | ||
ᨹ᩠ᩅᩫ | husband /pʰua/ |
Section 15 No. 5 | ||
ᩉ᩠ᩃᩬᩴ᩵ | to cast (in metal) /lɔː/ |
Section 15 No. 6 | ||
ᨾᩣ | to come /maː/ |
Section 15 No. 7 | ||
ᩉᩱ᩵ | to hit /hai/ |
Section 15 No. 8 | ||
ᨾ᩠ᨿ | Section 15 No. 9 | |||
ᩅ᩠ᨿᨦ | city /wiaŋ/ |
Section 15 No. 10 (2 syllables - the second is a single character) | ||
ᩉᩣ᩠ᨾ | to carry by the handles /haːm/ |
Section 15 No. 11 | ||
ᨯᩣᩴ | black /dam/ |
Section 15 No. 12 | ||
ᨡᩮ᩠ᩅ | Section 15 No. 13 | |||
ᩉ᩠ᨾᩣ | dog /maː/ |
Section 15 No. 14 | ||
ᨠᩕᩣ᩠ᨸ | to prostrate oneself /kʰaːp/ |
Section 15 No. 15. The later addition of SIGN BA to the repertoire makes the correct final consonant here unclear. | ||
ᨻᩕ᩵ᩣᩴ | indefatigable /pʰam/ |
Section 15 No. 16 | ||
ᨻᩕᩣᩴ᩵ | Different from the proposals. The USE diktat at December 2021 does not determine the relative order of the tone mark and mai kang. In some styles the tone mark is associated with and follows mai kang, either above or to the right of it, but in other styles the tone mark sits on the consonant and the mai kang on the spacing vowel. Both encodings are shown here. | |||
ᨻᩕᩣ᩵ᩴ | ||||
ᨠᩕᩬᨦ | garland; Mekong /kʰɔːŋ/ |
Section 15 No. 17 (2 syllables - the second is a single character) | ||
ᩈᩕᩫᨾ᩠ᨱ᩺ | ascetic /sălom/ |
Section 15 No. 18. If the word is interpreted as having two phonetic syllables, then the medial consonant comes between an implicit vowel and an explicit vowel. (2 syllables) | ||
ᩈᩕ᩠ᩅᩫᨾ | to embrace /săluam/ |
Section 15 No. 19 (2 syllables - the second is a single character). Ignore final ᨾ; it makes the spelling ungrammatical. However, a few such spellings do occur in the MFL. | ||
ᩈᩕ᩠ᩅᨾ | to embrace /săluam/ |
Spelling of above in the MFL, so this form's encoding is not given in the proposal. | ||
ᨯᩮᩬᩨᩁ | month /dɯan/ |
Section 15 No. 20 (2 syllables - the second is a single character) | ||
ᨯᩮᩨᩬᩁ | Different from the proposals. | |||
ᩁᩮᩬᩨᩋ | boat /hɯa/ |
Section 15 No. 21 (2 syllables - the second is a single character) | ||
ᩁᩮᩨᩬᩋ | Different from the proposals. | |||
ᩉ᩠ᩃᩮᩬᩨᩋ | to exceed /lɯa/ |
Section 15 No. 22 (2 syllables - the second is a single character) | ||
ᩉ᩠ᩃᩮᩨᩬᩋ | Different from the proposals. | |||
ᩉ᩠ᨾ᩵ᩣᩴ | to eat /mam/ |
Section 15 No. 23 | ||
ᩉ᩠ᨾᩣᩴ᩵ | The USE diktat does not show specify whether mai kang or tone mark comes first. Both encodings are shown. | |||
ᩉ᩠ᨾᩣ᩵ᩴ | ||||
ᩈ᩠ᨾᩬᩥ᩻ | very level(?) /sămɤː sămɤː/ |
Section 15 No. 24. Encoding as given, omitting SIGN E, which is depicted in the proposal. Moreover, the word appears to be a misreading of the next but one. | ||
ᩈ᩠ᨾᩨᩬ᩻ | Encoding is different from the proposals. | |||
ᩈ᩠ᨾᩮᩬᩥ᩻ | Section 15 No. 24. SIGN E restored to encoding. | |||
ᩈ᩠ᨾᩮᩥᩬ᩻ | SIGN E restored to a different encoding from the proposals. | |||
ᩈ᩠ᨾ᩻ᩮᩬᩥ | level (adj.) /sămɤː/ |
Probable reading of above. Consequently, the encoding is not vouched for by the proposal. Phonetically, this is one or two syllables, depend on how one counts. | ||
ᩈ᩠ᨾᩮᩥᩬ᩻ | Probable reading of the above. The USE-compliant encodings of the two readings are the same, but each has compatible renderings inconsistent with the other interpretation. | |||
ᩉ᩠ᨾᩮᩬᩨᨦ | mine (n.) /mɯaŋ/ |
Section 15 No. 25 (2 syllables - the second is a single character) | ||
ᩉ᩠ᨾᩮᩨᩬᨦ | Different from the proposals. | |||
ᩉ᩠ᨿᩮᩬᩨᨦ | to despise /ɲɯaŋ/ |
Section 15 No. 26 (2 syllables - the second is a single character) | ||
ᩉ᩠ᨿᩮᩨᩬᨦ | Different from the proposals. | |||
ᩉ᩠ᨾᩫ᩵ᩁ | winter melon (Benincasa
hispida) /mon/ |
Section 15 No. 27 (2 syllables - the second is a single character) | ||
ᩉ᩠ᩃᩣ᩠ᨿ | many /laːi/ |
Section 15 No. 28 | ||
ᩉ᩠ᩃᩮᩬᩨᨦ | yellow /lɯaŋ/ |
Section 15 No. 29 (2 syllables - the second is a single character) | ||
ᩉ᩠ᩃᩮᩨᩬᨦ | Different from the proposals. |
Introduction | Control Panel | The Tests | Notes | My Fonts |
The actual coding sequences to be used here are open to challenge.
Text | Meaning and Pronunciation | Encoding | Hacked via ASCII | Remarks |
---|---|---|---|---|
ᨠᩬᩢᩃ᩠ᨼ᩺ | golf /kɔp/ |
Section 2. The position of RA HAAM is debatable - cf. Thai กอล์ฟ. The first example places it on the second consonant, the second on the first. The third then normalises the spelling of the second. Note that this word consists of two orthographic syllables. | ||
ᨠᩬᩢᩃ᩠᩺ᨼ | ||||
ᨠᩬᩢᩃ᩠᩺ᨼ | ||||
ᨠᩢᩬᩃ᩠ᨼ᩺ | In the December 2021 USE order. | |||
ᨠᩢᩬᩃ᩠᩺ᨼ | ||||
ᨠᩢᩬᩃ᩠᩺ᨼ | ||||
ᨠᩕᩣ᩠ᨼ | graph /kaːp/ (?) |
Section 2. | ||
ᨴᩬᨼ᩠ᨼᩦ᩵ | toffee | Section 2 (2 syllables) | ||
ᨠᨽᩚ | pregnant /kap pʰa?/ |
Section 4 (2 syllables - the first is a single character) | ||
ᩈᨱᩛᩣ᩠ᨶ | shape /san tʰaːn/ |
Section 4 (2 syllables - the first is a single character) | ||
ᩁᨭᩛᨷᩣ᩠ᩃ | government /rat tʰa baːn/ |
Section 4 (3 syllables) | ||
ᩁᩢᨭᩛᨷᩣ᩠ᩃ | government /rat tʰa baːn/ |
Section 4 (3 syllables) | ||
ᩈᨻᩛ | omniscience /sap paʔ/ |
Section 4 (2 syllables - the first is a single character) | ||
ᩋᨾᩛ | mango /ʔam paʔ/ |
Section 4 (2 syllables - the first is a single character) | ||
ᩁᩣᨩᨽᩢ᩠ᨮ | Rajabhat /la:t tɕa pʰat/ |
Section 4 (3 syllables) | ||
ᨷᩢᨱ᩠ᨻᨷᩩᩁᩩᩈ | disciple "banop burus" |
Section 4 (5 syllables) |
Introduction | Control Panel | The Tests | Notes | My Fonts |
The mai kang lai character can be challenge to a font. The character has a wide range of behaviours. It can behave as a spacing final character (as in modern Tai Khün fonts) to a repha-like character, the old-fashioned behaviour seen in Tai Khün, Thailand and Laos. The MFL dictionary shows an intermediate behaviour, where marks above the following base consonant cause it to be positioned within the previous syllable. This is the style employed by the Da Lekh font.
Text | Meaning and Pronunciation | Encoding | Hacked via ASCII | Remarks |
---|---|---|---|---|
ᨴᩘ᩠ᩃᩣ᩠ᨿ | all /taŋ laːi/ |
The ascending tail of SAKOT LA prevents the MAI KANG LAI moving on to a subsequent syllable/word. This prevents fonts exploiting the rphf feature of the Universal Shaping Engine. | ||
ᨴ᩠ᩃᩘᩣ᩠ᨿ | With total disregard for logical order. | |||
ᩈᩘᨥᩮᩣ | Nominative of Pali saṅgha <saṅgho> |
(2 syllables) | ||
ᩁᩘᩈᩦ | ray /raŋ siː/ |
(2 syllables) |
Introduction | Control Panel | The Tests | Notes | My Fonts |
This is mostly a test for readers!
Text | Meaning and Pronunciation | Encoding | Hacked via ASCII | Remarks |
---|---|---|---|---|
ᨶᩣᩴ | to lead /nam/ |
|||
ᨾᨶᩮᩣ | heart, mind /maʔ no:/ |
(2 syllables) | ||
ᨶᩮᩢᩣ | to sew a long stitch /nau/ |
Some fonts may fail here because they handle
the ligature in pstf ; this worked with HarfBuzz until
pstf
was moved to before Indic rearrangement. |
||
ᨶᩣ᩠ᨿ | leader /na:i/ |
|||
ᨶ᩵ᩣ᩠ᨶ | Nan
/na:n/ |
|||
ᨶᩣ᩠᩵ᨶ | Using formalism where neither current nor historical speech defines phonetic order. The first of these two keeps user-perceivable characters contiguous, and the second is its normalisation (NFC/NFD). | |||
ᨶᩣ᩠᩵ᨶ | ||||
ᩍᨶ᩠ᨴᩣ | Indra /ʔin ta:/ |
The more usual form lacks U+1A63. (2 syllables - first has one character.) | ||
ᩋᩫᨶ᩠ᨲᩕᩣ᩠ᨿ | danger
/ʔon tʰaʔ la:i/ |
(2 syllables) | ||
ᨶ᩶ᩣᩴ | water /nam/ |
This can be surprisingly hard to achieve in a font. Logic designed to stop Arabic vowel marks wrongly interacting has to be circumvented so that the two marks will interact! | ||
ᨶᩣ᩶ᩴ | The USE rules do not dictate whether the tone mark comes before or after the mai kang. Both the canonically inequivalent forms are given here. | |||
ᨶᩣᩴ᩶ | ||||
ᨶ᩠ᩅᩣ᩠ᨷ | to falsely accuse /nwaːp/ |
MFL p352 | ||
ᨴᩤᩴᨶ᩠ᩅᩣ᩠ᨿ | to foretell /tam nwaːi/ |
NTDPLM p285. Sometimes the writer wants to avoid the ligature! (2 syllables) | ||
ᨲ᩵ᩣᩴᨶ᩠ᩅᩣ᩠ᨿ | to foretell /tam nwaːi/ |
MFL p320, but only in transliteration. Shape of second syllable (ligature plus subscript consonant) is attested elsewhere. (2 syllables) | ||
ᨲᩣ᩵ᩴᨶ᩠ᩅᩣ᩠ᨿ | The USE does not dictate whether mai kang or the tone mark comes first. Both options are given here. | |||
ᨲᩣᩴ᩵ᨶ᩠ᩅᩣ᩠ᨿ | ||||
ᨶᩣ | rice field /naː/ |
An isolated test of the ZWNJ feature above. This form is to be expected in texts teaching the writing system. | ||
ᩉ᩠ᨶ᩶ᩣ | face /naː/ |
Note that the SAKOT prevents ligature formation. | ||
ᩉ᩠ᨶᩣ᩶ | Tone mark above consonant still follows the vowel. |
Introduction | Control Panel | The Tests | Notes | My Fonts |
These examples are taken from the 'big blue book' pp151-6. Some of these renderings are unusual compared with the native tradition, and are included for that reason. The position of RA HAAM is particularly noteworthy.
The pronunciations given are guesswork where Siamese practice and Lanna script orthography conflict.
Text | Meaning and Pronunciation | Encoding | Hacked via ASCII | Remarks |
---|---|---|---|---|
ᨠᩯᩢ᩠ᩈ | gas /kɛs/ |
|||
ᨴᩕᩯ᩠ᨠᨴᩮᩬᩥᩁ᩺ | tractor /tʰɛːk tʰɤː/ |
Slightly complicated set of consonants in first syllable. (2 syllables) | ||
ᨴᩕᩯ᩠ᨠᨴᩮᩥᩬᩁ᩺ | Vowel not as in the proposals. | |||
ᨶᩰᩫ᩠᩶ᨲ | note /noːt/ |
Vowel combination not listed above | ||
ᨷᩕᩰᨴᩦ᩠ᨶ | protein /pʰoː tiːn/ |
Tests reordering - the vowel symbol should appear first. (2 syllables) | ||
ᨼᩥᩅ᩠ᩈ᩺ | fuse /fiu/ |
|||
ᩈᨲᩯᨾ᩠ᨷ᩺ | postage stamp /sa tɛːm/ |
(3 syllables) | ||
ᩈᩮᩥᩁ᩠᩺ᨷ | to serve /sɤːp/ |
Compare the placement of RA HAAM with the previous word. The same contrast may be seen on p155 of the 'big blue book'. (2 syllables) |
Introduction | Control Panel | The Tests | Notes | My Fonts |
These examples are all taken from Graphic Blends at SEAsite. The pronunciations given are Tai Lü.
Text | Meaning and Pronunciation | Encoding | Hacked via ASCII | Remarks |
---|---|---|---|---|
ᨴᩢ᩵ᩗᩣ | all /taŋ laːi/ |
This word, in some of its various forms, seems to be the only word containing U+1A57 TAI THAM CONSONANT SIGN LA TANG LAI. I withdraw my previous, surprised, reading of the word shown as containing NGA as the base consonant. |
||
ᨡᨶ᩠ᨵᩣ | spell (magic) /kʰan tʰaː/ |
(2 syllables, first a single character) | ||
ᨣ᩠᩶ᨯᩦ | okay /kɔː diː/ |
A non-breaking space has been appended to avoid truncation. A sophisticated font would slide the vowel under the tone mark. | ||
ᨷ᩠᩶ᨾᩣ | to not come /bau maː/ |
|||
ᨷ᩠᩶ᨾᩣ | Same again, but normalised. | |||
ᨷ᩠᩶ᨯᩣ᩠ᨿ | to not have /bau da:i/ |
|||
ᨧᩢ᩠ᩅᩤ | How big an area? /tsak va:/ |
|||
ᩈᩮ᩠ᩓ᩠ᩅ | deceased /se: lɛu/ |
|||
ᨴᩯ᩠ᨶᩳ | Really, is that true? /tɛː nɔː/ |
|||
ᩓ᩠ᨾᩣ | to look this way /lɛ maː/ |
|||
ᨠᩮ᩠ᩈᩣ | hair /keː saː/ |
|||
ᨻᩱ᩠ᨾᩣ | to come and go /pai maː/ |
|||
ᩈᩮ᩠ᩅ᩶ᩤ | if /seː vaː/ |
|||
ᩈᩮ᩠ᩅᩤ᩶ | Tone mark position not as in the proposals. | |||
ᩅᩮ᩠ᩃᩣ | time /veː laː/ |
Also in Apiradee p49 | ||
ᨵᩤ᩠ᨲᩩ | physical body /tʰaː tuʔ/ |
The vowel on the final consonant is inescapable - there is no way of rewriting the orthographic syllable to escape the combination. | ||
ᨩ᩠ᩓ | in conclusion /tsălɛː/ |
|||
ᨻᩭ᩠ᩅ᩻ᩣ | because /pɔi vaː/ |
The MAI SAM tags the WA as starting a chained syllable. The spelling presumes that a font can decide that the subscript WA goes to the left of the MAI KOY. | ||
ᨻᩭ᩠᩻ᩅᩣ | A purely visual placement of MAI SAM. | |||
ᨻᩭ᩠᩻ᩅᩣ | Normalised form of the above. | |||
ᩈᩫ᩠ᨦᩣ᩠ᨶ | world /suŋ saːn/ |
Introduction | Control Panel | The Tests | Notes | My Fonts |
These words are taken from the MA thesis 'Development of Tai Lue Scripts and Orthography' by Apiradee Techasiriwan (อภิรดี เตชะศิริวรรณ). The pronunciations given are Tai Lü. Comparative material from elsewhere is highlighted in yellow.
Text | Meaning | Encoding | Hacked via ASCII | Remarks |
---|---|---|---|---|
ᨻᩬᩳ᩵ | father /pɔː/ |
p3. Vowel combination not listed above. Spelling is archaic. | ||
ᨻᩳᩬ᩵ | USE vowel ordering. | |||
ᩈᨷ᩷ᩣ᩠ᨿ | content, well /săbaːi/ |
p3. Rare example of a word with this tone mark. (2 syllables, first is a single character.) | ||
ᩈᨷᩣ᩠᩷ᨿ | USE tone positioning. | |||
ᩅ᩠ᨿᩙ | city /weŋ/ |
p4. | ||
ᨣᩪ᩺ | person /kun/ |
p4. Unetymological, phonetic spelling. The mark above is serving as a final consonant, not a cancellation mark. | ||
ᨣ᩺ᩪ | USE ordering as vowels. | |||
᪁᪂ ᨻᩢ᩠ᨶ᩻ᩣ | Sipsongpanna /sip sɔːŋ pan naː/ |
p10. (Number precedes syllable). Example of mai sam marking a double-acting consonant. | ||
᪁᪂ ᨻᩢ᩠᩻ᨶᩣ | Best-looking hack for USE compliance. | |||
ᨻᩱ᩻ᩣ᩠ᨿ | to go to the location /pai paːi/ |
p47. | ||
ᨻᩱᩣ᩠᩻ᨿ | Best-looking hack for USE compliance | |||
ᨻᩱᩣ᩠᩻ᨿ | Normalisation of the above. | |||
ᨩ᩠ᨿᩙᨲᩩᩴ | Kengtung
/tseŋ tuŋ/ |
p53. (2 syllables) Possibly the Chengtung on the Vietnamese border. | ||
ᩅᨲᩛᩩ | matter /wat tʰu/ |
p49. U+1A5B represents subscript HIGH THA rather than high RATHA. This is an issue for a font's repertoire of conjuncts. | ||
ᩅᨲ᩠ᨳᩩ | matter /wat tʰu/ |
The Northern Thai writing of the above. Perhaps this should be rendered as the above when the language is Tai Lü or Lao. | ||
ᨯ᩠ᨿᩴ | one /deu/ |
p53. Assuming the word has TAI THAM SIGN MAI KANG rather than unencoded *TAI THAM CONSONANT SIGN FINAL WA. | ||
ᩉ᩠ᨶᩦᩢ᩶ | debt /niː/ |
p57. | ||
ᩁᩮᩂ᩠ᨠ | auspicious occasion /hɤːk/ |
p79. | ||
ᩁ᩠ᨿ᩺ | to learn /heːn/ |
p118. |
Introduction | Control Panel | The Tests | Notes | My Fonts |
The word typically meaning 'and...not' or 'and...then' may be written with a chained syllable, and this may present challenges to renderers. The form of the letter representing /b/ in a chained syllable presented an encoding challenge. N3207R proposed using the sequence <SAKOT, BA> for it, and using <SAKOT, HIGH PA> for the subscript form corresponding to both BA (common) and HIGH PA (extremely rare) in its rôle as a final (Thai sakot) consonant. During the ISO process, a new character was introduced instead for the special form, SIGN BA, and it is widely assumed that <SAKOT, BA> represents the usual subscript form corresponding to BA, both as a sakot consonant and in the Pali /mp/ and /pp/ intervocalic clusters.
When syllables are chained, shared vowel symbols are not repeated. This leads to ambiguity as to which symbol is dropped.
All the spellings in the table below represent the same careful pronunciation in Northern Thai, namely /kɔː bɔː/. The Tai Lü forms are written with different marks and pronounced with different vowels, but use the same two consonant forms in the stack.
Text | Meaning | Encoding | Hacked via ASCII | Remarks |
---|---|---|---|---|
ᨣᩴᨷᩴ᩵ | and...not, then...not | Full form - 2 syllables, and arguably 2 words. | ||
ᨣᩴᨷᩴ | do. | Univerbated form in MFL (2 syllables) | ||
ᨣᩝᩴ᩵ | do. | First mai kang dropped. | ||
ᨣᩴᩝ᩵ | do. | Second mai kang dropped. | ||
ᨣᩝᩴ | do. | First mai kang dropped. | ||
ᨣᩴᩝ | do. | Second mai kang dropped. |
Introduction | Control Panel | The Tests | Notes | My Fonts |
These words behave slightly oddly.
Text | Meaning and Pronunciation | Encoding | Hacked via ASCII | Remarks |
---|---|---|---|---|
ᩓᩯ | very much /lɛː/ |
Redundant vowel mark | ||
ᩐᩣ | to take /ʔau/ |
Vowel on independent vowel | ||
ᩐ᩵ᩣ | very hot /ʔau/ |
Vowel and tone mark on independent vowel | ||
ᩐᩣ᩵ | USE-compliant order | |||
ᨯᩪᩕᩣ | listen to me /duː haː/ |
Medial consonant between explicit vowels | ||
ᨯᩮᩬᩥᩁᨹᩫᩖᨣᩩᨱ᩺ | March /dɯan pʰon laʔ kun/ |
NTDPLM p259. Double-acting medial consonant with implicit vowel after it. (3 syllables) | ||
ᨯᩮᩥᩬᩁᨹᩫᩖᨣᩩᨱ᩺ | USE-compliant vowel ordering | |||
ᨻᩣᨷᩰᩖ | Pabol (sic) /paː boːn/ |
A mistake for Spanish Pablo seen on Wikipedia, but in light of the above a renderer should render it as intended. | ||
ᨶ᩶ᩭ | little /nɔːi/ |
Tai Khün spelling. | ||
ᨶᩭ᩶ | USE-compliant tone mark sequencing. | |||
ᩉᩖ᩠ᩅᨦ | big /luaŋ/ |
Medial consonant in middle of stack. The proposal classified the final consonant of the stack as a 'medial vowel'. (2 syllables, second a single character) | ||
ᩉᩖ᩠ᩅᩣ | iron /lwaː/ |
Medial consonant in middle of stack. In this case, the WA is very much a consonant. | ||
ᨻᩕ᩠ᨿᩮᩡ | a type of sound /pʰiaʔ/ |
Preposed medial consonant in middle of stack along with a preposed vowel. | ||
ᨠᩩ᩶ᩣ᩠ᨶ᩠ᨦ | to prosper /kaːn kuŋ/ |
The first word in the MFL! Note that there are two final consonants. The SIGN AA prevents a phonetic spelling. | ||
ᨠᩩᩣ᩠᩶ᨶ᩠ᨦ | USE-compliant tone mark placement. | |||
ᩋᩢ᩠ᨭᩛ | a satang coin /ʔat/ |
Two consonants in final consonant position (3 consonants in total) | ||
ᩆᩢᨠ᩠ᨯᩥ᩺ | rank /sak/ |
Consonant-killer also killing explicit vowel above (2 syllables) | ||
ᩆᩢᨠ᩠ᨯᩥ᩼ | rank /sak/ |
Same again, but with KARAN instead of RA HAAM. Some people are using KARAN in Northern Thai instead of RA HAAM! (2 syllables) | ||
ᨾᩉᩣᩉᩥᨦ᩠ᨣᩩ᩺ | giant fennel /ma haː hiŋ/ |
Consonant-killer also killing explicit vowel below (4 syllables) | ||
ᨾᩉᩣᩉᩥᨦ᩠ᨣ᩺ᩩ | USE then requires that the killer precede the killed vowel. | |||
ᨾᩉᩣᩉᩥᨦ᩠ᨣᩩ᩼ | Same again, but with KARAN. (4 syllables) | |||
ᩆᩣᩈ᩠ᨲᩕ᩺ | science /saːt/ |
Consonant-killer also killing medial consonant. NT spelling. (2 syllables) | ||
ᩈᩣᩈ᩠ᨲᩕ᩼ | science /saːt/ |
Consonant-killer also killing medial consonant. Tai Khün spelling. (2 syllables) | ||
ᩁᩪ᩠ᨷ | image /huːp/ |
This spelling is archaic in Northern Thailand (but current in Tai Khün) | ||
ᨻᩦ᩠᩵ᨶᩬ᩶ᨦ | relatives /piː nɔːŋ/ |
(2 syllables - second is a single character) | ||
ᩃᩢᩪ | child (progeny) /luːk/ |
USE demands that mai kak (see next) precede most of the vowels that it phonetically follows. | ||
ᩃᩪᩢ | MAI SAT can serve as a final consonant, /k/. This leads to yet more formal vowel combinations. | |||
ᨸᩢᩣ | mouth /paːk/ |
|||
ᨯᩬᩢ | flower /dɔːk/ |
|||
ᨯᩢᩬ | USE-compliant ordering. | |||
ᨯᩢᩬᩡ | USE-compliant ordering. | |||
ᨯᩬᩢᩡ | MAI SAT can even be reinforced by SIGN A. | |||
ᨻ᩠ᩅᩢᩡ | group /puak/ |
|||
ᨲᩯ᩠ᨶᩬᩴ᩵ | wasp, hornet /tɛːn tɔː/ |
A single orthographic syllable. | ||
ᨲᩬᩴ᩵͏ᩯ᩠ᨶ | wasp, hornet /tɔː tɛːn/ |
Should normally be visually identical with the above - the font may be too crude. However, when font colouring is supported, the vowel below should be coloured differently in the Da Lekh Si font; that font is intended to reveal the order of characters. | ||
ᨲᩬᩴ᩵ᩯ᩠ᨶ | Would it be legitimate for this to render differently to the above? | |||
ᩈ᩠ᨶᩫ᩻ | street /sănon/ |
The mai sam represents the final consonant in addition to the epenthetic vowel. | ||
ᨠᨾᩛᩦ | scripture /kam piː/ |
The surprise is that U+1A5B had InSC=Consonant_Final until Unicode 10.0. (2 syllables - the first is a single consonant in the first example.) | ||
ᨶᩥᨻᩛᩣ᩠ᨶ | nirvana /nip paːn/ |
|||
ᨵᨾᩜᩥᨠ | saintly /tʰam miʔ kaʔ/ |
Chiengtung p166. It has 3 syllables - the second is of interest. It may show a problem with U+1A5C having InSC=Consonant_Final until Unicode 10.0. | ||
ᩈᨵᩩ᩠ᨷ | stupa(?) /sătʰup/ |
Chiengtung p166. (2 syllables - the first is a single letter.) This shows the issue with placement of the vowel and 'sakot' consonant also applies to this explicit vowel. | ||
ᩋᩣᨴᩥᨲ᩠ᨲᨵᨾᩜᩮᩣ | Adittadhammo Pali <Ādittadhammo> |
Chiengtung p264. (5 syllables) | ||
ᨬᩣᨱᨵᨾᩜᩮᩣ | Nyanadhammo
Pali <Ñāṇadhammo> |
Chiengtung p238. The individual referred is not the one hyperlinked to. (4 syllables) | ||
ᩅᩥᩈᩮ᩠ᩈ | special /wiʔ seːt/ |
Note the lack of a ligature. (2 syllables) | ||
ᨢ᩶ᩣ | slave /kʰaː/ |
Same character order as in Thai and Lao! | ||
ᨢᩣ᩶ | But not if the USE prevails! | |||
ᩈᩣᩈᨶᩣ | religion /saː saʔ naː/ |
Full (5 chars) and contracted (7 chars) forms. (3 and 2 syllables respectively) |
||
ᩈᩣᩈ᩠ᨶ᩻ᩣ | ||||
ᩈ᩠ᨶ᩻ᩮᩢ᩶ᩣ | javelin /sănau/ |
|||
ᨲᩦ͏ᩣ᩠ᨿ | to beat to death /tiː taːi/ |
Uses CGJ as an invisible MAI SAM to stand for the duplicated consonant. | ||
ᩋᩮᩰᩣᨽᩣᩈ | to illuminate /ʔoː pʰaː saʔ/ |
MFL p919. While the spelling rules call for either just U+1A70 SIGN OO or just the combination of <U+1A6E SIGN E, U+1A63 SIGN AA>, this might conceivably be a private lexicographer's notation indicating that both occur that happened to escape into the published work. The graphical order, left-to-right, in the MFL is SIGN OO, SIGN E, LETTER A, SIGN AA. The 'hacked via ASCII' rendering is wrong. (3 syllables - first is of interest.) | ||
ᩉ᩠ᨾ᩵ᩣᩴ᩻ | Grub's up! /mam mam/ |
|||
ᩉ᩠ᨾᩣᩴ᩵᩻ | It is not clear whether a USE-compliant form should have MAI KANG or the tone mark first. | |||
ᩉ᩠ᨾᩣ᩵ᩴ᩻ | ||||
ᩃᩮᩞ | trickery /leːs/ | Tai Khün spelling, cited in N3384 | ||
ᩋᨶᩣᨳᨷᩥᨱ᩠ᨯᩥᨠᩈᩞ | Anathapindika's Pali <Anāthapiṇḍikassa> |
A rare spelling of the Pali masculine genitive singular ending. Note that SIGN SA starts the final phonetic syllable. (7 syllables - the last one is of interest.) |
Introduction | Control Panel | The Tests | Notes | My Fonts |
These exampless are intended to reveal the behaviour of the rendering system, rather than be clear pass or fail tests.
Text | Meaning and Pronunciation | Encoding | Hacked via ASCII | Remarks | ||
---|---|---|---|---|---|---|
Interpretation | ||||||
ᨠ᩠ᨷ | (no meaning) | Interpretation of <SAKOT, BA> and <SAKOT, HIGH PA> respectively. This looks at font behaviour rather than at layout engine behaviour. | ||||
ᨠ᩠ᨸ | (no meaning) | |||||
| ||||||
ᩈᨾᩮᩣᨴ᩠ᨴᨾᩣᨶᩮᩉᩥ | with (things) on friendly terms
Pali <samoddamānehi> |
Split using a soft hyphen. (Many syllables.) The text occurs with and without dingbats (U+1AA5) so that one can see whether an inactive soft hyphen affects it. |
||||
᪥᪥᪥ᩈᨾᩮᩣᨴ᩠ᨴᨾᩣᨶᩮᩉᩥ | with (things) on friendly terms
Pali <samoddamānehi> |
|||||
ᩈᨾᩮᩣᨴ᩠ᨴᨾᩣᨶᩮᩉᩥ | with (things) on friendly terms Pali <samoddamānehi> |
Split using zero width space - this uses the presentation-oriented view that ZWSP is simply a soft hyphen without visible rendering. This test is uninformative if the renderer refuses to make the break. See above for dingbats. (Many syllables) | ||||
᪥᪥ᩈᨾᩮᩣᨴ᩠ᨴᨾᩣᨶᩮᩉᩥ | with (things) on friendly terms
Pali <samoddamānehi> |
|||||
Baseless Marks and Non-alphabetic Bases | ||||||
ᩣ | (no meaning) | Bare vowel symbol | ||||
ᩣ | (no meaning) | Vowel symbol 'on' NBSP | ||||
ᩣ | (no meaning) | Vowel symbol 'on' NBSP with ZWJ. | ||||
ᨷ ◌ᩮ | N/A | And now discourage the use of multiple script runs by the renderer. | ||||
Dependent Consonant Above and Tone Mark - What Chooses the Order? | ||||||
ᨾ᩠ᩅ᩺᩵ | to be fun Khün /mon/ |
Typed as seen. Da Lekh fonts place the glyphs side by side, but the order is as in the Tai Khün manuscript. To be precise, it is an extract from a 1949 edition of the Khemarat Weekly, reproduced in L2/17-120 Figure 4. | ||||
ᨾ᩠ᩅ᩵᩺ | to be fun Khün /mon/ |
Typed with tone mark first. Da Lekh accepts the order, just as Thai does not rearrange THANTHAKHAT (or vowels above) with tone marks. The Da Lekh rendering does not match the Tai Khün manuscript | ||||
ᨾ᩠ᩅ᩺᩵᩻ | to be lots of fun Khün /mon mon/ |
Not actually attested, but grammatical derivatives of the above. | ||||
ᨾ᩠ᩅ᩵᩺᩻ | to be lots of fun Khün /mon mon/ |
|||||
ᨣᩪ᩺᩻ | everyone Tai Lü /kun kun/ |
Theoretical derivative of the unetymological, phonetic spelling of the word for person. The first mark above is serving as a final consonant, not a cancellation mark. | ||||
|
||||||
ᨭᩮ᩠ᨮ ᨭᩛᩮ | (no meaning) /te:t/ /-t tʰe:/ |
Should be different. (2 syllables) | ||||
ᨱᩮ᩠ᨮ ᨱᩛᩮ | (no meaning) /ne:t/ /-n tʰe:/ |
Should be different. (2 syllables) | ||||
ᨲᩮ᩠ᨮ ᨲᩛᩮ | (no meaning) /te:t/ /-t tʰe:/ |
Should probably be different. (2 syllables) | ||||
ᨻᩮ᩠ᨻ ᨻᩛᩮ | (no meaning) /pe:p/ /-p pe:/ |
Should be different. (2 syllables) | ||||
ᨾᩮ᩠ᨻ ᨾᩛᩮ | (no meaning) /me:p/ /-m pe:/ |
Should be different. (2 syllables) | ||||
ᨠᩮ᩠ᩁ ᨻᩕᩮ | (no meaning) /keːn/ /kʰe/ |
Should be different. (2 syllables) | ||||
ᨠᩮ᩠ᩃ ᨠᩖᩮ | (no meaning) /keːn/ /keː/ |
Should be different. (2 syllables) | ||||
ᨠᩖᩮ ᨠ᩠ᩃᩮ | (no meaning) /keː/ /keː/ |
However, those who don't use MEDIAL LA won't make a visual distinction to show the position of the vowel! (2 syllables) |
||||
ᩈ᩠ᩈ ᩈᩞ ᩔ | (no meaning) | Should be different. (3 syllables) | ||||
ᨾ᩠ᨾ ᨾᩜ | (no meaning) | Should be different. (2 syllables) | ||||
Behaviour of <SAKOT, NYA> | ||||||
ᨬᩮ᩠ᨬ ᨬ᩠ᨬᩮ | (no meaning) /ɲeːn/ /-n ɲeː/ |
Would ideally be different, but this may not be readily and robustly achievable. (2 syllables) | ||||
ᨬᩮ᩠ᨬ ᨬ᩠ᨬᩮ | (no meaning) /ɲeːn/ /-n ɲeː/ |
Instead should be different. (2 syllables) | ||||
ᨱ᩠ᨬ ᨬ᩠ᨬ | (no meaning) | Should these be different? (2 syllables) | ||||
ᨱᩮ᩠ᨬ ᨱ᩠ᨬᩮ | (no meaning) /neːn/ /-n ɲeː/ |
Should these be different? (2 syllables) | ||||
Marks from outside the Tai Tham Block | ||||||
ᩋᩦ๊ | (meaningless syllable in refrain of a song)
/ʔiː/ |
Thai mai tri and mai chattawa are found on tua mueang 'words' on p236 of the big blue book! Of course, these might just be the unencoded THAI-LAO TONES THREE and FOUR. In this particular case, a rendering issue might be alleviated by making the default positions of the tone marks higher than that of the vowels above. | ||||
ᩋᩦ๋ | (meaningless syllable in refrain of a song)
/ʔiː/ |
|||||
Language-Sensitive Forms (Browser Test?) | ||||||
ᩌᩣᩴ ᩌᩣᩴ | bran (written twice) /ham/ |
The top two rows are declared to be in Lao, and the second also has a corresponding style-setting lest the language setting be ignored. The initial consonant takes the form in the Da Lekh family of the consonant form used in that role in Laos and Northeast Thailand, namely , which is only subtly different from U+1A41 TAI THAM LETTER RA. So doing may be improper behaviour, but is seen in fonts. The mai kang should appear on the vowel U+1A63 TAI THAM VOWEL SIGN AA, its usual position outside Thailand. Of course, this won't happen if the font cannot be appropriate for such writing systems. At least one browser has failed to render the final stack properly when it has been the final glyph in the glyph stream; this is why the word is written twice. The bottom row is not marked for language, and shows the same word (and encoding). The Da Lekh font follows the more technically challenging Chiangmai style by default, with the MAI KANG on the consonant. (2 words, so 2 syllables!) |
||||
ᩌᩣᩴ ᩌᩣᩴ | bran (written twice) /ham/ |
|||||
ᩌᩣᩴ ᩌᩣᩴ | bran (written twice) /ham/ |
|||||
Tone before Vowel! | ||||||
ᨣ᩠ᩅ᩵ᩢᩣ᩠ᨶ | when kan waː | p118 - MFL clearly has the tone as the first mark! It may be that these are just typing errors. There are two other examples of tone and then vowel in the dictionary, the same tone and vowel as here. | ||||
ᨣ᩠ᩅ᩵ᩢᩣ | and say kɔʔ waː |
Introduction | Control Panel | The Tests | Notes | My Fonts |
These words presented problems, now overcome (Version 0.05), when developing the Da Lekh font to overcome the problems presented by the Universal Shaping Engine of mid 2016. (The solution is not entirely compliant with the Unicode standard - dotted circles in the input are sometimes deleted.) These are offered as an aid to font developers fighting unhelpful layout engines; they are not expected to help developers of the core layout engines.
Text | Meaning and Pronunciation | Encoding | Hacked via ASCII | Remarks |
---|---|---|---|---|
ᨠ᩠ᩃ᩻ᩬ᩵ᨾ | Cambodian /kălɔːm/ |
p4 (2 syllables, second a single consonant) | ||
ᨠ᩠ᩃᩬ᩵᩻ᨾ | Hard to interpret encoding. | |||
ᨠ᩠ᩃᩬ᩻᩵ᨾ | USE-compatible, with interpretable rendering specification. | |||
ᨠᩕᩥ᩠᩵ᨦ | suspicious /kʰiŋ/ |
p15 | ||
ᨡᩮᩢ᩶ᩬᩣ᩠ᨦ | belongings /kʰau kʰɔːŋ/ |
p101 - the one syllable form. The first form minimises the disruption to the pattern of first element followed by second element. The second spelling tries sticking in CGJ to advise that the ordering of the marks is not an error. The third spelling follows the principle that if the components cannot be concatenated (with deletion and addition of SAKOT or equivalent as appropriate), then the ordering should be based on the visual layout of the marks. |
||
ᨡᩮᩢ᩶͏ᩬᩣ᩠ᨦ | ||||
ᨡᩮᩬᩢ᩶ᩣ᩠ᨦ | ||||
ᨡᩮᩢᩬᩣ᩠᩶ᨦ | USE (December 2021)-compatible rearrangement of the above - but the final consonant is still incompatible at 2021. | |||
ᨦ᩠ᩅ᩶ᩣ᩻ ᨪᩰᩫ᩠᩶ᨦ᩻ | spastic /ŋwaː ŋwaː soːŋ soːŋ/ |
p168 (2 syllables) | ||
ᨦ᩠ᩅᩣ᩶᩻ ᨪᩰᩫ᩠᩶ᨦ᩻ | Vowel and tone order adjusted to the USE as at December 2021. | |||
ᨴᩯ᩠᩶ᩃ | truth to tell /tɛː lɛː/ |
p318. The first entry has the written vowel with the first consonant, the second with the second, and the third entry is the same as the second but normalised. | ||
ᨴ᩠᩶ᩃᩯ | ||||
ᨴ᩠᩶ᩃᩯ | ||||
ᨳᩮᩬᩥᩡ᩻ ᨳᩮᩥ᩠ᨠ᩻ | bruised /tʰɤʔ tʰɤʔ tʰɤːk tʰɤːk/ |
p314. (2 syllables) | ||
ᨳᩮᩥᩬᩡ᩻ ᨳᩮᩥ᩠ᨠ᩻ | With vowel order of the USE as at December 2021 | |||
ᨾᩉᩫᩖᨿᩰᨴᩤ | great army /maʔ hon yoː tʰaː/ |
NTDPLM p511. |
Introduction | Control Panel | The Tests | Notes | My Fonts |
Short Name | Full Reference |
---|---|
N3207R | Everson M., Hosken M. & Constable P. Revised proposal for encoding the Lanna script in the BMP of the UCS, ISO/IEC JTC1/SC2/WG2/N3207R, L2/07-007R |
MFL | Rungrueangsi, Udom (2004) [1991]. Lanna-Thai Dictionary, Princess Mother Version พจนานุกรมล้านนา ~ ไทย ฉบับแม่ฟ้าหลวง ᨻᨧᨶᩣᨶᩩᨠᩕᩫ᩠ᨾᩃ᩶ᩣ᩠ᨶᨶᩣ ~ ᨴᩱ᩠ᨿ ᨨᨷᩢ᩠ᨷᨾᩯ᩵ᨼ᩶ᩣᩉᩖ᩠ᩅᨦ [Photchananukrom Lanna ~ Thai, Chabap Maefa Luang] (in Thai) (Revision 1 ed.). Chiang Mai: Rongphim Ming Mueang (โรงพิมพ์มิ่งเมือง). ISBN 974-8359-03-4. |
big blue book | Wacharasat, Bunkhit (2003). Language of Mueang Lanna ᨽᩣᩈᩣᨾᩮᩬᩨᨦᩃ᩶ᩣ᩠ᨶᨶᩣ ภาษาเมืองล้านนา [Phasa Mueang Lanna] (in Thai). ISBN 974-85472-0-5 |
Apiradee | Techasiriwan, Apiradee อภิรดี เตชะศิริวรรณ. พัฒนาการของอักษรและอัขรวิธีในเอกสานไทลื้ [Patthanakan khong Akson lae Akhara Witi nai Ekasan Thai Lue] Development of Tai Lue Scripts and Orthography. MA Thesis, Chiangmai University (in Thai) |
NTDPLM | Arunrat Wichiankhiao et al. อรุณรัตน์ วิเชียรเขียว (1996). ᨻᨧᨶᩣᨶᩩᨠᩕᩫ᩠ᨾᩃ᩶ᩣ᩠ᨶᨶᩣᨨᨻᩕᩰᩬᩡᨣᩤᩴᨴᩦ᩵ᨷᩕᩤᨠᩫ᩠ᨭᨶᩱᨷᩱᩃᩣ᩠ᨶ พจนานุกรมศัพท์ล้านนาเฉพาะคำที่ปรากฏในใบลาน The Northern Thai Dictionary of Palm-Leaf Manuscripts. ISBN 974-7067-77-2 |
Chiengtung | Chieng Tung: Its Way of Life ᨡᩮᨾᩁᨭᩛᨶᨣᩬᩁᨩ᩠ᨿᨦᨲᩩᨦ [Khemarattha Nakon Cheng Tung] เขมรัฐนครดชียงตุง [Khemarat Nakhon Chiang Tung] (in Thai, Tai Khün, French and English) Chiang Mai: Wat Tha Kradas (วัดท่ากระดาษ) |
L2/17-120 | Wordingham J.R. Corrections to the Indic Syllabic Category for the Tai Tham Script, L2/17-120 |
N3384 | Hosken M. Tai Tham Subjoined Variants, ISO/IEC JTC1/SC2/WG2/N3384, L2/08-073 |
This is Version 2.12 of the web page, which has been written by Richard Wordingham.
Version | Date | Changes |
---|---|---|
1.0 | 14 June 2015 | Initial 'stable' (i.e. abandoned) version. Work had started on 27 February 2015, and there may be earlier versions around. |
1.1 | 25 September 2016 | Converted from XML to HTML (by stripping off XML header) for new website. |
2.0 | 25 October 2016 | Added option to dynamically switch fonts - free font Da Lekh Seri for exposure to rendering engine foibles, and encumbered font Da Lekh for resistance. Both fonts are open source, but I created all the inked glyphs for the Da Lekh Seri font. ('Seri' means beholding to no-one.) Completed references, and improved, pruned and extended the examples. |
2.1 | 26 October 2016 | Corrected typos. Started testing of display bases. |
2.2 | 7 November 2016 | Fixed transliterator bug. Added examples from testing of Da Lekh font work-arounds. Corrected more typos. Tested language sensitivity. |
2.3 | 14 November 2016 | Added styles to force Lao forms. Reorganised 'test and tell'. Added one new test word, for mai kam followed by mai sam. |
2.4 | 14 April 2017 | Improved 'bran' bug alert. Added 'A Tai Tham KH' font with and without ccmp enabled. The radar buttons are hidden, and anyone enabling them would also have to supply the font. Added test for double acting MEDIAL LA. |
2.5 | 8 July 2017 |
Added test for tone plus SIGN OY. Added colour fonts to show phonetic position of subscripts relative to vowel. Added "onclick" for radio buttons. |
2.6 | 22 February 2018 |
Added test cases for karan on vowels and medial la following preposed vowel. |
2.7 | 12 May 2018 |
Added three new fonts - 'A Tai Tham KH', 'Hariphunchai' and my extension of the latter, 'Lamphun'. Added a few more examples of the ᨶᩣ ligature. Added query as to when ᨲᩬᩴ᩵͏ᩯ᩠ᨶ should render properly. Colour for spell-checking is now a reality. |
2.8 | 17 February 2019 |
Added test cases for ᩃᩮᩞ and ᨻᩕ᩠ᨿᩮᩡ. |
2.9 | 9 December 2021 |
Corrected feature ss99 to ss19. Clarified rôle of Da Lekh Si fonts. Updated repertoire of Da Lekh Seri fonts. Added play area for readers to try the fonts out. Massively duplicated navigation bars to avoid need to scroll to top or bottom. Made the difference between strings that shall be rendered and other possible sequences clearer. Promoted four test and tell cases to test cases - three sequences for displaying marks and one for the combination of RA HAAM and MAI SAM. Linked to my font compiler. |
2.10 | 30 December 2021 |
Noted that colour now works even in IE 11 and also in LibreOffice. Added most recent (2019) version of Hariphunchai, dubbed Hariphunchai 4. Added USE-compatible encodings to avoid maligning any fonts that assume a USE-compatible encoding. |
2.12 | 23 January 2022 |
(Including changes to 2.11). Fixed miscellaneous typos, including alternative encodings of ᨶ᩶ᩣᩴ. Changed shortcomings of 'my fonts' to shortcomings of 'Da Lekh'. Changed site from HTTP to HTTPS. Changed background for USE encoding from red to orange to avoid clash with coloured fonts. |
This web page has been developed with frequent testing on Firefox Version 54 and occasional viewing using Safari on iPhone (iOS 10.3.2), IE 11 (on Windows 7) and Microsoft Edge (on Windows 10).
Switching fonts has been tested in all these browsers.
Introduction | Control Panel | The Tests | Notes | My Fonts |
You may freely use my four fonts mentioned here without modification and may freely examine my fonts. See the respective licensing for conditions and modification. I do not own all the intellectual property rights for the Da Lekh and Da Lekh Si fonts. The fonts are available as follows:
Name | Font file | Source file | Licence file |
---|---|---|---|
Da Lekh (ᨯᩣᩃᩮ᩠ᨡ) |
dalekh.ttf | File dalekh.txt in
dalekh.zip. This is also the
ultimate source code for the Da Lekh Seri font. See Makefile
therein for preprocessing directives. |
DejaVu licence |
Da Lekh Si (ᨯᩣᩃᩮ᩠ᨡᩈᩦ) |
dalekh_si.ttf | ||
Da Lekh Seri (ᨯᩣᩃᩮ᩠ᨡᩈᩮᩁᩥ) |
dalekh_seri.ttf | Either start from the source code, which is subject to
the DejaVu licence, for the Da Lekh font, or use the preprocessed file
dalekh_seri.txt. If the GNU Compiler
Collection is available,
one may use the following command to generate the
immediate 'source' code:cc -E -fdirectives-only -DSERI -x c dalekh.txt | grep -v ^# >| dalekh_seri.txt
|
seri_license.htm |
Da Lekh Si Seri (ᨯᩣᩃᩮ᩠ᨡᩈᩦᩈᩮᩁᩥ) |
dalekh_si_seri.ttf | Either start from the source code, which is subject to
the DejaVu licence, for the Da Lekh font, or use the preprocessed file
dalekh_si_seri.txt. If the GNU Compiler
Collection is available,
one may use the following command to generate the
immediate 'source' code:cc -E -fdirectives-only -DSERI -DCOLOUR -x c dalekh.txt | grep -v ^# >| dalekh_seri.txt
|
If you wish to have WOFF files, you should either generate them yourself from the font files listed above, or simply copy them from this website.
The fonts are generated from the source code by means of a DIY font compiler that still has many rough edges. However, the source code of the font, although spartanly commented, may make it clearer what the font is attempting to do. I have endeavoured to make reverse engineering unnecessary.
The font Da Lekh is partly intended for my practical use in analysing material in the Tai Tham script. It therefore contains a large set of Latin characters to support transcription and transliteration. It also contains work arounds so that it may render properly despite problems with rendering engines.
The other purpose of the fonts is to explore issues in making an OpenType font for the Tai Tham script.
The font Da Lekh Seri is an unencumbered font intended for testing rendering engines. It therefore has, besides the glyphs for Tai Tham writing systems, just a bespoke set of (poor) ASCII glyphs; both the extra characters required by Microsoft Office and the characters recommended for the Universal Script Engine; and the characters needed for transliteration style (feature ss04) and their closure under NFC. Known existing work-arounds have been removed. This removal is implemented by compiler directives.
The font Da Lekh Si (ᨯᩣᩃᩮ᩠ᨡᩈᩦ) differs from Da Lekh in that it aims to reveal the spelling of words. This is useful when using a spell-checker, for example on Firefox. The ideal is that subscript consonants in the coda of an orthographic syllable would be distinguished from those in the onset by colour, whence the word 'Si' in the name of the font. The colour technology used works in the dominant browsers (Chrome, Safari, Firefox, MS Edge and even Internet Explorer 11) and in the word processor of LibreOffice. The colouring is also applied to chained syllables.
It is possible that Da Lekh Si may be reduced to an optional OpenType feature applied to the Da Lekh font.
The font Da Lekh Si Seri is an unencumbered font that colours glyphs in the same way. Like Da Lekh Seri, it deliberately lacks work-arounds for problems with renderers. It is intended as an aid for the development of the Da Lekh Si font.
The Lamphun font is available under the SIL open font licence; the applicable customisation declares that "Hariphunchai" and "Lamphun" are reserved font names. The font file is lamphun.otf and what I have used as 'source' code to build the font is an untidy mess assembled in lamphun.zip:
Rôle | Name | Remarks | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Glyphs | Hariphunchai.otf | A version of the font dated 5 May 2014, taken from SourceForge. The 'unique identifier' in the name table is FontForge : Hariphunchai : 5-5-2014. There were later .sfd and .fea files at the same location, but at best they offered improved glyphs compared to Lamphun. This is the file that defines the 'early' Hariphunchai font as used on this web page. |
||||||||||||||||
OTL tables | lamphun.txt |
This defines a font with the same glyph numbering, but with blank glyphs. I then replace 7 tables in the early Hariphunchai font with tables from this new font:
|
||||||||||||||||
Change log | fontlog.txt | Only for Lamphun. | ||||||||||||||||
Make file | lamphun_makefile | The compiler invoked by '~/oft/parse' is my DIY font compiler. |
It is likely that I will create a variant coloured to indicate spelling.
There are two versions of the font used on this page. The fonts themselves are distinguished by the unique font identifiers in their name tables.
The early version of the Hariphunchai font, whose 'unique identifier' is
FontForge : Hariphunchai : 5-5-2014, is available as
Hariphunchai.otf
both on
SourceForge and within the
Lamphun source
zip file. The reversibly generated WOFF file is available
here.
The 2019 version, whose 'unique identifier' in the name table is
TragerStudio : Hariphunchai : 19-5-2019,
is available as Hariphunchai4.otf
on
SourceForge.
The reversibly generated WOFF file is available
here.
The licence is available on Source Forge. The WOFF files, being derivative works, are licensed under the same licence. As the original OTF files can be recovered from them, they preserve the font names.
Introduction | Control Panel | The Tests | Notes | My Fonts |