Nicholas Bodley wrote:
> My primary interest is in correct electronic rendering of text in
> these languages, [...]
So one of your interests is primary? ;-)
> What I'm wondering is whether dictionary lookup is desirable (or
> almost essential) in Chinese, Japanese, and Korean. My knowledge of
> these languages is optimistically described as spotty, but I suspect
> that some, if not all, can look bad if line breaks appear almost
> anywhere. I simply don't know Korean at all, but it seems that in
> Japanese and Chinese, small sequences of characters are better left
> intact, not split across lines.
For Chinese, I have noticed that there are three styles of line breaking,
with an increasing degree of complexity:
1) Just trim widely anywhere in the text, including *before* a punctuation
mark (resulting in a dot or comma at the beginning on a line), or after an
open parenthesis or quotation mark (resulting in opening punctuation at the
end of a line). This style is common in the body text of daily newspapers.
2) Break anywhere between two Chinese logographs, but not before a
punctuation mark, after an open parenthesis, between a number in Arabic
digits, etc. This corresponds to the Unicode line-break algorithm and to the
common practice in typography.
3) As 2, but also don't split two Chinese logographs being part of a
compound word, which requires human intervention or a dictionary-based
algorithm. I heard this is used in fine typography, but never actually saw
it.
_ Marco