--- In Pali@yahoogroups.com, "Ole Holten Pind" <oleholtenpind@...> wrote:
>
> Dear Dmytro,
>
> Your list would be very interesting to me. I regard distribution
lists to be
> the way to solid linguistic research of the language of the canon.
>

Do you think it might be a good idea to do some trimming in the corpus
first? There are so many repeated formula, both in doctrinal sections
and elsewhere, that many words show a much higher frequency than they
ought to.

Perhaps an approach would be to take all passages that are repeated
verbatim (such as doctrinal lists and opening formulas) and only count
them once total in the sample.

I suppose it would depend on the purpose of the frequency list. For
pedagogical purposes it's probably a good idea that upasa.mkamitvaa
and saavatthiya.m have artificially high rankings. But for studies of
the language itself, that would appear to be misleading.

best regards,

/Rett