Hi!

Some good news for beginning Pali students and Pali teachers?

Over the last few months, I have been doing some research into word
frequency in the Pali Canon. Brother Jim at Aukana in England produced a
list of unique word *forms* in the Pali Canon on the CSCD and the "count" or
frequency for each of those unique word forms. (ie dhamma has a count,
dhammena has a count, etc).

I took that list, sorted it by frequency and discovered something amazing.
The top 1000 word *forms* (exactly as you see them in the Pali Canon)
account for 55% of all the words you will see in the Pali Canon.

Oddly enough, many of these words do not appear at all in the free Pali
courses for beginners.

The Pali Canon on the CSCD has a total of approx. 2,700,000 words.

The total number of unique word forms in the Pali Canon is 152,922.

Total entries in the Paliwords dictionary: 20,119

Here is a summary of the frequency breakdown of word *forms*:

001 - 100 843,592 occurences (31%!)
101 - 200 169,092
201 - 300 109,124
301 - 400 83,892
401 - 500 66,760
501 - 600 57,025
601 - 700 48,312
701 - 800 42,217
801 - 900 37,467
901 - 1000 33,361
Top 1000 word forms total: 1,490,842 approx. 55% of all words.

When I began my Pali studies, I became a little bit frustrated. I was
studying lots of words and grammar, but I still had a lot of trouble
actually reading the Pali texts. After a while, it occurred to me that the
beginning Pali courses simply did not contain many of the "most common
words" and "most common word forms".

Obviously, if you study a word you wish to be certain that the word is used
very frequently in the Pali Canon. By studying the word *forms* you can
learn:
a) important vocabulary
b) important vocabulary exactly as you will see the word in the Pali Canon
and
c) the grammar of that word form in a "meaningful, useful and
easy-to-remember" context.

Personally, I am firmly convinced that this word study list is the "missing
link" for beginning Pali students. After all, it's not hard to memorize 1000
word *forms* and that is 55% of the words *as you see them* in the Pali
Canon.

Keep in mind: many of these words are *also* used in compound words (and the
basewords occur using less common declensions). The 55% number does *not*
include the use of these words in compound words and their use with less
common declensions!

So that you can get a look at the top 1000 word forms in the word list (and
so that anyone can use it!), I would like to upload the list to the "Files"
section as a MS-Works spreadsheet using the "LeedsBit PaliTranslit" font.
The list has the word form and the count for that word form. Hopefully both
teachers and students can use this list to "optimize" their Pali course
work.

I will post it as a spreadsheet so that people can easily sort it by "total
word form occurences" or alphabetically. The file is 12K zipped and 30K
unzipped.

Would you like me upload the file? How can I do this?

peace from

Andy


[Non-text portions of this message have been removed]