Hi Andy
This list of word frequencies would be a real help to learners, and I would
love to get a copy of it. It also demonstrates that Pali is a pretty
typical language.
However, just knowing the word is just a start - what it means and how it is
used is fairly important information too. What could also be helpful,
beyond a simple word list (and you may have already done this anyway) is to
have detailed information on the *form* (case, gender, etc) as well as the
root(s) so it can be easily traced in dictionaries for meanings etc. One of
the difficulties for beginners is finding the correct entry in dictionaries.
This may not help with selecting the appropriate meaning, or indeed
working out the actual meaning from the suggestions in the dictionaries, for
the passage in question, but it would be a quick and useful tool. How many
of us have been saved grief by a quick look in Whitney's Verb Forms, for
example, in Sanskrit?
On the subject of useful references - you will also be aware of the work of
Yamazaki and Ousaka on Pada indices. These help heaps in finding that
elusive passage.
Robert Didham
>From: "Andy" <721910352@...>
>Reply-To: Pali@yahoogroups.com
>To: <Pali@yahoogroups.com>
>Subject: [Pali] Pali by Numbers - 1
>Date: Sat, 27 Apr 2002 07:34:15 -0700
>
>Hi!
>
>Some good news for beginning Pali students and Pali teachers?
>
>Over the last few months, I have been doing some research into word
>frequency in the Pali Canon. Brother Jim at Aukana in England produced a
>list of unique word *forms* in the Pali Canon on the CSCD and the "count"
>or
>frequency for each of those unique word forms. (ie dhamma has a count,
>dhammena has a count, etc).
>
>I took that list, sorted it by frequency and discovered something amazing.
>The top 1000 word *forms* (exactly as you see them in the Pali Canon)
>account for 55% of all the words you will see in the Pali Canon.
>
>Oddly enough, many of these words do not appear at all in the free Pali
>courses for beginners.
>
>The Pali Canon on the CSCD has a total of approx. 2,700,000 words.
>
>The total number of unique word forms in the Pali Canon is 152,922.
>
>Total entries in the Paliwords dictionary: 20,119
>
>Here is a summary of the frequency breakdown of word *forms*:
>
>001 - 100 843,592 occurences (31%!)
>101 - 200 169,092
>201 - 300 109,124
>301 - 400 83,892
>401 - 500 66,760
>501 - 600 57,025
>601 - 700 48,312
>701 - 800 42,217
>801 - 900 37,467
>901 - 1000 33,361
>Top 1000 word forms total: 1,490,842 approx. 55% of all words.
>
>When I began my Pali studies, I became a little bit frustrated. I was
>studying lots of words and grammar, but I still had a lot of trouble
>actually reading the Pali texts. After a while, it occurred to me that the
>beginning Pali courses simply did not contain many of the "most common
>words" and "most common word forms".
>
>Obviously, if you study a word you wish to be certain that the word is used
>very frequently in the Pali Canon. By studying the word *forms* you can
>learn:
>a) important vocabulary
>b) important vocabulary exactly as you will see the word in the Pali Canon
>and
>c) the grammar of that word form in a "meaningful, useful and
>easy-to-remember" context.
>
>Personally, I am firmly convinced that this word study list is the "missing
>link" for beginning Pali students. After all, it's not hard to memorize
>1000
>word *forms* and that is 55% of the words *as you see them* in the Pali
>Canon.
>
>Keep in mind: many of these words are *also* used in compound words (and
>the
>basewords occur using less common declensions). The 55% number does *not*
>include the use of these words in compound words and their use with less
>common declensions!
>
>So that you can get a look at the top 1000 word forms in the word list (and
>so that anyone can use it!), I would like to upload the list to the "Files"
>section as a MS-Works spreadsheet using the "LeedsBit PaliTranslit" font.
>The list has the word form and the count for that word form. Hopefully both
>teachers and students can use this list to "optimize" their Pali course
>work.
>
>I will post it as a spreadsheet so that people can easily sort it by "total
>word form occurences" or alphabetically. The file is 12K zipped and 30K
>unzipped.
>
>Would you like me upload the file? How can I do this?
>
>peace from
>
>Andy
>
>
>[Non-text portions of this message have been removed]
>
>
_________________________________________________________________
Join the world�s largest e-mail service with MSN Hotmail.
http://www.hotmail.com