Dear Yong Peng and others,
I think i recently mentioned to Harry/Yong that i am working on a search function for the Pali Text Reader. A part of that effort resulted in the creation of the word list mentioned by Yong Peng (which i corrected slightly and sorted according to the Pali alphabet during the last 2 weeks - not manually though :-)). For anyone interested i set-up a subdomain which (although not finished yet) is another result of ongoing Pali Text Reader development. It seems that the small tool-set i built for doing some corpus related analysis on the Pali Canon is evolving into some program of its own:
http://paliconcordance.nibbanam.com
You may have a look for yourself. The problem i may face will be the webspace available to me, i fear (the concordance if uploaded, is going to be huge, several GBs - even the performance in generating such a concordance made my computer shut down itself once (safety measure because of overheating :-)
Note beside: The link above only serves the vowels right now ...
My slightly corrected word list is now at about 960,000+ unique "terms". If some further time is spent on elaborating on the corpus toolset it wont be a big problem to extract, for instance, all verbs (ending in -*ti) from the list (let me know if you need such a sub-list, Yong!) etc... it's just a question of the proper algorithm. In any case such a word list is really helpful - be it for developing a simple automatic translation tool or a fast and comprehensive search tool.
mettâya,
Lennart
----- Original Message -----
From: Ong Yong Peng
To: Pali@yahoogroups.com
Sent: Sunday, February 19, 2006 7:05 AM
Subject: [Pali] Re: self-introduction
Dear Ven. Pandita, Harry, Badra and friends,
it is also one of the objectives of Pali Scope to include the
paradigms of all verbs, nouns and pronouns.
According to the word list generated by Lennart (see msg#9807), there
are above 900,000 unqiue words in the Tipitaka. The long list takes
up 26,382 pages when opened in MS Word! I worked that out to be
approximately 976,134 unique word forms. Based on my understanding of
the Pali language and the texts, I worked out a conservative estimate
of 43,000 uninflected words (I suspect the actual number to be less,
there are roughly 20,000 entries in PED).
I suggest that we work together to set these up, which will be very
useful for Pali students.
metta,
Yong Peng.
--- In Pali@yahoogroups.com, Harry Liew wrote:
Please kindly let me know if you need a volunteer.
> 1. To get all possible noun stems and case-endings from the input
of a noun
> 2. To get all possible verbal stems, verbal endings, and
paradigmatic forms from the input of a conjugated verb
> 3. To set up a dictionary available in the public domain to help
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Paa.li-Parisaa - The Pali Collective
[Homepage] http://www.tipitaka.net
[Files] http://www.geocities.com/paligroup/
[Send Message] pali@yahoogroups.com
Yahoo! Groups members can set their delivery options to daily digest or web only.
SPONSORED LINKS Pali Theravada buddhism Beyond belief
Tibetan buddhism Zen buddhism
------------------------------------------------------------------------------
YAHOO! GROUPS LINKS
a.. Visit your group "Pali" on the web.
b.. To unsubscribe from this group, send an email to:
Pali-unsubscribe@yahoogroups.com
c.. Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.
------------------------------------------------------------------------------
[Non-text portions of this message have been removed]