Re: Comparison of Pali suttas in different versions

Hi Chanida,

Below a copy of an email I posted earlier this year to the Pali group (did
not find the link on the list)
It might help you to a certain extant, but is purely statistical and just a
rough approximation - not between suttas individually but on the level of
books.

===
... As you all know the "Tipitaka" consists of various text strata. This is
very obvious of course to anyone reading and comparing the vocabulary, style
and grammatical expressions used in the Vinaya, Sutta and Abhidhamma texts.
Prof. Kingsbury did a statistical analysis on this a couple of years ago (see
here<http://docs.google.com/viewer?a=v&q=cache:YQUw8L4F9WsJ:www.ling.upenn.edu/~kingsbur/inducing.pdf+paul+kingsbury+pali+university+penn&hl=en&gl=us&pid=bl&srcid=ADGEEShpmHsMa8-J_hM6MWDnj1M4JUtuOKd-jORCS-P_zQv0l2PnbgmXEz3CjSBpgz8gMpnlu5W3bi9H6Gq8tr94h6j4RnjmxjJxy34y3hqmjwecS50s97iUa4TFL2sPGhp_VFx5q7vh&sig=AHIEtbQoWszD1QN49uo-4RNw676XN4Apvg>
).

So, whenever someone uses CST4 or similar tools for searching and comparing
text snippets one can see that certain expressions always seem to surface in
certain books while others would contain not a single entry for that
particular word or phrase (take for instance "sabhāv*" - you won't find it
in the 4 Nikāya (for obvious reasons), but already the Milinda mentions it,
etc.)

So, while Prof. Kingsbury's approach was very straightforward (but complex),
it only covered a small portion of available books and only categorized
those few into three basic categories (early, middle, late text strata).

Taking a much simpler approach I created the following report which you can
download (see link below). What I was interested in was to map out,
automatically, the relationship (in percentages) between all canonical and
post-canonical books based on their similarities.

Based on that idea I wrote a little program which extracted a-declension
nominative forms as indicators of a certain semantic proximity (text-chain)
from all 217 books (VRI Tipitaka edition) and compared them against each
other (> 47089 combinations).

I sorted the resulting table by percentage and uploaded it as well (see
below). Of course the results are crude as we are just comparing one
characteristic (nom. sing. a-decl). However, because this test is applied to
the entire range of texts we can still use the percentages as a simple
indicator of proximity. The closer a percentage between two books the more
vocabulary they share. This is esp. interesting when we compare the
relationship between multiple books. One could play around with this even
more, comparing other grammatical features and then overlaying those
percentages to arrive at an even stronger indicator of the relationship
between the various books.

However, for my purposes, this first run (took 2 hours to complete) was
already more than enough. I guess there is tons of information especially
for those among you who are lexicographers etc. and you are welcome to
re-use etc. the source code which I uploaded as well.

But it is quite interesting to see which books form groups in terms of their
"semantic" (vocabulary) proximity. For instance you will see that the 4
Nikaya share a great percentage in similarity as expected. We can also see
that parts of the AN match the Puggalapannatti or observe the closeness
between Nettipakarana and Petakopadesa. From here we can go through the list
and discover interesting relationships which may have been not that obvious.

So this might help some of you find the "next best book" to read / study.

Download the report here:

http://www.nibbanam.com/pali_language_tools.html#pprox

mettāya,

Lennart

On Fri, Nov 12, 2010 at 3:09 AM, Poe <jchanida@...> wrote:

>
>
> Dear friends,
>
> May I ask for help, please?
>
> I am looking for books or research papers that compare Pali suttas in
> different versions of the Pali canon, such as the PTS, Siamese, Burmese and
> Sinhalese versions. I myself know only some works that compare the Pali
> canon with the Chinese Agamas or Gandhari texts, but not a comparison among
> versions of the Pali canon. Would very much appreciate your kind
> suggestions. Comparison of Pali canon, either part or whole, is of interest.
> Thank you very much in advance and looking forward to your reply.
>
> With metta,
> Chanida
>

[Non-text portions of this message have been removed]