Re: 28th Unicode Conference-Call for Papers-Orlando, FL-September 7

Hi all - just came across this on the Straight Dope web site at
http://www.straightdope.com/mailbag/mindusscript.html

Comments, currently not really on point for qalam, are available at
http://boards.straightdope.com/sdmb/showthread.php?t=315899

Best,

Barry
-------------------------------------------------

A Staff Report by the Straight Dope Science Advisory Board
How come we can't decipher the Indus script?

10-May-2005

Dear Straight Dope:

I just got a book on ancient civilizations. In the chapter dealing with
written languages, they list Egyptian hieroglyphics, Mesopotamian
pictographs, and Indus script as the three oldest known written
languages. The book goes on to say Indus script has never been
deciphered even though over 2,500 examples of it exist. Maybe I've
watched too many sci-fi movies where a master linguist deciphers alien
languages, but I really thought we had terrestrial languages mastered.
What's the deal with Indus script? Is the art of linguistics still held
hostage by our inability to decipher ancient languages without a "key" à
la the Rosetta Stone? --Troy Dayton, Fargo, ND

SDSTAFF bibliophage replies:

Too much science fiction? No such thing. Star Trek, for example, teaches
us that a good communications officer can send a message that transcends
mere language, especially if she has legs down to here and a hemline up
to there. Mmmmm. Mm-HMMMmmmmm . . . er, sorry. Was I saying something?

Yes, I was. The Indus script, which was written in and around Pakistan
over a period of several centuries centered around 2500 B.C., is the
most famous undeciphered script, but there are many others. Other
mystery writing systems include Linear A (Greece, 1800 B.C.), Zapotec
(Mexico, 500 B.C.), Meroitic (Sudan, 300 B.C.), Isthmian (Central
America, A.D. 200), Rongorongo (Easter Island, A.D. 1800) and Joycean
(Ireland, A.D. 1900). Okay, maybe not that last one.

Why haven't they been deciphered? It's instructive to look at some
deciphered scripts to see what makes the enigmatic writing of the Indus
valley different. Script decipherment is not as easy as it's made out to
be in science fiction--and sometimes not as easy as it's made out to be
in history books. Chances are the impression you took away from school
was that the Rosetta stone made it child's play to decipher Egyptian
hieroglyphics. Not so. How many schools teach that some of the best
minds in the world pored over the Rosetta stone for a quarter century
before it finally revealed its secrets?

One of the biggest obstacles was that the ancient Egyptians used a
writing system unlike anything known when the Rosetta stone was
discovered in 1799. Scholars knew about logographic systems like
Chinese, where there are thousands of symbols, each normally
representing a whole word or idea. They knew about alphabetic systems
like Hebrew and English, where there are typically 20 to 30 symbols,
each normally representing one consonant or vowel. Some scholars may
have known about syllabaries, with several dozen symbols each
representing one syllable, as in Japanese hiragana and katakana. But
Egyptian hieroglyphics had too many distinct symbols to be an alphabet
or syllabary, and too few to be logographic.

The decipherment published by Champollion in 1823 (building on work by
many others, including Thomas Young) showed that Egyptian hieroglyphics
were (neglecting some complications) a logo-phonetic system. In such a
writing system, any given symbol can represent either an entire idea or
word, or the sound (or initial sound) of that word. Some simple ideas
can be expressed efficiently with a drawing of the object or an object
it's associated with. But to express an abstract idea that can't be
readily drawn, you can use a string of sounds. Suppose you want to
express the English word "charitable" without an alphabet. You could
draw a picture of a chair and a table (since "chair table" sounds sort
of like "charitable"). This is the rebus principle. Today we may
consider rebus puzzles to be nothing but a silly game, but to the
ancients, they were a natural way to write a language. Other early
scripts, like Mayan hieroglyphs and Mesopotamian cuneiform, are built on
the same principle.

The rebus approach may seem an unwieldy way to write a language, but
it's a step up from non-linguistic pictograms. A picture of a chair and
a table can only convey "chair and table," or at best an idea associated
with a chair and table, such as the act of sitting down at a table. An
abstract concept such as "charitable" is difficult to get across using
pictograms. Writing systems built on the rebus system are a way of
filling the void, but have the drawback (for us latter-day translators)
that, unlike pictograms, they'll only work in one language. For a
speaker of Latin, for example, pictograms of a chair (in Latin, sella)
and a picture of a table (mensa) would never suggest the word for
charitable (benignus).

I go into such detail about logo-phonetic systems because the Indus
script appears to have about the right number of distinct symbols (250
to 400, depending on who's counting) to use this system. Knowing that,
shouldn't it be easier to decipher the Indus script? Not really--the
decipherers of Egyptian hieroglyphics had the help of the Rosetta stone,
a bilingual or bitext (parallel texts of the same message in the unknown
script and a known script). No bitext for the Indus script has yet been
found.

A bitext is no guarantee that decipherment will be easy. Take the case
of Etruscan writing, found in Italy. At a superficial level the script
is easily deciphered, since the letters are close in form to archaic
Greek and Latin alphabets. But the language remains largely
uninterpreted. What's the difference? Given a piece of Etruscan writing,
we have no difficulty pronouncing the words, but no idea what most of
the words mean (think of a trained politician reading off a
TelePrompTer). The trouble is that Etruscan is apparently unrelated to
any language understood today. Champollion, the decipherer of Egyptian
hieroglyphics, had the advantage of knowing Coptic, which he correctly
suspected was the descendant of the ancient Egyptian language. Etruscan
has left no descendants.

The dozens of Etruscan bitexts (with Latin, Greek, or Phoenician) aren't
very helpful. All they really tell you is that a given block of
mysterious text means such-and-such. There's no sure way to tell which
Etruscan word corresponds to which word in the parallel text, since the
order of ideas and number of words vary widely among the different
languages. All is not lost, however. If, for example, a Latin word
occurs several times in a text and a mystery word occurs the same number
of times in the corresponding Etruscan text, you may be justified in
supposing that they mean the same thing. But beware--often the two
messages in a bilingual text are just paraphrases of each other, not
word-for-word translations. Still, using methods like this, together
with glosses (explicit translations of individual words in the
documents), scholars have been able to determine--or at least make a
reasonable guess at--the meanings of a couple hundred Etruscan words.

If we understand the language or a close relative or descendant of the
language, it ought to be pretty easy to decipher the script, right? Not
so fast. The Rongorongo script used on Easter Island after European
contact almost certainly represents Rapa Nui, the well known Polynesian
language of the Easter Islanders. But no one now remembers how the
script symbols are meant to be read. Steven Fischer recently claimed to
have deciphered Rongorongo, but his critics say "Wrong-o, wrong-o." I
don't know if Fischer is right or wrong, but undeciphered scripts do
seem to invite harebrained analysis. Jacques Guy bluntly calls them
"kook attractors," but even serious scholars aren't immune. Hrozný, who
correctly deciphered Hittite, later went down many wrong paths with
other scripts.

The real kooks are those like Goropius Becanus of the Netherlands, who
in 1580 proved to his satisfaction that Egyptian hieroglyphics
represented Dutch. A Jesuit priest named Heras is one of scores who have
claimed to decipher Indus script. Here's one of his translations: "There
is no feast in the place outside the country of the Minas of the three
fishes of the despised country of the woodpeckers." Whatever you say, padre.

You mention the 2,500 examples of the Indus script. The number of
available texts now exceeds 4,000, but quantity is no indication of ease
of decipherment. Some scripts have been translated with far fewer texts.
Take Palmyrene, the first ancient script ever deciphered. A handful of
inscriptions were found on the walls of the ruins of the city of Palmyra
in Syria. Scholars knew from ancient Greek writers that the language
spoken there was closely related to Syriac, a well known Semitic
language. The script was obviously derived from the known Aramaic
alphabet but many letters weren't immediately identifiable. Among the
ruins were several bilingual inscriptions in Greek and Palmyrene. If you
know the Aramaic alphabet, it's a fairly simple matter to use the
identifiable Aramaic letters and the similarity of proper names in Greek
and Palmyrene to get a good start. Then you can use your knowledge of
Greek and Syriac to fill in the blanks. Your Syriac is a little rusty,
you say? Not to worry--a decent Syriac dictionary will serve just as
well. Soon after the first decent reproductions of Palmyrene
inscriptions were published in Europe in the 1750s, Barthélemy in France
and Swinton in England independently deciphered them, each taking just a
few hours to finish the job. It was perhaps a bit more challenging than
the cryptogram puzzles you can find in your Sunday paper, but not by
much. Most decipherments, needless to say, are a good deal tougher to
crack than that.

Returning to the matter at hand, is the lack of a bitext for the Indus
script an insurmountable obstacle? Not necessarily. Some scripts have
been deciphered without them, although not without a good deal of
cleverness. Ugaritic writings, like Palmyrene, were found in Syria (in
1929), suggesting that they too might be a Semitic language. About two
dozen symbols were used, suggesting an alphabetic script. Several of the
words were only a single letter long, suggesting Ugaritic used a
consonantal alphabet written without vowels (as was the case with other
early Semitic alphabets such as Hebrew). Applying letter frequency
analysis to the problem, Hans Bauer tentatively assigned the values L
and M to two Ugaritic letters. In Semitic languages, L is common as a
single-letter word, but not so common in suffixes and prefixes; M is the
only letter that is really common in Semitic suffixes, prefixes, and as
single-letter words.

On the assumption that related languages use similar words for common
concepts (much as European languages have father/vater/pater), Bauer
then used the M and L assignments to search the texts for the expected
Semitic word for "king" (M-L-K or similar) and "kings" (M-L-K-K or
similar). Proceeding along these lines, he found the words for "son" and
the name of the god Ba`al, and so eventually determined the values of
several other letters. His real insight was to guess that the word for
axe might occur in the text inscribed on several axes. He turned out to
be right about that, but chose the wrong phonetic values (he guessed
G-R-Z-N as in Hebrew; the actual Ugaritic form was the related but not
identical H-R-S-N). Édouard Dhorme later corrected the reading and
finished the decipherment. One of the axe inscriptions said, in a
language related to biblical Hebrew, "Unto the high priest doth this axe
belong, wherefore shouldst thou keep thy hands off it!" Or something
like that. It strikes me that Bauer's guess was pretty lucky--I have two
axes in my garage but have yet to inscribe either with the word "axe."
But hey, when the high priest tells me, "Inscribe the word 'axe' on this
axe, chop-chop," I'm not about to wait around for him to axe me politely.

Ugaritic isn't the only language to have been deciphered without a
bilingual. Georg Friedrich Grotefend made considerable progress in
deciphering Persian cuneiform by looking for and finding proper names of
Persian emperors known from ancient Greek and Hebrew sources. (Henry
Rawlinson finished the decipherment in the 1830s.) The point is that
bilinguals aren't necessary to decipher an unknown script. Still, in the
case of Ugaritic and Persian, scholars had a pretty good handle on the
language the script represented before they started work. In the case of
Etruscan, where the language is largely unknown, complete decipherment
thus far has eluded us.

What do we know about the language the Indus script wrote? We can say
little for certain, but the best guess is that it's a language of the
Dravidian family, an idea that has been around since at least the 1920s.
Today most Dravidian speakers live in Sri Lanka and southern India, 800
miles or more from the Indus valley where the bulk of the Indus
inscriptions have been found. But about a hundred thousand speakers of
one Dravidian language, Brahui, live in western Pakistan and neighboring
parts of Iran and Afghanistan, not too far west of the Indus. Contrary
to earlier speculation about recent migrations, linguistic and genetic
analyses show that they have been separated from other Dravidian
speakers for at least several thousand years. Further evidence that
Dravidian or related languages were once spoken in the general area
comes from Linear Elamite inscriptions, found in the ruins of the
ancient city of Susa in southwestern Iran. The script has been
deciphered from a phonetic standpoint because of its similarity to
Mesopotamian cuneiform, but as with Etruscan, the language remains
largely unknown. A significant percentage of words in Linear Elamite
appear to be of Dravidian origin, which could mean it is descended from
a hypothetical Elamo-Dravidian ancestor language, or just that it
borrowed a lot of words from a Dravidian language spoken nearby. In
either case, the Elamite connection makes it seem more likely that a
Dravidian or related language was spoken in the Indus valley when the
inscriptions were made.

Many Indian nationalists, and some serious scholars, believe the Indus
script writes a language of the Indo-Iranian (Aryan) branch of the
Indo-European family, which includes Farsi (modern Persian), Sanskrit
and Hindi. All things considered, this seems unlikely. The inscriptions
go back to about 3200 B.C., which according to mainstream archaeological
thinking is before any Indo-Europeans had come that far southeast.
Another problem is that Indo-European peoples kept domesticated horses
and used chariots and had other cultural traits not shared with the
ancient Indus civilization. Indeed, according to the mainstream
thinking, the arrival of the Indo-Europeans in the Indus Valley around
1800 B.C. is more likely to have been the end of the Harappan culture
than the beginning of it.

If the Indus script turns out to write a language that is neither
Indo-European nor Dravidian (or Elamo-Dravidian), then the chances of
deciphering it are slim. In the words of Alice Kober, who helped
decipher Linear B, "an unknown language written in an unknown script
cannot be deciphered, bilingual or no bilingual." There are really no
other decent candidates among known languages, so we would be left with
an unknown language, and the prospects of complete decipherment would be
as poor as with Etruscan.

But faint hope is better than none. Sumerian is a linguistic isolate,
but the script has been phonetically deciphered, and the language partly
deciphered. Most of the cuneiform scripts of Mesopotamia are direct
descendants of the Sumerian script, though they're used to write
unrelated languages. Babylonian and Akkadian and some other languages
written in these related scripts were amenable to decipherment in part
because they were members of the well understood Semitic family. The
similarity of the scripts, the many Sumerian loanwords in these Semitic
languages, and the unusually large number of bilingual texts have
allowed scholars to reconstruct the Sumerian language with considerable
success despite its being unrelated to any known language. No such
combination of circumstances exists for the Indus script, and no
discoveries along these lines are seriously expected.

What will we get if the Indus script is finally deciphered--great
historical works that reveal the local political situation 5,000 years
ago? Classic works of literature like the Egyptian Book of the Dead or
the Mesopotamian epic of Gilgamesh? Insight into ancient religious
practices of the sort revealed by Ugaritic? No to all the above. The sad
truth is that the longest known Indus inscription is only 17 symbols
long. The bulk of the 4,000 or so Indus inscriptions are believed to be
simple identifying marks. Most of the inscriptions are on seals or seal
impressions, similar to signet rings or rubber stamps. So even if we
decipher the script and the language, chances are we'll discover they
say nothing more fascinating than "government property" or "John Smith"
or "tax paid." As with the revelation that Linear B wrote an archaic
form of Greek, if the Indus script is deciphered, the most interesting
fact learned will be what language the ancient script wrote--that is, if
it writes a language at all.

If it writes a language? They wouldn't call it the "Indus script" if it
weren't a script, would they? Don't be so sure. When the first
inscriptions were discovered in the 1870s in and around the Indus valley
of Pakistan, and when the early cities of Harappa and Mohenjo-Daro were
excavated in the 1920s, archaeologists assumed that civilization and
writing always went together--a complex urban culture couldn't possibly
develop without writing. The Indus sites were urban; ergo, the
inscriptions were writing.

Today we recognize that civilization and writing don't always go
together. The Inca empire, for example, was urban but lacked true
writing. Historian Steve Farmer now questions the assumption that the
Indus script is true writing. In a recent paper, he and two linguists
compare the Indus script with medieval European heraldry. Like heraldry,
they say, the Indus script may consist of discrete conventional elements
that serve as identification marks but don't encode a spoken language.

This controversial idea has some points in its favor. Considering the
corpus of texts as a whole, there's a considerable amount of repetition
among symbols, as would be expected if they wrote a spoken language. But
there's less repetition than expected within the texts, even considering
their brevity. Further, several systems of pictograms from around the
world--for example, the Vinca signs of southeastern Europe, written
about 4000 B.C.--resemble the Indus script in their use of conventional
symbols, but nobody believes they code a written language.

Traditionalists have some points in their favor too. The Indus script
was linear, that is, usually written with symbols following one another
in a line, rather than being placed randomly or in some other geometric
pattern. Linearity is found in most writing, though not exclusively so.
More to the point, the characters often crowd at the end of a line, as
if the writer wanted to avoid breaking up a word. This is a distinctive
feature of true writing. The comparison with heraldry may not hold water
either. Hittite hieroglyphics were initially considered heraldry by
serious linguists but were eventually found to be true writing and
deciphered. Much the same has been said about many other undeciphered
scripts likewise shown to be true writing.

Still, Farmer feels so strongly that the Indus script is not a real
script that he has offered a $10,000 reward for proof that it is true
writing. He will accept as proof an authenticated inscription more than
50 symbols long. Farmer thinks the extant texts are all so short because
they don't write a language. The pro-language side thinks the longer
texts once produced in Harappa and other cities have been lost because
they were written on perishable surfaces. Certainly a long text would be
a great gift to modern science. I just wish they wouldn't use the lame
excuse that they couldn't give it to us because they ran out of Harappan
paper.

Further reading

Lost Languages: The Enigma of the World's Undeciphered Scripts by Andrew
Robinson, 2002

The Story of Decipherment: From Egyptian Hieroglyphs to Maya Script by
Maurice Pope, revised edition, 1999

"The Collapse of the Indus-Script Thesis: The Myth of a Literate
Harappan Civilization" by Steve Farmer, Richard Sproat, and Michael
Witzel in Electronic Journal of Vedic Studies, Dec.13, 2004. This and
related items can be accessed from Steve Farmer's download page at
www.safarmer.com/downloads/.

--SDSTAFF bibliophage
Straight Dope Science Advisory Board

[Comment on this answer.]

Staff Reports are researched and written by members of the Straight Dope
Science Advisory Board, Cecil's online auxiliary. Although the SDSAB
does its best, these articles are edited by Ed Zotti, not Cecil, so
accuracywise you'd better keep your fingers crossed.

[ Return to the Staff Report Archive ]

The Straight Dope / Questions or comments for Cecil Adams to:
cecil@...
Comments regarding this website to: webmaster@...
For advertising information, see the Chicago Reader Online Rate Sheet
Copyright 2005 Chicago Reader, Inc. All rights reserved.
No material contained in this site may be republished or reposted
without express written permission.
The Straight Dope is a registered trademark of Chicago Reader, Inc.