Richard Wordingham wrote:
> >By the way, having jettisoned seven of their 35 characters, the
> >authors announce that they have 29 left.  This is a trivial point, of
> >course, but it does nothing to intill confidence in the care and
> >attentiveness of the authors.

H.M. Hubey wrote:
> Throwing out bad data is sanctified, AFAIK, in stats. Let's ask Richard.

The authors appear to be saying that 35 - 7 = 29.  I'd sooner believe 6 * 9
= 42.
According to some this is sufficient reason not to pay attention to anything he writes :-)

Let's see how this would take off: Aha, LT cannot even do simple substraction why should we
believe that he has any comprehension of stochastic processes, metric spaces, vector operations,
stats blah blah blah that is necessary to understand how to produce graphs, to align words/sequences,
blah blah blah..

This sounds familiar, doesn't it :-)


Throwing out 'bad data' dispels confidence.  One's suppposed to decide what
analysis and then look at the data, because of the dictum that 'every set of
data is peculiar'.  Of course, that's far easier said than done.  Then
there's the infamous exam question:

'N shots are fired at a circular target, and the positions of impacts on
that target are recorded.  How does one estimate the parameters of the miss
distribution?  (You may assume that the the horizontal and vertical
components are independent, normally distributed with zero mean and with
equal variances.)  Now, if none of the shots hit the target, there would be
a court martial instead of a statistical analysis.  How does this affect the
estimates? '

It's infamous because there is no agreement on the correct answer to the
second part.

It's relevant because misses (arguably 'bad data') do affect the estimation
of the standard deviations.

Certainly, but one way is to "fix" the bad data, and the other is to throw it out. Is that not one of the
methods in use?  

In order to "fix" it you need a model, then you use the model to estimate where the data point should
be, but then you don't gain anything so why bother? I never understood it.

It seems like the use of a regularization principle, something also used by linguists e.g. RSC.


>
>
> >Only one tree is drawn.  There is no searching of tree space, and so
> >this is not a "best tree" method.
> >
>
> Why would they do that?

Common practice!   Geneticists tend to look for the tree which requires the
fewest changes.  The robustness of such trees I think is another matter, but
its a way of producing a defensible cladogram when the branching is not
obvious.  As different weightings can produce different results, it doesn't
necessarily help, but if you use a complicated enough program you can apply
Lucifer's GIGO principle (Garbage In, Gospel Out).

I meant "why would they search the tree, etc etc (being facetious of LT).  There are different ways to
create trees. They optimize something. A "search" is not necessarily a literal exhaustive enumeration. It is
merely a way of selecting/creating some tree out of N possible trees.


Richard.



To unsubscribe from this group, send an email to:
Nostratica-unsubscribe@yahoogroups.com



Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.

-- 
Mark Hubey
hubeyh@...
http://www.csam.montclair.edu/~hubey