> Richard Wordingham wrote:
>
> > > >By the way, having jettisoned seven of their 35 characters, the
> > > >authors announce that they have 29 left. This is a trivial point, of
> > > >course, but it does nothing to intill confidence in the care and
> > > >attentiveness of the authors.
> >
> > H.M. Hubey wrote:
> > > Throwing out bad data is sanctified, AFAIK, in stats. Let's ask
Richard.
> >
> > The authors appear to be saying that 35 - 7 = 29. I'd sooner believe
> > 6 * 9
> > = 42.
>
> According to some this is sufficient reason not to pay attention to
> anything he writes :-)
>
> Let's see how this would take off: Aha, LT

It's the authors of the paper, not Larry Trask, who appear to be incapable
of counting. The key point is performance, not potential capability. LT
thinks the authors are sloppy.

> > Throwing out 'bad data' dispels confidence. One's suppposed to decide
> > what
> > analysis and then look at the data, because of the dictum that 'every
> > set of
> > data is peculiar'. Of course, that's far easier said than done. Then
> > there's the infamous exam question:
> >
> > 'N shots are fired at a circular target, and the positions of impacts on
> > that target are recorded. How does one estimate the parameters of the
> > miss
> > distribution? (You may assume that the the horizontal and vertical
> > components are independent, normally distributed with zero mean and with
> > equal variances.) Now, if none of the shots hit the target, there
> > would be
> > a court martial instead of a statistical analysis. How does this
> > affect the
> > estimates? '
> >
> > It's infamous because there is no agreement on the correct answer to the
> > second part.
> >
> > It's relevant because misses (arguably 'bad data') do affect the
> > estimation
> > of the standard deviations.
>
> Certainly, but one way is to "fix" the bad data, and the other is to
> throw it out. Is that not one of the
> methods in use?
>
> In order to "fix" it you need a model, then you use the model to
> estimate where the data point should
> be, but then you don't gain anything so why bother? I never understood it.

I'm not acquainted with 'fixing' the data; is this a technique for
recovering the information that is in the data?

In the example I gave, getting 2 hits out of 10 in itself tells you
something about the accuracy. Using only the dispersion of the two hits
would grossly overestimate the accuracy. (It's a good technique for biasing
the results. I must remember to use it. -:) In practice, you might have an
issue with some firings going wrong, so you might have near normal variation
from targeting errors, minor projectile variations, and a completely
different source of error from projectiles breaking up in flight.

Setting misses to 1.01 radii would 'fix' the data ('a miss is as good as a
mile') but would bias the results. On the other hand, using miss distances
from 'rogue projectiles' would result in the calculation of a misleading
r.m.s. miss.

> I meant "why would they search the tree, etc etc (being facetious of
> LT). There are different ways to
> create trees. They optimize something. A "search" is not necessarily a
> literal exhaustive enumeration. It is
> merely a way of selecting/creating some tree out of N possible trees.

The issue, which I suppose someone ought to address, is whether there it is
'obvious' that the best tree has been chosen.

Richard.

P.S. I got in-line comments plus original again.