Richard Wordingham wrote:
> Richard Wordingham wrote:
>
> > > >By the way, having jettisoned seven of their 35 characters, the
> > > >authors announce that they have 29 left.  This is a trivial point, of
> > > >course, but it does nothing to intill confidence in the care and
> > > >attentiveness of the authors.
> >
> > H.M. Hubey wrote:
> > > Throwing out bad data is sanctified, AFAIK, in stats. Let's ask
Richard.
> >
> > The authors appear to be saying that 35 - 7 = 29.  I'd sooner believe
> > 6 * 9
> > = 42.
>
> According to some this is sufficient reason not to pay attention to
> anything he writes :-)
>
> Let's see how this would take off: Aha, LT

It's the authors of the paper, not Larry Trask, who appear to be incapable
of counting.  The key point is performance, not potential capability.  LT
thinks the authors are sloppy.

> > Throwing out 'bad data' dispels confidence.  One's suppposed to decide
> > what
> > analysis and then look at the data, because of the dictum that 'every
> > set of
> > data is peculiar'.  Of course, that's far easier said than done.  Then
> > there's the infamous exam question:
> >
> > 'N shots are fired at a circular target, and the positions of impacts on
> > that target are recorded.  How does one estimate the parameters of the
> > miss
> > distribution?  (You may assume that the the horizontal and vertical
> > components are independent, normally distributed with zero mean and with
> > equal variances.)  Now, if none of the shots hit the target, there
> > would be
> > a court martial instead of a statistical analysis.  How does this
> > affect the
> > estimates? '
> >
> > It's infamous because there is no agreement on the correct answer to the
> > second part.
> >
> > It's relevant because misses (arguably 'bad data') do affect the
> > estimation
> > of the standard deviations.
>
> Certainly, but one way is to "fix" the bad data, and the other is to
> throw it out. Is that not one of the
> methods in use?
>
> In order to "fix" it you need a model, then you use the model to
> estimate where the data point should
> be, but then you don't gain anything so why bother? I never understood it.

I'm not acquainted with 'fixing' the data; is this a technique for
recovering the information that is in the data?

Presumably.


In the example I gave, getting 2 hits out of 10 in itself tells you
something about the accuracy.  Using only the dispersion of the two hits
would grossly overestimate the accuracy.  (It's a good technique for biasing
the results.  I must remember to use it. -:)  In practice, you might have an
issue with some firings going wrong, so you might have near normal variation
from targeting errors, minor projectile variations, and a completely
different source of error from projectiles breaking up in flight.

Setting misses to 1.01 radii would 'fix' the data ('a miss is as good as a
mile') but would bias the results.  On the other hand, using miss distances
from 'rogue projectiles' would result in the calculation of a misleading
r.m.s. miss.

> I meant "why would they search the tree, etc etc (being facetious of
> LT).  There are different ways to
> create trees. They optimize something. A "search" is not necessarily a
> literal exhaustive enumeration. It is
> merely a way of selecting/creating some tree out of N possible trees.

The issue, which I suppose someone ought to address, is whether there it is
'obvious' that the best tree has been chosen.
All these algorithms optimize something and that makes the result "best".



Richard.

P.S. I got in-line comments plus original again.



To unsubscribe from this group, send an email to:
Nostratica-unsubscribe@yahoogroups.com



Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.

-- 
Mark Hubey
hubeyh@...
http://www.csam.montclair.edu/~hubey