HP3000-L Archives

October 1999, Week 1

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Wirt Atmar <[log in to unmask]>
Reply To:
Date:
Tue, 5 Oct 1999 10:09:36 EDT
Content-Type:
text/plain
Parts/Attachments:
text/plain (92 lines)
John asks:

> Has anyone already programmed a Standard Deviation routine in Powerhouse
>  Quiz?
>  I a) am not very good on Math and
>    b) don't understand Dtandard Deviations and
>    C) don't wish to reinvent the wheel.

Although I can't help you with Quiz specifically, I can give you the
equations you need (they're not complicated). Let me also bore you a bit with
the philosophy underlying the processes too, if you don't mind.

Statistics generally serves three purposes: 1) to compress sampled data into
standard yardsticks, 2) to get some measure of the variation within the
measured population, and 3) and to predict the probability that a single
result (a sampled population) could have been created by a postulated process.

All distributions talk of populations.  If you ever hear the words variation,
distribution, statistics or probability, you should always immediately think
of a population.  More than that, you should think of the manner in which the
population distributes itself.

A normal distribution is the most common because it results from each
individual "particle" in the population making many left/right, up/down,
yes/no kind of decisions.  Each decision has exactly a 50/50 chance of being
chosen to be one of the two options at each decision point. If you allow a
large population of particles to repetitively interact in some physical
manner with the environment, so that each particle has made hundreds or
thousands of such left/right decisions, each independent of the others, the
result you get is a "normal" (or Gaussian) distribution of paths taken (and
end results measured).

The simplest, commonly used yardstick is the mean (or average) of a
population.  The mean is simply the sum of all of the values (end results)
measured divided by the number sampled.  That is

                  mean = SUM[x(1),x(2),....,x(n)]/n

where n is the number of measurements in your sample.

The most commonly used statistic to describe the variation found within the
population of results is the variance, that is

               pop. variance = SUM([x(i)-mean]^2)/n     for i=1,...,n

using the mean value calculated above.

The mean and variance are also the two parameters necessary to completely
specify a normal distribution. Thus, they are automatically important
statistics because of the overwhelming physical commonness of the normal
distribution.

When are they likely to fail as useful statistics?  Quite obviously, when the
distribution of the population is non-normal.  A normal distribution is not
the only distribution with a profound physical basis (although almost all
other distributions can be expressed in
terms of the normal distribution by reworking the basis of measurement, not
unlike subtracting out the effect of gravity in biasing the distribution of
sprayed water from a garden hose).

What's the difference between the "population variance" and the "variance"?

The population variance is the variance calculated for the population of data
points you entered.  This variance is not likely, however, to be truly
indicative of the variance that actually exists in the larger parent
population from which the data was sampled.

In fact, the smaller the number of data points measured, the smaller the
value of the variance calculated for a given sample in comparison to the
actual process producing the measured points.

For this reason, a correction is usually imposed such that the sample
population variance is made larger by the factor n/(n-1), where n is the
number of data points in the sampled population,

            unbiased variance = n/(n-1) * [pop. variance above]

The resulting variance is called either the parent population variance or
simply variance.  The correction factor is called an unbiased estimator.  It
is easy to see, however, that as more and more points are sampled, the two
variances converge.  And, of course, that is
intuitively what you would expect.

The clear moral is thus: when you refer to the variance of a process, use the
unbiased estimated variance.

The statistic termed "standard deviation" is simply the square root of the
variance.  Likewise, a population standard deviation is the square root of a
population variance.

Wirt Atmar

ATOM RSS1 RSS2