I've been meaning to answer John Dunlop's question about how to code standard
deviation in Quiz for several days now, but just haven't had the time.
However, as long as this is official math day, let me finish up John's
question. In a previous posting, I gave the relevant equations as:
mean = SUM[x(1),x(2),....,x(n)]/n
pop. variance = SUM([x(i)-mean]^2)/n for i = 1,...,n
unbiased variance = n/(n-1) * [pop. variance above]
std. dev. = sqr(unbiased variance)
These equations, at least in this form, suggest that you will require two
passes to calculate the standard deviation, the first to get the average, and
then the second to calculate the variance of each of the individual entries
against the previously calculated mean. However, that isn't true. If you
multiply the second equation out, substituting the first equation into the
second, you only need to accumulate three variables during your first and
only pass through the data:
the accumulating count
the accumulating sum
the accumulating sum of the squares
While it has been 15 years since I programmed a report in Quiz, and thus I
can't be of any real help in that regard, I can give the equations the way we
programmed them up internally within QueryCalc.
A matrix of six real numbers R0[*] is defined to be a global variable in
QueryCalc. The matrix is composed of these elements:
R0[1] = the accumulating sum of the qualified entries
R0[2] = the accumulating count of the qualified entries
R0[3] = the maximum value so far qualified
R0[4] = the minimum value so far qualified
R0[5] = the accum. sum of squares of qualified entries
R0[6] = the accum. count of all IMAGE/KSAM/MPE records touched
These six variables allows us to calculate the average, sum, max, min,
variance, standard deviation, count, and percent qualified, which are part of
every QueryCalc query question. You should be able to do the same thing in
Quiz.
As each record in the query is touched and/or qualified, these particular
values are updated in QC's search routines. The BASIC code to do this on an
"instanteous" basis (i.e., once for every qualifying instance) is the
following:
5150 R0[1]=R0[1]+V0[K]
5160 R0[5]=R0[5]+V0[K]**2
5170 IF V0[K]>R0[3] THEN DO
5180 R0[3]=V0[K]
5200 DOEND
5210 IF V0[K]<R0[4] THEN DO
5220 R0[4]=V0[K]
5240 DOEND
5270 R0[2]=R0[2]+1
(some lines were deleted in this listing because they are irrelevant to the
process at hand).
Once all of the records in the dataset(s) have been searched (either by using
a chain or a serial read) and we've come to the end of the dataset, the
following code is called to convert these raw accumulated sums into the
standard statistics for display purposes:
1620 IF Q$[1,102;3]="SUM" THEN CONVERT R0[1] TO I$
1630 IF Q$[1,102;3]="NUM" THEN CONVERT R0[2] TO I$
1631 IF Q$[1,102;3]="FND" THEN CONVERT R0[2] TO I$
1640 IF R0[2]=0 THEN A=0
1650 ELSE A=R0[1]/R0[2]
1660 IF Q$[1,102;3]="AVG" THEN CONVERT A TO I$
1670 IF R0[2]<2 OR R0[5]/R0[2]-A**2<0 THEN V=0
1680 ELSE V=R0[2]/(R0[2]-1)*(R0[5]/R0[2]-A**2)
1690 IF Q$[1,102;3]="VAR" THEN CONVERT V TO I$
1700 IF Q$[1,102;3]="DEV" THEN CONVERT SQR(V) TO I$
1710 IF Q$[1,102;3]="MAX" THEN CONVERT R0[3] TO I$
1720 IF Q$[1,102;3]="MIN" THEN CONVERT R0[4] TO I$
1730 IF R0[6]=0 THEN P5=0
1740 ELSE P5=R0[2]/R0[6]*100
1750 IF Q$[1,102;3]="PCT" THEN CONVERT P5 TO I$
Some of the statistics (sum, max, min, count) require no further conversion.
They simply need to be converted to text for display. The others (average,
variance, std. dev., percent qualifying) require some final calculation. A
few limits need to be placed on some of the calculations. An average makes no
sense with less than 1 qualifying record; a std. deviation/variance makes no
sense with less than two entries.
Wirt Atmar
|