LISTSERV - HP3000-L Archives

HP3000-L Archives

May 1996, Week 2

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L May 1996, Week 2

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: IMAGE Logging for Historical Purposes -
From:	Wirt Atmar <[log in to unmask]>
Reply To:	[log in to unmask][log in to unmask], 7 May 1996 10:42:58 -0700642_- Gavin writes: > I believe that the Metrowerks [Java] product currently does support > 68K Macs (in addition to PowerPC based systems). Yes, it does, and they also have ALL their documentation on CD-ROM. I haven't seen their Java product yet, but it should be coming any day now. I can recommend the Metrowerks product, based on my two years' experience with their Mac development environment that hosts the Java compiler. Their products are solid, their documentation is excellent, and their tech support is very responsive. In addition, if you go for their full product, you get Object Pascal and [...]37_7May199610:42:[log in to unmask]
Date:	Sat, 11 May 1996 14:11:34 -0400
Content-Type:	text/plain
Parts/Attachments:	text/plain (156 lines)

Ference Nagy asks:
 
>The first is the way System Dictionary stores them. That is too
>complicated. The other is the QueryCalc(R) method. It is far simpler:
>An IMAGE database about the image databases.
 
>Mr. Atmar perhaps forgive me that I saved the data base structure of the
>QueryCalc demo.
 
>* Remark: The unit of the item offset is half-byte. Why, Mr. Atmar?
 
The unit of the item offset in QueryCalc's dictionary is in nibbles because
that is the smallest coherent group of bits (4 bits) that is used by anyone
to represent any form of data (a "packed" datatype). All other datatypes use
an integer multiple of a nibble (bytes or 16-bit words), thus representing
the offset in nibbles is rather natural. However, I suspect your question is
engendered by the fact that all IMAGE datatypes must begin and end on a
multiple of a 16 bit word. IMAGE mandates that rule -- but it doesn't enforce
it, and people have (ill-advisedly) done all kinds of strange things in
IMAGE.
 
The original version of QueryCalc did not use a dictionary, and I was
initially opposed to the idea of putting a dictionary in between QueryCalc
and IMAGE. IMAGE contains its own internal dictionary in the root file, but
it quickly became obvious that there are four valid reasons to put an
auxiliary dictionary into a report writer in addition to the root file, and
each was enough on its own to force us to put a dictionary in QueryCalc. They
are:
 
     o  A dictionary is necessary in order to be able to read KSAM and MPE
files.
     o  A dictionary allows the easy manipulation of security access levels.
     o  A dictionary allows for the renaming of dataitem, dataset, and
database names without disturbing the original database.
     o  A dictionary allows for the capacity to correctly parse overloaded
and forced-fit dataitems.
 
It was the last reason that was the original impetus for QueryCalc's
dictionary. No vendor wants to disappoint a customer, but one of QC's
earliest customers was Northern Telecom -- and they had a dataset that
contained only two items, BLOB1 and BLOB2. The data in the IMAGE record had
been simply transferred over from some earlier (ancient) IBM system without
modification. Alignment with IMAGE's 16-bit word rule is always guaranteed at
the beginning of the record, thus BLOB1 started at zero offset, but BLOB2
began at that particular point where their particular string of dataitems
happened, by coincidence, to line up with another 16-bit boundary. If we used
only the IMAGE root file definition, we couldn't read their data -- and
QueryCalc would have been no use to them.
 
In order to parse representational, concatenated data from a record string,
you only need to know three items: (i) the offset (where to begin in the
record), (ii) the bit format (the datatype), and (iii) its length.  That
statement's equally true for KSAM, MPE, or IMAGE files; they're all basically
the same at the file level.
 
Because QueryCalc is constructed as a spreadsheet on the HP3000, every cell
on the spreadsheet can be regarded as an independent object, capable of
independent optimization. The great majority of the cells in a QueryCalc
spreadsheet will be query questions, each of which may extract information
from up to 10 IMAGE, KSAM, or MPE databases simultaneously.
 
QueryCalc uses now, and always has used, a technique called "late-binding,"
where all optimization decisions are made at the very last moment before a
cell's query questions are launched. A standard query question in QueryCalc
will always look something like:
 
     @using invoices, {get me the} sum of amount when category is 501
      and date ib 950601,951231 and division is 56,47,39,711
 
If CATEGORY, DATE, and DIVISION were all search items, the question is, which
search chain should be used? IMAGE is rather exceptional in the ease by which
the information necessary to make a very well-informed optimization decision
is readily available, at low CPU cost, and thus make highly efficient,
dynamic late-binding a reality. In QueryCalc, the optimization rule that is
used is for IMAGE queries is:
 
     (i) the chain length of all of the qualifying search chains is measured
(summed as a total value when more than one search item value is specified as
in "division is 56,47,39,711")
 
     (ii) the primary chain is given a 30% advantage (simply a guess, but an
estimate made on the likelihood of record locality on the disc; this was a
much more important consideration on the MPE/V boxes than it is now for the
MPE/iX machines)
 
     (iii) the shortest chain is then chosen to be search chain
 
     (iv) if the shortest chain is still greater than 25% the length of the
dataset, the search automatically switches to an MR-NOBUF high speed serial
serial search.
 
KSAM and MPE queries have similar optimization rules, but appropriate to
their particular structures.
 
However, rather than have fixed match patterns, the more common query
question that will appear in a QueryCalc cell is something more of the order:
 
     @using invoices, sum of amount when category is [c1]
      and date ib [c5],[c6] and division is 56,47,39,711
 
where the match patterns are dynamically taken off of the spreadsheet (in
this case, from cells C1, C5, and C6).  Because this tends to be the common
usage, I have been consistently impressed by the wisdom of implementing
late-binding optimization, which was, in the beginning, nothing more than a
gut feeling. However, because of that decision, QueryCalc now often obtains
query efficiencies 1000 times greater than Query when only three datasets are
"joined."
 
I was originally opposed to putting a dictionary into QueryCalc because I
felt, in 1985, that it would soon be possible to "quiesce" an IMAGE database
and add or drop a search key while the database was in operation. If that did
come to pass, because of QueryCalc's late-bound, object-oriented, cellular
construction, a dictionary-less QueryCalc reading only from the root file
would be able to instantly re-optimize itself to the new conditions,
completely automatically and without any requirement of user intervention,
while executing a report.
 
Clearly, that capability hasn't yet come to pass (and now we can't use it),
although DBQUIESCE and the capacity to add and drop indices have moved to the
top of the SIGIMAGE enhancement list recently. Nonetheless, due to the
reasons above, putting a dictionary into QueryCalc was inevitable. Even so,
we did everything we could to make the process as simple and invisible as
possible to the user.
 
Running the program ADDIMAGE simply extracts the structural information out
of the root file and places it into an IMAGE database called QCDICT. All
structural information necessary for optimization is now taken from
QueryCalc's dictionary; however, dynamic information, such as set capacities,
high-water marks, entry counts, and chain lengths, is still gathered as we
originally obtained it so that a maximally optimized, late-bound
determination can be made as to the best possible search path.
 
The pain of a dictionary is that if any modification is made to the structure
of an IMAGE database, the dictionary must be updated. Again, we've tried to
make that as simple and as invisible as possible. But that pain aside, the
dictionary has proven to be far more of a benefit than a cost. One of the
nicer things we've done for the customers recently is add the capacity to
build KSAM datasets, on the fly. QueryCalc's detail list report page, which
prints a detail list report not unlike Quiz, Query, or BRW, could previously
print to paper, disc file, or back onto the spreadsheet. It can now also
print into an on-the-fly manufactured KSAM dataset, where all of the named
items on the report page become search field dataitems in the KSAM dataset.
The KSAM dataset is not only populated by the detail list report, but is also
automatically added into QueryCalc's dictionary, so that the new dataset is
instantly available for subsequent queries. The easy, high-speed addition of
transient KSAM datasets that summarize information from a multiple number of
pre-existing (IMAGE, KSAM or MPE) datasets now allows for the easy
construction of reports that were previously extremely difficult to
accomplish -- even when hand-coded by a very good programmer using a 3GL.
 
I realize that this is a somewhat long answer to your original question, but
by making a nibble the minimum offset value in the dictionary, QueryCalc is
allowed to read any form of data that appears on the HP3000.
 
Wirt Atmar

ATOM RSS1 RSS2

RAVEN.UTC.EDU