HP3000-L Archives

July 1997, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Wirt Atmar <[log in to unmask]>
Reply To:
Date:
Tue, 22 Jul 1997 15:13:58 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (74 lines)
Jim Byrne writes:

> We did exactly this during the early days of our company's
>  automation and found that relying solely on programming edits to
>  catch characters from getting into the date field was trusting
>  in human nature just a little too much.  We found that users
>  could and would on occasion enter dates as 84JUL1 for example.
>  So we adopted the zoned signed decimal which allowed Image to
>  catch any input format errors that the programmer allowed.  A
>  negative number was deemed to have been from the previous
>  century.  Now we treat negative numbers as BC dates, not that
>  we have any mind you.
>
>  What we do for parsing the zoned decimal field is overlay the
>  image date field with a character field redefine using the
>  powerhouse dictionary and treat the field as numeric or
>  character as the situation warrants.  More usually  we
>  parse the date field using a series of MOD and DIV statements to
>  extract the portion that we are interested in.
>
>  i.e
>
>  date = 19970722
>
>  current year = floor(date / 10000)
>  current month = floor((mod(date,10000) / 100))
>  current month and day = mod(date,10000)
>  current day of month = mod(date,100)
>
>  In Powerhouse at least these are just as efficient as a
>  substring extract.

Let me slightly disagree one more time with Jim (although none of this should
be taken too seriously). The portion that I disagree most with is the last
sentence. Database designs inevitably reflect the tools that one has
available, and Jim's design makes good use of the functions that are
available in PowerHouse. Nonetheless, a substring extract such as:

     find when important-date(1,6) = 199707

that is inherently a text substring search (from character positions 1 to 6)
will always be enormously more efficient than having to perform any form of
arithmetic on every record retrieved to see if it is a qualifying record or
not. When substrings are used, all that has to be done by the query language
qualifier routine is parse (extract) the requested text field from the
record, beginning from the specified start point in the record and ending at
the stop point). A simple compare is then all that's necessary. Quick,
simple, fast, and very efficient. If the record value doesn't match, you move
on. If it does, you process it.

Similarly, if Jim had had perfect foresight, he would also have foreseen with
crystalline clairvoyance the great advantages twenty years ago of keeping his
dates is X8 format, anxiously waiting 20 years for the arrival of b-trees in
Image. Once b-trees do arrive, if a date is text-formatted, rather than
Z-formatted, you are going to be able to ask a query in this format:

      find important-date = 199708@

and instantly retrieve all of the dates in the dataset for August of 1997. If
the date is formatted in a numeric fashion, you would have to go to all of
the trouble of typing the following:

     find important-date ib 19970801,19970831*

:-).

Wirt Atmar


*All kidding aside, in this latter case, for b-tree retrievals, both are
going to be equally efficient. There will be no advantage for text over
numeric range searches. But that's not true for the first search condition
mentioned.

ATOM RSS1 RSS2