HP3000-L Archives

September 2000, Week 3

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Wirt Atmar <[log in to unmask]>
Reply To:
Date:
Sun, 17 Sep 2000 21:40:11 EDT
Content-Type:
text/plain
Parts/Attachments:
text/plain (53 lines)
Ron asks:

> For many years we have had a PMI (Patient Master Index) with a
>  primary key of X6 wihch contains an all numeric identifier in the
>  range of 100000 to (now) about 940000.  This has served us very
>  well, with pretty good retrieval performance, even with a severely
>  overloaded system.
>
>  However, we estimate that we are going to hit the 1000000 mark
>  in issued numbers within the next 14 to 18 months.  Time to think
>  about expanding that X6 to an X8.
>
>  My concern is what it will do to storage efficiency and retrieval
> performance
>  if we just take all those existing key values and change them to X8 with
>  right justify and leading-zero-fill.  I don't like the sort problems if we
>  left
>  justify and pad with spaces (although, to be honest, a numeric sort is
>  very infrequently done).  Right-justify with leading SPACE fill is a
>  possibility,
>  if it gains any advantage.
>
>  Any insight will be highly appreciated.

My advice would be to convert the X6 field to a K2 (an unsigned 32-bit
integer, ranging in allowable values from 0 to 4,294,976,295), rather than an
X8. A K2 field will take up half as much space in your database as an X8
field would, but it will provide you with a much more efficient retrieval
process than a hashed X8 field would. Further, you are guaranteed of no
synonyms so long as:

     o your numbers are sequentially assigned (and it sounds as if they are),

     o and your master dataset's capacity stays larger than your maximum
number.

K2 fields are intrinsically not hashed. The numeric value itself is taken to
be the record number in the master dataset, thus retrievals tend to be quite
fast. The only pain associated with this suggestion is that your numbers
start with 100,000, thus the first 100,000 records (0.4 Mbytes) in the master
dataset will be empty, although that's not much of a penalty nowadays.

Further, if you want to keep your disc usage to a minimum, as your number
usage increases, all you need do to keep current is every so often expand
your master dataset to cover the next hundred thousand or so numbers. So long
as your capacity is greater than your current maximum number, every number
points to a unique record number within the master dataset (no synonyms).

Using K2 datatypes as search items are ideally suited for your form of
application.

Wirt Atmar

ATOM RSS1 RSS2