HP3000-L Archives

June 1998, Week 5

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Jerry Fochtman <[log in to unmask]>
Reply To:
Jerry Fochtman <[log in to unmask]>
Date:
Mon, 29 Jun 1998 11:05:51 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (124 lines)
At 12:38 PM 6/28/98 -0400, Ted Ashton wrote:
>Just been learning about integer keys thanks to some excellent papers
available
>from Adager's website (thank you, Adager) and got to wondering.  How will the
>integer key stuff be handled once Master Datasets can expand automatically?
>If it goes of actuall dataset capacity, it seems that response could vary
>wildly as the dataset expands.  Alternatively, if it goes off max set
capacity,
>it seems not unlikely that certain keys could map to places as yet
unallocated.

Like DDX, with MDX there are 3 capacity values: Initial Capacity, Current
Capacity
and Maximum Capacity.  There is also an expansion 'increment'. It is the
initial
capacity that is the value used for the hash calculation to determine the
primary
address for a given key, regardless of key type.  This 'initial capacity'
does not change unless one changes the 'starting' capacity of the master
set. So the results
of the hash calculation will be constant for a given key.  For definition
purposes,
the area within the initial capacity I'll call the primary hash area. All
primary
entries for a every hash value will/must be located in the primary 'hash'
portion
of the set.

The expansion area of master sets, the area between initial and maximum
capacity,
is basically an 'overflow' area for secondary entries.  When placing a new
entry in
the master and the hash value of the key is already occupied by the a
primary entry of another key, IMAGE searches forward from the primary
address until it locates an open entry, places the new entry, which is now
termed a secondary entry, in the open slot
and then links it back to the primary entry which was at its primary
address.  These
links are called a synonym chain.

Under various circumstances, this search for an open slot to place the
secondary entry
can cause significant performance issues.  Especially when there is a
clustering of
entries due to a poor choice for capacity, or a number of other
circumstances which
have been described in many papers.  The purpose of master set expansion is
to try
and mitigate these long searches for open slots when adding entries.  After
searching
some specific number of entries or blocks, IMAGE will give-up trying to
place the
secondary entry in the primary hash area and simply expand the set (if
necessary) and
place it in the expansion area, linking it back to the entry in the primary
address
using the synonym chain.  (Of course, if there already exists a synonym
chain for the
particular hash value the entry is linked to the end of the chain.)

Essentially, the expansion area operates like a detail set.  There is a
High Water Mark which denotes the highest used entry and when these reaches
'current capacity',
the dataset is expanded so long as it hasn't reached 'maximum capacity'.
There is
also a delete chain used to keep track of freed entries within the boundry
of the
HWM.

However, master dataset expansion should not be used like detail dataset
expansion.
In DDX, most folks set the initial capacity low and maximum capacity high
and then
simply allow the incoming data volume expand the set as needed.  This is
not the
purpose of master dataset expansion!

Its purpose is to provide a sort of 'safety buffer' so sites can schedule a
more
appropriate/convenient time to perform capacity changes.   If one were to
set a
master dataset's initial capacity low, then a majority of the entries would
end-up
in the expansion area, essentially the performance benefits of a master set
would be lost.  So one still has to be carefull in selecting the capacity
for a
master set in order to obtain optimium performance.  Only now, should the
set start
getting too full, or a clustering of entries start to occur, instead of users
waiting a long time for IMAGE to serially search for an open slot, after a
certain
amount of searching, the IMAGE will simply proceed to place the entry in the
expansion area and continue.

There is more nitty-gritty details to this new IMAGE feature as I've only
tried to
provide an overview.  Hopefully there will be more discussion as we get closer
to release 6.0, which is slated to contain this new enhancement to IMAGE.  And
I'm sure that our tech support folks as well as Adager's will be fielding
calls from
customers and non-customers about this as well as responding to performance
problems
because someone chose to setup their master sets like their detail sets and
now
doesn't understand why performance is getting poor.... ;-)



/jf
                              _\\///_
                             (' o-o ')
___________________________ooOo_( )_OOoo____________________________________

                          Monday, June 29th

           Today in 1767 - Townshend Acts were passed, placing tax on
                           all imports to the colonies.

___________________________________Oooo_____________________________________
                            oooO  (    )
                           (    )  )  /
                            \  (   (_/
                             \_)

ATOM RSS1 RSS2