LISTSERV - HP3000-L Archives

HP3000-L Archives

October 1997, Week 5

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L October 1997, Week 5

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives
Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]
Subject:	Re: New B-tree feature
From:	Jerry Fochtman <[log in to unmask]>
Reply To:	Jerry Fochtman <[log in to unmask]>
Date:	Fri, 31 Oct 1997 08:50:23 -0600
Content-Type:	text/plain
Parts/Attachments:	text/plain (175 lines)
I've pretty much been sitting on the side-lines watching this thread
but now feel I should help with folks' understanding of the new
b-tree enhancement to IMAGE.

John Alleyn-Day writes:
>  It seems to me that there will be subtle performance hits.  Sorted lists
>  of  names were often done with a KSAM file that used detail set record
>  numbers to find the record.  Any use of Image B-trees will necessititate
>  a separate  master record for each unique name and could result in many
>  more master records being created.

Yes.  For these situations using a 3rd party product with key-word retrieval
is less expensive, in terms of space requirements!


At 09:17 PM 10/30/97 P, Steve Dirickson b894 WestWin wrote:
/snip

>However, the "one extra level" you mention will more often have the
>effect of *reducing* index update activity-sometimes dramatically. If the
>sorted-sequential index were directly on the items in the detail data
>set, the index would have to be changed every time the record moves
>(which, if CIUPDATE is not in use, means every time a search item-any
>search item in the set, whether or not it is the one with the SS
>index-changes value) or when an SS-indexed search item changes value when
>CIUPDATE is in use. With the SS index on the master set, the index only
>has to be updated for a given value if the update is the addition of a
>not-currently-existing value for the search item or is the deletion of
>the last occurrence of that value for the search item.

Yes, the IMAGE b-tree index would not have to be updated if/when the
record moves, but this overlooks the fact that the IMAGE pointers
themselves would still have to be updated. With a separate 3rd party
index on the detail it is necessary to update the 3rd party b-tree
pointers if the record moves. So basically it could be considered a
wash...except for....

Updating a b-tree pointer involves following the tree and updating the
data pointer for the index.  IMAGE, on the other hand, has to update
upwards to 6 different pointers for one detail search item change even
with CIUPDATE. First, it needs to de-link the entry from its old chain,
updating the forward pointer from the prior entry in the chain and the
backward pointer in the next record in the chain.  It also has to search
for and update the master record, at least updating the chain count.
If the entry is on the beginning or end of the chain, there is no preceeding
or following entry to update so the work involves only 2 record (1 detail
and the master).  Obviously only the master is involved if it is a single
entry on the chain.  If its an auto-master with a single detail, IMAGE
will have to maintain the master, possibly involving migrating secondaries
as well.

This same type of processing occurs when placing the entry on the
new path: linking the entry in the new chain in the proper location
and updating or building a new master.

Similar issues occur when maintaining a b-tree structure.  There is
always the possibility of having to split a branch when adding new
leaves to the structure, not unlike IMAGE dealing with migrating
secondaries in a master. However, there isn't the forward/backward
entry pointers to maintain at the entry level in the b-tree.


>In addition, putting a sorted-sequential index directly on the detail
>sets would require significantly more disk space-orders of magnitude
>more. With the SS index on the master set, you get a free "compression
>factor" equal to the number of sets keyed on the automatic master set's
>key item, times the number of occurrences of that value in all sets with
>an index. IOW, instead of having, for example, 5 index files, each of
>which must provide the location of every occurrence of the specified
>value in each of its associated detail sets, say a total of 1000
>occurrences, you have one index file, that specifies the location of the
>value in one set. Simplifying the numbers, that's an effective
>compression of 1000:1 for a 4-byte item, or 100:1 for a 40-byte item.

There clearly is not the 'order of magnitude' of more space needed to
use a b-tree to index a detail for sorted-sequention access.  Overlooked
is the fact that with IMAGE, the b-tree simply provides the next master key.
From that master key is obtained the pointers to the corresponding
details.  These IMAGE pointer chains require 6 half-words for the master
and another 4 half-words for each detail entry.

Generally a b-tree pointer needs 2 half-words for each detail.  So if
you assume that the b-tree overhead for IMAGE's b-tree on the master
is the same as the overhead for a non-IMAGE b-tree on the detail, the
b-tree on the detail generally requires less space to point to the
detail entries.  As same key value volume grows, a leaf is added to
the b-tree branch, requiring a pointer.  But this is cumulatively much
less than the 4 half-words for *each* IMAGE detail's pointers in order
to add a link to IMAGE's detail chain pointers.


>If you want full-text searching of lots of data, or need search options
>beyond the fairly basic capability of the TurboIMAGE B-Tree indices,
>Omnidex or the equivalent is clearly the way to go.

There are also a number of other types of indexing approaches, such as
grouping indexes or composit-type indexes which cannot be achieved
with IMAGE's existing implementation.

What will be interesting to watch will be the performance attributes
of the b-tree as the data volume grows.  Generally, b-tree structures
can also be at risk for long serial searches for open space when one
is creating new indexes.  We've seen this in IMAGE when master entries
'cluster' due to the characteristics of the key values with a particular
capacity when calculating the hash address.

Unfortunately, NM KSAM can experience a similar issue with its internal
index blocks.  When first built, the NMKSAM structure appears to leave
openings every so often.  But as more and more new indexes are added,
NMKSAM has to serially search farther and farther through its index block
structure area to locate an open slot to add the new index.

Folks previously have talked about seeing performance improvements
when changing the capacity of the KSAM files.  Without more closely
examining their situation, I suspect that in fact what essentially
happened is that by re-building the file KSAM again left open space
in the block structure as it created the new file.  Now adding new
indexes appear to be faster and its been attributed to the change in
capacity when in fact it was the re-building which simply added
'breathing space' within the block structure and shortened KSAM's
serial search effort for locating an available index block in the
structure.  Also, the capacity may somehow factor into the number of
or frequency of the open entries left when initially building the
KSAM index structure.

NMKSAM is subjected to the 4GB file limit.  Also, there can be only
one HFS IMAGE b-tree file per master set.  NM KSAM files consist of
2 areas: one for the actual data and the other area containing the
index block structures and overhead.  As such, given the size of the IMAGE
key it would be possible to have a non-jumbo master set which has
an entry volume such that it cannot have an attached IMAGE b-tree
because of the 4GB file limit.  I don't know/can't speak for OMNIDEX,
but..

<plug-alert>

SUPERDEX does not have this same 4GB file limitation for housing its
index structures.  It uses IMAGE detail sets, which if need be can
be jumbo sets.

Also, we've developed an algorithm for DBGENERAL which will determine
the maximum capacity of the attached b-tree and limit any change of
master capacity so as to be within that limit. This will preventing any
run-time aborts due to a full IMAGE b-tree index file.  It will also
tells the user what the maximum possible capacity of a master could be
when it has an attached IMAGE b-tree index.

</end-plug>

IMAGE's b-trees are a great addition to HP's world-class DBMS.  A lot
of sites will certainly benefit from them.  However they do have their
limitations which will primarily impact large data bases, bases that are
approaching the 199 set limit (for adding new masters in order to add
indexes) or those who need more sophisticated indexing technology.





/jf
                              _\\///_
                             (' o-o ')
___________________________ooOo_( )_OOoo____________________________________


     On this day, in 1864 - Nevada became the 36th state.

                         Happy Halloween!!!

___________________________________Oooo_____________________________________
                            oooO  (    )
                           (    )  )  /
                            \  (   (_/
                             \_)
ATOM RSS1 RSS2
RAVEN.UTC.EDU