LISTSERV - HP3000-L Archives

HP3000-L Archives

October 1997, Week 4

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L October 1997, Week 4

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: New B-tree feature
From:	Wirt Atmar <[log in to unmask]>
Reply To:	[log in to unmask]
Date:	Wed, 22 Oct 1997 17:04:12 -0400
Content-Type:	text/plain
Parts/Attachments:	text/plain (74 lines)

Shawn Gordon writes:

> I have read the communicator on this new feature, but am confused about
>  how it is used/implemented.  It seem's to indicate that you have to put
>  them on manual master keys.  Can someone describe how you can use these,
>  and it what situations.

Because I didn't see anyone else's response to this, let me say just a few
words about the new b-trees. First of all, you're going to like them -- a lot
-- if you write queries into IMAGE datasets.

The b-trees ARE attached only to master datasets (automatic or manual, it
makes no difference). But the masters are, of course, themselves attached to
detail datasets. What used to be a two-level structure is now three-level.
But you don't have to think about it in that manner. You can't directly
access the information in the b-tree itself. The b-trees are there in support
of queries that you couldn't ask before in IMAGE without a serial search,
such as:

     find name is GOR@
     find city > Chic
     find amount ib 50.00,99.99

Each individual query language that will use the new b-tree feature may use
slightly different syntaxes, but the basic behaviors will be the same,
nonetheless. Each one of these queries is now <drum roll> a chained search
</drum roll>, rather than a serial search, although they may not look like
it.

If the dataitems NAME, CITY, and AMOUNT are search items in a detail dataset
(or key items in a master dataset), and if a b-tree is "attached" to the
master dataset appropriate for each of these search items, then each of the
query formats above will go out and first create what is now called a
"superchain." That is, the process first searches the b-tree attached to the
master dataset for all of the names that begin with GOR and creates a list of
all of the fully-specified names that exist in the master dataset that meet
this criterion (GORDON, GORE, GORFINKLE, GORTON, GORVEY, etc.). Each one of
these name's chain is then searched in turn and every qualifying entry is
marked, exactly as it always has been, as if it were just a single name you
were looking for. All of this is done completely automatically, without you
having to do anything special at all.

As for turning the b-tree feature on, that too is surprisingly simple. Three
new commands have been added to DBUTIL. They are: ADDINDEX, DROPINDEX, and
REBUILDINDEX. The commands are fully explained in DBUTIL's help. I personally
recommend adding b-tree indexes to every master dataset in your databases --
unless you have some overwhelming reason to argue that no benefit would be
obtained from one or another specific master dataset having a b-tree
attached.

B-trees also offer the possiblity of changing the manner by which IMAGE
databases have been used in the past. In the examples I gave above, AMOUNT is
clearly a real number. Ordinarily, it would have been considered a little
declasse to put a hashing index on a real number, but I think that changes
now. An automatic master could be attached to a real number field, such as
AMOUNT, and with a b-tree attached, you can now ask range questions and get
back answers almost immediately.

What performance increase should you expect with b-trees over serial
searches? In our in-house measurements, we're only seeing about a 100x
increase, but that's simply because our databases are too small. We're not a
production shop, so we don't have enormously large datasets. But if the NAME
field existed in a four million record dataset and the query above only had
thirty records in total for all of the names that began with GOR, you could
expect potentially a 5000x increase in retrieval speed, as compared to an
MR-NOBUF, high-speed serial search. All in all, you have to figure that's not
a bad bit of performance enchancement -- especially given the price of the
enhancement.

CSY is to be sincerely congratulated for the quality of work they've done on
b-trees.

Wirt Atmar

ATOM RSS1 RSS2

RAVEN.UTC.EDU