HP3000-L Archives

December 1997, Week 1

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Larry Boyd <[log in to unmask]>
Reply To:
Larry Boyd <[log in to unmask]>
Date:
Sun, 7 Dec 1997 14:12:20 -0600
Content-Type:
text/plain
Parts/Attachments:
text/plain (28 lines)
John wrote:

>I have always been under the impression that one of the major advantages of
>hashed keys is that they are more efficient for lookup than b-trees.  A
>b-tree ALWAYS forces you to traverse several levels of the tree to reach a
>particular record (and the larger the file, the more levels), whereas a
>hashed key with a well-chosen capacity and a reasonable amount of free
>space will usually reach the desired record with just one read.  Why would
>DBGET use the less-efficient b-trees instead?

Actually, it is *possible* that only one level will be traversed, and depending on the size of the physical read, it may still only take one physical I/O.  However, your point is well taken and factual.  Generally, you will need more than one physical read to find the selected value.

>Furthermore (and I'm really going to display my ignorance here) isn't the
>b-tree pointing to the key and not the record number, thereby forcing the
>read down the synonym chain anyway?

And this is true... Once the value is found, the master key is taken from b-tree and used to find the record in the Master dataset.

The question to me is, does it significantly affect you for the good or the bad?  For example, in a serial read, when your looking at all the records, it will make a *big* difference.  Additionally, if your machine is already 80% utilized when on-line processing is occurring, you could see a performance hit.  On the other hand, in many cases the affect on performance will not be perceived by the users (and perception is reality).

The benefits of indices, whether on an IMAGE dataset or in a relational db, is that the user's do not have to "know" the customer number, for example.  The ease-of-use for the user (and their customer who just called on the telephone and does not have their "account number") is much greater.

Trade offs between ease-of-use and performance is always the issue.  The easier it is to use (for most users), generally the more power it takes.  It will either take more machine instructions or more I/Os, and often time both.  This is why the pressure to continually increase the speed of both of these items seems to always increase.  (And once you satisfy the response time for the users, they ask for more functionality -- which decreases performance :)  You can (and should) "tune" the code for better performance, but when you do more work it takes more energy (isn't there an equation about this [E=MC2]? ).

Deciding what to do is why we make the big bucks :)

LB

ATOM RSS1 RSS2