LISTSERV - HP3000-L Archives

HP3000-L Archives

August 2003, Week 1

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L August 2003, Week 1

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Graphical depiction of Image Database
From:	Jerry Fochtman <[log in to unmask]>
Reply To:	Jerry Fochtman <[log in to unmask]>
Date:	Tue, 5 Aug 2003 11:18:33 -0500
Content-Type:	text/plain
Parts/Attachments:	text/plain (62 lines)

At 06:08 PM 8/4/2003, Wirt Atmar wrote:
>Jason writes:
>
> >  Also, for serial read performance,  can anybody comment on the
> >  expected gain or difference between just deleting records vs.
> >  deleting, setting a lower capacity and repacking?  As an example, let
> >  say we delete 25% of records in a dataset with 20 million records.
> >  Less records clearly means less time right? even if you don't resize
> >  and repack right?
>
>No. If you delete 25% of your 20 million records but fail to repack the
>dataset, a serial search will take just as long as it did before.
>
>A serial search begins at the first record of the dataset and proceeds until
>it hits the high-water mark. It doesn't matter if the records in between those
>two points are either active or have been marked deleted.
>
>A repacked dataset however will be 25% faster to search serially. All of the
>deleted records have been squeegee'd out of the dataset, so that every record
>that's now present is active -- and the high-water mark will have been moved
>down to the top of those records.

I'd like to expand a bit on what Wirt provided by explaining the 2 primary
options
that are available in the various 3rd party tools to address this
situation.  The most
common approach is to repack along a specific search path. This will
improve data
retrieval performance when your application performs a key lookup and then
retrieves multiple detail entries following the look-up.

Another repack method involves simply compressing the empty space from the
file
and lowering the high water mark.  This technique involves simply moves
adjacent
records next to one another until they are all located in consecutively.
Both methods
will improve the performance of a serial scan by lowering the high water
mark and
removing the space occupied by the deleted records.

However, while the second compress method can be performed on a dataset
faster than the reorganizing/repacking method, there is no guarantee that
it will
improve retrieval performance along a certain search path.  Some folks with
very large data sets but only 1-2 entries per key value periodically use this
compress method on their sets as the added downtime to conduct a
reorganization
does not provide noticeable lookup performance improvement.

Others sites, with much larger search chains within a key value find that
having
to perform the full detail set reorganization periodically indeed does
improve their
application performance.

So, as with most things, it depends on your sitution as to which approach may
work best, especially when it comes to very large detail sets.... :-)

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2

RAVEN.UTC.EDU