HP3000-L Archives

December 2001, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Roy Brown <[log in to unmask]>
Reply To:
Roy Brown <[log in to unmask]>
Date:
Thu, 27 Dec 2001 05:16:27 -0600
Content-Type:
text/plain
Parts/Attachments:
text/plain (45 lines)
"Steve Dirickson" <[log in to unmask]> wrote in message
news:a0e27e01jg7@enews3.newsguy.com...

Re the attribution-snipped message from Kent Wallace:

> > I have 2 files to load 37 million and 27 million.
> > On my test of the first 5 million records I got 5,200 puts
> > per min., which is what I got in the first 5 million records.
> > I am at 8 million records now.  I am getting about 2000 puts per min.
> > Any ideas why I am slowing down?

> Sounds like at least one of the chains is sorted, and the data is not.
> Because of the way IMAGE adds to sorted chains, inserting random data
> is an O(N^2) operation. If that's the case, pre-sorting the data (in
> DESCENDING order) to the maximum extend possible (requires a stable
> sort if doing multiple sorts on multiple chains) should significantly
> improve your load time.

Nope: as Image searches backwards along the chains, DESCENDING is the
absolutely worst thing he could do, and will exactly double the chain
processing time of random data, forcing Image to read the whole chain each
time, instead of the (average) half-a-chain that random data will encounter.

(End of chain positioning time will be the same for each entry, and I am
assuming that the DBPUT itself will take the same time, whether it is two
details or a detail and a master involved).

ASCENDING is Kent's friend. The sort should be on the sort field and all
that comes after it, if duplicates on chain key/sort key are likely to be
encountered, not just on the sort key itself.

If the relevant detail has only one sorted chain through it, then this is
the field to use; if it has more than one sorted chain, and the sequences
conflict, then go for whichever is likely to give the longest chain - i.e.
is likely to make the least key additions[*], or has the most duplicate key
values - all three ways of looking at it are equivalent.

[*] Or would be on an empty database....

--
Roy Brown

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2