HP3000-L Archives

October 1999, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Jerry Fochtman <[log in to unmask]>
Reply To:
Jerry Fochtman <[log in to unmask]>
Date:
Tue, 26 Oct 1999 09:56:27 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (183 lines)
This has been an interesting thread on IMAGE so I thought I'd
provide some thoughts as well.

At 05:41 PM 10/18/1999 -0400, Jim Phillips wrote:
>Does Image optimize data access?

Well, it depends by what you mean by optimize....  Today one can enable
the prefetch flag via DBUTIL and IMAGE will issue a prefetch for the next
several pages of a disc storage of a dataset which follow the address of
the current entry/block/page.  However, IMAGE does not anticipate that
the user will walk a specific path and attempt to prefetch the disc
pages which contain the entries next on that chain.  While not 100%
certain, I believe this is the limit of IMAGE's current optimization
related to disc I/O.


>What I have in mind is:
>
>I'm reading a data set of invoices that have the customer number in them.
>The primary key for the invoices is invoice number, and one invoice number
>pertains to only one customer number. I need to read the invoices serially
>because I'm selecting by invoice date, which is not a key item.  As I'm
>reading the invoices, I need to jump out to the customer data set and get
>some info about the customer.  Should I code something like this:
>
>DBGET Mode 2 Invoice
>
>If Invoice.Cust# <> Customer.Cust# then
>    DBFIND Mode 1 Customer
>    DBGET Mode 5 Customer
>End-If
>
>The purpose of this code is to eliminate an unnecessary DBFIND/DBGET to the
>customer data set if that customer's record is already in memory.  What I
>want to know is should I code this, or does Image do something like this
>already?  IOW, if I do the DBFIND/DBGET every single time, am I wasting
>resources by forcing Image to re-read a record that may be in memory
>already, or is Image smart enough to not do the record retrieval if the
>record is already in memory?

Certainly if the customer number of the current invoice matches the
customer entry already in your application buffer it is unnecessary to
request another copy of the customer entry via IMAGE. This is application
level design, as IMAGE does not know if the specific customer entry is
in memory or not.  If it is, certainly the IMAGE retrieval would be
faster than if the object page(s) containing the block had to also be
retrieved from disc storage.  However, there still is a cost of plowing
through all the IMAGE processing code unnecessarily anyway when the
program already has the data from a prior retrieval and there is no
change in customer number.  So yes, this is a good design approach.

IMAGE uses mapped access to a dataset.  As such, it does not really know
if the disc page containing the desired entry is in real memory or not, it
simply calculates the offset from the mapped file pointer and references
that address.  If the desired object address is not in memory a page fault
occurs and it's up to storage manager to request secondary storage retrieve
it from disc. So essentially when processing returns to IMAGE, as far as the
code is concerned the entry is in memory and processing continues.
Determining whether or not an object page is in memory or not is not an
easy task, and cannot be done directly as referencing it will cause it to
be brought-in if it isn't already in memory.  That's really part of the
beauty of mapped file access.  Also keep in mind that even if you are
retrieving the same entry again, there is an outside chance that the system
needed the real memory space occupied by the page that contained the
information from the prior retrieval and the data is no longer in memory.
In which case an I/O would occur again.

At 04:41 PM 10/18/1999 -0600, Simonsen, Larry wrote:
>Image will know that the record is possibly in memory and will do a search
>for it before it does a disc read.  However your check will be much faster.
>Unless there are other fields which need to be retrieved which other people
>have updated your check will be the best.

Nope, IMAGE doesn't know if the object page is in memory or not, it simply
references it and virtual/secondary storage management handles it. IMAGE no
longer does explicit disc reads for dataset blocks, it simply calculates
offset addresses and references the mapped data file.

At 04:48 PM 10/18/1999 -0600, John Krussel wrote:
>Since Image places records Serially (unless there is a delete chain) there
>is probably a very small chance that the same customer placed two orders one
>right after another. In that case your test will almost always fail and
>you'll have to do the Find and the Get. If you read all the entries first,
>sort them by Cust# and then go through them again getting any additional
>data, there is a greater likelihood that you will already have the record
>you want in your buffer. And have to do less reading

This is a good idea under specific circumstances.  I would want to know
the volume of invoices that would have to be sorted.  Certainly one could
select/extract the invoices for the desired date range, sort this smaller
subset by customer # then use John's approach to reduce the change of
multiple retrieval costs associated with visiting the same customer
information more than once.  SUPRTOOL is an excellent tool for this
purpose.

At 04:31 PM 10/18/1999 -0700, Peter Chong wrote:
>IIRC, Disk cache will boost speed on disk access, since MPE5?
>but, you could move Dataset to make 1 or 2 extension as much as
>possible to reduce mechanical seek...

Yes, managing the disc storage and extent placement of the dataset
using any of the available disc management tools may help improve
overall I/O performance.


>second, spread sets across volume, like invoice set with one volume and
>customer in the other volume and move close together within volume.

Maybe, but serializing access and minimizing unnecessary retrievals
is probably more cost-effective in terms of overall effort.  We also don't
really know how these items are accessed by other aspects of the application
so it's not clear if the effort may cause other issues.


>But, If an invoice set was sort by Invoice date, (most case) you could
>read customer set serially (most case less data) and backward chain
>read from current to old and read backward chain until your cutoff date.
>It will skip old data chain read, mode 6 read until hit the date you want

Don't really think this would help.  First, there is the overall cost of
sorting more data than is needed.  Secondly, this does not optimize the
retrieval of customer data should a customer have multiple invoices for
the same date or different dates.

I tend to feel that using SUPRTOOL to qualify the invoices based upon date,
sorting these by customer number and then accessing the customer data is
probably the more optimium approach when volume is not really known from
one cycle to the next.


At 12:09 AM 10/19/1999 -0400, Tom wrote:
>In one set of bases I know, serially reading the largest set first
>always results in faster reads. If I chain from Part-Master or
>Customer-Master, or even Shipment Header to Shipment-Line, I can expect
>it to take 2.5 to 3 hours. If I extract the Shipment-Line high-speed (no
>manipulation, read-select-write only), sort by what I want, then read it
>serially, chaining to what I need, I can get the same task done in 30
>minutes.

Serially reading through data will increase the likelihood that pages
of the file object are already still in memory when accessed via the
random process.  I suspect this is why you see a difference in performance
with this approach.

Certainly working with a subset of data is more efficient that plowing
through the entire field.  Reorganizing the detail set on the access path
being used would probably help this performance and reduce what appears
to be I/O thrashing.  It also may be that the path being used for
reorganizing the set is not optimim for this retrieval, in which case
reordering the data indeed makes sense.

At 06:27 PM 10/19/1999 -0400, Tom wrote:
>Also, about an earlier idea, a mode-5 forward chain is alway more
>efficient than a mode-6 backwards read. I believe that an Image read
>fetches about 90,000 bytes ahead for serial and 16,000 for chained read.
>So for mode-6 reads, it's Read 16000, back up 500 or so and 16000
>forward, back up 500 or so and 16000 forward, and so on.

While I don't know if the prefetch logic differentiates between
forward vs. backward movement in the set, the quantities mentioned
seem to be related to the old MPE-V cache quantums one could set
for serial vs. random I/O.  These don't apply, as prefetch generally
is used for object page (4096 bytes) quantities.  I don't know how
many pages IMAGE prefetches at a time, but I suspect its at least
2 pages, and more likely several more.



/jf
                              _\\///_
                             (' o-o ')
___________________________ooOo_( )_OOoo____________________________________

                        Tuesday, October 26th

          Today in 1825 - The Erie Canal officially opened to traffic.

___________________________________Oooo_____________________________________
                            oooO  (    )
                           (    )  )  /
                            \  (   (_/
                             \_)

ATOM RSS1 RSS2