HP3000-L Archives

September 2004, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Roy Brown <[log in to unmask]>
Reply To:
Roy Brown <[log in to unmask]>
Date:
Mon, 27 Sep 2004 18:00:39 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (116 lines)
In message <[log in to unmask]>,
Venkatraman Ramakrishnan <[log in to unmask]> writes
>Hello Everybody,
>
>We are writing HP Cobol programs to access Turbo IMAGE databases in MPE
>We want to retrieve the records in the sequence in which they were
>inserted into the dataset.

> We have thought of the below three options
>Option 1: Serial read using DBGET Mode 2
>a) Serial read the dataset using DBGET Mode 2
>b) Process each record
>Question:
>1. Will be the sequence in this case be maintained?

Just long enough, perhaps, for you to test the process and decide that
it looks like it would... But not reliably. You cannot rely on the
sequence of Mode 2 DBGETs returning the records in order of addition,
for any significant processing task at all.

You will need, as Tracy says, never to have deleted a record (unless you
have been using HWMPUT, and have never filled the dataset to the HWM).

You will also need never to have done, and be sure you will never need
to do, any database maintenance that reloads or resequences the records
(with DBUNLOAD/DBLOAD, Adager, or whatever).

>2. Will there be performance issues when the dataset size is going to be
>huge as DBGET Mode 2 needs to pass through each record

There will indeed. SuprTool would be your friend, except for the answer,
above, to 1.

>Option 2: Chain read using DBGET Mode 5
>a) Chain read using a dummy field created
Same problems as above with HWM/reuse/db maintenance...
> which will select all the records in the dataset
*All* the records? As many as with the Mode 2 DBGETS? Chain ones will
have more overhead, and so be slower than Mode 2...
>b) Sort the table using a Sequence based on the above dummy field
A sort as well? Slower still....
>c) Process each record
So, same problems as Option 1, only worse, if you are really getting
every record in the dataset.


Solution (?):

If you want a guaranteed insertion sequence, and this is a new database,
then I suggest that you add a date/time sorted key field to the records.
The time will need to go down to a granularity (hundredths or
thousandths of a second or less) that either ensures that you never have
two records with the same timestamp, or that you really don't care if
you process the records from a given timestamp increment out of
sequence.

Alternatively, you could add a counter sorted key field, and number the
records sequentially from a separate counter dataset/file. But if you do
this, then you will be doing two updates for each write (the record
itself, and incrementing the central counter), and the counter may be a
bottleneck if multiple parallel processes need to share it to update
your original dataset.

But either way, you will then have a sorted chain that you can read
down, in Mode 5, and guarantee that you will get the records in addition
sequence.

As the additions to the sorted chain will, by definition, be in key
sequence, there will be no overhead by dint of having a sorted chain
here: sorted chains are only slower when new records have to go into the
chain, not onto the end of it. Yours will always go on the end.

However, reading it will still retrieve every record, every time, and I
get the feeling from your problem description that some selection is
possible. But what selection is that? If you only want the most recent
records every time, you might want to build a B-Tree, so you can chop in
for a given start date/time or counter value.

Or maybe read backwards from end-of-chain to find that start point, and
come forwards again?

As ever in these cases, if you tell us what you are trying to do in the
first place (as distinct from describing the problems you are
encountering with your own proposed solution), we may be able to suggest
something much better...

>Option 3: Message file (Option not preferred)

>a) Insert records into message file
>b) Read from message file and process the record

>Pls. let us know your thoughts on the feasibility of first two options
>and which would be preferable and why. We have decided not to use
>message files for the same.

Very wise; they do not have remotely the same stability as Image
datasets.

You could envisage writing the records to the database without any extra
keys, I suppose, and then write a pointer to each one into a message
file; that would ensure sequentiality. But the message file might break,
and anyway, once you have read the message file record, it's gone
(usually).

Hmmm... conventional MPE file of the dataset record additions perhaps? A
logfile, in other words. Pretty safe, and could be cleaned out as each
batch was re-read (for whatever purpose you are doing the re-reading),
if this is indeed a one-shot-per-addition process.

--
Roy Brown        'Have nothing in your houses that you do not know to be
Kelmscott Ltd     useful, or believe to be beautiful'  William Morris

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2