LISTSERV - HP3000-L Archives

HP3000-L Archives

August 1997, Week 3

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L August 1997, Week 3

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Message File Corruption
From:	Gavin Scott <[log in to unmask]>
Reply To:	Gavin Scott <[log in to unmask]>
Date:	Thu, 21 Aug 1997 14:22:18 -0700
Content-Type:	text/plain
Parts/Attachments:	text/plain (68 lines)

Rich asks:
> At one time, message files were susceptible to corruption if
> the accesing program was aborted or the system aborted. If
> you were using a message file to queue records for processing
> and a failure occurred, message file corruption would cause
> loss of such queue records.
>
> Is this still the case today or has this issue been dealt with
> as MPE has moved forward?  Can message files be relied upon to
> retain records if the system or writting program fails? What
> is the current reliability of message files as a persistent
> queueing mechanisim?

When I was at Quest, we used message files heavily as a persistent
queueing mechanism, and I recall very few cases where the files
actually became corrupt.  It is thus my impression that message files
make a pretty good persistent queueing mechanism.  Whenever I say
this though, a lot of people start telling horror stories about the
evils of message files and how they are "always getting corrupted".

Back in MPE/IV days when message files were brand new, there certainly
were a lot of bugs, and pre MPE-XL systems may have tended to be more
subject to corruption of message files on system failure.  Things are
much better on /iX thday.  Another problem is that "HP" has never really
understood what a useful utility message files are.  There are a number
of magical things that messag files do which have no simple replacement,
such as guaranteed notification of process termination, "pipes", queue
files, persistent queue files, etc.

As far as I know, it's not possible to corrupt a message file on MPE/iX
with anything less than a system abort (i.e. program aborts are not a
problem) and I'm not sure exactly what windows of vulnerability to
system aborts there are, and what sort of corruption (or simply loss of
data that didn't make it to disk before the system failed) are really
possible.  As I recall, message files (now in Native Mode since ~5.0)
do not support being attached to the transaction manager (an unfortunate
oversight as this ought to have made them virtually failure proof).

A very common reason that people *think* that message files are corrupt
and unreliable is that they simply don't have a clue about programming
with them.  The most common problem is failure to account for FSERR 151:

CURRENT RECORD WAS LAST RECORD WRITTEN BEFORE SYSTEM CRASHED (FSERR 151)

which is a feature of message files, not an indication of corruption.
When a message file is open for write and the system fails, the next
time the file is opened, the last record in the file is marked with a
flag.  When a program later reads that record, the read "fails" with
the above error.  In fact the read didn't fail at all, the system is
just trying to be helpful and let you know where a system interruption
occurred.  This is useful for writing recovery code, detecting partial
transactions, knowing that the "writer IDs" in the data are now going
to start over again, etc.

For most (simple) applications, the correct logic for reading a message
file is to simply ignore FSERR 151.  A common mistake is to treat the
error as a recoverable error and simply issue another read without
processing the data returned with the "error".  This results in a single
lost record every time the error is encountered, which is a tricky bug
that may never be found.

Message files are subject to loss of data not posted to disk when the
machine fails, just like other types of file.  You can use the (relatively
expensive) FCONTROL 6 operation to force all data up to the current
time to be flushed to disk.

G.

ATOM RSS1 RSS2

RAVEN.UTC.EDU