HP3000-L Archives

July 1995, Week 2

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
[log in to unmask][log in to unmask]
http://www.allegro.com/sieler.html40_11Jul199509:53:[log in to unmask]
Date:
Thu, 13 Jul 1995 10:06:21 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (96 lines)
Item Subject: Message text
Jeff wrote:
> On Wed, 12 Jul 1995 18:50:49 -0700 <[log in to unmask]> said:
> >I think this would prevent 50-98% of all instances of "corruption after
> >system failures" that people report these days for Message Files. :-)
>
> True, and I'll take your word for it; but given bad variable BLOCK structure,
> I was concerned about the possibility of records within the block (the block
> length might be invalid, but perhaps contain some legitimate variable records)
> that would be flushed if you couldn't read the block.  But again, I'll take
> your glowing-in-the-dark-by-now word for it :-)
 
Well, if you *are* seeing anything about BAD VARIABLE BLOCK STRUCTURE, then you
probably *do* have a problem with the internal structure of your message files
and they should probably be rebuilt. Please note that I have seen *Circular*
files have this type of problem very often, but very rarely with MSG files.
 
The error I was referring to was FSERR 151, for which the text is:
 
CURRENT RECORD WAS LAST RECORD WRITTEN BEFORE SYSTEM CRASHED
 
Now this is actually a *feature* of message files, which the file system goes
to a lot of trouble to implement for you. Unfortunately it's only marginally
well documented (it would be nice if there was a note in the FREAD doc), and
a lot of people have never heard of it. If you have a message file open with
write access, and the system fails, then you *will* get this error when you
later read data from the file. It does not indicate that there is any problem
with the file, but simply that the record you just read was the last one that
got written to the file before the system crashed.
 
Unfortunately, most generic file reading programs don't know about FSERR 151,
and many programs that are specifically using message files don't either. The
problems arise because the file system has this bit of information it wants
to pass to you, but it has no good way of getting it to you. So when your
program reads a record from the message file that 'was the last one written
before a system failure', the FREAD returns CCL and FSERR 151 to indicate it.
Even though the FREAD returned CCL, the record *was* returned to you, and
also deleted from the message file (unless you had done a non-destructive
read).
 
When a program gets an FSERR 151, it is supposed to ignore it unless the
application has some use for the information that FSERR 151 is giving it.
What usually happens though is that the program aborts when it gets the
unexpected 'error' reading the file. Since the record has actually been read
(and deleted) the user can just restart the program and it will continue
without errors (of course one record from the file has now been lost).
 
So this logic:
 
LOOP
  FREAD MessageFile
  IF condition-code not CCE THEN
    quit with fatal error
  ELSE
    do something with the data
ENDLOOP
 
will result in the above symptoms. The program will crash sometime after
a system abort, losing a record in the process. Restarting the program will
'fix' the problem, probably leaving the user thinking evil things about
message files.
 
This logic 'fixes' the problem too but is actually worse:
 
LOOP
  READANOTHER:
  FREAD Message File
  IF condition code not CCE then BEGIN
    FCHECK to get FSERR
    IF FSERR is 151 then GOTO READANOTHER
    ELSE
      quit with fatal error
  END
  do something useful with the record
ENDLOOP
 
The problem is that the programmer who wrote the above has learned that
FSERR 151s should be ignored, but in his attempt to 'ignore' the error,
he is also throwing away a valid record from the file. A correct way of
dealing with the FSERR 151 is to treat the CCL/FSERR 151 the same as
though a CCE had been returned, for example:
 
LOOP
  FREAD Message File
  IF condition-code is not CCE then BEGIN
    FCHECK to get FSERR
    IF FSERR <> 151 THEN
      quit with fatal error
  END
  process the record normally
ENDLOOP
 
(of course a real program would be cheking for EOF, etc too).
 
G.

ATOM RSS1 RSS2