LISTSERV - HP3000-L Archives

HP3000-L Archives

November 1998, Week 3

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L November 1998, Week 3

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Re[2]: System Shutdown Command Script (take two)
From:	John Korb <[log in to unmask]>
Reply To:	John Korb <[log in to unmask]>
Date:	Fri, 20 Nov 1998 10:42:27 -0500
Content-Type:	text/plain
Parts/Attachments:	text/plain (83 lines)

At 11/20/98 08:30 AM , Simpkins, Terry wrote:
>Paul H. Christidis  says:
>>then I'd like to also include some mechanism
>> for distinguishing restarts due to system failures.
>
>ME TOO!! boy would that be nice for remembering to recover all
>those KSAM files that were open at the time of the crash.
>Of course this happens sooooooo seldom, that you tend to forget
>things.  (or is that the age?)

Our (Navy) application had to determine whether the application was
properly shut down, whether the application aborted, or whether the system
failed while the application was running.

We ended up using a "status" record, some Global RINs, some RIN locking
mechanisms, and some common code to ensure that the applications were
properly recovered/restarted after application or system failures.

The mechanisms we used have worked very well with batch applications and
were fairly simple.  The mechanisms used with interactive applications were
more complicated because we ended up adding some "brokering functions" (for
passing primary control of the Global RINs to another process when the
first user of the application exits from the application).

Below is a very simplified description of the mechanism used for batch:

o  When the application opens the database (or KSAM file), it updates
   a "status" record to indicate the application is "running" and the
   current system "cold load ID" value.  The application also locks a
   global RIN.

o  Duplicate runs of the application are prevented because the duplicate
   run can't lock the global RIN.

o  When the application shuts down normally, it updates the "status"
   record, setting status to indicate "not running" and changes the value
   it has for the "cold load ID" to zero.

o  If the application aborts, the status record still has the flag set
   that says the application is running, and it has the system "cold
   load ID" under which the application last ran.

o  When the application is restarted, it sees the status record that
   indicates the application is running, but tries to lock the global
   RIN anyway.  Since the application isn't running, it CAN lock the
   global RIN, which tells the application that something is wrong.
   The application then checks some things and decides that the
   previous run must have aborted.  It performs maintenance, and if
   the maintenance completes successfully, restarts itself for a normal
   run.  If maintenance fails, it makes sure everyone knows about it,
   and posts a PRINTOPREPLY to the console.  The operators know better
   than to reply to the PRINTOPREPLY request, so it sits out there until
   someone fixes the problem.

o  If the system fails while the application is running, something
   similar to the above application abort recovery, except that the
   application detects that the "cold load ID" doesn't match that
   on the status record, so it assumes that the system has failed
   and responds accordingly.

o  Sometimes there is an application failure and the system is rebooted
   before the problem is fixed.  The application then tends to think that
   the application didn't fail, but rather that the system failed while
   the application was running.  Thus, the application's recovery code
   has to not do any damage by assuming that the system failed.

The above description is a simplified version of the methodology used, but
it should give you some ideas as to how you might approach determining
whether:

   1) the application shut down normally,
   2) the application failed, or
   3) the system failed while the application was running.

John

--------------------------------------------------------------
John Korb                            email: [log in to unmask]
Innovative Software Solutions, Inc.

The thoughts, comments, and opinions expressed herein are mine
and do not reflect those of my employer(s), or anyone else.

ATOM RSS1 RSS2

RAVEN.UTC.EDU