LISTSERV - HP3000-L Archives

HP3000-L Archives

November 1995, Week 4

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L November 1995, Week 4

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Long Time to recover from System Abort
From:	Steve Cole <[log in to unmask]>
Reply To:	[log in to unmask]
Date:	Mon, 27 Nov 1995 20:26:30 -0500
Content-Type:	text/plain
Parts/Attachments:	text/plain (80 lines)

In a message dated 95-11-27 04:45:32 EST, [log in to unmask] (Bryan
O'Halloran) writes:

>It is taking me around two hours to recover from each system abort and that
>time is viewed by my manglement as excessive.  (MVS people get that way)  I
>have only
>moved our production system to MPE/iX 5.0 a week ago and so far we have had
>two
>system aborts, one system hang and we mislaid two of our three processors on
>a
>reboot.
>
>Breaking the time down a bit further the big problem seems to be composed of
>a few
>small problems.
>
>(1)  It takes around 30 minutes to reboot an Emerald on the best of days.
>It is
>     one of the slowest beasts in the HP firmament.
>
>(2)   Dump to disc is still real slow and takes another half hour to toss a
>dump on
>      to two SCSI discs.
>
>(3)   Being ultra cautious, I had set our databases to Mustrecover and use
>DBRECOV
>      rather than using dynamic roll back recovery to recover our databases.
>This
>      seems to take another 45 minutes.
>
>I have already asked for guidence concerning database recovery but I guess I
>still
>need to cut down on the time taken to prepare a dump for HP.  Would using
DDS
>2
>drives be faster than using the 2 gb SCSI disc drives?
>
>Thanks for any suggestions on improving this time.
>
>Regards
>
>Bryan O'Halloran
>
>

I have experienced exactly what you are describing but when I
analyzed what was going on I made several changes that
significantly reduced the downtime.

(1).  Over a two year period I tracked the number of System
Aborts and the System Abort number.  The results of the
analysis indicated that 80% of the failures were non-repeating.
By taking a dump on every failure we extended the system
downtime for problems that would not reoccur.  We dumped
the system on any second occurrance of a failure.

This process not only improved the downtime of the system
but also improved reliability.  By dumping the system on the
second failure we only applied patches to fix problems that
were actually impacting us rather than applying a patch for
every failure.  Over the years I have found that the fewer number
of patches you have to apply the more reliable the OS is.

(2). Without knowing the nature of your transaction the following
comments may or may not apply.  Our environment operated on
300Gb of data bases spread across (3) 99x systems.  We found
the transaction manager to be extremely reliable.  After a failure
we were able to reboot and let the users on.  We had (2) bases
 out of our 300+ setup to do dynamic rollback recovery.  We
experienced no detectable data corruption with this method.

Each data base and transaction needs to be evaluated.  If recovery
is required after every failure then rollback recovery isa faster way.

In regards to using the  DDS tape over 2Gb disc----I think that whatever
you use will be faster than DDS.

Steve Cole
Outer Banks Solutions

ATOM RSS1 RSS2

RAVEN.UTC.EDU