HP3000-L Archives

December 1995, Week 2

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
Date:
Fri, 8 Dec 1995 09:12:57 PST
Content-Type:
text/plain
Parts/Attachments:
text/plain (84 lines)
Yes, we've had several incidents similar to this where the RC was
unable to specifically point to a device/cable/DA/etc. as the culprit.
Occasionally, our local office has "shotgunned" fixes - i.e., blindly
replaced components outside the disk mech itself (you didn't state what
model disk drive was involved) - even replacing the servo and
read/write controllers on the drive.  Occasionally, however -
especially with our older Eagles (793x/FL) - the mech has had to be
replaced after all other attempts to fix the problem failed, but (and
this is outside the scope of your question, but I have to mention it!)
we've avoided system or volumeset reloads by using DTDUTIL via SYSDIAG
to snap the data off the disk to tape, first, then reloading the data
back to the drive after the mech was replaced.  This won't work on
LDEV1 (I also don't know if SCSI devices are supported - ours are all
HP-FL) and it's doubtful in the event of a head crash or other physical
media damage.  Also, the system must be started in single-user,
single-disk mode.  This has saved us hours of reloading and recovering
data and databases!!  I'd recommend that you talk to your CE or other
knowledgeable support person about this utility.  If it's not on your
system, ask HP for it!!
 
Good luck!
 
 
=======================================================================
Lee Gunter                           Voice:  503-375-4498
HMO Oregon                             FAX:  503-588-4350
PO Box 12625                        E-Mail:  [log in to unmask]
Salem, OR 97309  USA        Private E-Mail:  [log in to unmask]
=======================================================================
 
The opinions expressed, here, are mine and mine alone, and do not
necessarily reflect those of my employer.
 
 
 
 
______________________________ Reply Separator _________________________________
Subject: System Never Came Back from dinner
Author:  [log in to unmask] at ~INTERNET
Date:    12/7/95 6:36 PM
 
 
Last night I was not able to dial in to our HP3000 967/RX Mpe/ix 5.0.  I
thought that our modem wan hung and decided to come in a couple of hours
earlier this morning and perhaps reset the modems and DTC's before the
users began logging on.
 
To my surprise I discovered that the system was hung and would not reply to
my attempts of logging on.  The only input from the console that I could
manage were Cntl A and Cntl B.  The console indicators as well as the
lights on the CPU indicated that the machine was running but the last
message on the console was from 22:12 and the fact that the DAT tape had
not been ejected indicated that the backup had not been performed.
 
So I went through the Cntl B /TC sequence started the memory dump and
called HPRC.  They indicated that they would be logging on to check the
dump and I proceeded to reset and reboot the system.  The system was in
operational condition before the users began logging on, but we had to
restart some jobs that did not get to finish.
 
Some hours later the HPRC called back saying that the dump indicated that
ldev 37 was the reason for the hang and were considering to replace it
although no errors were logged against that device and no errors were
detected when they 'exercised' the device.   Since we were hesitant to halt
our production and thus loose today's data entry and all of the recovery
work, the HPRC disk specialists and dump analysts had another huddle and
decided that the situation did not warrant changing the ldev.   They
indicated that while they were 100% sure that actions involving ldev 37
were the reason for the hang, they had no idea what the initial cause was
nor could they rule out the possibility that it was software.  The
potential list included: a bad controller, bad cable, bad scsi terminator,
intermittent power supply, faulty scsi driver, etc.....
 
While I feel better dumping all this here, I also have that sinking feeling
that this thing will occur again and perhaps at an even more critical
situation.
 
Has anyone else out there had anything similar?  Did it reoccur?  Was it
ever resolved? etc?  ... ?
 
Thanks
 
Paul H. Christidis

ATOM RSS1 RSS2