HP3000-L Archives

December 1995, Week 2

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Paul H. Christidis" <[log in to unmask]>
Reply To:
Date:
Thu, 7 Dec 1995 18:23:17 PST
Content-Type:
text/plain
Parts/Attachments:
text/plain (42 lines)
Last night I was not able to dial in to our HP3000 967/RX Mpe/ix 5.0.  I
thought that our modem wan hung and decided to come in a couple of hours
earlier this morning and perhaps reset the modems and DTC's before the
users began logging on.
 
To my surprise I discovered that the system was hung and would not reply to
my attempts of logging on.  The only input from the console that I could
manage were Cntl A and Cntl B.  The console indicators as well as the
lights on the CPU indicated that the machine was running but the last
message on the console was from 22:12 and the fact that the DAT tape had
not been ejected indicated that the backup had not been performed.
 
So I went through the Cntl B /TC sequence started the memory dump and
called HPRC.  They indicated that they would be logging on to check the
dump and I proceeded to reset and reboot the system.  The system was in
operational condition before the users began logging on, but we had to
restart some jobs that did not get to finish.
 
Some hours later the HPRC called back saying that the dump indicated that
ldev 37 was the reason for the hang and were considering to replace it
although no errors were logged against that device and no errors were
detected when they 'exercised' the device.   Since we were hesitant to halt
our production and thus loose today's data entry and all of the recovery
work, the HPRC disk specialists and dump analysts had another huddle and
decided that the situation did not warrant changing the ldev.   They
indicated that while they were 100% sure that actions involving ldev 37
were the reason for the hang, they had no idea what the initial cause was
nor could they rule out the possibility that it was software.  The
potential list included: a bad controller, bad cable, bad scsi terminator,
intermittent power supply, faulty scsi driver, etc.....
 
While I feel better dumping all this here, I also have that sinking feeling
that this thing will occur again and perhaps at an even more critical
situation.
 
Has anyone else out there had anything similar?  Did it reoccur?  Was it
ever resolved? etc?  ... ?
 
Thanks
 
Paul H. Christidis

ATOM RSS1 RSS2