HP3000-L Archives

March 2000, Week 3

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Stan Sieler <[log in to unmask]>
Reply To:
Stan Sieler <[log in to unmask]>
Date:
Tue, 21 Mar 2000 18:48:14 -0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (72 lines)
Re:
> "DISKTRKP", that normally runs at 11:00 PM, consuming 99.3% of one of the CPUs.
> It became obvious that this was the reason for our long job queue.  I contacted
> the RC and after a few attempts we were able to 'kill' the process and 'free up'

would have been interesting to get a couple of stack traces, to see what
the process was doing.

> The next day I got an early morning call from one of the programmers and I was
...
> the 'impeded' state, most of them running 'vtserver'.  It became apparent that

Sounds like the network was hung.

> (Programmers reported
> that their session would also hang after the 'bye' command was issued).

Did any notice if *any* commands worked (e.g., "showtime") ... if any had
tried a non-logoff, non-process running command (or even "<return>") and
gotten a hang, that might point even more to a network problem.

> I issued the shutdown command and proceeded to re-boot the system.   Just after
> the 'Interact with IPL' prompt the system displayed a couple of messages
> indicating that it had begun the 'booting' process and then displayed:
>      IPL error: Bad IPL checksum
> it re-displayed the 'boot' menu and "WARN C5F0" at the console status line.

C5F0 = PRI IPL FAULT

(Primary IPL fault ... whatever that really means!)

> I repeated the process a few times with the same result and thus I called the RC

Any tries with a power cycle in there?

> again.  After some additional diagnostics and attempts to 'bypass' the error it
> was concluded that the only remaining option was to reboot from the latest SLT
> tape and reload the data.

Why reload?  Why not try an UPDATE?

> To my surprise the utility, RR, begun reading the tape, read the directory and
> begun performing all kinds of disk accesses, disk accesses that seemed more
...
> /Rant ON
> I was very upset, needless to say.  It took longer to restore those 5 files of a
> few thousand sectors than it took to store the entire tape of 115,000,000
> sectors.  AND IT DOES NOT MAKE ANY SENSE.  The information needed to determine
> where the 'newer' file resides (tape or disk) exists in the directory that IS
> stored in the beginning of the tape.   I can accept the fact that some

Is this some kind of online backup?  If so, I'd be slightly more
sympathetic towards RR.  If not, then I definitely agree with you!

> what caused the problem.  Is the "Bad IPL checksum" error indicating disk
> corruption?

That would be my first guess.

> Is the impediment of the 'VTSERVER' indicating the same?

Possibly, and possibly not.

> What about the 'runaway' "DISKTRKP"?.

Can't tell at this point.


SS
Stan Sieler                                           [log in to unmask]
www.allegro.com/sieler/wanted/index.html          www.allegro.com/sieler

ATOM RSS1 RSS2