HP3000-L Archives

March 2005, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
John Clogg <[log in to unmask]>
Reply To:
John Clogg <[log in to unmask]>
Date:
Wed, 23 Mar 2005 10:43:59 -0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (83 lines)
We had a similar problem with DLT drives.  In our case it was not caused
by the drive itself, so I don't think the type of medium matters.  This
may or may not relate to your situation, but here is what was happening:

As a prelude to performing the synchronization point of our online
backup each night, we shut down all jobs and sessions (except the backup
job) and shut down networking (NETCONTROL STOP).  We learned that any
time you stop and restart networking, it is necessary to restart the STM
monitor processes used by the CSTM diagnostics, so we run STMSHUT.DIAG
before shutting down the network, and run STMSTART.DIAG after restarting
it.  This was the source of the problem.  When the diagnostic monitor is
started by STMSTART (or when the system is started), it goes through a
hardware mapping process.  That process runs diagnostics on peripherals
as it encounters them.  If you mount a tape while that is happening, the
diagnostic process and the AVR (automatic volume recognition) process
get into some kind of deadlock, and a reboot of the system is the only
way to get the drive back.

The solution for us was to forbid any tape mounts or other use of the
tape drives for several minutes after restarting the diagnostic monitor.
It's kind of like not swimming after eating.  

I hope this is helpful to someone!

John
-----Original Message-----
From: HP-3000 Systems Discussion [mailto:[log in to unmask]] On
Behalf Of Dave Powell, MMfab
Sent: Tuesday, March 22, 2005 3:25 PM
To: [log in to unmask]
Subject: Hung tape enhancement request

If it's not too late to make requests...   How about fixing whatever
causes
our tape drive to hang about 2 or 3 times a year.  HP seems to think it
is
tied in with the ghost-session problem that others have mentioned..
Might be,
but we never have anything hang except the tape drive.

Details from last night's hang:
A500,  MPE 7.5 pp2 (just updated Saturday),  DDS-3, an old tape (used
about 20
times with no prior problems)
Drive now shows as 'UNAVAIL', owned by 'SYS'.
Backup ran normally.  Finished 11:05 pm.
Drive failed to come back on line.  Backup job noticed the drive was
"UNAVAIL", waited about 10 times longer than it usually takes, then
started
the verify anyway, at 11:16 pm.
Verify said "DEVICE UNAVAILABLE  (FSERR 55)" and " VSTORE ENCOUNTERED
FOPEN
FAILURE ON DEVICE FILE "T"  (S/R 2213)".  Then it just sat there.
At 2:51 a watchdog job concluded that the backup job was hung, did a
showdev 7
(it was 'UNAVAIL', owned by 'SYS', issued a few abortios on it (but
never got
the "no io to abort" message), then abortjobed it.  Job aborted ok first
time.

This morning I fed it ever-increasing abortio while-loops, and finally
got the
'no io to abort' message after about 2,600 total abortios.  But that
does not
mean that there were 2600 ios to abort.  If I do a single abortio at the
console, I still don't get the 'no io to abort' message.  Further while
loops
in jobs with no pauses seem get the warning after between 50 & several
hundred
repetitions.  2600 just happens to be how many total tests it took for
me to
get impatient enough to feed it a big enough while-loop with no pauses.
Tape
drive is STILL 'UNAVAIL', owned by 'SYS'.

Dave ("planning to reboot tonight") Powell,  MMfab

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2