HP3000-L Archives

February 2003, Week 1

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Atwood, Tim (DVM)" <[log in to unmask]>
Reply To:
Atwood, Tim (DVM)
Date:
Fri, 7 Feb 2003 15:13:00 -0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (64 lines)
Need help on diagnosing a possible problem with drives and/or directories on
a user volume. If at all possible without bringing the computer down.

HP3000 928RX.
MPE/iX 6.0 PP2.
Old HP3000 disc enclosure with (4) 2GB and (1) 4GB drives mounted as
"USER_VOL_SET_1".
LDEVs 11,12,13 & 14 = 2GB, LDEV 15 = 4GB

User volume is used for Backup to disc. Using Orbit Backup+/iX. System
volume set gets backed up to file(s) on the user volume set.

At 6:30 PM the backup is running. Drives in USER_VOL_SET_1 are being
accessed by Orbit Backup.
Power is accidentally cut off to the HP6000 enclosure containing the drives
in the user volume set.
All other equipment on this HP3000 remains on.
Orbit Backup of course immediately hangs.
Problem is quickly recognized and the user volume discs are powered back on.
Discs appear to spin up as normal. But Backup does not un-hang.
Worse - everything else on the HP3000 now hangs. Within moments entire
system is completely frozen. No system halt messages, just hung.
I reboot the HP3000.

When HP3000 first comes back up, it does not recognize LDEV 15, the 4GB
drive within the user volume set. LDEV 15 unavailable for mounting.
30 seconds later HP3000 seems to recognize LDEV 15 all by itself. It gives
the usual mount messages which would appear if one had suddenly attached a
user volume to a running system.
Once HP3000 is back up and running, I do :DSTAT ALL and LISTF on user volume
files. Everything appears OK.

I re-stream the backup to run at 1:00 AM (It would interfere with jobs
running between 7:00 and midnight).

At 1:00 AM Orbit Backup starts.
As soon as the backup attempts to access USER_VOL_SET_1, the HP3000 halts
with the following system abort:

SYSTEM ABORT 2052
FROM SUBSYSTEM 145
SYSTEM HALT 7, $0804

I reboot the system. I have not done anything to access USER_VOL_SET_1 yet
today. the system has been running fine since just after 2:00 AM. But I have
not done anything with USER_VOL_SET_1 for fear of crashing the system again.

I am unsure how to proceed with checking and fixing whatever is bad on
USER_VOL_SET_1. This is a mission critical 24/7 production computer. The
downtimes which have already occurred put me well over my allowed downtime
metrics. It going down again would really p... off a lot of people.

So is there anyway to safely do the following without bringing the computer
down?
1. Check the drives for potential problems - bad sectors, etc.
2. Check the directory structure on USER_VOL_SET_1.
3. Check the file structure on USER_VOL_SET_1
4. Otherwise diagnose the problem.
5. Purge and rebuild any corrupted directory structure on USER_VOL_SET_1 if
appropriate.

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2