HP3000-L Archives

August 1999, Week 2

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Sletten Kenneth W KPWA <[log in to unmask]>
Reply To:
Sletten Kenneth W KPWA <[log in to unmask]>
Date:
Mon, 9 Aug 1999 22:32:25 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (83 lines)
For all who are running DLT4000 on their HP 3000's;  that are
connected via SE-SCSI (not FW):

I don't have time to give all the background on this problem before
leaving for HP World, but since we finally have hard confirmation
of the failure from the HP Expert Center, those with DLT4000
mechs may be interested to know:


Five or so months ago now we first experienced a VSTORE
failure on one of our daily backups.  This first happened not too
long after I installed MPE 5.5 PP6 earlier this year (but no hard
evidence of direct correlation).  Since we switched from DDS-2
to DLT4000 a year or so ago, we had been doing a "rolling
VSTORE";  i.e.:  we did not VSTORE every tape (thinking "Oh;
DLT should be quite a bit more reliable than DDS....  NOT SO
FAST !!!);  i.e.:  It's hard to say when we would have seen the
first failure if we had been running VSTORE on EVERY tape.

Anyway, seemingly out of the blue we started getting apparently
random, occasional VSTORE failures.  After the first two we
changed our backup procedures to VSTORE every last backup..
At which point this whole problem started it's long descent into
MANAGER.SYS  hell...  or at least purgatory (since we never had
to try and RESTORE from any of the tapes that failed VSTORE
(and I made special SLT's with full backup when our exposure
even with Mirroring was starting to get out of hand) ).

For a couple weeks maybe 50 percent of the tapes failed VSTORE.
....  Then we went OVER A MONTH with ZERO failures.
....  Then we had a week where four tapes in a row failed VSTORE.
....   After that we had a week where 100 percent passed VSTORE.
....   After that we seemed to oscillate back and forth randomly:
VSTORE pass / fail ratio remained roughly 50-50;  but no "pattern".
....   After that I started wanting not to have to think about it....

It was somewhere after the above when the writer had to admit
he had not the *foggiest* idea of what might be going on...  and
avoided the problem by going to work on house for three weeks
(I know:  inexcusable;  to leave staff with this problem).

A call to the HP RC was launched somewhere in above sequence
(later than it should have been (you don't want to know) ).
Considerable back and forth with the RC.  Got one patch that
supposedly fixed similar problem for one site running Road Runner
(all our VSTORE failures were with vanilla STORE).   This patch
was no help to us....

VSTORE failures occurred on many different new and next to new
DLT cartridges.  Once VSTORE failed it continued to fail on that
tape....  until new backup overwrite of failing instance.  Tapes that
failed ALWAYS passed VSTORE if another test database backup
was done right after initial VSTORE failure on prior backup.

Sent HP RC a sample DLT cartridge that failed VSTORE.  I just
got the word today that the Expert Center confirmed the failure;
fails same way on their machine.  Problem is in the hands of the
Lab for a patch to fix.


THE KICKER  (I have *no* idea how):    Latest best guess from
the Expert Center / Lab is that this VSTORE (RESTORE too, as
far as anyone knows) failure is DATA DEPENDENT !!!!...:  They
*think* if you have just the right combination of a large number of
relatively small files and / or one or more large files with a lot of
white space, it can trigger this failure (!!!! (end of "!" quota) ).
If the lab is right, at least this provides some explanation of the
seemingly random, "wide" swings in the standard deviation rate
and frequency of failure occurrence:  Number of files and size of
many of same would change incrementally from one backup to
the next;  perhaps pushing us from "pass VSTORE" to fail and
then back again on a day-to-day basis.

The RC only knows of THREE sites in the world including ours
that have seen something like this (why; *why* did our site have
to be so privileged ?!?!)....  If and when we get a patch that fixes
the problem, I will report patch number....  If anybody else out
there has been getting strange DLT4000 (again:  *Not* DLT7000
on FW-SCSI) VSTORE failures, I would be interested in hearing
from you....

Ken Sletten

ATOM RSS1 RSS2