I presume there is some "lock-out" method going on during purge and rename?
And that there was some sort of (online) backup method on the original that
completed just before the cursed event?
I may sound obvious but one never knows.
BT
Tracy Johnson
MSI Schaevitz Sensors
> -----Original Message-----
> From: HP-3000 Systems Discussion [mailto:[log in to unmask]]On
> Behalf Of Baker, Mike L.
> Sent: Thursday, July 29, 2004 2:27 PM
> To: [log in to unmask]
> Subject: [HP3000-L] Frustration Level = Max
>
>
> We have two hp3k's, 969 and 989. Both share disc space on a
> va7100. We have 6 of the scsi/fiber routers (2 - 969, 4 -
> 989). Last night again, both boxes "froze". Solution (well,
> not really), one by one, powercycle those 6 routers. Both
> boxes "appear" to pick up where they left off. But not
> really. There are "gaps" in record counts in datasets. This
> problem has been happening since we went to the va7100 the
> end of June 2003. Every few weeks, we experience this
> "freeze". Our hardware vendor (Service Express), working
> with our software vendor (HP), just changed out a bunch of
> hardware two weeks ago on a Sunday morning. The problem is
> still happening. I guess what out hardware support vendor
> and HP don't seem to realize is THAT IT IS VERY CRITICAL THAT
> OUR PRODUCTION HP989 NOT GO DOWN - AT ALL. Even this
> "freeze" is unacceptable. Why has the problem not been
> solved yet? This is very frustrating. I'm sure the answer
> is that we are causing it ourselves, we are using the box in
> a way it was not intended to be used. Most of you that have
> seen my posts to the list know that we have an application
> that was written in a unique way. About 300 or so clients,
> so about 300 or so image databases (of course, NO logging
> with that amount). Batch environment. Make a copy of the
> database (using adager), post the changes to the copy, if
> everything is ok, purge the original and rename the "trial"
> database as the original. That's basically how it works, and
> there are some other fun things that I won't go into, for
> 300+ clients each night, plus maybe a couple of times during
> the day depending on what's happening (i.e. copy, post,
> purge, rename). And this on a va7100 (autoraid = pain [when
> you copy, post, purge, rename all the time]).
>
> Anyway, to make a long story short, all you mpe internal
> folks. For this "hang" thing, do you think there is any way
> that at night, when there are all the copies going on, and we
> know the disc queue writes peg at 100% or more for long
> periods of time, do you think that whatever tries to keep
> tract of what is queued to go runs out of queue space, and
> the boxes just freeze? Why does this happen to the 969 at
> the same time, the development box that is sitting there
> minding it's own business? This is just frustrating,
> baffling and perplexing all at the same time. And everyone
> here just wants the problem to go away. So, we have called
> the hardware vendor back, I still have not heard what the
> next step is. The big bosses want concrete answers, not
> "feelings", before they spend money. We have Beachglen
> monitoring the system now too, so hopefully they can find
> some trend or some thing that will bring us closer to a
> resolution. We are on 7.0, they looked at the patches,
> etc...all that stuff. Replaced a bunch of hardware and
> controllers. We know we need to revamp the app to do less
> disc i/o, that is not an easy task with two programmers, and
> an army working on the unix/oracle side on the "new" app that
> was supposed to be ready 5 years ago.
>
> Sorry to ramble, folks. Just trying to make sense out of
> this on a box that has been so reliable in the past. Not
> used to this.
>
> Mike "just wanted to share the pain" Baker
>
> * To join/leave the list, search archives, change list settings, *
> * etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
>
* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
|