HP3000-L Archives

July 2004, Week 5

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Johnson, Tracy" <[log in to unmask]>
Reply To:
Johnson, Tracy
Date:
Thu, 29 Jul 2004 15:06:42 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (87 lines)
I presume there is some "lock-out" method going on during purge and rename?
And that there was some sort of (online) backup method on the original that 
completed just before the cursed event?  

I may sound obvious but one never knows.

BT


Tracy Johnson
MSI Schaevitz Sensors 

> -----Original Message-----
> From: HP-3000 Systems Discussion [mailto:[log in to unmask]]On
> Behalf Of Baker, Mike L.
> Sent: Thursday, July 29, 2004 2:27 PM
> To: [log in to unmask]
> Subject: [HP3000-L] Frustration Level = Max
> 
> 
> We have two hp3k's, 969 and 989.  Both share disc space on a 
> va7100.  We have 6 of the scsi/fiber routers (2 - 969, 4 - 
> 989).  Last night again, both boxes "froze".  Solution (well, 
> not really), one by one, powercycle those 6 routers.  Both 
> boxes "appear" to pick up where they left off.  But not 
> really.  There are "gaps" in record counts in datasets.  This 
> problem has been happening since we went to the va7100 the 
> end of June 2003.  Every few weeks, we experience this 
> "freeze".  Our hardware vendor (Service Express), working 
> with our software vendor (HP),  just changed out a bunch of 
> hardware two weeks ago on a Sunday morning.  The problem is 
> still happening.  I guess what out hardware support vendor 
> and HP don't seem to realize is THAT IT IS VERY CRITICAL THAT 
> OUR PRODUCTION HP989 NOT GO DOWN - AT ALL.  Even this 
> "freeze" is unacceptable.  Why has the problem not been 
> solved yet?  This is very frustrating.  I'm sure the answer 
> is that we are causing it ourselves, we are using the box in 
> a way it was not intended to be used.  Most of you that have 
> seen my posts to the list know that we have an application 
> that was written in a unique way.  About 300 or so clients, 
> so about 300 or so image databases (of course, NO logging 
> with that amount).  Batch environment.  Make a copy of the 
> database (using adager), post the changes to the copy, if 
> everything is ok, purge the original and rename the "trial" 
> database as the original.  That's basically how it works, and 
> there are some other fun things that I won't go into, for 
> 300+ clients each night, plus maybe a couple of times during 
> the day depending on what's happening (i.e. copy, post, 
> purge, rename).  And this on a va7100 (autoraid = pain [when 
> you copy, post, purge, rename all the time]).
> 
> Anyway, to make a long story short, all you mpe internal 
> folks.  For this "hang" thing, do you think there is any way 
> that at night, when there are all the copies going on, and we 
> know the disc queue writes peg at 100% or more for long 
> periods of time, do you think that whatever tries to keep 
> tract of what is queued to go runs out of queue space, and 
> the boxes just freeze?  Why does this happen to the 969 at 
> the same time, the development box that is sitting there 
> minding it's own business?  This is just frustrating, 
> baffling and perplexing all at the same time.  And everyone 
> here just wants the problem to go away.  So, we have called 
> the hardware vendor back, I still have not heard what the 
> next step is.  The big bosses want concrete answers, not 
> "feelings", before they spend money.  We have Beachglen 
> monitoring the system now too, so hopefully they can find 
> some trend or some thing that will bring us closer to a 
> resolution.  We are on 7.0, they looked at the patches, 
> etc...all that stuff.  Replaced a bunch of hardware and 
> controllers.  We know we need to revamp the app to do less 
> disc i/o, that is not an easy task with two programmers, and 
> an army working on the unix/oracle side on the "new" app that 
> was supposed to be ready 5 years ago.
> 
> Sorry to ramble, folks.  Just trying to make sense out of 
> this on a box that has been so reliable in the past.  Not 
> used to this.
> 
> Mike "just wanted to share the pain" Baker
> 
> * To join/leave the list, search archives, change list settings, *
> * etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
> 

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2