LISTSERV - HP3000-L Archives

HP3000-L Archives

July 2004, Week 5

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L July 2004, Week 5

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Frustration Level = Max
From:	George Willis <[log in to unmask]>
Reply To:	George Willis <[log in to unmask]>
Date:	Thu, 29 Jul 2004 14:30:56 -0500
Content-Type:	text/plain
Parts/Attachments:	text/plain (88 lines)

Mike - can't help you with the 'hang' due to it sounds like it you need
to further isolate the problem to the routers, bandwidth capacity, or
disc hardware (we use a cheap, reliable, HP Model 12H smart array on our
production systems). However, I can offer you a thought to help with the
overhead of copying, purging, renaming databases. On our system we are
one client with 9 rather large databases that is logging to a single
transaction log. Further more, we are shadowing real-time transactions
to another HP box at our DRC site. You may want to re-visit setting up
Image Logging on some of your 300 databases so that you can reduce the
DISKIO overhead of managing your trial databases and depend on the
reliable DBRECOVER methods of database recovery - yes there is some
overhead incurred that might be mitigated with more memory/cpu. Coupled
with online backup software, you will then be able to spend time
managing the system with some ease and concentrate on Oracle apps.

George Willis
Fayez Sarofim & Co.
713-308-2803


-----Original Message-----
From: HP-3000 Systems Discussion [mailto:[log in to unmask]] On
Behalf Of Baker, Mike L.
Sent: Thursday, July 29, 2004 1:27 PM
To: [log in to unmask]
Subject: [HP3000-L] Frustration Level = Max

We have two hp3k's, 969 and 989.  Both share disc space on a va7100.  We
have 6 of the scsi/fiber routers (2 - 969, 4 - 989).  Last night again,
both boxes "froze".  Solution (well, not really), one by one, powercycle
those 6 routers.  Both boxes "appear" to pick up where they left off.
But not really.  There are "gaps" in record counts in datasets.  This
problem has been happening since we went to the va7100 the end of June
2003.  Every few weeks, we experience this "freeze".  Our hardware
vendor (Service Express), working with our software vendor (HP),  just
changed out a bunch of hardware two weeks ago on a Sunday morning.  The
problem is still happening.  I guess what out hardware support vendor
and HP don't seem to realize is THAT IT IS VERY CRITICAL THAT OUR
PRODUCTION HP989 NOT GO DOWN - AT ALL.  Even this "freeze" is
unacceptable.  Why has the problem not been solved yet?  This is very
frustrating.  I'm sure the answer is that we are causing it ourselves,
we are using the box in a way it was not intended to be used.  Most of
you that have seen my posts to the list know that we have an application
that was written in a unique way.  About 300 or so clients, so about 300
or so image databases (of course, NO logging with that amount).  Batch
environment.  Make a copy of the database (using adager), post the
changes to the copy, if everything is ok, purge the original and rename
the "trial" database as the original.  That's basically how it works,
and there are some other fun things that I won't go into, for 300+
clients each night, plus maybe a couple of times during the day
depending on what's happening (i.e. copy, post, purge, rename).  And
this on a va7100 (autoraid = pain [when you copy, post, purge, rename
all the time]).

Anyway, to make a long story short, all you mpe internal folks.  For
this "hang" thing, do you think there is any way that at night, when
there are all the copies going on, and we know the disc queue writes peg
at 100% or more for long periods of time, do you think that whatever
tries to keep tract of what is queued to go runs out of queue space, and
the boxes just freeze?  Why does this happen to the 969 at the same
time, the development box that is sitting there minding it's own
business?  This is just frustrating, baffling and perplexing all at the
same time.  And everyone here just wants the problem to go away.  So, we
have called the hardware vendor back, I still have not heard what the
next step is.  The big bosses want concrete answers, not "feelings",
before they spend money.  We have Beachglen monitoring the system now
too, so hopefully they can find some trend or some thing that will bring
us closer to a resolution.  We are on 7.0, they looked at the patches,
etc...all that stuff.  Replaced a bunch of hardware and controllers.  We
know we need to revamp the app to do less disc i/o, that is not an easy
task with two programmers, and an army working on the unix/oracle side
on the "new" app that was supposed to be ready 5 years ago.

Sorry to ramble, folks.  Just trying to make sense out of this on a box
that has been so reliable in the past.  Not used to this.

Mike "just wanted to share the pain" Baker

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

Confidentiality Notice:

Unless otherwise indicated or obvious from the nature of the communication, the information contained in this email message is confidential information intended for the use of the individual or entity named above. If the reader of this message is not the intended recipient, or the employee or agent responsible to deliver it to the intended recipient, you are hereby notified that dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately return the original message to Fayez Sarofim & Co. at [log in to unmask] and delete the copy received in error. Thank you.

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2

RAVEN.UTC.EDU