LISTSERV - HP3000-L Archives

HP3000-L Archives

July 2004, Week 5

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L July 2004, Week 5

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Frustration Level = Max
From:	"Paul, Guy (San/Storage Delivery)" <[log in to unmask]>
Reply To:	Paul, Guy (San/Storage Delivery)
Date:	Thu, 29 Jul 2004 13:37:10 -0700
Content-Type:	text/plain
Parts/Attachments:	text/plain (112 lines)

In looking at your ITRC entry from March it appears you have
a lot of things involved here that could cause a problem.

- 1TB raw/740GB useable VA7100
- Brocade 2800 san switch
- Vicom A5814A-003 SCSI-FC routers
- 989KS 8x50GB ldevs=400GB (one volumeset)
- 969KS 4x60GB ldevs=240GB (one volumeset)

The VA should have a minimum HP18 firmware. HP19 is available that fixes
some
performance issues.

The Brocade should have 2.6.0 firmware and you should have
single-initiator zoning
in effect to keep the scsi-fc routers from seeing each other. This will
cause all
kinds of bad things. See Jim Hawkins article:

http://jazz.external.hp.com/mpeha/papers/router_paper01.htm

Vicom A5814A-003 router should be on 8.01.0C 

Your ldevs are very large as are your volumesets. With FWSCSI I have
found that
50-70GB per FWSCSI card and <200GB per volume set is best for
performance.

My suspicion is your zoning is allowing your scsi-fc routers to trade
maps.

HTH,

Guy Paul, BCFP & MCP
San/Storage



> -----Original Message-----
> From: HP-3000 Systems Discussion 
> [mailto:[log in to unmask]] On Behalf Of Baker, Mike L.
> Sent: Thursday, July 29, 2004 12:27 PM
> To: [log in to unmask]
> Subject: Frustration Level = Max
> 
> We have two hp3k's, 969 and 989.  Both share disc space on a 
> va7100.  We have 6 of the scsi/fiber routers (2 - 969, 4 - 
> 989).  Last night again, both boxes "froze".  Solution (well, 
> not really), one by one, powercycle those 6 routers.  Both 
> boxes "appear" to pick up where they left off.  But not 
> really.  There are "gaps" in record counts in datasets.  This 
> problem has been happening since we went to the va7100 the 
> end of June 2003.  Every few weeks, we experience this 
> "freeze".  Our hardware vendor (Service Express), working 
> with our software vendor (HP),  just changed out a bunch of 
> hardware two weeks ago on a Sunday morning.  The problem is 
> still happening.  I guess what out hardware support vendor 
> and HP don't seem to realize is THAT IT IS VERY CRITICAL THAT 
> OUR PRODUCTION HP989 NOT GO DOWN - AT ALL.  Even this 
> "freeze" is unacceptable.  Why has the problem not been 
> solved yet?  This is very frustrating.  I'm sure the answer 
> is that we are causing it ourselves, we are using the box in 
> a way it was not intended to be used.  Most of you that have 
> seen my posts to the list know that we have an application 
> that was written in a unique way.  About 300 or so clients, 
> so about 300 or so image databases (of course, NO logging 
> with that amount).  Batch environment.  Make a copy of the 
> database (using adager), post the changes to the copy, if 
> everything is ok, purge the original and rename the "trial" 
> database as the original.  That's basically how it works, and 
> there are some other fun things that I won't go into, for 
> 300+ clients each night, plus maybe a couple of times during 
> the day depending on what's happening (i.e. copy, post, 
> purge, rename).  And this on a va7100 (autoraid = pain [when 
> you copy, post, purge, rename all the time]).
> 
> Anyway, to make a long story short, all you mpe internal 
> folks.  For this "hang" thing, do you think there is any way 
> that at night, when there are all the copies going on, and we 
> know the disc queue writes peg at 100% or more for long 
> periods of time, do you think that whatever tries to keep 
> tract of what is queued to go runs out of queue space, and 
> the boxes just freeze?  Why does this happen to the 969 at 
> the same time, the development box that is sitting there 
> minding it's own business?  This is just frustrating, 
> baffling and perplexing all at the same time.  And everyone 
> here just wants the problem to go away.  So, we have called 
> the hardware vendor back, I still have not heard what the 
> next step is.  The big bosses want concrete answers, not 
> "feelings", before they spend money.  We have Beachglen 
> monitoring the system now too, so hopefully they can find 
> some trend or some thing that will bring us closer to a 
> resolution.  We are on 7.0, they looked at the patches, 
> etc...all that stuff.  Replaced a bunch of hardware and 
> controllers.  We know we need to revamp the app to do less 
> disc i/o, that is not an easy task with two programmers, and 
> an army working on the unix/oracle side on the "new" app that 
> was supposed to be ready 5 years ago.
> 
> Sorry to ramble, folks.  Just trying to make sense out of 
> this on a box that has been so reliable in the past.  Not 
> used to this.
> 
> Mike "just wanted to share the pain" Baker
> 
> * To join/leave the list, search archives, change list settings, *
> * etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
> 

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2

RAVEN.UTC.EDU