In looking at your ITRC entry from March it appears you have
a lot of things involved here that could cause a problem.
- 1TB raw/740GB useable VA7100
- Brocade 2800 san switch
- Vicom A5814A-003 SCSI-FC routers
- 989KS 8x50GB ldevs=400GB (one volumeset)
- 969KS 4x60GB ldevs=240GB (one volumeset)
The VA should have a minimum HP18 firmware. HP19 is available that fixes
some
performance issues.
The Brocade should have 2.6.0 firmware and you should have
single-initiator zoning
in effect to keep the scsi-fc routers from seeing each other. This will
cause all
kinds of bad things. See Jim Hawkins article:
http://jazz.external.hp.com/mpeha/papers/router_paper01.htm
Vicom A5814A-003 router should be on 8.01.0C
Your ldevs are very large as are your volumesets. With FWSCSI I have
found that
50-70GB per FWSCSI card and <200GB per volume set is best for
performance.
My suspicion is your zoning is allowing your scsi-fc routers to trade
maps.
HTH,
Guy Paul, BCFP & MCP
San/Storage
> -----Original Message-----
> From: HP-3000 Systems Discussion
> [mailto:[log in to unmask]] On Behalf Of Baker, Mike L.
> Sent: Thursday, July 29, 2004 12:27 PM
> To: [log in to unmask]
> Subject: Frustration Level = Max
>
> We have two hp3k's, 969 and 989. Both share disc space on a
> va7100. We have 6 of the scsi/fiber routers (2 - 969, 4 -
> 989). Last night again, both boxes "froze". Solution (well,
> not really), one by one, powercycle those 6 routers. Both
> boxes "appear" to pick up where they left off. But not
> really. There are "gaps" in record counts in datasets. This
> problem has been happening since we went to the va7100 the
> end of June 2003. Every few weeks, we experience this
> "freeze". Our hardware vendor (Service Express), working
> with our software vendor (HP), just changed out a bunch of
> hardware two weeks ago on a Sunday morning. The problem is
> still happening. I guess what out hardware support vendor
> and HP don't seem to realize is THAT IT IS VERY CRITICAL THAT
> OUR PRODUCTION HP989 NOT GO DOWN - AT ALL. Even this
> "freeze" is unacceptable. Why has the problem not been
> solved yet? This is very frustrating. I'm sure the answer
> is that we are causing it ourselves, we are using the box in
> a way it was not intended to be used. Most of you that have
> seen my posts to the list know that we have an application
> that was written in a unique way. About 300 or so clients,
> so about 300 or so image databases (of course, NO logging
> with that amount). Batch environment. Make a copy of the
> database (using adager), post the changes to the copy, if
> everything is ok, purge the original and rename the "trial"
> database as the original. That's basically how it works, and
> there are some other fun things that I won't go into, for
> 300+ clients each night, plus maybe a couple of times during
> the day depending on what's happening (i.e. copy, post,
> purge, rename). And this on a va7100 (autoraid = pain [when
> you copy, post, purge, rename all the time]).
>
> Anyway, to make a long story short, all you mpe internal
> folks. For this "hang" thing, do you think there is any way
> that at night, when there are all the copies going on, and we
> know the disc queue writes peg at 100% or more for long
> periods of time, do you think that whatever tries to keep
> tract of what is queued to go runs out of queue space, and
> the boxes just freeze? Why does this happen to the 969 at
> the same time, the development box that is sitting there
> minding it's own business? This is just frustrating,
> baffling and perplexing all at the same time. And everyone
> here just wants the problem to go away. So, we have called
> the hardware vendor back, I still have not heard what the
> next step is. The big bosses want concrete answers, not
> "feelings", before they spend money. We have Beachglen
> monitoring the system now too, so hopefully they can find
> some trend or some thing that will bring us closer to a
> resolution. We are on 7.0, they looked at the patches,
> etc...all that stuff. Replaced a bunch of hardware and
> controllers. We know we need to revamp the app to do less
> disc i/o, that is not an easy task with two programmers, and
> an army working on the unix/oracle side on the "new" app that
> was supposed to be ready 5 years ago.
>
> Sorry to ramble, folks. Just trying to make sense out of
> this on a box that has been so reliable in the past. Not
> used to this.
>
> Mike "just wanted to share the pain" Baker
>
> * To join/leave the list, search archives, change list settings, *
> * etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
>
* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
|