HP3000-L Archives

August 2004, Week 1

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Bill Cadier <[log in to unmask]>
Reply To:
Bill Cadier <[log in to unmask]>
Date:
Sat, 7 Aug 2004 15:21:44 -0600
Content-Type:
text/plain
Parts/Attachments:
text/plain (83 lines)
Craig writes:

> Everyone,
>
> I am wondering where to look.
>
> I have a 979/400 which is being hammered, night and day.
>
> For some reason, in the afternoon, the system seems to lock up...
>
> The console recieves a bunch of NETIPC errors and a bunch of sessions get
> logged off.
>
> The system then continues on its merry way.
>
> I have looked at system tables with TBLMON, NETTOOL.NET/RESOURCE/DISPLAY,
> nothing there.  LINKCONTROL @,A shows no errors.
>
> Any ideas, my next step is to turn on network logging and do an NMDUMP.
>
> I am running out of ideas, this one is frustrating.
>
> TIA,
>
> -Craig

I thought I'd reply to this message over the other where you suspect a memory
leak. It would be hard to say that a memory leak caused the symptoms described
but it could, I suppose.

If a process is leaking memory, that is the number of malloc() calls it makes is
greater than the number of free() calls (a typical cause) then that would use up that
processes heap space. The only effect that I think could be seen externally would
be a diminishing amount of transient space. Eventually a process leaking memory
in this manner will terminate and that will clean up the problem releasing transient
space. If that process abort also causes the session to drop and the connection is
via VT then you'll probably also see the VTERR 42's on the console.

Would that cause the system to lock up? Hard to say. If a large number of processes
all leaked memory and that resulted in transient space rapidly running low I think you
would see other problems such as jobs and sessions unable to log on or process
creation or fork calls failing and so forth.

You said the system was on 6.5 power patch 4. What version of PDC is installed?
Does the system have either MPEKXY0 or the later MPEMXQ1 installed? If not
you might consider first installing the software patch and then also installing the latest
PDC (processor dependent code) when convenient.

On pre 7.5 Hawk boxes (979, 989) it was possible for the system to experience short
"freezes" when the runway bus got extremely busy. The freeze is caused by the busy
bus blocking a PDC call on the monarch (CPU 0). The most common PDC call on the
Hawks is the chassis update call that cycles the FxFF display and checks for overtemp
conditions. That happens every 2 seconds on the monarch and is the likliest candidate
to block if the bus is busy. So in 39.43 PDC that call was relocated out of ROM space
so the call would not generate a bus transaction that could block. And in the kernel we
also made an adjustment similar to one made in HPUX to avoid busying the bus. The
s/w patch by itself will help but the complete fix is to have both the patch and a PDC
revision of 39.43 or later.

If the system is seeing this what you will find is that system time may drift slowly
backwards. That's because we do all time calculations on the monarch. PDC calls have
to be made with interrupts disabled so if the PDC call blocks on a busy bus interrupts
are held off one of which is the heartbeat interrupt and that can lead to time drifting
backwards or slowing (cue the physics jokes :-).

I do not recall whether session disconnects were ever a symptom of this but they
could be if the "freeze" exceeds any configured network timer value.

I should add too that this is a very old problem, I  worked on it in 1999 but I noticed
that neither MPEKXY0 nor the later MPEMXQ1 is on a 6.5 power patch so I thought
I'd toss this out as a possible explanation for what you are seeing.

hth

Bill
hp/vCSY
===========================================
  Reply to: bill . vcsy -at- comcast . net
===========================================

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2