HP3000-L Archives

January 2004, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Goetz Neumann <[log in to unmask]>
Reply To:
Goetz Neumann <[log in to unmask]>
Date:
Thu, 22 Jan 2004 02:33:35 -0600
Content-Type:
text/plain
Parts/Attachments:
text/plain (100 lines)
 Harpreet SINGH wrote:

> On HP-967 I got the below error. When I tried to do the Memory Dump and
> again I got the same error. I could not find any errors in the Log files.
> Every time I am getting the same errors.
>
> Please advise what I have to do. The System halted 3 times in a Day. The
> Hardware engineer has checked the logfile and run the Diagnostics on
> Disk'./ But could not find any thing...
>
>  System Abort 1457 from Subsystem 102
>  Secondary Status: info = -55 Subsys = 107
>  System Halt 7, $b05B1
>  FLTBF07
>  FLT0195
>  FLT183

A SA1457 means that a system process (not a user application
program process) terminated due to a trap. It is an abort that
can have a very wide range of causes, one of the most 'generic'
sort of SAs that can happen.  Often it is networking processes
that encounter this error, but there are other system processes
that could get a trap, too.

The secondary status (-55, 107) is (per MSGUTIL)

Enter SUBSYSTEM # [<cr> = quit] >107

Virtual Space Management

Enter MESSAGE # [<cr> = quit] >55
-------------------------------------------------------------------------------
A DATA MEMORY PROTECTION trap was detected because of an invalid target address
alignment.  Virtual Space Management message 55
-------------------------------------------------------------------------------

So some code is trying to access an object at an odd address.

What I find sort of hard to believe is that you are saying that
the SAME system abort happens when you try to take a memory dump.
The ISL > DUMP tool does not run in a process environment, i.e.
there is only one program executing, and no system process to trap.

Nevertheless if ISL > DUMP also gives you a system abort, then
this is ugly because with a SA1457 you really need to get some
more information from a dump to find the cause.

So the options are to either
- troubleshoot the DUMP problem, so that you can successfully
   get a memory dump for analysis. e.g. looking at the messages
   that the DUMP tool gives, you might be able to determine if
   it is evtl. aborting when trying to collect (swap/transient)
   information from a particular disk LDEV.  It might also spit
   out a useful error message before hitting the SA.     or
- try to collect information with ISL > SAT  rather them DUMP,
   i.e. do your CTRL-B TC, then boot from primary and at the ISL
   prompt launch SAT instead of DUMP.
   SAT will at least allow you to get a stack trace of the abort.

   Depending on how many CPUs you have (I know a 967 only has one
   but I am giving more general advice here) you would repeat the
   following

   nmsat > cpu 0
   nmsat > tr,i,d ; dr
   nmsat > cpu 1
   nmsat > tr,i,d ; dr
   ...
[  repeat until (num_cpu -1) .]

One of these traces (from the tr,i,d commands) from one particular
CPU will have a line at the top showing the procedure name
'system_abort'. You could get additional information by switching
back to that CPU :

   nmsat > cpu X

Since in your case the process running on that CPU got a
data memory protection trap, there should be a trap marker
(interrupt marker) in the stack trace, and you can also
collect the register contents as they were when the trap
happened by :

   nmsat > lev 0,1
   nmsat > dr

Provided with that information your HP Response Center should
be able to narrow down the possible causes of the SA1457 quite
a bit usually. (there can be exceptions to that rule of thumb
so, for example a corrupted stack can mean that you cannot get
any good stack trace).


HTH,

Goetz

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2