HP3000-L Archives

February 2010, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Robert Mpe <[log in to unmask]>
Reply To:
Robert Mpe <[log in to unmask]>
Date:
Tue, 23 Feb 2010 14:25:32 -0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (313 lines)
Thanks for the responses.

I rebooted again today and found out there is no PDT option under the
Service Menu.

I found a reference to deleting the log file called memlog.

It worked. Now STM tells me the memory log is empty.

I will keep monitoring and replace the memory stick if it shows more
problems.

Procedure used to re-create a new memlog:

Xeq Sh.Hpbin.Sys - L      (get into posix)
cd /var/stm/logs/os       (change dir where the logs are)
ls -l                     (look at the files)
rm memlog.old             (if you have an old one saved)
mv memlog memlog.old      (rename)
touch memlog              (create a new file)
chmod 644 memlog          (set the attributes)
ls -l
exit

Btw, the same thing can be done for "log1.raw.cur".

Thanks again and Cheers!

~Robert


-----Original Message-----
From: HP-3000 Systems Discussion [mailto:[log in to unmask]] On Behalf
Of Gary S Robillard
Sent: Tuesday, February 23, 2010 8:17 AM
To: [log in to unmask]
Subject: Re: How to clear Memory Log

Hello All, 



I don't believe that the 928LX has a PDT (Page Deallocation Table) in PDC 

(Processor Dependent Code).  So the deallocation is due to the logs in the
memlog file. 



Since the DIMM appears to have a solid repeatable single-bit error, it is
going to keep 

being added to the memlog file and being deallocated by memory management
software. 



You might want to consider replacing the DIMM with the error, or adding some


thresholding to your CSTM job ... 



The PDT article in the MPE 5.0 Communicator: 



(Note the last paragraph " MPD and Current Systems ", as it explains how the


 pages are deallocated on systems without a PDT): 

Chapter 10 Technical Articles Memory Page Deallocation (MPD) Steve Flynn
Systems Technology Division MPD and Current Systems This article presents an
overview of Memory Page Deallocation, a new
feature available with MPE/iX Release 5.0.  It does not cover detailed
operation.

When an HP 3000 is upgraded to MPE/iX 5.0, it also benefits from the MPD
software.  Most of the MPD operations described below operate in a
similar manner.  Please refer to the last section of this article for a
discussion of the minor exceptions to MPD operation. Memory Failures. Memory
boards are subject to two types of failures, hard errors and soft
errors.  Hard errors are caused by a single chip failure within a memory
board, causing failures on all words associated with that chip.  Soft
errors occur when a bit within a word changes value.  This is typically
caused by decaying alpha particles from the surrounding casing material
on the chip.

HP's current memory design is single-bit correct, double-bit detect.  It
is important to note that our ECC design does not perform error
correction on the memory cell itself, but fixes the value in the cache
line.  The memory cell still contains the failure.  If this is a soft
failure, the data in memory is corrected when the cache line is written
back to memory.  If this is a hard failure, the memory cell is always in
error.

In either case, if another failure were to occur on the same word, it
would go from single-bit correct to double-bit detect and cause the
system to fail the next time the word is read.  The purpose of page
deallocation is to permanently remove those pages from memory that
contain single or double bit errors. Components of MPD MPD provides a
mechanism where memory pages containing errors can be made
unavailable for system use.  A memory page is 4k bytes in size and is
deallocated if it contains one of the following errors:

   *   Solid single-bit error

   *   A soft failure re-occurring within a 24-hour period

   *   A double-bit error

Numerous system components work together to implement memory page
deallocation: Page Deallocation Table (PDT). This is a table that contains
an entry for each memory page that has been
deallocated, at some point in time, due to an error.  Each entry contains
the address and the nature of the error (single or double-bit).

One important feature of this table is that it is implemented in
Non-Volatile RAM, thus preserving deallocated pages between system boots. 
NOTE Older systems do not implement the PDT. 
Memory Selftest. Each time the system is reset, the memory selftest
executes.  If it finds
a double-bit error, the address is entered into the PDT along with the
fact that this was a double-bit error. MEMLOGP. The Memory Logging Process,
MEMLOGP, is a process that periodically
(every hour by default) checks the status of each memory controller on
the system for occurrences of single-bit errors. MEMDIAG/LOGTOOL.
Information about deallocated pages is kept in two places, the PDT, which
is NVRAM based, and the MEMLOGP memory log file, which is disk based.

MEMDIAG and LOGTOOL can be used to display the contents of the memory
logfile.  Information such as memory board slot number, physical address,
page number and error type is displayed.  The size of the PDT and number
of entries currently in the table are also displayed. O/S Memory Manager.
The O/S memory manager is involved during two phases, system boot and
while the system is running.

During the early portion of boot, the memory manager reads the PDT and
deallocate any pages found there.

Once the system is up, the memory manager provides services to MEMLOGP to
allow pages to be deallocated online. Predictive. HP Predictive Support
analyzes internal error logs on disk drives, system
log files and memory logs for error trends.  When an error rate exceeds
its threshold, an EVENT is generated.  HP Response Center Engineers and
Customer Engineers analyze event information and take appropriate action
to solve the problem.

MEMSCAN is a software module within Predictive which scans system memory
log files.  MEMSCAN provides page deallocation trending information to
support engineers such as PDT table size status and identification of
boards or banks that have a significant number of pages deallocated.
Bank deallocation or board replacement recommendations occur if the total
number of deallocated pages exceeds a certain threshold. GENERAL OPERATION
PD comes into effect while the system is being started as well as when it
is online.

During system startup, memory is tested and any pages with bad locations
are made unavailable to the system.

While the system is online an attempt is made to correct memory locations
containing soft errors (scrubbing) and deallocated pages online, that
contain solid errors. System Startup. The following shows the general system
startup flow that occurs with
respect to MPD.

   1.  Memory selftest executes.  If any double-bit errors are discovered
       during testing, and there is not an entry in the PDT corresponding
       to this address, an entry is made.

   2.  During the boot process, the Operating System obtains the contents
       of the PDT. Each page in the PDT are made unavailable for
       allocation by the system's memory manager.

   3.  MEMLOGP reads the PDT and add any new PDT entries (discovered by
       selftest) which are not contained in the memory logfile. Online
Operation. The following shows the operation of MPD while the system is
online.

   1.  MEMLOGP wakes up and reads the memory controller status register
       and determines whether a single-bit error has been logged.

   2.  MEMLOGP requests the O/S memory manager to release the page for
       testing.

   3.  If the O/S cannot release the page, MEMLOGP logs the error in the
       memory log file as it does today.

   4.  If the O/S does release the page, MEMLOGP performs a scrubbing
       operation (write/read test) on the page.

   5.  If the single-bit error is reproduced (hard error), the page is
       entered into the PDT and memory log file.  A request is made to
       the O/S memory manager to make this page unavailable for system
       use.

   6.  If the single-bit error is not reproduced (soft error) and another
       soft error WAS DETECTED at this location within 24 hours, the page
       is entered into the PDT and memory log file.  A request is made to
       the O/S memory manager to make this page unavailable for system
       use. MPD and Current Systems The one exception to MPD operation is
that older systems were not
designed with a Page Deallocation Table.  Because of this, the system
startup routine is slightly different.  During system startup if the
memory selftest detects a double-bit error, the system does not boot
(same operation as today), unlike the 3000 991/995.  But, while the
system was running, MEMLOGP was keeping track of deallocated pages in its
disk-based memory log file.  During startup, these pages are deallocated
before the system comes up. 







Thanks, 

Gary Robillard  
----- Original Message ----- 
From: "Raymond D Legault" <[log in to unmask]> 
To: [log in to unmask] 
Sent: Tuesday, February 23, 2010 6:43:23 AM GMT -07:00 US/Canada Mountain 
Subject: Re: [HP3000-L] How to clear Memory Log 

HP3000 A/N-Class - How to clear entries in PDT table? 

DocId: MPEKBRC00017083   Updated: 7/20/05 4:01:00 AM 

PROBLEM 

What is the procedure to clear the Page Deallocation Table (PDT) on the 
HP3000 A-Class and N-Class series servers? 

For a brief summary of Page Deallocation Table (PDT), refer to the document
ID 
TCKBCA00000264 (Enabling/Disabling/Verifying Page Deallocation Table). 


CONFIGURATION 

A-Class N-Class 

RESOLUTION 

Shut system down. 
Restart system to the Boot Menu. 
At the Main Menu prompt, enter service. 
At the Service Menu prompt, enter pdt to display the PDT entries. 
At the Service Menu prompt, enter pdt clear to clear entries; 
the following is displayed: 
Execution of this command will clear the Page Deallocation Table and then
hard boot the system (memory will be reconfigured on boot) Continue? (Y/N) >

Enter y 
; the following is displayed: Resetting ... .. .. 
********** VIRTUAL FRONT PANEL 
********** SYSTEM BOOT DETECTED LEDs : 
RUN ATTENTION FAULT REMOTE POWER FLASH OFF OFF ON ON LED state: 
Running non-OS code (i.e. BOOT OR DIAGNOSTICS). 
Next, the Main Menu is displayed. From now on, restart the system like you
normally do.   


Ray 

-----Original Message----- 
From: Robert Mpe [mailto:[log in to unmask]] 
Sent: Monday, February 22, 2010 2:22 PM 
Subject: How to clear Memory Log 

How to clear Memory Log 

Friends, 

I have 928LX on 6.5 PP3 with all patches applied. 

We had a memory error 2 weeks ago: 

  Memory Controller in Slot 3A 
  ========================================================== 
  Slot:             3A 
  Error Type:       Single/hard: solid, repeatable single-bit error. 
  Page Status:      Deallocated: page is no longer in use. 
  Bit Num / Bank:   29 / 0 
  Logged By:        Memlogd 
  First Detected:   Sat Feb  6 14:08:18 2010 
  Last Detected:    Sat Feb  6 14:10:18 2010 
  Error Count:      2 
  Error Addr:       0x4cd1068 
  ========================================================== 

I have had other memory errors that the "Page Status" was "Active". 
I can get into CSTM, run logtool and use the 'CL' command to 
clear the log. 
But with the above Deallocated Status, ClearLog cmd does not work. 

I run a home-made STM diagnostic job every day to check the hardware 
status and I am getting tired of looking at this entry. 

The system has been rebooted twice since the memory error. 

Any idea how to clear this memory log? 

Thanks in Advance, 

~Robert 

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2