HP3000-L Archives

February 2010, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Gary Robillard <[log in to unmask]>
Reply To:
Gary Robillard <[log in to unmask]>
Date:
Tue, 23 Feb 2010 22:50:29 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (339 lines)
Hi Robert,

 

You could also run stmshut.diag.sys, delete the memlog file, then run stmstart.diag.sys

 

When diagmond restarts the memlogd process, if the memlog file does not exists it should be created.

I just prefer to have the memlogd process stopped instead of deleting the file out from under it...

 

Thanks,

Gary
 
> Date: Tue, 23 Feb 2010 14:25:32 -0800
> From: [log in to unmask]
> Subject: Re: How to clear Memory Log
> To: [log in to unmask]
> 
> Thanks for the responses.
> 
> I rebooted again today and found out there is no PDT option under the
> Service Menu.
> 
> I found a reference to deleting the log file called memlog.
> 
> It worked. Now STM tells me the memory log is empty.
> 
> I will keep monitoring and replace the memory stick if it shows more
> problems.
> 
> Procedure used to re-create a new memlog:
> 
> Xeq Sh.Hpbin.Sys - L (get into posix)
> cd /var/stm/logs/os (change dir where the logs are)
> ls -l (look at the files)
> rm memlog.old (if you have an old one saved)
> mv memlog memlog.old (rename)
> touch memlog (create a new file)
> chmod 644 memlog (set the attributes)
> ls -l
> exit
> 
> Btw, the same thing can be done for "log1.raw.cur".
> 
> Thanks again and Cheers!
> 
> ~Robert
> 
> 
> -----Original Message-----
> From: HP-3000 Systems Discussion [mailto:[log in to unmask]] On Behalf
> Of Gary S Robillard
> Sent: Tuesday, February 23, 2010 8:17 AM
> To: [log in to unmask]
> Subject: Re: How to clear Memory Log
> 
> Hello All, 
> 
> 
> 
> I don't believe that the 928LX has a PDT (Page Deallocation Table) in PDC 
> 
> (Processor Dependent Code).  So the deallocation is due to the logs in the
> memlog file. 
> 
> 
> 
> Since the DIMM appears to have a solid repeatable single-bit error, it is
> going to keep 
> 
> being added to the memlog file and being deallocated by memory management
> software. 
> 
> 
> 
> You might want to consider replacing the DIMM with the error, or adding some
> 
> 
> thresholding to your CSTM job ... 
> 
> 
> 
> The PDT article in the MPE 5.0 Communicator: 
> 
> 
> 
> (Note the last paragraph " MPD and Current Systems ", as it explains how the
> 
> 
>  pages are deallocated on systems without a PDT): 
> 
> Chapter 10 Technical Articles Memory Page Deallocation (MPD) Steve Flynn
> Systems Technology Division MPD and Current Systems This article presents an
> overview of Memory Page Deallocation, a new
> feature available with MPE/iX Release 5.0. It does not cover detailed
> operation.
> 
> When an HP 3000 is upgraded to MPE/iX 5.0, it also benefits from the MPD
> software. Most of the MPD operations described below operate in a
> similar manner. Please refer to the last section of this article for a
> discussion of the minor exceptions to MPD operation. Memory Failures. Memory
> boards are subject to two types of failures, hard errors and soft
> errors. Hard errors are caused by a single chip failure within a memory
> board, causing failures on all words associated with that chip. Soft
> errors occur when a bit within a word changes value. This is typically
> caused by decaying alpha particles from the surrounding casing material
> on the chip.
> 
> HP's current memory design is single-bit correct, double-bit detect. It
> is important to note that our ECC design does not perform error
> correction on the memory cell itself, but fixes the value in the cache
> line. The memory cell still contains the failure. If this is a soft
> failure, the data in memory is corrected when the cache line is written
> back to memory. If this is a hard failure, the memory cell is always in
> error.
> 
> In either case, if another failure were to occur on the same word, it
> would go from single-bit correct to double-bit detect and cause the
> system to fail the next time the word is read. The purpose of page
> deallocation is to permanently remove those pages from memory that
> contain single or double bit errors. Components of MPD MPD provides a
> mechanism where memory pages containing errors can be made
> unavailable for system use. A memory page is 4k bytes in size and is
> deallocated if it contains one of the following errors:
> 
> * Solid single-bit error
> 
> * A soft failure re-occurring within a 24-hour period
> 
> * A double-bit error
> 
> Numerous system components work together to implement memory page
> deallocation: Page Deallocation Table (PDT). This is a table that contains
> an entry for each memory page that has been
> deallocated, at some point in time, due to an error. Each entry contains
> the address and the nature of the error (single or double-bit).
> 
> One important feature of this table is that it is implemented in
> Non-Volatile RAM, thus preserving deallocated pages between system boots. 
> NOTE Older systems do not implement the PDT. 
> Memory Selftest. Each time the system is reset, the memory selftest
> executes. If it finds
> a double-bit error, the address is entered into the PDT along with the
> fact that this was a double-bit error. MEMLOGP. The Memory Logging Process,
> MEMLOGP, is a process that periodically
> (every hour by default) checks the status of each memory controller on
> the system for occurrences of single-bit errors. MEMDIAG/LOGTOOL.
> Information about deallocated pages is kept in two places, the PDT, which
> is NVRAM based, and the MEMLOGP memory log file, which is disk based.
> 
> MEMDIAG and LOGTOOL can be used to display the contents of the memory
> logfile. Information such as memory board slot number, physical address,
> page number and error type is displayed. The size of the PDT and number
> of entries currently in the table are also displayed. O/S Memory Manager.
> The O/S memory manager is involved during two phases, system boot and
> while the system is running.
> 
> During the early portion of boot, the memory manager reads the PDT and
> deallocate any pages found there.
> 
> Once the system is up, the memory manager provides services to MEMLOGP to
> allow pages to be deallocated online. Predictive. HP Predictive Support
> analyzes internal error logs on disk drives, system
> log files and memory logs for error trends. When an error rate exceeds
> its threshold, an EVENT is generated. HP Response Center Engineers and
> Customer Engineers analyze event information and take appropriate action
> to solve the problem.
> 
> MEMSCAN is a software module within Predictive which scans system memory
> log files. MEMSCAN provides page deallocation trending information to
> support engineers such as PDT table size status and identification of
> boards or banks that have a significant number of pages deallocated.
> Bank deallocation or board replacement recommendations occur if the total
> number of deallocated pages exceeds a certain threshold. GENERAL OPERATION
> PD comes into effect while the system is being started as well as when it
> is online.
> 
> During system startup, memory is tested and any pages with bad locations
> are made unavailable to the system.
> 
> While the system is online an attempt is made to correct memory locations
> containing soft errors (scrubbing) and deallocated pages online, that
> contain solid errors. System Startup. The following shows the general system
> startup flow that occurs with
> respect to MPD.
> 
> 1. Memory selftest executes. If any double-bit errors are discovered
> during testing, and there is not an entry in the PDT corresponding
> to this address, an entry is made.
> 
> 2. During the boot process, the Operating System obtains the contents
> of the PDT. Each page in the PDT are made unavailable for
> allocation by the system's memory manager.
> 
> 3. MEMLOGP reads the PDT and add any new PDT entries (discovered by
> selftest) which are not contained in the memory logfile. Online
> Operation. The following shows the operation of MPD while the system is
> online.
> 
> 1. MEMLOGP wakes up and reads the memory controller status register
> and determines whether a single-bit error has been logged.
> 
> 2. MEMLOGP requests the O/S memory manager to release the page for
> testing.
> 
> 3. If the O/S cannot release the page, MEMLOGP logs the error in the
> memory log file as it does today.
> 
> 4. If the O/S does release the page, MEMLOGP performs a scrubbing
> operation (write/read test) on the page.
> 
> 5. If the single-bit error is reproduced (hard error), the page is
> entered into the PDT and memory log file. A request is made to
> the O/S memory manager to make this page unavailable for system
> use.
> 
> 6. If the single-bit error is not reproduced (soft error) and another
> soft error WAS DETECTED at this location within 24 hours, the page
> is entered into the PDT and memory log file. A request is made to
> the O/S memory manager to make this page unavailable for system
> use. MPD and Current Systems The one exception to MPD operation is
> that older systems were not
> designed with a Page Deallocation Table. Because of this, the system
> startup routine is slightly different. During system startup if the
> memory selftest detects a double-bit error, the system does not boot
> (same operation as today), unlike the 3000 991/995. But, while the
> system was running, MEMLOGP was keeping track of deallocated pages in its
> disk-based memory log file. During startup, these pages are deallocated
> before the system comes up. 
> 
> 
> 
> 
> 
> 
> 
> Thanks, 
> 
> Gary Robillard  
> ----- Original Message ----- 
> From: "Raymond D Legault" <[log in to unmask]> 
> To: [log in to unmask] 
> Sent: Tuesday, February 23, 2010 6:43:23 AM GMT -07:00 US/Canada Mountain 
> Subject: Re: [HP3000-L] How to clear Memory Log 
> 
> HP3000 A/N-Class - How to clear entries in PDT table? 
> 
> DocId: MPEKBRC00017083   Updated: 7/20/05 4:01:00 AM 
> 
> PROBLEM 
> 
> What is the procedure to clear the Page Deallocation Table (PDT) on the 
> HP3000 A-Class and N-Class series servers? 
> 
> For a brief summary of Page Deallocation Table (PDT), refer to the document
> ID 
> TCKBCA00000264 (Enabling/Disabling/Verifying Page Deallocation Table). 
> 
> 
> CONFIGURATION 
> 
> A-Class N-Class 
> 
> RESOLUTION 
> 
> Shut system down. 
> Restart system to the Boot Menu. 
> At the Main Menu prompt, enter service. 
> At the Service Menu prompt, enter pdt to display the PDT entries. 
> At the Service Menu prompt, enter pdt clear to clear entries; 
> the following is displayed: 
> Execution of this command will clear the Page Deallocation Table and then
> hard boot the system (memory will be reconfigured on boot) Continue? (Y/N) >
> 
> Enter y 
> ; the following is displayed: Resetting ... .. .. 
> ********** VIRTUAL FRONT PANEL 
> ********** SYSTEM BOOT DETECTED LEDs : 
> RUN ATTENTION FAULT REMOTE POWER FLASH OFF OFF ON ON LED state: 
> Running non-OS code (i.e. BOOT OR DIAGNOSTICS). 
> Next, the Main Menu is displayed. From now on, restart the system like you
> normally do.   
> 
> 
> Ray 
> 
> -----Original Message----- 
> From: Robert Mpe [mailto:[log in to unmask]] 
> Sent: Monday, February 22, 2010 2:22 PM 
> Subject: How to clear Memory Log 
> 
> How to clear Memory Log 
> 
> Friends, 
> 
> I have 928LX on 6.5 PP3 with all patches applied. 
> 
> We had a memory error 2 weeks ago: 
> 
>   Memory Controller in Slot 3A 
>   ========================================================== 
>   Slot:             3A 
>   Error Type:       Single/hard: solid, repeatable single-bit error. 
>   Page Status:      Deallocated: page is no longer in use. 
>   Bit Num / Bank:   29 / 0 
>   Logged By:        Memlogd 
>   First Detected:   Sat Feb  6 14:08:18 2010 
>   Last Detected:    Sat Feb  6 14:10:18 2010 
>   Error Count:      2 
>   Error Addr:       0x4cd1068 
>   ========================================================== 
> 
> I have had other memory errors that the "Page Status" was "Active". 
> I can get into CSTM, run logtool and use the 'CL' command to 
> clear the log. 
> But with the above Deallocated Status, ClearLog cmd does not work. 
> 
> I run a home-made STM diagnostic job every day to check the hardware 
> status and I am getting tired of looking at this entry. 
> 
> The system has been rebooted twice since the memory error. 
> 
> Any idea how to clear this memory log? 
> 
> Thanks in Advance, 
> 
> ~Robert 
> 
> * To join/leave the list, search archives, change list settings, *
> * etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
 		 	   		  
* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2