Thanks for the responses.
I rebooted again today and found out there is no PDT option under the
Service Menu.
I found a reference to deleting the log file called memlog.
It worked. Now STM tells me the memory log is empty.
I will keep monitoring and replace the memory stick if it shows more
problems.
Procedure used to re-create a new memlog:
Xeq Sh.Hpbin.Sys - L (get into posix)
cd /var/stm/logs/os (change dir where the logs are)
ls -l (look at the files)
rm memlog.old (if you have an old one saved)
mv memlog memlog.old (rename)
touch memlog (create a new file)
chmod 644 memlog (set the attributes)
ls -l
exit
Btw, the same thing can be done for "log1.raw.cur".
Thanks again and Cheers!
~Robert
-----Original Message-----
From: HP-3000 Systems Discussion [mailto:[log in to unmask]] On Behalf
Of Gary S Robillard
Sent: Tuesday, February 23, 2010 8:17 AM
To: [log in to unmask]
Subject: Re: How to clear Memory Log
Hello All,
I don't believe that the 928LX has a PDT (Page Deallocation Table) in PDC
(Processor Dependent Code). So the deallocation is due to the logs in the
memlog file.
Since the DIMM appears to have a solid repeatable single-bit error, it is
going to keep
being added to the memlog file and being deallocated by memory management
software.
You might want to consider replacing the DIMM with the error, or adding some
thresholding to your CSTM job ...
The PDT article in the MPE 5.0 Communicator:
(Note the last paragraph " MPD and Current Systems ", as it explains how the
pages are deallocated on systems without a PDT):
Chapter 10 Technical Articles Memory Page Deallocation (MPD) Steve Flynn
Systems Technology Division MPD and Current Systems This article presents an
overview of Memory Page Deallocation, a new
feature available with MPE/iX Release 5.0. It does not cover detailed
operation.
When an HP 3000 is upgraded to MPE/iX 5.0, it also benefits from the MPD
software. Most of the MPD operations described below operate in a
similar manner. Please refer to the last section of this article for a
discussion of the minor exceptions to MPD operation. Memory Failures. Memory
boards are subject to two types of failures, hard errors and soft
errors. Hard errors are caused by a single chip failure within a memory
board, causing failures on all words associated with that chip. Soft
errors occur when a bit within a word changes value. This is typically
caused by decaying alpha particles from the surrounding casing material
on the chip.
HP's current memory design is single-bit correct, double-bit detect. It
is important to note that our ECC design does not perform error
correction on the memory cell itself, but fixes the value in the cache
line. The memory cell still contains the failure. If this is a soft
failure, the data in memory is corrected when the cache line is written
back to memory. If this is a hard failure, the memory cell is always in
error.
In either case, if another failure were to occur on the same word, it
would go from single-bit correct to double-bit detect and cause the
system to fail the next time the word is read. The purpose of page
deallocation is to permanently remove those pages from memory that
contain single or double bit errors. Components of MPD MPD provides a
mechanism where memory pages containing errors can be made
unavailable for system use. A memory page is 4k bytes in size and is
deallocated if it contains one of the following errors:
* Solid single-bit error
* A soft failure re-occurring within a 24-hour period
* A double-bit error
Numerous system components work together to implement memory page
deallocation: Page Deallocation Table (PDT). This is a table that contains
an entry for each memory page that has been
deallocated, at some point in time, due to an error. Each entry contains
the address and the nature of the error (single or double-bit).
One important feature of this table is that it is implemented in
Non-Volatile RAM, thus preserving deallocated pages between system boots.
NOTE Older systems do not implement the PDT.
Memory Selftest. Each time the system is reset, the memory selftest
executes. If it finds
a double-bit error, the address is entered into the PDT along with the
fact that this was a double-bit error. MEMLOGP. The Memory Logging Process,
MEMLOGP, is a process that periodically
(every hour by default) checks the status of each memory controller on
the system for occurrences of single-bit errors. MEMDIAG/LOGTOOL.
Information about deallocated pages is kept in two places, the PDT, which
is NVRAM based, and the MEMLOGP memory log file, which is disk based.
MEMDIAG and LOGTOOL can be used to display the contents of the memory
logfile. Information such as memory board slot number, physical address,
page number and error type is displayed. The size of the PDT and number
of entries currently in the table are also displayed. O/S Memory Manager.
The O/S memory manager is involved during two phases, system boot and
while the system is running.
During the early portion of boot, the memory manager reads the PDT and
deallocate any pages found there.
Once the system is up, the memory manager provides services to MEMLOGP to
allow pages to be deallocated online. Predictive. HP Predictive Support
analyzes internal error logs on disk drives, system
log files and memory logs for error trends. When an error rate exceeds
its threshold, an EVENT is generated. HP Response Center Engineers and
Customer Engineers analyze event information and take appropriate action
to solve the problem.
MEMSCAN is a software module within Predictive which scans system memory
log files. MEMSCAN provides page deallocation trending information to
support engineers such as PDT table size status and identification of
boards or banks that have a significant number of pages deallocated.
Bank deallocation or board replacement recommendations occur if the total
number of deallocated pages exceeds a certain threshold. GENERAL OPERATION
PD comes into effect while the system is being started as well as when it
is online.
During system startup, memory is tested and any pages with bad locations
are made unavailable to the system.
While the system is online an attempt is made to correct memory locations
containing soft errors (scrubbing) and deallocated pages online, that
contain solid errors. System Startup. The following shows the general system
startup flow that occurs with
respect to MPD.
1. Memory selftest executes. If any double-bit errors are discovered
during testing, and there is not an entry in the PDT corresponding
to this address, an entry is made.
2. During the boot process, the Operating System obtains the contents
of the PDT. Each page in the PDT are made unavailable for
allocation by the system's memory manager.
3. MEMLOGP reads the PDT and add any new PDT entries (discovered by
selftest) which are not contained in the memory logfile. Online
Operation. The following shows the operation of MPD while the system is
online.
1. MEMLOGP wakes up and reads the memory controller status register
and determines whether a single-bit error has been logged.
2. MEMLOGP requests the O/S memory manager to release the page for
testing.
3. If the O/S cannot release the page, MEMLOGP logs the error in the
memory log file as it does today.
4. If the O/S does release the page, MEMLOGP performs a scrubbing
operation (write/read test) on the page.
5. If the single-bit error is reproduced (hard error), the page is
entered into the PDT and memory log file. A request is made to
the O/S memory manager to make this page unavailable for system
use.
6. If the single-bit error is not reproduced (soft error) and another
soft error WAS DETECTED at this location within 24 hours, the page
is entered into the PDT and memory log file. A request is made to
the O/S memory manager to make this page unavailable for system
use. MPD and Current Systems The one exception to MPD operation is
that older systems were not
designed with a Page Deallocation Table. Because of this, the system
startup routine is slightly different. During system startup if the
memory selftest detects a double-bit error, the system does not boot
(same operation as today), unlike the 3000 991/995. But, while the
system was running, MEMLOGP was keeping track of deallocated pages in its
disk-based memory log file. During startup, these pages are deallocated
before the system comes up.
Thanks,
Gary Robillard
----- Original Message -----
From: "Raymond D Legault" <[log in to unmask]>
To: [log in to unmask]
Sent: Tuesday, February 23, 2010 6:43:23 AM GMT -07:00 US/Canada Mountain
Subject: Re: [HP3000-L] How to clear Memory Log
HP3000 A/N-Class - How to clear entries in PDT table?
DocId: MPEKBRC00017083 Updated: 7/20/05 4:01:00 AM
PROBLEM
What is the procedure to clear the Page Deallocation Table (PDT) on the
HP3000 A-Class and N-Class series servers?
For a brief summary of Page Deallocation Table (PDT), refer to the document
ID
TCKBCA00000264 (Enabling/Disabling/Verifying Page Deallocation Table).
CONFIGURATION
A-Class N-Class
RESOLUTION
Shut system down.
Restart system to the Boot Menu.
At the Main Menu prompt, enter service.
At the Service Menu prompt, enter pdt to display the PDT entries.
At the Service Menu prompt, enter pdt clear to clear entries;
the following is displayed:
Execution of this command will clear the Page Deallocation Table and then
hard boot the system (memory will be reconfigured on boot) Continue? (Y/N) >
Enter y
; the following is displayed: Resetting ... .. ..
********** VIRTUAL FRONT PANEL
********** SYSTEM BOOT DETECTED LEDs :
RUN ATTENTION FAULT REMOTE POWER FLASH OFF OFF ON ON LED state:
Running non-OS code (i.e. BOOT OR DIAGNOSTICS).
Next, the Main Menu is displayed. From now on, restart the system like you
normally do.
Ray
-----Original Message-----
From: Robert Mpe [mailto:[log in to unmask]]
Sent: Monday, February 22, 2010 2:22 PM
Subject: How to clear Memory Log
How to clear Memory Log
Friends,
I have 928LX on 6.5 PP3 with all patches applied.
We had a memory error 2 weeks ago:
Memory Controller in Slot 3A
==========================================================
Slot: 3A
Error Type: Single/hard: solid, repeatable single-bit error.
Page Status: Deallocated: page is no longer in use.
Bit Num / Bank: 29 / 0
Logged By: Memlogd
First Detected: Sat Feb 6 14:08:18 2010
Last Detected: Sat Feb 6 14:10:18 2010
Error Count: 2
Error Addr: 0x4cd1068
==========================================================
I have had other memory errors that the "Page Status" was "Active".
I can get into CSTM, run logtool and use the 'CL' command to
clear the log.
But with the above Deallocated Status, ClearLog cmd does not work.
I run a home-made STM diagnostic job every day to check the hardware
status and I am getting tired of looking at this entry.
The system has been rebooted twice since the memory error.
Any idea how to clear this memory log?
Thanks in Advance,
~Robert
* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
|