HP3000-L Archives

November 1999, Week 5

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
James Clark <[log in to unmask]>
Reply To:
James Clark <[log in to unmask]>
Date:
Mon, 29 Nov 1999 09:59:15 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (66 lines)
My understanding of memory, not including my own, is that it is short lived.
A quick check of memory many times returns with OK but a extensive check may
return errors. Reason being that the item stored is used before it has a
chance to change. Your memory cards have what is called "refresh cycles"
where the memory controller will go through its memory and refresh the
contents. (still working on good fast memory) If the memory fade is quicker
than the refresh then you start to get errors, which many times is corrected
by parity. ($$$ memory) I believe that the memory manager also keeps a
counter of who is bad and the utility programs check these counters to look
for patterns, which would indicate a memory location about to go bad. Also
the reason for counters is memory can go temporarily bad due to possible
"sun spot activity", someone shaving with a electric razor, the cleaning
people getting to close to the box with their equipment. The list could be
endless depending upon your environment. Thus if the system registers too
many errors, then it downs the memory segment for latter extensive
validation, most likely done at next reboot.
So to make a long story short, you may have seen the contents of your
counters which indicated possible bad memory, but when you rebooted and/or
told the system to validate the memory, everything checked out OK and the
counters were zeroed out for the next time.

James

-----Original Message-----
From: HP-3000 Systems Discussion [mailto:[log in to unmask]]On
Behalf Of Jim Phillips
Sent: Monday, November 29, 1999 8:59 AM
To: [log in to unmask]
Subject: Memory Problems . . . or Not?


Some time ago (back in July I believe), we had a discussion about memory
problems and how to identify them (using SYSDIAG/LOGTOOL/MEMRPT) which I ran
on my system and found that two 64 MB cards had been deallocated because of
"Single/Hard Error" 's.  There followed a discussion about how long my
system had been up and running, what to do about the memory problems, etc.

Well, over the Thanksgiving (US of A) Holiday, in between giving thanks to
my imperialistic ancestors for killing Indians and robbing them blind and
gorging myself to the point of regurgitation (okay, I'm just kidding about
the "imperialistic ancestors" bit, I'm actually of Italian descent
(majority) with some English and German mixed in for good measure - the
gorging part is pretty close to reality), I brought the system down so I
could check the memory boards and see exactly which ones were being
deallocated, my goal being to report to the vendor(s) with the idea of
getting them replaced.

Imagine my chagrin when this morning MEMRPT reports that:

 ===========================================================================
                *** NO MEMORY ERRORS HAVE BEEN DETECTED ***
 ===========================================================================

Hmmmmm.  Is this a turkey-induced hallucination or could I have "fixed" this
problem by removing and reinserting the memory boards?  Has anyone seen
memory errors that disappeared when the board was reseated?  Although I am
happy that we now have the full 512 MB of memory available (we need all we
can get), I am wondering if the problem is truly fixed or just waiting to
reappear during the next ethno-centric holiday.


Jim Phillips                            Manager of Information Systems
E-Mail: [log in to unmask]     Therm-O-Link, Inc.
Phone: (330) 527-2124                   P. O. Box 285
  Fax: (330) 527-2123                   Garrettsville, Ohio  44231

ATOM RSS1 RSS2