This morning I was paged by a remote user, two timezones away, about not
being able to connect to our 969/220. I had to drive into work, after I
was unable to connect from home.
The last message on the console was from yesterday, around 6:00 PM, there
was some sporadic disc activity, it was alternating between 'FFFF' and
'F1FF' on the status display, but the console was not responding. I
finally had to reboot the machine. The server came up without any errors
and seemed to work fine. Then, about 2 hours later while I was looking
through $stdlists and log files, if 'froze up' again (this time the display
alternated between 'FFFF' and 'F6FF). I had to reboot again, this time I
took a memory dump, ran 'fscheck' (it came back clean), condensed a few of
the discs, and then allowed the users back on. The system has been running
for over 3 hours now, without any 'ill' effects.
I've examined the log file created between the two reboots (fewer records)
and noticed about 80 errors similar to the one below (the subsystem varied
between 111 and 134), does anyone have any ideas?
==============================================================================
WED, MAY 12, 2004 6:50 AM LOG2597.PUB.SYS SYSTEM (PIN
90)
I/O ERROR
PRODUCT NAME: PDEV:
LDEV: DEVICE CLASS: 15
I/O EVENT CLASS: Software LLIO STATUS: $00050086
MPE/XL I/O Status: Proc. Num. = 5, Error Num. = 0, Subsystem = 134
RETRY SCHEME: Summarized Retries WILL RETRY: NO
I/O RESULT: I/O Failed RUN AUTODIAG: NO
RETRY COUNT: 0 MGR PORT NUM.: $FFFFFA49
TRANS. NUM.: $0 # HDWR BYTES: 0
HARDWARE STATUS:
No hardware status was logged.
DATA LEN: 72 MGR CODE: 134
TAG DEFINITION NOT GIVEN - FIELD WILL BE DISPLAYED IN HEX:
1 2 3 4 5 6 7 8 9 10 11 12
== == == == == == == == == == == ==
1: 03 00 02 01 00 05 00 86 02 39 00 00 . . . . . . . . . 9
. .
13: 00 A5 00 00 00 00 00 02 00 06 00 00 . . . . . . . . . .
. .
25: 00 00 00 00 00 00 10 00 00 00 00 00 . . . . . . . . . .
. .
37: 00 00 10 00 8D 11 A0 00 00 29 CD DC . . . . . . . . . )
. .
49: 02 00 05 89 00 00 0B B8 00 B4 00 7A . . . . . . . . . .
. z
61: 00 00 00 00 00 00 00 00 40 40 00 00 . . . . . . . . @ @
. .
==============================================================================
I also found out that a few hours before the first discovered signs of
trouble, someone began scanning the network using the tool "What is Up
Gold" in an attempt to identify the different hosts on our subnet. The
tool takes an IP address range and tries to get a response using 'ping',
'http' and 'telnet'. The scan was running all night and was terminated
around the same time that my HP3K came up for the second time.
I wonder If anyone knows about that tool and whether they may know of any
problems that could be caused on the HP3K when the above 'protocols' are
thrown at the machine every minute, for about 20 hours.
Regards
Paul Christidis
* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *
|