HP3000-L Archives

January 2002, Week 3

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Doug Werth <[log in to unmask]>
Reply To:
Doug Werth <[log in to unmask]>
Date:
Tue, 15 Jan 2002 10:44:27 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (122 lines)
Gilles writes:

> The first thing to check is whether or not your system is being bombarded
> by lack of heartbeat signals from your dtc's.
>
> Type:
>
> :linkcontrol @;status=all
> Linkname: DTSLINK   Linktype: IEEE8023  Linkstate: CONNECTED
> Physical Path:              56/56
> Current Station Address:    08-00-09-98-18-D3
> Default Station Address:    08-00-09-98-18-D3
> Current Receive Filter:     broad(1) any(0) k_pckts(1) x_pckts(0)
> Current Multicast Addresses:
>   09-00-09-00-00-01  09-00-09-00-00-02  09-00-09-00-00-03
>   09-00-09-00-00-04
> Transmits no error         2472  Receives no error         7375
> Transmit byte count      332989  Receive byte count      951822
> Transmits error               0  Receives error               0
> Transmits deferred            1  Carrier losses               0
> Transmits 1 retry             0  CRC errors                   0
> Transmits >1 retry            0  Frame losses                 0
> Trans 16 collisions           0  Whole byte errors            0
> Trans late collision          0  Size range errors            0
> 802 chip restarts             0  Receives dropped             0
> Heartbeat losses              0  Receives broadcast        6605
>                                   Receives multicast           0
>
> You should see Heartbeat losses of 0 or very close to 0.

Gilles is correct that you should check for Heartbeat losses on the LAN
card. Heartbeat losses on the system card cause slow network throughput most
notable in large file transfers. But the LINKCONTROL statistics only show
you if the transceiver on the HP3000 system itself is not providing SQE
heartbeat.

Lack of SQE Heartbeat on DTCs can cause system performance problems and is
not reported by the LINKCONTROL command. A DTC 'complains' to the host
system that it is missing SQE. The host system, your HP3000, will log the
heartbeat loss events to special log files stored on LDEV 1. These log
events occur continuously resulting in an I/O bottleneck on the system disk.
On some systems you can actually hear the system disk getting constant
usage.

How do you diagnose if you are subject to this problem? Frequently the
process that is logging the errors appears as the top DISC consumer in SOS
or Glance/iX. Or a system process will continually appear in a list of
active processes as seen in the :SHOWQ command.

   :showq;active

    DORMANT                                   RUNNING

   Q  PIN   JOBNUM                           Q  PIN   JOBNUM

                                             A   39
                                             C  M163  #S9136
                                             C  M183  #S9140
                                             D  U189  #J6036

A stack trace of PIN 39 would look something like this:

   $8 ($a3) nmdebug > pin #40;tr,i,d
          PC=a.0017399c enable_int+$2c
   NM* 0) SP=41643df0 RP=a.00789004
notify_dispatcher.block_current_process+$338
   NM  1) SP=41643df0 RP=a.00870cd8 find_obj_cache_desc+$170
   NM  2) SP=41643d70 RP=a.001baa64 wait_for_active_port+$e8
   NM  3) SP=41643c70 RP=a.001bb6c8 receive_from_port+$544
   NM  4) SP=41643bf0 RP=a.0075f5e4 extend_receive+$494
   NM  5) SP=416439f0 RP=a.00a6cce0 xm_w_commitrecord+$1a0
   NM  6) SP=416438b0 RP=a.00954e2c xm_end_system_trans+$340
   NM  7) SP=416437b0 RP=a.00990fec sm_pin_eof+$448
   NM  8) SP=416436b0 RP=a.00a4245c sm_write_eof+$110     <--------------
   NM  9) SP=41643530 RP=a.00a42688 sm_cntl_64+$104
   NM  a) SP=416434b0 RP=a.00ee0490 tm_control_common+$e2c
   NM  b) SP=416433f0 RP=a.01558b14 tm_ord_fix_buf_disc+$250
   NM  c) SP=416432f0 RP=a.0143676c fcontrol_nm+$fa8
   NM  d) SP=41643230 RP=a.01435790 ?fcontrol_nm+$8
            export stub: a.01436f7c FCONTROL+$50         <----------------
   NM  e) SP=41642d70 RP=a.01436ef8 ?FCONTROL+$8
            export stub: a.01e9440c tio_dtcm.p_write_eof+$f0
   NM  f) SP=41642cf0 RP=a.01ebff1c tio_dtcm.x_log+$638   <---------------
   NM 10) SP=41642c70 RP=a.01ec51a0 tio_dtcm+$42a0
   NM 11) SP=416424b0 RP=a.01ec0eec ?tio_dtcm+$8
            export stub: a.00748d74 io_receive+$e0
   NM 12) SP=41642330 RP=a.0074c818 io_mgr_process+$320
   NM 13) SP=416422b0 RP=a.0099c358 outer_block+$154
   NM 14) SP=41642130 RP=a.00000000
        (end of NM stack)

Reviewing the stack points to performance problem. Not only is this process
logging the heartbeat loss events, it is forcing a post of the records to
disk immediately via the FCONTROL. This is where the performance problem
lies.

Another method to investigate if you have this problem is to check for the
log files themselves. The system will write one set of log file for each DTC
configured on the system. The names of the log files are HxxxxxxA.PUB.SYS
and HxxxxxxB.PUB.SYS where 'xxxxxx' represents the last 6 characters of the
12-digit Ethernet/MAC address of the DTC. For instance, if the MAC address
08-00-09-00-75-BD then the file name will be H0075BDA.PUB.SYS.

   :listf [log in to unmask],2
   ACCOUNT=  SYS         GROUP=  PUB

   FILENAME  CODE  ------------LOGICAL RECORD-----------  ----SPACE----
                     SIZE  TYP        EOF      LIMIT R/B  SECTORS #X MX

   H0075BDA*           1W  FB           5      66010   1      256  1  *

If the EOF of this file is very large then you should verify the SQE
settings on the transceiver connected to that DTC.

Doug.

Doug Werth                             Beechglen Development Inc.
[log in to unmask]                               Cincinnati, Ohio

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2