HP3000-L Archives

May 1995, Week 2

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Eero Laurila <[log in to unmask]>
Reply To:
Eero Laurila <[log in to unmask]>
Date:
Sat, 13 May 1995 03:11:37 GMT
Content-Type:
text/plain
Parts/Attachments:
text/plain (267 lines)
This article is lengthy but for those interested in insides of VT it may
be interesting reading material.  It is the same I sent to our internal
newsletter 95/04/21.
 
The bottom line is that the problem is a defect that has been in VT ever
since MPE/XL 1.0.  We did not hit the problem in CSY regardless of very,
_VERY_ extensive testing during the entire 5.0 build and test cycle.
Of course, it raised it's ugly head right after the MR date...  The
time we started seeing it was after updating from X.50.50 to "rename"
build C.50.00.   Fix information at the end of the article.
 
Cheers,
:-)  Eero Laurila - HP CSY Networking lab, NS services, VT.
 
 
 
VT ghosts under DSDAD on 5.0 push, session gone - SR#4701-285262
By:  Eero Laurila - CSY Networking lab, NS services.
 
 
As of 5.0 push release we have seen some sites experiencing the
all-infamous problem of ghost vtservers left under dsdad.  Although
it may seem that some previous release ghost-fix was probably dropped,
that's not the case.  The problem has been identified and fixed, see
fix info at the end of this message.  Neither the 5.0 push nor pull
patch is general release yet - however, I wanted to bring this to your
attention to avoid re-inventing the wheel.
 
The problem was first seen intermittently with one 5.0 push beta-customer
and later on (after 5.0 push MR) CSY's High-speed NS test ring started
hitting it somewhat consistently.  This was the main source of the data
and helped us to nail down the problem quickly.
 
Please contact Network Expert Center for/before obtaining the patch so
that we can track beta-test cycle and make patches GR when appropriate.
Instructions on how to identify this problem attached.
 
 
PROBLEM CAUSE:
-------------
The problem is a defect in vt-code that has existed ever since vt was
written - lack of information exchange between the vtserver process
and vt-ldm.  The reason we are seeing this on 5.0 is something that
changed in HLIO and device close processing (timing).
 
When vt_ldm gets a (write) request from HLIO, it'll build a vts_io_
request(write) msg, put it in it's send buffer (if enough space) and
if buffer not full, set a timer for 0.5 seconds to wait for more data
from HLIO (i.e. try to concatenate many messages in one send for
better performance).  This happens with all fwrites to vt_ldm.
 
In cases of these hangs,  vt_ldm had received a write request from
HLIO to print the logoff banner "CPU=5. Connect=...".  However, before
sending this out,  the device_close came down from HLIO, so the vt_ldm
built a vt-termination request message and concatenated it to the tail
of the write message - i.e. the buffer was like this:
 
   +-------------------------------------+-----------------------+
   | VTS-write request for logoff banner | VTS-terminate request |
   +-------------------------------------+-----------------------+
 
..and now the ldm passed the pointer to and length of this concatenated
message buffer to vtserver.
 
VTSERVER tries to figure out if the ldm is sending out a termination
request by checking the message buffer for VTS-msg type/primitive,
since it needs to initiate termination processing as well.  However,
this code has always been in fault and it has only checked the msg
type/primitive field of the FIRST MESSAGE ONLY(!).  It has never tried
to traverse the entire buffer and check for all messages.  As such,
in all cases where the termination request has not been the first
message in the buffer, the vtserver (in all previous versions/releases)
has never seen the termination requested by the ldm.
 
 
HOW TO IDENTIFY THIS PROBLEM:
-----------------------------
The ghost vtserver pin stack trace is the typical of an idle vtserver:
 
$23c ($4d) nmdebug > pin #77
$23d ($4d) nmdebug > tr,i,d
       PC=a.0016d70c enable_int+$2c
NM* 0) SP=41836fb0 RP=a.0029aa5c notify_dispatcher.block_current_process+$2f0
NM  1) SP=41836fb0 RP=a.0029cea8 notify_dispatcher+$264
NM  2) SP=41836f30 RP=a.0019dbd0 wait_for_active_port+$ec
NM  3) SP=41836e30 RP=a.0019e850 receive_from_port+$534
NM  4) SP=41836db0 RP=a.003341e8 extend_receive+$494
NM  5) SP=41836bb0 RP=a.00e78ff4 nowait_io_comp.get_any_io+$84
NM  6) SP=41836a30 RP=a.00e7a098 nowait_io_comp+$2c8
NM  7) SP=41836830 RP=a.001f1cdc ?nowait_io_comp+$8
         export stub: a.00252818 IOWAIT+$bc
NM  8) SP=418363b0 RP=a.00252748 ?IOWAIT+$8
         export stub: 2c4.0002f508 wait_for_completion+$98
NM  9) SP=418362f0 RP=2c4.00032b94 main+$90
NM  a) SP=41836230 RP=2c4.00032a30 ?main+$8
         export stub: 2c4.000189c4 _start+$138
NM  b) SP=418361b0 RP=2c4.00000000
     (end of NM stack)
 
Before going further, find the vtserver version number and open the
correct symbol-file giving it the name of "NSSRVSOM".  Naming has
changed recently and is far more straightforward, however, the 5.0
push MR'd version has still the files from the old build process
and as such correlation between vt versions and symbol files is less
obvious.  Let it suffice to say that the version number you are likely
to see in 5.0 push is B0011001 or B0011002 for vtserver and the symbol
file that was sent out is SVTBB001.NSRV0000.TELESUP.  To obtain the
version number you can either run the customers' vtserver.net.sys
program file, use "nmmaint,6" or locate it in the dump.
 
For any patched versions (recent 4.0 patches, all 5.0 pull and push) the
symbol-file naming convention for VT symbols is SVTnnnnn.NSRV0000.TELESUP,
where "nnnnn" is the last 5 digits of the vtserver program version number.
A word of warning - VT has gone through several major changes impacting
data structures all over the place - I cannot stress enough the importance
of having the correct symbol file open.  Please make sure that from now
on you'll always have matching symbol file when reading a dump - they are
distributed and installed with NSS-patches as well as NS services macros -
see patch info-file(s).
 
Once you have the correct symbol file open, proceed from here:
 
 
  1) Locate the vtserver process and it's header record.  The pointer
     to header record is kept at vtserver's entry_dp+14 (or entry_dp+8,
     changed (around(?)) 4.0 version B000809B.  On 5.0 pull and push
     the released version has the header at entry_dp+14, all patches
     have it at entry_dp+8.  The way to make sure you're at the right
     spot is that the first two fields at the pointed location should
     be dsdad's port id and pin number - see sample below:
 
     nmdebug > fv [ns_vt_entry_dp+14] 'nssrvsom:headerrecord'
     RECORD
        DSDADPORTID : ffff9e11
        DSDADPIN    : 5e
        NUMSERVICES : 1
        WAITEDHEAD  : 452d0298
        TSBANCHOR   : 452d02b8
        STACKLOG    : 452d0018
     END
 
  2) From header record, get the WAITEDHEAD pointer and format the
     waitentry-chain until you get to "ENTRYTYPE : SERVERPORT"
 
     ...traverse waitentry-chain until...
     nmdebug > fv 467c64a8 'nssrvsom:waitentry'
     RECORD
        NEXTENTRY  : 0
        ENTRY_SIZE : 20
        DESCRIPTOR : b
        DATALENGTH : 10
        IOCCODE    : 2
        CSTATION   : 0
        ENTRYTYPE : SERVERPORT
           PORT :
              ID     : ffff7f91
              MSGBUF : 467c64c8
     END
 
 
  3) Using the MSGBUF address from above, you can format the vt_ldm's
     ipcsend request message as follows (note that I added 8 to the
     MSGBUF address to skip the NS message header stuff):
 
     nmdebug > fv 467c64c8+8 'nssrvsom:help_msg_type,1'
 
     PACKED RECORD
        MSG_LENGTH : e
        MSG_TYPE   : 5     <- ipcsend request
        AM_ENV_ID  : 1
           SEND_REQ_ID    : 1
           REPLY_SUBQUEUE : 3
           FLAGS          : [ ]
           PACKET_LENGTH  : 4e           <<-- note the length
           MSG            : b.8707076c   <<-- and address
     END
 
  4) Message address (MSG) points to vt_ldm's vts-msg buffer and
     all messages there are ready-to-send vts-format messages.
     Formatting the first message shows:
 
     nmdebug > fv b.8707076c 'nssrvsom:msg_header'
     PACKED RECORD
        PACKET_LENGTH : 42
        PROTOCOL_ID   : 2
        MESSAGE_TYPE  : 2  <<--+-- VTS-write request
        NULL          : 0      |
        PRIMITIVE     : 1  <<--+
        REQUEST_COUNT : 7c
     END
 
 
     ...now you're starting to see something that gives a clue.  The
     first message at pointed location is of length $42 bytes, although
     the entire ipcsend request from vt_ldm had a length of $4e... i.e.
     there is still some #12 more bytes in the buffer.  Formatting the
     next message can be done using the same type, just add the first
     msg length to the pointer:
 
     nmdebug > fv b.8707076c+42 'nssrvsom:msg_header'
     PACKED RECORD
        PACKET_LENGTH : c
        PROTOCOL_ID   : 2
        MESSAGE_TYPE  : 0  <<--+-- VTS-terminate request
        NULL          : 0      |
        PRIMITIVE     : 2  <<--+
        REQUEST_COUNT : 7d
     END
 
 
     There may or may not be more messages in the buffer and the way you
     know you've seen them all is to add packet lengths together and once
     your individual packet lengths together equal to the IPCSEND request
     length from vt_ldm, you have seen them all.
 
     FYI,  VTS-terminate request is of following type, #12 bytes:
 
     nmdebug > ft 'terminate_request' m
 
     TERMINATE_REQUEST =
        PACKED RECORD
        HEADER : MSG_HEADER ($0.0 @ 8.0);
        KIND   : BYTE ($8.0 @ 1.0);
        NULL   : BYTE ($9.0 @ 1.0);
        REASON : SHORTINT ($a.0 @ 2.0);
        END ($0.0 @ c.0)
 
 
     If you find an ipcsend request from vt_ldm that has a buffer where
     the termination request message is anything else than the first
     message in the buffer, you have hit this problem.
 
 
FIX INFORMATION TEXT (from SR):
-------------------------------
Since in makes no sense to change VTSERVER to search through the
entire ldm's send buffer (which can contain up to 2.5 Kbytes of data,
i.e. over 200 msgs can be concatenated to search - at EVERY ipcsend),
another method was chosen.
 
VT_LDM now has a flag "terminate_request_in_buffer" in it's pda that
gets set when ldm builds a termination request.  When the ldm later
on builds the ipcsend request,  it'll check the pda flag setting and
if set, it'll use a flag (borrowed from ipcsend flags) to indicate
that one of the messages in the msg buffer is a termination request.
 
This saves the effort from vtserver to search the buffer, just check
the flag and if set, initiate termination processing.  Before doing
the actual ipcsend, vtserver removes the "borrowed" flag so that
netipc never sees it - it only exists between vt_ldm and vtserver.
Also, this is a foolproof way, vtserver will no longer miss ldm's
termination requests.
 
In order to make sure that changes in vt_ldm don't break SNA/DHCF
which uses the same ldm, the "borrowed" flag will only be used if
the server process is a vtserver.
 
Fix in versions/patches as of 4/11/95:
 
4.0     : none
4.5     : none
5.0 pull: '3A' patch NSSDDQ0, vtserver and ldm versions B001003A
5.0 push: '0A' patch NSSDDQ3, vtserver and ldm versions B001100A
 
Submitter-Name              Cert-Date Cert-Doc-ID          Document-ID
CSY       DISTRIBUTION       95/04/21 CSYB0015540005       001554:0005

ATOM RSS1 RSS2