HP3000-L Archives

April 1999, Week 5

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
Date:
Thu, 29 Apr 1999 00:49:03 -0400
Content-Type:
text/plain
Parts/Attachments:
Re: (198 lines)
Hello Friends @ 3000-L,

Re: Re: How do I kill a hung session ?

Can't wait for ":abortproc" ?  

I usually try to stay away from the argumentative topics on the list, but
this one pulled me in.

</rant-on>

Yes, I work on a NT Workstation and enjoy using Task Managers ability to
kill processes.  I am sure their is no correlation with the fact that this
tool is present and software vendors being less concerned about the quality  
of their software.  It just seems interesting to me that I use Task Manager
to kill looping and hung processes more often on a single user NT machine  
than I use abortjob for any reason on all 3 of my multi user 3000 systems.

Watch out for that Kill... Killing the wrong process on NT is darn xyz
painful... Oh, I hear someone clamoring for an example...   OK, Last
Month I did the Task Manager Kill of a wrong process and hit my mail client
(yes the one I receive 3000-L on), OOP's corrupted directory files, and as
I mentioned already to Jeff Kell, their are those folks to backup and those
who wish they had...  Last good backup 2/18/99, lost 40 days of 3000-L  
messages from all of you good folks... ARGH!  Oh, and here is the one Jeff  
knows about... 3 weeks later I take a power fail, same mail client, Yes, OOP's
corrupted directory files and yes once again Last good backup 2/18/99, lost
30 days of 3000-L message this time... Have I backed up yet ? maybe not, and
can you believe last night I did a del *.* on "c:"  No backup yet, after all  
its is only a PC...  If something goes wrong I can always reboot? right ?   

I am sure their is at least (1) good mail client running on a 3000 where we
have a file system which flushes memory to disk and closes files with out
corrupting them when a process is terminated.  Yes, Lars and several folks  
out here have already pointed me in the right direction...  

If the functionality of abortprocess is implemented on the 3000 then the  
question is: Do you want it to work 100% of the time ???  If the answer is  
Yes, then the answer is also "Yes, I am willing to accept file corruption as
a result of performing an abortprocess on a system process".  I personally
prefer a kinder / gentler abortprocess... and I am not willing to accept
file corruption as a result of abortprocess, kill, abortjob or any other  
action I perform on my system. I guess the 3000 has spoiled me, and yes I
do run backups on my 3000s!

</rant-off>

Now, what's a person to do with these hung sessions, especially the sticky
ones that manage to survive a network restart...

First a little background... :abortjob is a message sent to the CI's message
port... If the CI is impeded on some other resource it is not going to be
able to go back and read it's message port... hence we are hung...  a 2nd
case is CI got the abort message from the port and has sent a kill to all
of it's children process's and one or more of then are impeded on some other
resource and are not able to return a reply to creating process hence the
child and creating process are hung... Their are several/many other variations  
of this and their are multiple resources for which we can be impeded on, but  
this is pretty much what it all is about...    

OK, back to the question: what's a person to do with these hung sessions,
especially the sticky ones that manage to survive a network restart...

#1  HP has worked VERY hard to address session hangs in DTS/DTC Terminal  
    connections, in NS-VT Virtual Terminal connections, and in TELNET
    Terminal connections.

    We at HP invite you to be part of the solution and install the General
    Release patches which include fixes to known problems...

      - NSSFD16 5.5 GENERAL RELEASE (Included on power patch)
      - PTDFD37 5.5 GENERAL RELEASE (Included on power patch)
      - DTCEDT5 5.5 GENERAL RELEASE  
      - DTSFDA3 5.5 GENERAL RELEASE

#2 With minimal effort, it is possible to take a cursory look at a process
   hang (look at all process's in the family) with debug and for HP RC to  
   review this to see if it matches up with a problem we are aware of or
   maybe a problem we already have fixed in a Beta Test patch.  If you copy
   this data into a post to the 3000-L, as time/resources permit, I maybe
   able to look at it as well.

   Example:  In my example #s54 is hung (CNTRL-S on print catalog.pub.sys)

:showjob
...
#S54    EXEC        18  18       SAT 12:25A  JHATHOME,MANAGER.SYS  
...

Find all of the pins for #S54:

:showproc ;pin=1;system;tree
QPRI  CPUTIME   STATE  JOBNUM  PIN  (PROGRAM) STEP
...
B152  0:00.235  WAIT   S54         57   (JSMAIN.PUB.SYS)
C200  0:00.241  WAIT   S54           91   :PRINT catalog.pub.sys;page=0
C152  0:00.092  WAIT   S54           70   (VTSERVER.NET.SYS)   ÿ8
...

:debug  
DEBUG/iX C.16.01  

HPDEBUG Intrinsic at: a.00a60064 hxdebug+$e4
$1 ($60) nmdebug > pin #57;tr,d,i
       PC=a.0017e70c enable_int+$2c
NM* 0) SP=418428f0 RP=a.002c3b98 notify_dispatcher.block_current_process+$324
NM  1) SP=418428f0 RP=a.002c603c notify_dispatcher+$264
NM  2) SP=41842870 RP=a.001aee90 wait_for_active_port+$ec
NM  3) SP=41842770 RP=a.001afb20 receive_from_port+$544
NM  4) SP=418426f0 RP=a.003647e4 extend_receive+$494
NM  5) SP=418424f0 RP=a.00d0c5d4 jsm_get_command+$58
NM  6) SP=418423b0 RP=a.00d0c568 ?jsm_get_command+$8
         export stub: 120.00007744  
NM  7) SP=418422f0 RP=120.00000000  
     (end of NM stack)
$2 ($39) nmdebug > pin #91;tr,d,i  
       PC=a.0017e70c enable_int+$2c
NM* 0) SP=418444b0 RP=a.002c3b98 notify_dispatcher.block_current_process+$324
NM  1) SP=418444b0 RP=a.002c603c notify_dispatcher+$264
NM  2) SP=41844430 RP=a.001aee90 wait_for_active_port+$ec
NM  3) SP=41844330 RP=a.001afb20 receive_from_port+$544
NM  4) SP=418442b0 RP=a.003647e4 extend_receive+$494
NM  5) SP=418440b0 RP=a.003531f4 rendezvousio.get_specific+$158
NM  6) SP=41843f70 RP=a.00353558 rendezvousio+$1d8
NM  7) SP=41843eb0 RP=a.0035334c ?rendezvousio+$8
         export stub: a.019745a8 sm_term_vt.blocked_write+$194
NM  8) SP=41843b30 RP=a.019746b4 sm_term_vt.sm_write+$7c
NM  9) SP=41843ab0 RP=a.01974770 sm_term_vt+$54
NM  a) SP=418439f0 RP=a.019746c8 ?sm_term_vt+$8
         export stub: a.003cf544 tm_terminal.tm_write+$3a8
NM  b) SP=41843970 RP=a.003d0f40 tm_terminal+$15c
NM  c) SP=418437b0 RP=a.00eb7b1c FWRITE+$8b8
NM  d) SP=418436f0 RP=a.00eb7230 ?FWRITE+$8
         export stub: a.00e7a500 tprint+$7cc
NM  e) SP=41843370 RP=a.00a64f7c hxprint+$2c0
NM  f) SP=41842ef0 RP=a.00a775d8 exec_cmd+$a3c
NM 10) SP=41842e30 RP=a.00a76b68 ?exec_cmd+$8
         export stub: a.00a79558 try_exec_cmd+$c8
NM 11) SP=41842db0 RP=a.00a768e0 command_interpret+$318
NM 12) SP=41842930 RP=a.00a76594 ?command_interpret+$8
         export stub: a.00a7a088 xeqcommand+$194
NM 13) SP=41842330 RP=a.00a79ee0 ?xeqcommand+$8
         export stub: 100.000067dc main_ci+$64
NM 14) SP=418422b0 RP=100.0000751c PROGRAM+$290
NM 15) SP=41842230 RP=100.00000000  
     (end of NM stack)
$3 ($5b) nmdebug > pin #70;tr,d,i  
       PC=a.0017e70c enable_int+$2c
NM* 0) SP=418431f0 RP=a.002c3b98 notify_dispatcher.block_current_process+$324
NM  1) SP=418431f0 RP=a.002c603c notify_dispatcher+$264
NM  2) SP=41843170 RP=a.003a30c4 ipc_impede+$274
NM  3) SP=41843070 RP=a.003a2e3c ?ipc_impede+$8
         export stub: a.0181da28 sk_block+$1c4
NM  4) SP=41842f30 RP=a.0181e2e8 sk_block_for_completion+$fc
NM  5) SP=41842df0 RP=a.0184a78c sk_send+$308
NM  6) SP=41842cb0 RP=a.017fec94 IPCSEND+$820
NM  7) SP=41842bf0 RP=a.017fe440 ?IPCSEND+$8
         export stub: 2c1.0001dcac  
NM  8) SP=41842970 RP=2c1.0002a294  
NM  9) SP=41842630 RP=2c1.0002e13c  
NM  a) SP=41842330 RP=2c1.0002e6e8  
NM  b) SP=418422b0 RP=2c1.0002fb2c  
NM  c) SP=41842230 RP=2c1.0002f9a4  
         export stub: 2c1.00014c80  
NM  d) SP=418421b0 RP=2c1.00000000  
     (end of NM stack)
$4 ($46) nmdebug > exit
:

#3 If from the above data we are not able to identify this session hang
   as a known problem, then it is time to take the memory dump, submit the
   SR, etc... This may sound like a lot of work, but this is how we at HP
   are able to collect sufficient data to resolve these types of problems  
   and put out the fix in the next patch to benefit you and probably benefit  
   our other friends out here in 3000-L land...

I hope this helps,  
  and yes the tape is in the drive and I am backing up tonight!

Regards,

James Hofmeister
Hewlett Packard
Worldwide Technology Network Expert Center
P.S. My Ideals are my own, not necessarily my employers.











  

ATOM RSS1 RSS2