Hello Friends @ 3000-L,
Re: Re: How do I kill a hung session ?
Can't wait for ":abortproc" ?
I usually try to stay away from the argumentative topics on the list, but
this one pulled me in.
</rant-on>
Yes, I work on a NT Workstation and enjoy using Task Managers ability to
kill processes. I am sure their is no correlation with the fact that this
tool is present and software vendors being less concerned about the quality
of their software. It just seems interesting to me that I use Task Manager
to kill looping and hung processes more often on a single user NT machine
than I use abortjob for any reason on all 3 of my multi user 3000 systems.
Watch out for that Kill... Killing the wrong process on NT is darn xyz
painful... Oh, I hear someone clamoring for an example... OK, Last
Month I did the Task Manager Kill of a wrong process and hit my mail client
(yes the one I receive 3000-L on), OOP's corrupted directory files, and as
I mentioned already to Jeff Kell, their are those folks to backup and those
who wish they had... Last good backup 2/18/99, lost 40 days of 3000-L
messages from all of you good folks... ARGH! Oh, and here is the one Jeff
knows about... 3 weeks later I take a power fail, same mail client, Yes, OOP's
corrupted directory files and yes once again Last good backup 2/18/99, lost
30 days of 3000-L message this time... Have I backed up yet ? maybe not, and
can you believe last night I did a del *.* on "c:" No backup yet, after all
its is only a PC... If something goes wrong I can always reboot? right ?
I am sure their is at least (1) good mail client running on a 3000 where we
have a file system which flushes memory to disk and closes files with out
corrupting them when a process is terminated. Yes, Lars and several folks
out here have already pointed me in the right direction...
If the functionality of abortprocess is implemented on the 3000 then the
question is: Do you want it to work 100% of the time ??? If the answer is
Yes, then the answer is also "Yes, I am willing to accept file corruption as
a result of performing an abortprocess on a system process". I personally
prefer a kinder / gentler abortprocess... and I am not willing to accept
file corruption as a result of abortprocess, kill, abortjob or any other
action I perform on my system. I guess the 3000 has spoiled me, and yes I
do run backups on my 3000s!
</rant-off>
Now, what's a person to do with these hung sessions, especially the sticky
ones that manage to survive a network restart...
First a little background... :abortjob is a message sent to the CI's message
port... If the CI is impeded on some other resource it is not going to be
able to go back and read it's message port... hence we are hung... a 2nd
case is CI got the abort message from the port and has sent a kill to all
of it's children process's and one or more of then are impeded on some other
resource and are not able to return a reply to creating process hence the
child and creating process are hung... Their are several/many other variations
of this and their are multiple resources for which we can be impeded on, but
this is pretty much what it all is about...
OK, back to the question: what's a person to do with these hung sessions,
especially the sticky ones that manage to survive a network restart...
#1 HP has worked VERY hard to address session hangs in DTS/DTC Terminal
connections, in NS-VT Virtual Terminal connections, and in TELNET
Terminal connections.
We at HP invite you to be part of the solution and install the General
Release patches which include fixes to known problems...
- NSSFD16 5.5 GENERAL RELEASE (Included on power patch)
- PTDFD37 5.5 GENERAL RELEASE (Included on power patch)
- DTCEDT5 5.5 GENERAL RELEASE
- DTSFDA3 5.5 GENERAL RELEASE
#2 With minimal effort, it is possible to take a cursory look at a process
hang (look at all process's in the family) with debug and for HP RC to
review this to see if it matches up with a problem we are aware of or
maybe a problem we already have fixed in a Beta Test patch. If you copy
this data into a post to the 3000-L, as time/resources permit, I maybe
able to look at it as well.
Example: In my example #s54 is hung (CNTRL-S on print catalog.pub.sys)
:showjob
...
#S54 EXEC 18 18 SAT 12:25A JHATHOME,MANAGER.SYS
...
Find all of the pins for #S54:
:showproc ;pin=1;system;tree
QPRI CPUTIME STATE JOBNUM PIN (PROGRAM) STEP
...
B152 0:00.235 WAIT S54 57 (JSMAIN.PUB.SYS)
C200 0:00.241 WAIT S54 91 :PRINT catalog.pub.sys;page=0
C152 0:00.092 WAIT S54 70 (VTSERVER.NET.SYS) ÿ8
...
:debug
DEBUG/iX C.16.01
HPDEBUG Intrinsic at: a.00a60064 hxdebug+$e4
$1 ($60) nmdebug > pin #57;tr,d,i
PC=a.0017e70c enable_int+$2c
NM* 0) SP=418428f0 RP=a.002c3b98 notify_dispatcher.block_current_process+$324
NM 1) SP=418428f0 RP=a.002c603c notify_dispatcher+$264
NM 2) SP=41842870 RP=a.001aee90 wait_for_active_port+$ec
NM 3) SP=41842770 RP=a.001afb20 receive_from_port+$544
NM 4) SP=418426f0 RP=a.003647e4 extend_receive+$494
NM 5) SP=418424f0 RP=a.00d0c5d4 jsm_get_command+$58
NM 6) SP=418423b0 RP=a.00d0c568 ?jsm_get_command+$8
export stub: 120.00007744
NM 7) SP=418422f0 RP=120.00000000
(end of NM stack)
$2 ($39) nmdebug > pin #91;tr,d,i
PC=a.0017e70c enable_int+$2c
NM* 0) SP=418444b0 RP=a.002c3b98 notify_dispatcher.block_current_process+$324
NM 1) SP=418444b0 RP=a.002c603c notify_dispatcher+$264
NM 2) SP=41844430 RP=a.001aee90 wait_for_active_port+$ec
NM 3) SP=41844330 RP=a.001afb20 receive_from_port+$544
NM 4) SP=418442b0 RP=a.003647e4 extend_receive+$494
NM 5) SP=418440b0 RP=a.003531f4 rendezvousio.get_specific+$158
NM 6) SP=41843f70 RP=a.00353558 rendezvousio+$1d8
NM 7) SP=41843eb0 RP=a.0035334c ?rendezvousio+$8
export stub: a.019745a8 sm_term_vt.blocked_write+$194
NM 8) SP=41843b30 RP=a.019746b4 sm_term_vt.sm_write+$7c
NM 9) SP=41843ab0 RP=a.01974770 sm_term_vt+$54
NM a) SP=418439f0 RP=a.019746c8 ?sm_term_vt+$8
export stub: a.003cf544 tm_terminal.tm_write+$3a8
NM b) SP=41843970 RP=a.003d0f40 tm_terminal+$15c
NM c) SP=418437b0 RP=a.00eb7b1c FWRITE+$8b8
NM d) SP=418436f0 RP=a.00eb7230 ?FWRITE+$8
export stub: a.00e7a500 tprint+$7cc
NM e) SP=41843370 RP=a.00a64f7c hxprint+$2c0
NM f) SP=41842ef0 RP=a.00a775d8 exec_cmd+$a3c
NM 10) SP=41842e30 RP=a.00a76b68 ?exec_cmd+$8
export stub: a.00a79558 try_exec_cmd+$c8
NM 11) SP=41842db0 RP=a.00a768e0 command_interpret+$318
NM 12) SP=41842930 RP=a.00a76594 ?command_interpret+$8
export stub: a.00a7a088 xeqcommand+$194
NM 13) SP=41842330 RP=a.00a79ee0 ?xeqcommand+$8
export stub: 100.000067dc main_ci+$64
NM 14) SP=418422b0 RP=100.0000751c PROGRAM+$290
NM 15) SP=41842230 RP=100.00000000
(end of NM stack)
$3 ($5b) nmdebug > pin #70;tr,d,i
PC=a.0017e70c enable_int+$2c
NM* 0) SP=418431f0 RP=a.002c3b98 notify_dispatcher.block_current_process+$324
NM 1) SP=418431f0 RP=a.002c603c notify_dispatcher+$264
NM 2) SP=41843170 RP=a.003a30c4 ipc_impede+$274
NM 3) SP=41843070 RP=a.003a2e3c ?ipc_impede+$8
export stub: a.0181da28 sk_block+$1c4
NM 4) SP=41842f30 RP=a.0181e2e8 sk_block_for_completion+$fc
NM 5) SP=41842df0 RP=a.0184a78c sk_send+$308
NM 6) SP=41842cb0 RP=a.017fec94 IPCSEND+$820
NM 7) SP=41842bf0 RP=a.017fe440 ?IPCSEND+$8
export stub: 2c1.0001dcac
NM 8) SP=41842970 RP=2c1.0002a294
NM 9) SP=41842630 RP=2c1.0002e13c
NM a) SP=41842330 RP=2c1.0002e6e8
NM b) SP=418422b0 RP=2c1.0002fb2c
NM c) SP=41842230 RP=2c1.0002f9a4
export stub: 2c1.00014c80
NM d) SP=418421b0 RP=2c1.00000000
(end of NM stack)
$4 ($46) nmdebug > exit
:
#3 If from the above data we are not able to identify this session hang
as a known problem, then it is time to take the memory dump, submit the
SR, etc... This may sound like a lot of work, but this is how we at HP
are able to collect sufficient data to resolve these types of problems
and put out the fix in the next patch to benefit you and probably benefit
our other friends out here in 3000-L land...
I hope this helps,
and yes the tape is in the drive and I am backing up tonight!
Regards,
James Hofmeister
Hewlett Packard
Worldwide Technology Network Expert Center
P.S. My Ideals are my own, not necessarily my employers.
|