LISTSERV - HP3000-L Archives

HP3000-L Archives

May 1999, Week 1

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L May 1999, Week 1

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives
Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]
Subject:	Re: How do I kill a hung session ?
From:	Frank Letts <[log in to unmask]>
Reply To:	Frank Letts <[log in to unmask]>
Date:	Thu, 6 May 1999 07:59:13 -0500
Content-Type:	text/plain
Parts/Attachments:	text/plain (207 lines)
well, you can write a script that does a ps -fu'user', and grep out the
process names that you want to waste, cut the pid field, and then do a kill
-9 on it.  all awk and grep, pretty simple.

Frank Letts
http://freeweb.pdq.net/fbt1
[log in to unmask]

----------
> From: Richard Gambrell <[log in to unmask]>
> To: [log in to unmask]
> Subject: Re: How do I kill a hung session ?
> Date: Wednesday, May 05, 1999 9:59 PM
>
> While I agree with the general tone and the need for careful, pragmatic,
> conservatism about any Killproc procedure, I would like to say a word
about
> system administration.  There are good sys admins and bad sys admins, but
> usually they at least know they are a sys admin and are responsible for
the
> system and it's data.  One of the major problems with Windows and PCs is
that
> everyone becomes a sys admin, but doesn't know it and/or isn't very good
at
> it.
>
>         A good Unix sys admin is take the time to try to resolve a
problem
> process/sessions without using kill -9.  The gratuitous use of kill -9 is
> simply unethical or worse. Furthermore, a careful use of kill -9 on a
process
> tree can resolve problems better than a reboot that will kill the process
> anyway, but arbitrarily, that is, not necessarily in the optimal manner.
>
>         Similarly, a good MPE sys admin will try to resolve a
process/session
> problem before doing an abortio/abortjob/killsess.
>
>         Finally, one of the wonders of MPE is that the abortjob does a
super job of
> cleaning up the process tree for you, whereas in Unix, the sys admin
needs to
> know just how to apply kill (without using -9) to the process tree,
otherwise
> one can really mess things up.  I've seen many cases where the wrong
order
> of  killing processes or a failure to follow other procedures leads to a
> situation where a kill -9 is needed, when it wouldn't have been if
superior
> technique had been used. Repeating: MPE's abortjob is a wonder...
>
>         Just my 0.01 worth.
>
> Richard G.
>
> Newman, Kevin: wrote:
> >
> > Gavin,
> >
> > In general, I would agree with this; however, in this particular case,
I
> > know what was happening with this session.  I know what it was
> > accessing, what it was trying to do and the conditions of it becoming
> > hung.  I feel that I have enough knowledge about the 3000 that I would
> > feel save doing a kill on this process.  I know that others on this
list
> > would also be able to determine if they would want to risk performing a
> > kill like this.  I don't think that the general user should have access
> > to do a kill like this, and I definitely would not let a 'unix only'
> > person get anywhere near this command on any 3000 that I had anything
to
> > do with.  I still think that it is needed, and maybe the kill should
> > describe what you are about to do, what resources are being held, and
> > possible side affects; then ask if you really want to risk
destabilizing
> > your system.  If you say yes, kill it.  Set some flag to show that a
> > process was forced dead, and if a DUMP is done and sent to the RC, they
> > could easily see that the system is dirty due to someone killing off
> > processes.  At that point, they could turn it back to the customer, and
> > not waste anymore of their time on it.
> >
> > Just my opinions,
> >
> > Kevin
> >
> > btw: the following is part of the SR that James identified as being
> > linked to my problem.
> >
> > I looked this one over and it looks to me like it matches up with
> > SR 5003282111.  I do not see an indication that this problem is
> > fixed, so now is time to talk to our friends at the Response
> > Center and find out what needs to happen to solve this one.
> >
> > -------------------------------------------------------------------
> > Problem Text
> >
> > A session hang can occur if the Break key is hit in a very small
> > timing window, when processes are running under the CI.
> >
> > Cause Text
> >
> > A deadlock can occur in this timing window involving the CI process
> > and one of its child processes.  The deadlock is between the CI
> > Var Table lock and the Terminal PACB lock.
> > -------------------------------------------------------------------
> >
> > Gee, that looks like a bug to me!
> >
> > > -----Original Message-----
> > > From: Gavin Scott [SMTP:[log in to unmask]]
> > > Sent: Thursday, April 29, 1999 11:56 AM
> > > To:   [log in to unmask]
> > > Subject:      Re: How do I kill a hung session ?
> > >
> > > Scott writes:
> > > > Bottom line, :ABORTPROCESS will not abort any process that is
> > > critical.
> > > > Period.
> > >
> > > Ergo, :ABORTPROCESS will not abort any process which :ABORTJOB could
> > > not
> > > abort.  It's just a more fine-grained :ABORTJOB, not a magical
> > > "process-
> > > be-gone" command.
> > >
> > > In my opinion this is a step backwards in that it destabilizes the
> > > 3000
> > > platform by allowing operations staff to take potshots at arbitrary
> > > processes in a carefully designed and constructed process tree.  The
> > > developers of the system probably never tested what happens if
> > > arbitrary
> > > processes get killed at arbitrary points in their execution, thus
> > > increasing the chances of data corruption and other problems
resulting
> > > from use of this new command.
> > >
> > > Further, the only mechanism available to a developer to protect
> > > herself
> > > from :ABORTPROC is to set the process "critical" for its entire
> > > execution.
> > > This means of course that the process will not be abortable *at all*
> > > now,
> > > and any error in the program will result in the entire system
crashing
> > > with a SA1458 (Process Aborting While Critical).  Hardly an
> > > improvement.
> > >
> > > Users are asking for a product that "Kills bugs dead permanently
right
> > > now", but, as has been pointed out, this is not practical.
> > >
> > > The problem is not the inability to 'kill' certain processes.  In
fact
> > > there are two problems:
> > >
> > > 1) Processes get stuck in states where they cannot be aborted.
> > >
> > >    This is not a deficiency in the :ABORT[JOB|PROC|whatever]
commands.
> > >    It is a result of a complex system which is either buggy or not
> > >    designed to avoid these situations.  If you don't want
> > > non-abortable
> > >    stuck processes, ask HP and the other software developers to
ensure
> > >    that this doesn't happen.  Of course if you want more money spent
> > > on
> > >    this, you'll have to expect something else to suffer.
> > >
> > > 2) Users can't tell *why* something is stuck and why they can't abort
> > > it.
> > >
> > >    There seems to be a standard human response of "if you can't
> > > understand
> > >    it, try to make it go away".  Several people today have asked for
> > > more
> > >    information as to why processes are stuck.  I suggest (actually I
> > >    suggested to HP several days ago) that if there was a TELESUP type
> > >    utility that people could run which would explain to them why a
> > >    process was not currently abortable, that this would practically
> > >    eliminate the need for a "super" :ABORTPROC command.  Either users
> > >    would accept the stuck nature of the process once they understand
> > >    exactly why it is stuck, or they would complain to HP (or
whomever)
> > >    about the stuck process and ask for the associated "bug" to be
> > > fixed.
> > >    The utility would give enough information for the user to feel
> > >    confident that they understand exactly what the process is doing
> > > and
> > >    why it is stuck (this means a textual explanation, not just a
bunch
> > >    of stack traces) and also the technical information that a
> > > developer
> > >    would need to investigate and fix why the condition occurred in
the
> > >    first place.
> > >
> > > G.
>
> --
> Richard Gambrell
> Database Administrator and Consultant to Computing Services
> University of Tennessee at Chattanooga, Dept. 4454
> 113 Hunter Hall, 615 McCallie Ave. Chattanooga, TN 37403-2598
> UTC phone: 423-755-4551 fax: 423-755-4025
> UTC e-mail: [log in to unmask]
> Business or private email: [log in to unmask]
ATOM RSS1 RSS2
RAVEN.UTC.EDU