While I agree with the general tone and the need for careful, pragmatic,
conservatism about any Killproc procedure, I would like to say a word about
system administration. There are good sys admins and bad sys admins, but
usually they at least know they are a sys admin and are responsible for the
system and it's data. One of the major problems with Windows and PCs is that
everyone becomes a sys admin, but doesn't know it and/or isn't very good at
it.
A good Unix sys admin is take the time to try to resolve a problem
process/sessions without using kill -9. The gratuitous use of kill -9 is
simply unethical or worse. Furthermore, a careful use of kill -9 on a process
tree can resolve problems better than a reboot that will kill the process
anyway, but arbitrarily, that is, not necessarily in the optimal manner.
Similarly, a good MPE sys admin will try to resolve a process/session
problem before doing an abortio/abortjob/killsess.
Finally, one of the wonders of MPE is that the abortjob does a super job of
cleaning up the process tree for you, whereas in Unix, the sys admin needs to
know just how to apply kill (without using -9) to the process tree, otherwise
one can really mess things up. I've seen many cases where the wrong order
of killing processes or a failure to follow other procedures leads to a
situation where a kill -9 is needed, when it wouldn't have been if superior
technique had been used. Repeating: MPE's abortjob is a wonder...
Just my 0.01 worth.
Richard G.
Newman, Kevin: wrote:
>
> Gavin,
>
> In general, I would agree with this; however, in this particular case, I
> know what was happening with this session. I know what it was
> accessing, what it was trying to do and the conditions of it becoming
> hung. I feel that I have enough knowledge about the 3000 that I would
> feel save doing a kill on this process. I know that others on this list
> would also be able to determine if they would want to risk performing a
> kill like this. I don't think that the general user should have access
> to do a kill like this, and I definitely would not let a 'unix only'
> person get anywhere near this command on any 3000 that I had anything to
> do with. I still think that it is needed, and maybe the kill should
> describe what you are about to do, what resources are being held, and
> possible side affects; then ask if you really want to risk destabilizing
> your system. If you say yes, kill it. Set some flag to show that a
> process was forced dead, and if a DUMP is done and sent to the RC, they
> could easily see that the system is dirty due to someone killing off
> processes. At that point, they could turn it back to the customer, and
> not waste anymore of their time on it.
>
> Just my opinions,
>
> Kevin
>
> btw: the following is part of the SR that James identified as being
> linked to my problem.
>
> I looked this one over and it looks to me like it matches up with
> SR 5003282111. I do not see an indication that this problem is
> fixed, so now is time to talk to our friends at the Response
> Center and find out what needs to happen to solve this one.
>
> -------------------------------------------------------------------
> Problem Text
>
> A session hang can occur if the Break key is hit in a very small
> timing window, when processes are running under the CI.
>
> Cause Text
>
> A deadlock can occur in this timing window involving the CI process
> and one of its child processes. The deadlock is between the CI
> Var Table lock and the Terminal PACB lock.
> -------------------------------------------------------------------
>
> Gee, that looks like a bug to me!
>
> > -----Original Message-----
> > From: Gavin Scott [SMTP:[log in to unmask]]
> > Sent: Thursday, April 29, 1999 11:56 AM
> > To: [log in to unmask]
> > Subject: Re: How do I kill a hung session ?
> >
> > Scott writes:
> > > Bottom line, :ABORTPROCESS will not abort any process that is
> > critical.
> > > Period.
> >
> > Ergo, :ABORTPROCESS will not abort any process which :ABORTJOB could
> > not
> > abort. It's just a more fine-grained :ABORTJOB, not a magical
> > "process-
> > be-gone" command.
> >
> > In my opinion this is a step backwards in that it destabilizes the
> > 3000
> > platform by allowing operations staff to take potshots at arbitrary
> > processes in a carefully designed and constructed process tree. The
> > developers of the system probably never tested what happens if
> > arbitrary
> > processes get killed at arbitrary points in their execution, thus
> > increasing the chances of data corruption and other problems resulting
> > from use of this new command.
> >
> > Further, the only mechanism available to a developer to protect
> > herself
> > from :ABORTPROC is to set the process "critical" for its entire
> > execution.
> > This means of course that the process will not be abortable *at all*
> > now,
> > and any error in the program will result in the entire system crashing
> > with a SA1458 (Process Aborting While Critical). Hardly an
> > improvement.
> >
> > Users are asking for a product that "Kills bugs dead permanently right
> > now", but, as has been pointed out, this is not practical.
> >
> > The problem is not the inability to 'kill' certain processes. In fact
> > there are two problems:
> >
> > 1) Processes get stuck in states where they cannot be aborted.
> >
> > This is not a deficiency in the :ABORT[JOB|PROC|whatever] commands.
> > It is a result of a complex system which is either buggy or not
> > designed to avoid these situations. If you don't want
> > non-abortable
> > stuck processes, ask HP and the other software developers to ensure
> > that this doesn't happen. Of course if you want more money spent
> > on
> > this, you'll have to expect something else to suffer.
> >
> > 2) Users can't tell *why* something is stuck and why they can't abort
> > it.
> >
> > There seems to be a standard human response of "if you can't
> > understand
> > it, try to make it go away". Several people today have asked for
> > more
> > information as to why processes are stuck. I suggest (actually I
> > suggested to HP several days ago) that if there was a TELESUP type
> > utility that people could run which would explain to them why a
> > process was not currently abortable, that this would practically
> > eliminate the need for a "super" :ABORTPROC command. Either users
> > would accept the stuck nature of the process once they understand
> > exactly why it is stuck, or they would complain to HP (or whomever)
> > about the stuck process and ask for the associated "bug" to be
> > fixed.
> > The utility would give enough information for the user to feel
> > confident that they understand exactly what the process is doing
> > and
> > why it is stuck (this means a textual explanation, not just a bunch
> > of stack traces) and also the technical information that a
> > developer
> > would need to investigate and fix why the condition occurred in the
> > first place.
> >
> > G.
--
Richard Gambrell
Database Administrator and Consultant to Computing Services
University of Tennessee at Chattanooga, Dept. 4454
113 Hunter Hall, 615 McCallie Ave. Chattanooga, TN 37403-2598
UTC phone: 423-755-4551 fax: 423-755-4025
UTC e-mail: [log in to unmask]
Business or private email: [log in to unmask]
|