HP3000-L Archives

May 1999, Week 1

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Richard Gambrell <[log in to unmask]>
Reply To:
Richard Gambrell <[log in to unmask]>
Date:
Wed, 5 May 1999 22:59:47 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (168 lines)
While I agree with the general tone and the need for careful, pragmatic,
conservatism about any Killproc procedure, I would like to say a word about
system administration.  There are good sys admins and bad sys admins, but
usually they at least know they are a sys admin and are responsible for the
system and it's data.  One of the major problems with Windows and PCs is that
everyone becomes a sys admin, but doesn't know it and/or isn't very good at
it.

        A good Unix sys admin is take the time to try to resolve a problem
process/sessions without using kill -9.  The gratuitous use of kill -9 is
simply unethical or worse. Furthermore, a careful use of kill -9 on a process
tree can resolve problems better than a reboot that will kill the process
anyway, but arbitrarily, that is, not necessarily in the optimal manner.

        Similarly, a good MPE sys admin will try to resolve a process/session
problem before doing an abortio/abortjob/killsess.

        Finally, one of the wonders of MPE is that the abortjob does a super job of
cleaning up the process tree for you, whereas in Unix, the sys admin needs to
know just how to apply kill (without using -9) to the process tree, otherwise
one can really mess things up.  I've seen many cases where the wrong order
of  killing processes or a failure to follow other procedures leads to a
situation where a kill -9 is needed, when it wouldn't have been if superior
technique had been used. Repeating: MPE's abortjob is a wonder...

        Just my 0.01 worth.

Richard G.

Newman, Kevin: wrote:
>
> Gavin,
>
> In general, I would agree with this; however, in this particular case, I
> know what was happening with this session.  I know what it was
> accessing, what it was trying to do and the conditions of it becoming
> hung.  I feel that I have enough knowledge about the 3000 that I would
> feel save doing a kill on this process.  I know that others on this list
> would also be able to determine if they would want to risk performing a
> kill like this.  I don't think that the general user should have access
> to do a kill like this, and I definitely would not let a 'unix only'
> person get anywhere near this command on any 3000 that I had anything to
> do with.  I still think that it is needed, and maybe the kill should
> describe what you are about to do, what resources are being held, and
> possible side affects; then ask if you really want to risk destabilizing
> your system.  If you say yes, kill it.  Set some flag to show that a
> process was forced dead, and if a DUMP is done and sent to the RC, they
> could easily see that the system is dirty due to someone killing off
> processes.  At that point, they could turn it back to the customer, and
> not waste anymore of their time on it.
>
> Just my opinions,
>
> Kevin
>
> btw: the following is part of the SR that James identified as being
> linked to my problem.
>
> I looked this one over and it looks to me like it matches up with
> SR 5003282111.  I do not see an indication that this problem is
> fixed, so now is time to talk to our friends at the Response
> Center and find out what needs to happen to solve this one.
>
> -------------------------------------------------------------------
> Problem Text
>
> A session hang can occur if the Break key is hit in a very small
> timing window, when processes are running under the CI.
>
> Cause Text
>
> A deadlock can occur in this timing window involving the CI process
> and one of its child processes.  The deadlock is between the CI
> Var Table lock and the Terminal PACB lock.
> -------------------------------------------------------------------
>
> Gee, that looks like a bug to me!
>
> > -----Original Message-----
> > From: Gavin Scott [SMTP:[log in to unmask]]
> > Sent: Thursday, April 29, 1999 11:56 AM
> > To:   [log in to unmask]
> > Subject:      Re: How do I kill a hung session ?
> >
> > Scott writes:
> > > Bottom line, :ABORTPROCESS will not abort any process that is
> > critical.
> > > Period.
> >
> > Ergo, :ABORTPROCESS will not abort any process which :ABORTJOB could
> > not
> > abort.  It's just a more fine-grained :ABORTJOB, not a magical
> > "process-
> > be-gone" command.
> >
> > In my opinion this is a step backwards in that it destabilizes the
> > 3000
> > platform by allowing operations staff to take potshots at arbitrary
> > processes in a carefully designed and constructed process tree.  The
> > developers of the system probably never tested what happens if
> > arbitrary
> > processes get killed at arbitrary points in their execution, thus
> > increasing the chances of data corruption and other problems resulting
> > from use of this new command.
> >
> > Further, the only mechanism available to a developer to protect
> > herself
> > from :ABORTPROC is to set the process "critical" for its entire
> > execution.
> > This means of course that the process will not be abortable *at all*
> > now,
> > and any error in the program will result in the entire system crashing
> > with a SA1458 (Process Aborting While Critical).  Hardly an
> > improvement.
> >
> > Users are asking for a product that "Kills bugs dead permanently right
> > now", but, as has been pointed out, this is not practical.
> >
> > The problem is not the inability to 'kill' certain processes.  In fact
> > there are two problems:
> >
> > 1) Processes get stuck in states where they cannot be aborted.
> >
> >    This is not a deficiency in the :ABORT[JOB|PROC|whatever] commands.
> >    It is a result of a complex system which is either buggy or not
> >    designed to avoid these situations.  If you don't want
> > non-abortable
> >    stuck processes, ask HP and the other software developers to ensure
> >    that this doesn't happen.  Of course if you want more money spent
> > on
> >    this, you'll have to expect something else to suffer.
> >
> > 2) Users can't tell *why* something is stuck and why they can't abort
> > it.
> >
> >    There seems to be a standard human response of "if you can't
> > understand
> >    it, try to make it go away".  Several people today have asked for
> > more
> >    information as to why processes are stuck.  I suggest (actually I
> >    suggested to HP several days ago) that if there was a TELESUP type
> >    utility that people could run which would explain to them why a
> >    process was not currently abortable, that this would practically
> >    eliminate the need for a "super" :ABORTPROC command.  Either users
> >    would accept the stuck nature of the process once they understand
> >    exactly why it is stuck, or they would complain to HP (or whomever)
> >    about the stuck process and ask for the associated "bug" to be
> > fixed.
> >    The utility would give enough information for the user to feel
> >    confident that they understand exactly what the process is doing
> > and
> >    why it is stuck (this means a textual explanation, not just a bunch
> >    of stack traces) and also the technical information that a
> > developer
> >    would need to investigate and fix why the condition occurred in the
> >    first place.
> >
> > G.

--
Richard Gambrell
Database Administrator and Consultant to Computing Services
University of Tennessee at Chattanooga, Dept. 4454
113 Hunter Hall, 615 McCallie Ave. Chattanooga, TN 37403-2598
UTC phone: 423-755-4551 fax: 423-755-4025
UTC e-mail: [log in to unmask]
Business or private email: [log in to unmask]

ATOM RSS1 RSS2