HP3000-L Archives

April 1999, Week 5

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Newman, Kevin:" <[log in to unmask]>
Reply To:
Newman, Kevin:
Date:
Thu, 29 Apr 1999 15:27:14 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (128 lines)
Gavin,

In general, I would agree with this; however, in this particular case, I
know what was happening with this session.  I know what it was
accessing, what it was trying to do and the conditions of it becoming
hung.  I feel that I have enough knowledge about the 3000 that I would
feel save doing a kill on this process.  I know that others on this list
would also be able to determine if they would want to risk performing a
kill like this.  I don't think that the general user should have access
to do a kill like this, and I definitely would not let a 'unix only'
person get anywhere near this command on any 3000 that I had anything to
do with.  I still think that it is needed, and maybe the kill should
describe what you are about to do, what resources are being held, and
possible side affects; then ask if you really want to risk destabilizing
your system.  If you say yes, kill it.  Set some flag to show that a
process was forced dead, and if a DUMP is done and sent to the RC, they
could easily see that the system is dirty due to someone killing off
processes.  At that point, they could turn it back to the customer, and
not waste anymore of their time on it.

Just my opinions,

Kevin

btw: the following is part of the SR that James identified as being
linked to my problem.

I looked this one over and it looks to me like it matches up with
SR 5003282111.  I do not see an indication that this problem is
fixed, so now is time to talk to our friends at the Response
Center and find out what needs to happen to solve this one.

-------------------------------------------------------------------
Problem Text

A session hang can occur if the Break key is hit in a very small
timing window, when processes are running under the CI.

Cause Text

A deadlock can occur in this timing window involving the CI process
and one of its child processes.  The deadlock is between the CI
Var Table lock and the Terminal PACB lock.
-------------------------------------------------------------------

Gee, that looks like a bug to me!

> -----Original Message-----
> From: Gavin Scott [SMTP:[log in to unmask]]
> Sent: Thursday, April 29, 1999 11:56 AM
> To:   [log in to unmask]
> Subject:      Re: How do I kill a hung session ?
>
> Scott writes:
> > Bottom line, :ABORTPROCESS will not abort any process that is
> critical.
> > Period.
>
> Ergo, :ABORTPROCESS will not abort any process which :ABORTJOB could
> not
> abort.  It's just a more fine-grained :ABORTJOB, not a magical
> "process-
> be-gone" command.
>
> In my opinion this is a step backwards in that it destabilizes the
> 3000
> platform by allowing operations staff to take potshots at arbitrary
> processes in a carefully designed and constructed process tree.  The
> developers of the system probably never tested what happens if
> arbitrary
> processes get killed at arbitrary points in their execution, thus
> increasing the chances of data corruption and other problems resulting
> from use of this new command.
>
> Further, the only mechanism available to a developer to protect
> herself
> from :ABORTPROC is to set the process "critical" for its entire
> execution.
> This means of course that the process will not be abortable *at all*
> now,
> and any error in the program will result in the entire system crashing
> with a SA1458 (Process Aborting While Critical).  Hardly an
> improvement.
>
> Users are asking for a product that "Kills bugs dead permanently right
> now", but, as has been pointed out, this is not practical.
>
> The problem is not the inability to 'kill' certain processes.  In fact
> there are two problems:
>
> 1) Processes get stuck in states where they cannot be aborted.
>
>    This is not a deficiency in the :ABORT[JOB|PROC|whatever] commands.
>    It is a result of a complex system which is either buggy or not
>    designed to avoid these situations.  If you don't want
> non-abortable
>    stuck processes, ask HP and the other software developers to ensure
>    that this doesn't happen.  Of course if you want more money spent
> on
>    this, you'll have to expect something else to suffer.
>
> 2) Users can't tell *why* something is stuck and why they can't abort
> it.
>
>    There seems to be a standard human response of "if you can't
> understand
>    it, try to make it go away".  Several people today have asked for
> more
>    information as to why processes are stuck.  I suggest (actually I
>    suggested to HP several days ago) that if there was a TELESUP type
>    utility that people could run which would explain to them why a
>    process was not currently abortable, that this would practically
>    eliminate the need for a "super" :ABORTPROC command.  Either users
>    would accept the stuck nature of the process once they understand
>    exactly why it is stuck, or they would complain to HP (or whomever)
>    about the stuck process and ask for the associated "bug" to be
> fixed.
>    The utility would give enough information for the user to feel
>    confident that they understand exactly what the process is doing
> and
>    why it is stuck (this means a textual explanation, not just a bunch
>    of stack traces) and also the technical information that a
> developer
>    would need to investigate and fix why the condition occurred in the
>    first place.
>
> G.

ATOM RSS1 RSS2