HP3000-L Archives

March 1999, Week 2

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Thomas Madigan <[log in to unmask]>
Reply To:
Thomas Madigan <[log in to unmask]>
Date:
Sat, 13 Mar 1999 00:51:57 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (100 lines)
X-no-Archive:yes

Ross:

I completely understand and sympathize with your situation.  MPE has always
been of the philosophy that a process keeps running until it terminates of
its own free will.  Only limited outside intervention is allowed -- even
from the so-called "system manager."

This post is very reminiscent of one that I made in the Spring of 1998 and
is being repeated because of a very similar circumstance.  I was signed on
today developing a command file in QEDIT.  When everything looked good, I
:XEQ'd that command file. I looked at another window (at GLANCE/iX) and
realized that I really wanted to run the command file in a lower queue
because it searched a large number of files for a particular string and
gobbled up lots of CPU time.  I immediately smacked the BREAK key and the
screen just stared back at me.  A quick look at the GLANCE session showed
that my command file process (actually the root CI process) was now
"impeded" and that it was apparently waiting on itself forever.  I tried
the standard :ABORTJOB command.  Still staring back at me.  Tried NSCONTROL
KILLSESS #S{whatever}.  Nada!  I ran (as fast as my portly bod would allow
me) to the computer room and at the system console did the =ABORTJOB thing.
 Still more nada!!  I then called HPRC on a Friday afternoon (when a lot of
systems seem to go down -- strange!) and was told:  "You'll have to reboot
the system."

<cussing and fuming>

"Reboot the system."  Intentionally shut down a *production* system and
interrupt the work of several hundred users to terminate ONE stuck process.

<shout>

TOTALLY UNACCEPTABLE!!

</shout>

If this were a lowly Windows 95 or NT system, I would have hit
CTRL-ALT-DEL, clicked on the offender and selected "End Process."  Or, on
my NT box, I would have right-clicked the taskbar, brought up Task Manager,
right-clicked the offender's icon and selected "End Process."  If that
didn't work, I would proceed to "reboot the system."  I expect that kind of
behavior from a Windows box.  "Rebooting the system" is standard operating
procedure for a Windows box.  I don't expect to have to take that kind of
drastic measure for an OS whose users regularly throw around the buzz
phrases "24 X 7," "guaranteed maximum uptime," "high availability," etc.
For many years, the various permutations of Unix have had the ability to
kill off a stuck process:

        lo -{pin}

Once you type in the "lo" command, that process is GONE.  Everyone else on
the system keeps chugging right along and are happily unaware that the
system manager just killed a single process out of many hundreds of
processes.  Nobody lost any work (with the exception of the session that
was just terminated); nobody lost any time waiting on the system manager to
intentionally crash the system and restart it (which is what "rebooting the
system" effectively is).

I've heard the argument advanced that allowing the ability to abort a
single process could corrupt files or even entire databases.  I'll
grudgingly admit that it is possible, although it is a remote possibility,
that killing a process could corrupt files if certain conditions were met
(critical update in progress; pages not yet posted back to disk).  What
could be more corrupting, however, than to kill off *everything* and throw
*everyone's* schedule out of whack while "rebooting the system?"  I've yet
to see, even on a "shaky" OS such as Unix, where terminating one process
corrupted anything.  Of course, if you are silly enough to terminate the
root process, you get what you deserve!

The time has long passed where HP needs to balance the slight possibility
of corrupting files against the reality of having to bring everyone's
production work to a screeching halt to terminate one offender.  Especially
if MPE is truly meant to be a "high availability" operating system.  IMHO,
"rebooting the system" is a LAST RESORT measure.  If I ever feel the need
to practice "rebooting the system," I'll whip together a few NT boxes,
throw on a couple of Micro$oft apps and wait for the damn thing to crash.
I shouldn't have to wait long.

See you on the other side, Ross.

</cussing and fuming>

At 04:33 PM 3/12/99 -0800, Ross Warner wrote:
>New Beginner again,
>
>I am very stuck here I think. I have somehow initiated these jobs
>below and can't abort them.
>
>
>#J460   WAIT:1   8  10S LP       FRI  3:26P  REPORTS,REPORT00.FIRES
>#J461   WAIT:2   8  10S LP       FRI  3:55P  REPORTS,REPORT00.FIRES
>
>I have issued  - ABORTJOB #J460 and ABORTJOB #J461 and it doesn't stop
>them.
>I have issued  - = ABORTJOB #J460 and =ABORTJOB #J461 (with the CTRL
>A) and no go.

        [rest snipped]

ATOM RSS1 RSS2