LISTSERV - HP3000-L Archives

Doug writes:
> [...]I have always felt there is a need for a kill process
> command, with the caveat that a process that is set critical must
> be left alone.

And therein lies the technical problem in providing the :NUKEPROC command.
I believe that *all* blocked processes are marked "critical".  The only
reason you're able to abort *any* blocked process today is that the abort
commands know how to cancel certain kinds of pending I/O, thus allowing
the process to wake up, cease to be critical, and then notice the abort
request and act on it.

Once a process enters "system code", a system manager *cannot* know what
the effect of just "aborting" that process will be.  And because there
is no way to block a process in user code, *all* hung (blocked as opposed
to looping) processes are in system code while they are blocked.  I
realize many people feel it's their fundamental right to shoot themselves
in the foot with the weapon of their choice whenever they feel like it,
but in this case it's just not practical.

If you want Unix, you know where to find it.

A related issue is that processes cannot be "killed", they can only be
"requested" to commit suicide at their earliest convenience.  So it's
easy to ask a process to go away, but the hard part is being able to
wake up a blocked process so that it can act on that pending request.

Today's abort commands can abort processes that are blocked for certain
reasons, most notably I/O of one sort or another.  Unfortunately there
are lots of other resources that a process can block on (file and db
locks, semaphores controlling many different kinds of operating system
structures, etc.) but teaching the abort code how to extricate a process
from each of these cases would be quite expensive compared to just finding
the bugs that cause the hangs in the first place.

On the other hand, "looping" processes should be easier to kill and
the new HP :ABORTPROC[ESS?] command should take care of these unless
they are continuously critical for some reason (Though I've seen at
least one major 3rd party tool that seems to stay critical all the time).

> The question I have is this. What difference in potential corruption is
> there with an abortproc command versus rebooting the system?

If you shutdown (or even just halt) the system as a whole, then things
like the Transaction Manager (XM) ensure that the system remains in a
consistent state.  If you just arbitrarily release resources owned by
processes, then the structures protected by the locks you've just freed
may be in an inconsistent state, so if you continue to let the system
run after that then all of your integrity and security may be out the
window, and there's no way of knowing what will happen.  This is the
same reason why virtually all unexpected failures and errors within
the operating system itself result in instant System Aborts.  It limits
the damage that might be caused by the unknown state that the system
has gotten itself into somehow.

> Historically, rebooting an MPE system to solve a problem was almost
> unheard of. Now it is commonplace, and worse, an accepted practice.

Historically an HP3000 was a much simpler world than it is today.  MPE
plus COBOL, IMAGE, VPLUS, spoolfiles, printers, a tape drive, and serially
connected terminals was all you needed to run a business.  MPE/V systems
quite happily ran large organizations with significant numbers of users
on only a megabyte or two of memory and 1MHz or so of CPU.  Today we have
all sorts of new things like networks, Posix, and all of the things that
they bring with them.

The offer was made: Good, Fast, and Cheap; pick any two.  People went for
fast and cheap for some reason.

G.