HP3000-L Archives

December 1998, Week 1

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Scott McClellan <[log in to unmask]>
Reply To:
Scott McClellan <[log in to unmask]>
Date:
Fri, 4 Dec 1998 09:20:00 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (44 lines)
I am not sure that I put my original comments in the proper context.

The main point I got from the original posting was that the author
wanted to know how to logoff 1000+ users "as fast as possible". In
other words, I thought the key part of the question had to do
with "speed".

All of the normal methods, =LOGOFF, =SHUTDOWN, etc, broadcast a
kill message to all concurrent users simultaneously. I have done
a lot of testing over the years with > 1000 users (for various
increased capacity projects). From practical experience I know that
it can take a "very long time" to shutdown a heavily loaded system.
From that experience I know that if you broadcast a kill message
to all the users at once, they will all try to die *AT THE SAMETIME*.
This results in two problems.

* First of all there is an enourmous amount of "thrashing" that occurs.
  This is obviously a function in part of how much memory you have, but
  I think it will be an issue regardless (even if maxed out - though
  I admit I have not tried it recently e.g. w/current max memory).
* Second there is a lot of "single threading" on various semaphores
  within the OS (eg: the PCL and the JMAT SIR). Nothing has changed
  in the OS in the last few years that would help this much.

The bottom line is (IMO) is that the thrashing activitiy introduces
an enormous amount of overhead and the single threading limits
the amount of progress that is made concurrently anyway.

From experience, it is MUCH, MUCH, MUCH faster to shutdown a very
large number of users "a few at a time". On old hardware, whith
not enough memory (us Lab guys often have to test on whatever is
available :), a brutte-force =LOGOFF with 1200 users could take
(several) hours. An organized shutdown (e.g. abortjob 100 users,
wait 2 minutes, would only take like 20-25 minutes). I am sure that
the results are different if you are on a current high-end system
(which you should be if you have > 1000 users) and max memory, but the
theory is almost certainly still valid.

For the record, the same thing is true with logging them back on.
It is slower to simultaneously STARTSESS 1200 users than it is to
start a few, wait, start a few, wait, etc. Again this observation
comes from testing on less than optimal hardware, but I would expect
it to hold (to a lesser extent) on the current high-end systems.

ATOM RSS1 RSS2