HP3000-L Archives

July 1999, Week 2

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Wirt Atmar <[log in to unmask]>
Reply To:
Date:
Sat, 10 Jul 1999 18:33:54 EDT
Content-Type:
text/plain
Parts/Attachments:
text/plain (358 lines)
Martin writes:

> Does anyone have not-too-technical explanation of how the queues work
>  together, and possibly an explanation for what was happening? Are there any
>  configuration changes that could be made to give my jobs an increased slice
>  of the action without slowing down the users' sessions too much (ie to the
>  extent that they phone me up to complain)?
>
>  Also, at the end of the day the users logged off and the previously static
>  jobs in the 'D' queue jumped back into life. Is this relevant?

This is the explanation that I use in our training classes, which tend to be
filled with accountants, bookkeepers, CEOs, and the like, many of them the
system managers for their machines. The explanation is straightforward and
easily understood, and although I tend to kid around perhaps too much, it is
not patronizing. It is a fairly accurate, albeit brief, explanation of how
queues and schedulers work in any computer.


THE OLD DAYS

Twenty years ago, one of the questions that newcomers were pretty much always
asked when they attended their first HP3000 user group meeting was: given
your size machine, how many jobs and sessions can your system be executing at
any one time?

The answers would come back: 4, 12, 30, 100, geez, I don't know.

The correct answer was: 1. No matter how big or small your system was, it had
only one processor and it could only be doing one thing at a time. What gives
you the illusion -- and it's only an illusion -- that your machine is
processing many jobs simultaneously is the fact that all HP3000's are
time-sharing devices, where each job and session gets a "time-slice" of the
processor's attention.

If it's your turn, you have the processor's complete and undivided attention.
During your time-slice, the processor is processing only your code -- and no
one else's. Every other process is standing dead still.

The corollary to that question was: Given your size of machine and the number
of discs that you have connected, how many different files can you be reading
and writing to at the same time?

The answer is again the same: 1. We can only be reading from one file at a
time, no matter how many discs we have. And if we move the disc heads on one
drive, we're putting them where somebody else doesn't want them to be.

Both of these answers are important to understanding how a multiuser machine
works and how to use it efficiently.


TAKIN' IT TO THE BANK

For the next ten minutes, imagine your von-Neumann-architected, transactional
engine (your HP3000) to be a bank. Imagine also that your single processor is
a single teller, standing behind a row of teller windows. The windows are
labelled A, B, C, D, and E. All of the windows face out into the lobby of the
bank, with the exception of Window A. It's behind a wall that separates the
lobby from the bank's employees. The teller can move from window to window
with ease but customers can only come in the front door and line up at any of
the windows marked B through E.

Window A is reserved for extremely high-priority bank business only, items
such as time-keeping and memory management. These processes tend to be
absolutely important to running the bank, but they're also of extremely short
duration. When they're necessary to be processed, the teller has to
momentarily abandon whatever customer he's servicing and move over and
process this high priority system task.

To do this, he has to first hand back to the customer all of his
transactional data -- a large stack of papers -- and then tell the customer:
"Hold on, I'll be right back." The teller then zips over to the A window,
does whatever is necessary, and returns almost immediately, gone so short a
period of time that the customer really never notices.

That priority pattern is how all of the windows are arranged, each window
representing a queue on the HP3000. Window B is reserved for the really high
priority lobby customers, the VIPs. Their transactions are rated very
important, but they're also designed to be very short. More often than not,
virtually no one ever shows up at this window in real life. Almost no one
programs processes into the B queue because there are very few processes that
are short enough and important enough that they should take priority over the
humans who sit at terminals. It is the humans and their processes that are
assigned to the C queue.

The C window is primarily intended for the people keying in data at termnals.
Each of these customers carry in a transaction bundle that's relatively
small, one that will do a few calculations and update one, two, three or four
datasets, but that's about it. The C queue is designed for people to get into
the bank, do their business, and leave.

The single teller/processor not only works on a time slice basis, where the
customer who's at the window in the C queue gets his undivided attention for
as long as his time-slice lasts, the teller also works on an interrupt basis.
If a higher priority customer shows up at either the B or A windows, the
teller will mark his current place, zip over and process the higher-priority
request, and then return.

With the exception of these interrupts, which tend to be very short, the
customer at the C window gets the teller's undivided attention, right up to
the last moment of his time slice. At that point, the teller hands back the
customer all of his papers and asks that he go to the back of the line,
creating a "circular queue." The teller then begins processing the next
customer in line.

If there are five customers at Window C, each one gets 20% of the teller's
time (and as a consequence, each customer takes five times longer to process
his material than if he were the only customer in the bank). This circular
process continues with a customer having his transaction worked on feverishly
by the teller for a bit of time, then moving to the back of the line, and
then worked on, and then to the back, etc., until eventually the transaction
is complete. At that point, the customer leaves the bank, satisfied.

C-Window customers (who should be reserved to only be the humans sitting at
terminals) don't enter the bank very often. They only do that when they
either press the RETURN key or the ENTER key, but that's very rarely done.
There are billions of teller cycles in between each of those key presses.
It's entirely possible that the teller can be sitting there for long periods
of time with no customers at any of the windows, just twiddling his thumbs to
the rhythm of the bank's clock.

However, most of the time, if there aren't customers at the C-window, there
will be some at the D or E Windows. These customers aren't any less
important, they just have different characteristics. Most often these
customers are automatically-scheduled jobs and as a consequence, they tend to
be patient. They don't mind a few interruptions. They also tend to be massive
in their requirements compared to the humans at the C Window.

These D and E customers don't update just one or two records in one or two
datasets. Instead, they touch everything. And unlike the C customer, who gets
in and gets out and doesn't show up again for days (in teller time), the D
customer, as soon as he finishes updating or extracting information from one
dataset, is immediately ready to go again, so these customers have a tendency
to keep the teller processor quite busy.

In fact, if they don't work to keep the teller 100% busy, they're not doing
their job properly. If there are D and E customers waiting in their
respective lines (queues), but the teller is just sitting there, twiddling
his thumbs, waiting for somebody to give the data he needs to get on with his
calculations, it means that everyone is being delayed while the discs stumble
and fumble around, trying to find the data the teller needs for the current
customer.

When this condition gets to its worst possible conclusion, the circumstance
is known as "thrashing". A D-window customer is sent to the back of the queue
and the next customer in line takes his place, but without any of his records
or data in his hands. Rather than have the teller reach into his memory and
immediately remember where he left off with this particular customer, the
teller instead has to go look up all of his files, get everything set up
again, and then begin processing his data. All of this extra work takes up
time during the D-customer's time slice. Reading data off of the discs is at
least 100,000 times slower than reading it out of the memory directly.

Under the worst possible conditions, the teller just gets all of the
customer's data loaded into his memory when the customer's time-slice ends --
and he has to start the process all over again with the next guy in line.
Under this worst of all possible worlds, no useful work is getting done. The
teller is twiddling his thumbs, the discs are busier than heck, and the line
of D-customers is growing.

The key to breaking this condition is not to shoot (abort) some of the
D-customers, although that does often help. Rather, the more socially
acceptable solution is to plug more memory into the teller's brain. Doing
that allows the teller to remember instantly where he left off with each
customer as they appear at the window for their turn.

"Thrashing" is not nearly as much of a problem as it used to be. The new crop
of tellers are very bright kids, capable of adding up long columns of numbers
with amazing speeds. They also tend to be memory-rich nowadays, so they don't
have to refer back to files held on discs nearly as often as they used to.
The consequence of both of these attributes is that a teller nowadays can get
a lot more work done during each customer's time slice than they did in the
old days.


STOPPING A LINE FROM MOVING

There is nothing inherently slower about a D process than B process. The
teller's queues are only priority-based, not speed-based. A single E Window
customer, running in the middle of the night in an otherwise empty bank will
get 100% of the teller's attention and process its tasks just as fast as a B
Window customer ever would. Assigning the customers to the various windows is
simply a method of assigning them relative priorities, not speed of
processing.

It's just that when a B or C Window customer walks through the front door,
the teller is obliged to stop processing the D and E Window customers and go
service them instead. Most normally, there's only one teller in the bank and
he has to service all of the customers.

If the queues are said to be "non-overlapping", the teller will
preferentially service the C Window customers right up to the point where
there are none remaining in line. Only at that point will the teller drop
down to the D Window and service the customers lined up there.

Clearly, under these conditions, the D customers won't move when there are C
customers to be serviced.

How do you get the C customers to so completely starve the D and E people of
all available time? Quite easily. Simply keep the line of C customers long.
An easy way to do this is to run a lot of large tasks, such as database
updates or reports, in the C queue. But if you do this, you're violating the
unspoken rules of ettiquette that govern every time-sharing, time-slicing
system.

The system was designed so as to maximally perpetuate the illusion that the
humans sitting at terminals have the machine all to themselves. If their
transactions are purposefully kept short and to the point, that illusion is
perfectly maintainable for a surprisingly large number of people.

During the time when there are no C customers in line, the teller doesn't
take a break. He immediately drops down and processes the D customers, if
there are any. If there aren't, he processes the E customers. Remember, these
are the patient customers. They don't care if the teller has to leave them
for a few minutes. If the teller has to spend 10% of his time in the C queue,
their processing time is only extended by 10% overall, and no one will hardly
even notice that.


OVERLAPPING QUEUES

The rules are very clear-cut if the queues are "non-overlapping." The teller
spends all of his time in the highest queue, processing whatever customers
are there until they're completely done.

There is however a certain level of unjustness associated with this protocol.
If there should be one or two customers in the C queue who are completely
hogging the teller's time, no one in the D or E queues will get any attention
at all.

To rectify this situation, the priority levels of the customers in the
various queues can be dynamically altered. There are any number of
"optimization" algorithms that you can imagine that could be used. One simple
protocol is that if a particular customer stays in a queue for a very long
time, the teller can say, "Gawd, it's you again!", and drop his priority down
a bit, sometimes substantially, so that this particular C customer now may
have a priority similar to a D- (or even E-) level customer. The customer is
still technically in the C queue, but he isn't getting the same level of
respect he used to. This reassignment of priority is called "decay." The idea
behind this simple notion is the hope that a new C Window customer that just
walked in the door will have a very short transaction and will get in and out
quite quickly. If things prove to be otherwise, it's to everyone's advantage
to start dropping this new customer's priority. The quick in-and-out
customers that are still coming through the doors will still think that the
machine is totally theirs -- and this substantial customer won't notice all
that much of a difference in overall processing time.

A second thing that can be done is to "overlap" the priorities of the various
queues so that the D and E customers just don't sit there, indefinitely,
forever. Rather, their individual priorities can also be dynamically altered
to "percolate" up so that eventually, they too, actually get a bit of the
teller's time.

They can't take a lot of the teller's time and attention. If they did, the
meanings of the various queues would evaporate -- and everyone would be made
equal and equally fighting for time. But a little bit of time, once and a
while, doesn't do any harm. It keeps their processes moving, albeit not very
quickly. But it may be enough to keep them somewhat happy.


ADDING MORE TELLERS

The newest idea in bank design is called "symmetric multiple processors"
(SMP). This isn't at all the same architecture as "massively parallel
processing" (MPP). In the latter design, you really only have one task before
you, something like seeking out alien intelligence in radio signals, looking
for patterns in the cosmic chaos. Tasks such are these are eminently
partitionable and can be parcelled out to billions of small processors, all
of whom are doing essentially the same thing on very small chunks of data.

Rather, our new banks are designed differently. While there remains only one
front lobby door, where all transactional customers enter and leave, we now
can add one or two or three new tellers. Most importantly, each of these new
tellers are essentially independent agents, each with their own set of
windows, labelled B through E, one set along the north wall of the bank, one
along the east wall, and so on.

When a customer comes into the front door, that customer is directed to one
of the walls. And he'll stay there until he's completely done and leaves the
building. A customer never moves between tellers, once he's in line.

In this new architecture of symmetric multiprocessors, it's as if there were
several banks in the same building. It's just that as each new customer walks
in the door, he's assigned by the "monarch" teller to one of the various
tellers' walls (including possibly his own). This additional responsibility
doesn't really bother the teller who's been assigned monarch duties all that
much; all he's trying to do is to distribute the load among all of the
tellers as evenly as possible.

However, adding a new row of windows and putting a teller behind them doesn't
increase the overall processing time as much as you might think. There
remains only one set of cash drawers, the discs, and all of the tellers have
to read and write from the same set of drawers. As a consequence, they tend
to bump into each other a bit.

Because of this interference, going from one teller to two only increases
overall productivity by a factor about 1.8, not the 2x that you might expect.
Doubling the number of tellers to four is even worse, netting you only about
1.6 times delta increase. And by the time you get to 16 tellers behind their
16 individual rows of windows, they're spending so much time running into
each other that productivity gains may well have become negative. While
you're spending money like water at this point trying to redesign the bank to
increase overall throughput, your tellers are running into each other at such
a rate that overall processing productivity actually drops.


THE MORALS

The morals of all this?

     o Don't fiddle too much (or at all) with the queuing structure that was
assigned by HP. Like all optimization algorithms where you don't know in
advance the topography of the optimization response surface, you can spend a
lot of time fiddling and do yourself no good -- and possibly some substantial
amount of harm. In general, on the average, HP's settings are pretty good,
and that's all you can expect, in general, on the average.

     o Rather, take the time to assign your tasks to the queues based on
their real needs. Sessions are automatically assigned to the C queue and jobs
to the D queue. Running a massive report once and a while in the C queue
won't do any harm, but don't make it a habit. And especially, don't let a lot
of people do that. Set your processes up so that important, immediately
necessary reports run in the D queue -- and those that can wait are run in
the E.

Above all else, we want to maintain the illusion to the people out there on
the terminals that the machine is solely theirs. These are real people, who
have real lives, who have kids, who want to go home at night, and most
importantly, they are the people who are talking to customers.
Lightning-quick responsitivies is all we want for these guys.

     o If the poor processor who has to run the entire place by himself is
having to spend too much time accessing his discs because critically
important information necessary for each customer is constantly being flushed
from his memory, then by all means invest in more main memory. It can often
mean a dramatic overall performance increase, sometimes much more than adding
a second processor.

     o Run your middle-of-the-night batch jobs single file, setting your job
limit just one above the number of background jobs. If there's only one job
running, it is irrelevant which queue it runs in. But what does matter is
that the job should (ideally) make the discs go "clunk" one time, read in all
of the necessary information, and then process that information at maximal
speed, from memory, at 100% CPU utilization rates. In this way, the job isn't
fighting with anyone else for the processor's attention performing trivial
tasks, such as file reads and writes, the discs are quiet, and the constant
flush and reload of the processor's main memory is avoided. This scenario is
particularly true for large data extractions, which are often the reports
that tell you whether or not you're making money.

Five massive jobs run single-file can occasionally execute in two or three
times less time than if they were running simultaneously. Although thrashing
has been greatly ameliorated over the years in the HP3000, it isn't
completely gone, and never will be. If you do things wrongly enough, you can
resurrect it from its deep slumber.

Wirt Atmar

ATOM RSS1 RSS2