HP3000-L Archives

February 2002, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Wirt Atmar <[log in to unmask]>
Reply To:
Date:
Tue, 26 Feb 2002 22:17:57 EST
Content-Type:
text/plain
Parts/Attachments:
text/plain (117 lines)
Stan writes:

> > A 997/800
>  > 380GB of disc
>  > 700+ active online users
>  > 45 jobs at any given time
>  >
>  > The system spends on AVERAGE 9% cpu on memory management, that
>  > there are extended periods where the system spends over 20% CPU on
>  > memory management for hours at a time.
>
>  First, is there a performance problem?
>
>  Sure...there are some stated oddities about how much CPU is spent
>  of memory management...but that only matters if there's a performance
> problem!
>
>  What's the "speedometer" show (control-B on ldev 20), on average, over a
>  few minutes?  If it's 70 (F7FF) or less, then you aren't CPU bound ...
>  can probably afford that extra CPU usage.
>
>  > Page fault rate on average over a 24 hour period is 60 per second, but
>  > it is not uncommon to be over 100 per second. (slightly lower than I
>  > would expect)
>
>  Depending upon locality, there's a chance that adding more memory might
>  drop that page fault rate, which would increase pressure on the CPU.
>  (Also, adding more memory might increase memory manager overhead.)
>
>  If the data locality isn't great, then adding more memory might not affect
>  the page fault rate ... so no benefit, but it *could* increase MM overhead
>  (bad).  That would be the primary scenario where adding more memory could
> hurt.
>
>  > What I find amazing is that not only is HP telling them that Memory will
>  > only make the matters worse, but that adding more CPU's will also not
>  > help.
>
>  We can't tell from the data presented so far.

Stan's final line is exactly the sentence I was going to write, but I was
going to emphasize different attributes of the problem.

The question is what are the 700+ users doing? If they're all doing
heads-down data entry, taking telephone orders as fast as they can, you can
certainly expect a great deal of I/O as each of their entries are posted to
disk. But what you shouldn't expect too many page faults due to them. Such
users don't require a great deal of space in main memory.

On the other hand, if the 700+ users are merely occasional terminal users,
such as would occur at nurses' stations, you shouldn't expect either much I/O
or page faults. Most of the time, the terminals would simply be idle.

In either case however, presuming that all of the users all running a
well-designed, conservative program, such as BASIC/FORTRAN/COBOL IMAGE-based
inquiry and updating process, each user is probably only requiring a quarter
megabyte of main memory. But let's be conservative and say that each user
requiries on average 2MB of main memory, then the total user community is
consuming all told about 1.5GB of memory to keep everyone live in RAM, a
situation that would seem to require no page faults at all, given that you
have 10GB of RAM.

Thus the question becomes: who in the heck is finding the remaining 8.5GB to
be too small? My first assumption would be the 45 jobs that are running in
parallel. If those jobs are reports, meaning anything from printing labels to
billings to complex financial extractions, it's been demonstrated over and
over again that running such jobs in single-file, so that the job limit is
set to one job per system processor, rather than running them all
simultaneously, very often dramatically increases throughput by minimizing
the very I/O that you're complaining about.

Ten massive jobs will often run substantially faster when run one-at-a-time
than than when run in parallel, most especially when each job has to push
some of the previous jobs' data out of main memory to make room for its own
data. It's all of the wasted I/O and system pauses attendent to these
multiple parallel jobs that so lengthens the wall time of a run of parallel
jobs. Because you're getting page faults in a situation where you should have
better than 8GB of free memory, some combination of processes must be doing
exactly that. I would never recommend to anyone that they allow 45 jobs to
run simultaneously. Indeed, what I tell our customers is to keep their job
limit to the number of background jobs that must run + the number of
processors they own.

The difference between the C-queue users and the D-queue jobs is very
straightforward. A C-queue user wants the system to appear as if he's the
only user, thus you want his process(es) to remain in main memory and not to
have to be reloaded from disc often. Nonetheless, between the times that the
user pushes his RETURN or ENTER key, billions of machine cycles pass by. To
the CPU, even the most intensely active typist looks like nothing more than
an occasional nuisance.

But jobs are different. As soon as a executing job has finished with one
database transaction, it's immediately ready to proceed with the next. Not
only do you want to keep its processes resident in main memory, you also want
to keep its read source data there too.

But you can never -- or at least you should never -- try to constrain a
process's write data. Any data that's written to a database should be posted
immediately, regardless of whether that data was created by a job or a
session. While it is possible to put the IMAGE databases into an autodefer
state and post nothing to the discs until you request it, and thus
dramatically cut down on I/O, doing this is the shortest road to hell that I
know of that's paved with any sort of intention, good, bad or merely dumb.

But this second kind of I/O shouldn't result in page faults either. You're
simply writing data onto the discs from the same memory space that you
already own. If something is causing a massive number of page faults,
throwing more memory at the problem may alleviate it a bit, but you'd be a
lot better off understanding why *anything* is causing you page fault now.

Besides, it would also be a great deal cheaper.

Wirt Atmar

* To join/leave the list, search archives, change list settings, *
* etc., please visit http://raven.utc.edu/archives/hp3000-l.html *

ATOM RSS1 RSS2