HP3000-L Archives

April 1995, Week 3

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Larry Byler <[log in to unmask]>
Reply To:
Larry Byler <[log in to unmask]>
Date:
Thu, 20 Apr 1995 01:40:10 GMT
Content-Type:
text/plain
Parts/Attachments:
text/plain (116 lines)
Hello all.  It was good to meet some of the faces behind the names at IPROF.
 
Currently, there is a limit of just under 10,000 spool files on one MPE/iX
system.  This is a magic number, set by a CONST declaration.  I don't know
where this number came from; it was determined by other NMS development
team members.
 
Many customers have hit this limit and are asking us to increase it.  And
it's trivial to change the CONST declaration.  But there are performance
and system availability considerations, especially with the SPOOLF com-
mand.  An elegant and ideal solution will take lots of time and effort,
but there are quick hits that will address some of the considerations but
not others.  So I'd like your collective advice on how best to proceed.
 
To understand the problem, you need to know something about the spool file
directory (SPFDIR) and how LISTSPF and SPOOLF (ALTER and DELETE only)
work.  My apologies in advance for the detail here.  If you're already
familiar with these topics, you can skip directly to the bullets near the
end.
 
The SPFDIR is an MPE table.  Access is managed by a Native Mode semaphore.
Everyone needing SPFDIR access must queue on the semaphore.  This includes
users of the above two commands, spooler processes wishing to open files
for printing (or closing them afterward), user processes wishing to create
spool files, and jobs logging on that need to create $STDLISTs.
 
We are concerned only with wildcarded specifications (such as LISTSPF O@,
possibly filtered by a selection equation).  Let's start with LISTSPF.
The command first locks the SPFDIR semaphore, then traverses the entire
SPFDIR.  If you specify a seleq, entries that do not match are discarded.
Entries that fail the spool file security check (SM=any, AM=any in acct,
other=creator) are also discarded.  The remaining entries are cached in a
temporary object and the semaphore is released.  The display is then for-
matted from the entries in the temporary object.
 
SPOOLF ;ALTER starts out in a similar fashion.  It locks the semaphore and
traverses the SPFDIR, filtering out unwanted (seleq) and unavailable (se-
curity) entries, and caching the remaining entries in the temporary ob-
ject.  But since SPOOLF changes entries, it maintains the lock while it
changes the SPFDIR entries specified in the temporary object.  It then re-
leases the lock, but re-acquires it individually for each changed spool
file as it writes the changes to the file's File LABel eXtension (FLABX).
After this step, all locks are released.  If the ;SHOW option was used,
entries are formatted out of the temporary object, as for LISTSPF.
 
SPOOLF ;DELETE operates the same until the selected entries are cached.
It then releases the lock and calls file system intrinsics to delete each
file (and its SPFDIR entry) specified in the cache.  The file system locks
and unlocks the SPFDIR for each file.
 
The initial scanning/caching step is fairly fast.  For 9000 spool files,
it takes around 10-12 seconds on a Series 930 with 24 Mbytes of memory
(worst case test), and about four seconds on a Series 957RX with 128
Mbytes of memory.  ALTERing the SPFDIR entries is another matter.  Each
entry must be dequeued and requeued twice, once for its device, and once
more for its priority, and all while the SPFDIR is locked.  This takes
real time.  On the 930, 9000 spool files probably required around 81
minutes (I extrapolated this based on my results for 1000, 2000,...,6000
spool files).  On the 957 it required 10 minutes.
 
This affects all would-be SPFDIR accessors.  Once the lock has been re-
leased, other processes can contend for the SPFDIR.  But the poor soul
who does a SPOOLF ;DELETE of those 9000 spool files may as well go out
for lunch.  On the 930, it took almost two hours to delete 9000 spool
files.  I didn't measure the time on the 957, but it was significant
(minutes, not seconds).  The reason is that the MPE directory management
(of files in OUT.HPSPOOL) that accompanies file deletion is all under XM,
and we are talking *large* transactions.  Again, during most of this time
other processes can acquire the SPFDIR; only the command user is
penalized.
 
Numbers like this are why I have resisted increasing the spool file
limit.  The time wasters are not part of spooling code, so there's not
much I can do directly to affect their execution time.  The SPFDIR
queueing is done via MPE symbol table management calls, and the file
system (and XM) is used to delete spool files.
 
So why am I taking up your time with this?  Because it's becoming more
and more of a customer hot button -- and because there are some band
aids (below) I can apply fairly easily that will solve some (not all) of
the problem.  I'd like your feedback:  Is this the right way to go (or at
least a good start)?  Bear in mind that the choice is the band aids soon,
or a full (and as yet unknown) solution at some future (also unknown)
date.
 
Here's what we can do quickly:
 
o   Increase the spool file limit to 50000.  Simply requires changing a
    CONST.  We don't want to go higher than that because of a current
    limitation in XM.
 
o   Currently SPOOLF ;ALTER dequeues and requeues *even if the priority
    or device specification doesn't change*.  Do a SPOOLF O@;SHOW as an SM
    user (remember, the command defaults to ALTER) and you wait -- and
    every other SPFDIR user waits with you.  (Note that this is not a
    problem if you are a vanilla user with access to only a few spool
    files, even if the system has many thousands of files -- the time is
    spent only on the cached files, those that survive the cut).
 
    It's easy to avoid the requeueing step if neither priority nor device
    changes (as would be true for SPOOLF O@;SHOW).  On the 957 with 30000
    spool files, the command released the lock after 22 seconds and began
    the file display at 1:39 (the interval was spent updating FLABXs).
    But if you *do* change either attribute, and you can access all those
    files, everyone will still wait (10 minutes with 9000 spool files,
    did not take data with 30000 spool files).
 
o   Apply a heuristic to the DELETE function -- reverse the order of the
    individual file deletions.  Because of the way spool file entries are
    cached for deletion and XM directory transactions are logged, this
    should substantially reduce the wait time for the DELETE user.
 
Thanks for any comments.
 
-Larry "MPE/iX Spoolers 'R' Us^H^HMe" Byler-

ATOM RSS1 RSS2