LISTSERV - HP3000-L Archives

HP3000-L Archives

January 1999, Week 1

HP3000-L@RAVEN.UTC.EDU

	LISTSERV Archives
	HP3000-L Home
	HP3000-L January 1999, Week 1

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Stateless HTTP and CGI on the HP3000 (LONG)
From:	John Korb <[log in to unmask]>
Reply To:	John Korb <[log in to unmask]>
Date:	Thu, 7 Jan 1999 18:34:54 -0500
Content-Type:	text/plain
Parts/Attachments:	text/plain (252 lines)

There have been some really interesting off-line questions, comments, and
war stories. In reviewing my answer to a particularly interesting offline
post I decided that perhaps I should post part of my reply to the list.
So, here it comes!

------
Each application was completely redesigned and rewritten when "webified".
No new features or capabilities were added (initially). The whole idea was
to port the existing functionality over to the web where the user interface
would be supported by the browser. Initial versions DID NOT USE JAVASCRIPT.

After a number of "proof of concept" programs were written, with the form
data, url data, and cookie data parsers being extensively tested and
debugged in the process, a final "demo" application was coded. The data
flow through the demo application became the blueprint for further
development. It wasn't as fast as hoped, but it was generic enough to lock
certain blocks of code. Today, the program which acts as the form
input/output translator, collecting and piping the form data from the web
server to the application server process, has matured enough that it hasn't
been touched in the better part of a year.

One of the most frustrating parts of the experience during the "proof of
concept" and "demo application" phases was tweaking the HTML code. First,
100 bytes of HTML code were often spread over three pages of source code,
as the 100 byte string was created tag by tag, parameter by parameter.
There were a lot of tables, each with various options like pixel widths,
alignment settings, background colors, etc. Reading and tweaking the HTML
was a nightmare.

Some on the list may remember that last spring I was asking for opinions on
the use of message catalogs with web applications. The research I did back
then and the experience I had maintaining HTML within the source code led
me to conclude that storing the HTML within a message catalog was the way
to go.

A couple months of working with message files containing virtually all of
the HTML convinced me that the small performance penalty related to message
files was more than offset by the ease of coding of both the HTML and the
SPLash! source code, the ease of maintenance of both HTML and SPLash!, and
the fact that with almost all of the HTML removed from the SPLash! source
code, the number of lines of SPLash! dropped by at least 2/3, the
readability of the SPLash! code improved, and changes in both HTML and
SPLash! source were easier to implement.

Today, almost all of the HTML code is stored in message catalogs. This has
affected the applications in the following ways:

1) The HMTL can be "tweaked" without having to recompile the SPLash!
code - a big plus.

2) The parameter substitution feature of message catalogs is heavily
used, saving me a lot of coding (very few calls to ASCII, DASCII.

3) All the HTML code for a particular page can be placed together in
the message catalog file instead of being spread across multiple
pages of SPLash! source code (boy did that ever make changing the
HTML easier!).

4) A slight increase in CPU utilization is incurred by using the
GENMESSAGE intrinsic (yes, the old, reliable GENMESSAGE rather than
the newer intrinsics). The CPU time performance penalty has averaged
about 3%.

The toughest part of rewriting an application to make it web ready is the
report generator if the report writer portion allows you to gradually
refine your search. This is because in the iterative report writer the
search results have to be saved from screen to screen, and the application
server process has to be set up to "remember" what the user has currently
selected and allow the user to gradually refine their search and even
manually remove selected records from those they have chosen.

My personal feeling is that the following are the primary causes of the
poor -web performance we've experienced:

1) The NCSA server is not known for either speed or efficiency. Some day
I'll have the authorization to work with Apache (which I've installed
and run a few tests with, but which I'm not authorized to use for
development or production) and will see how it fits into the equation.

2) The path the data follows involves the following chain of processes:
(eliminating what happens on the client)

a) Web server receives data

b) Web server creates new shell process for the cgi request and
writes data to the new shell's stdin
(overhead: create new shell process, create stdin, stdlist,
stderr for new shell, open new shell's stdin for output, write
of data just received to the new shell's stdin, open shell's
stdlist for input)

c) cgi shell script invokes SPLash! web interface program, passing
some information (such as environment variables, etc.) through
the INFO parameter.
(overhead: open shell script file. multiple reads from shell
script file, Create new interface program process, open $STDIN
and $STDLIST)

d) Interface program opens and reads $STDIN to get the data, then
takes data from INFO parameter and formulates the request.
(overhead: open $STDIN [again, but with proper options], read
data from $STDIN [same data that was written to new shell's
stdin in (b) above)

e) Interface program opens the message file to the application
server.
(overhead: open of message file)

f) Interface program creates the new output file for the application
server.
(overhead: build the new output file)

g) Interface program opens the new output file.
(overhead: open the file)

h) Interface program writes the request to the message file to the
application server.
(overhead: write to message file)

i) Interface program closes the message file to the application
server, then closes $STDIN.
(overhead: two file closes)

j) Application server reads the request from the message file.
(overhead: a file read)

k) If there is a multi-step transaction, application server program
opens, then reads the transaction state file.
(overhead: open file, multiple reads, possible close file)

l) Application server writes html to the output file.
(overhead: multiple writes)

m) Application server closes the output file.
(overhead: file close)

n) Interface program reads from the output file and writes to
$STDLIST (which then goes to the shell's stdlist).
(overhead: multiple reads and writes)

o) On EOF on the output file, Interface program closes the output
file, purging the file upon close.
(overhead: close output file, disposition=delete)

p) Interface program terminates.
(overhead: two file closes [$STDIN, $STDLIST], process
termination)

q) Shell script ends, shell terminates.
(overhead: close shell script, close stdin, stdlist, stderr,
process termination)

r) Web server reads from interface program's/shell's STDLIST/stdlist
and writes to web client.
(overhead: multiple reads and writes)

s) Web server hits EOF on stdlist, closes stdlist.
(overhead: close of stdlist)

The problem is that it takes so much to get the data to the application
server, if that application server runs on the HP 3000. There are multiple
process creations, multiple file opens/closes, and the data keeps getting
handed off (read and written again and again).

If instead of running an application server, you have a plain old garden
variety program that just reads the web form data from $STDIN and writes to
$STDLIST, you get into the situation where you don't have a persistent
DBOPEN like you have with the application server approach, and those
DBOPENs and DBCLOSES for each browser interaction take up a lot of system
resources and take a long time to execute.

In some respects the single program approach is simpler. Unfortunately, it
is usually slower and requires more system resources. In brief its
operation is:
1) Web server launches shell
2) Shell script launches program
3) Program reads data and decides what to do (via $STDIN)
4) Program opens database
5) Program processes against database
6) Program closes database
7) Program writes output to web server (via $STDLIST)
8) Program terminates
9) Shell script ends
10)Shell terminates

So, assuming we are talking about an old BASIC application that I rewrote
in SPLash!, what do we have for overhead compared to the BASIC application
(for a single transaction)?

BASIC Web App Web
Server Standalone
Process creations:................ 2 2
Process terminations:............. 2 2
Temporary files created:.......... 1 1
Permanent files created:.......... 1
Temporary files deleted:.......... 1 1
Permanent files deleted:.......... 1
File opens:....................... 1 12 7
File closes:...................... 1 12 7
DBOPENs:.......................... 1
DBCLOSES:......................... 1
Reads shell script?:.............. Y Y
Reads of input data:.............. 1 2 1
Writes of input data:............. 2 1
Reads of output data:............. 1 2 1
Writes of output data:............ 2 1
Reads of transaction hold:........ 1 1 1
Writes of transaction hold data:.. 1 1 1
Note: creations of system files (stdin, stdlist, stderr) not counted.

Gee, the web applications sure have to do a lot more, don't they. Also,
look at the number of file opens and closes and process creations and
terminations - all CPU and resource gobblers.

Now, if you wrote your own special-purpose web server to handle
transaction from start to finish, performance could skyrocket. Or, have
the web server have a pool of application server processes active as son
processes and pass the client form data directly to the proper son (the one
that acts as the application server for that client's application), with
the son process keeping the database open all the time and simply
suspending when it has finished processing a transaction. Even that would
eliminate a lot of overhead. Something like:

1) Webserver gets input from client.

2) Web server maps Target URL to a son process, an application server
process running as a son process under the web server process).

3) Web server writes the client's input to the stdin of the proper son
process.

4) Son process already has DB opened, processes transaction.

5) Son writes output page to its stdlist, then waits for next transaction.

6) Webserver transfers output page data from son's stdlist to the web
client.

Gee, wouldn't that cut the level of complexity down a few notches!

Oh well, take care,

John

--------------------------------------------------------------
John Korb email: [log in to unmask]
Innovative Software Solutions, Inc.

The thoughts, comments, and opinions expressed herein are mine
and do not reflect those of my employer(s), or anyone else.

ATOM RSS1 RSS2

RAVEN.UTC.EDU