HP3000-L Archives

April 1997, Week 3

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Jeff Vance <[log in to unmask]>
Reply To:
Jeff Vance <[log in to unmask]>
Date:
Mon, 21 Apr 1997 16:26:50 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (133 lines)
Hi all,

Based on Stan's suggestion and the real need that Java has to support
filenames with a '$', here is the beginning of a dialog on what it will
take to support more special characters in POSIX filenames.

POSIX requires a filename to accept A-Z, a-z, 0-9, "_", "-" and ".", with
the rule that the name cannot begin with a "-".  So far this all we have
implemented in MPE.

I have saved the private email conversations with a number of folks on this
list and will use them as a starting point for this discussion.  Below
are my thought on extending POSIX filenames, in no particualr order:

----------------------------------------------------------------------

I believe that the extra filename characters should be always available.
(I don't want to re-visit the infamous posix "switch" scenerios).  That is,
a system or system manager does not somehow "enable" extended characters
in posix names -- they are just always available after release x.y

I believe that the CI should be able to access any name that the shell can
via the MPE-escaped syntax.  IOWs this is not something that should be
implemented in the shell or posix utilities, but rather in the generic
name parsing and pattern matching NL code.

I think we need to consider backwards compatibility carefully because
there are scenerios where we easily break a script or program that used to
work.  Some examples are shown below.

I think that all printable chars (except maybe very few) should be allowed
in a filename.  This is necessary to make MPE an easier porting target.
I believe this, even if it makes selecting files with "unusual" chars
difficult via the CI or shell.

If a script or program accepts a filename from the user, say, via script
parameters, then extra robustness may be necesary when referencing this name.
Eg. the passed in filename may contain the quote character.  This adds
more weight to the need for the CI to provide some kind of quoting function.

Today, CM commands are parsed differently from NM commands.  Filename
supplied to CM commands are initially parsed by the old MYCOMMAND intrinsic.
This implies that an extended name character may be being used as the
token delimiter.  Eg. the delim passed to MYCOMMAND may be "=;," yet the
filename passed in may be "./fee;fie,foo=bar".

We have a central routine that does simple pattern matching.  However this
routine, without modification, cannot distinguish between the pattern
"F@X#" and a file with that exact name.

The above few paragraphs point strongly to supporting an "escape" character
that ignores (or escapes) the meaning of a special (or meta) character. Eg:
   LISTFILE ./f@     -- should list all files beginning with "f", however,
   LISTFILE ./f\@    -- should only list the file named "f@".
In the second example "\" is the escape character.
The point is that if we support an escape character it becomes pervasive
in the code (CI and intrinsics) associated with command line parsing,
filename parsing and pattern matching.


Issues:
-------
1) intrinsics like FLABELINFO, FOPEN, FRENAME, (more) expect a delimited
filename and support the MPE-escaped syntax.  Today an application can
FOPEN("./abc$zz ") and open ./abc.  Tomorrow the $ would be considered part
of the name so fopen would try to open "./abc$zz" (assuming blank is still
not a legal name char).  Is this OK since most applications use space, null
or cr as the filename terminator.  However, I know that the shell uses a
"$" as a name terminator.  (yes this make the Java work more interesting!,
and in MKS' defense, we recommended that char to them, for some reason?)

2) The CI has to parse out a filename from the command line to do I/O
redirection.  Today,
   :echo abc>./def$hij
writes "abc$hij" to file ./def. Tomorrow it would write "abc" to the file
./def$hij.  Is this OK?

3) The CI and shell interpret certain characters as metachars, like "@",
"?", "#", "[", "]", "<", ">", ">>", "!", "![", "!!", etc. Assuming these
chars will be legal for a posix filename, there needs to be a way to treat
the character as a simple char - not a metachar.  This is usually called
"escaping" or quoting.

Using single or double quotemarks to escape the meaning of a metachar seems
reasonable except that MYCOMMAND knows nothing special about quotes, and the
NM parser allows all command line tokens to be quoted.  Eg.
  :PRINT abc   and  :PRINT "abc"
are equivalent today.  So, today,
  :LISTFILE './ab@z', 2
shows all files beginning with "ab" and ending with "z".  Tomorrow, the
same command could show the single file named "./ab@z", if we want to define
the quotes as being significant.

Using an explicit escape character, like "\", also seem reasonable.  However,
"\" could be the filename delimiter used in a call to FOPEN.  "\" could
be the delimiter seen by the CI in i/o redirection filename extraction.
Any code that calls MYCOMMAND passing in filenames with a "\" in the delimiter
list may break.

It should be noted that an escape character is a CI or shell concept, NOT
part of a filename syntax.  I don't want FOPEN, HPFOPEN, etc. to recognize
an escape character; however, 3rd party products will probably want to support
the same escape character.

Common uses of an escape character in the CI would be if the target filename
contained:
  wildcard chars,
  either quote char,
  command line token delimiters (comma, semicolon, space, equalsign),
  variable dereferencing char (!, ![...], !!, etc),
  I/O redirection symbols,
  parenthesis
and probably more.

4) There are some internal issues: if we teach our filename parser to
recognize an escape character (like "\") then we need to know what to do
with all the "\"s in the filename.  If we strip them out and then pass the
filename to our pattern matching routine, the "\@" becomes "@" and this
will not produce the correct result.

OTOH, if we leave the "\"s in the filename, teach our pattern matcher about
escaped chars, then we need to tell the directory code to strip them out
before accessing the directory, which is contrary to the directory philosphy.
We can have our filename parser return both names (with and without "\"s) and
use the correct version for various operations.  This seems ok, but affects
lots of code (both NM and CM)!


Jeff Vance, CSY


--

ATOM RSS1 RSS2