HP3000-L Archives

April 1997, Week 4

HP3000-L@RAVEN.UTC.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
Date:
Wed, 23 Apr 1997 14:49:36 +0530
Content-Type:
text/plain
Parts/Attachments:
text/plain (149 lines)
Jeff Vance wrote:
>
> Hi all,
>
> Based on Stan's suggestion and the real need that Java has to support
> filenames with a '$', here is the beginning of a dialog on what it will
> take to support more special characters in POSIX filenames.
>
> POSIX requires a filename to accept A-Z, a-z, 0-9, "_", "-" and ".", with
> the rule that the name cannot begin with a "-".  So far this all we have
> implemented in MPE.
>
> I have saved the private email conversations with a number of folks on this
> list and will use them as a starting point for this discussion.  Below
> are my thought on extending POSIX filenames, in no particualr order:
>
> ----------------------------------------------------------------------
>
> I believe that the extra filename characters should be always available.
> (I don't want to re-visit the infamous posix "switch" scenerios).  That is,
> a system or system manager does not somehow "enable" extended characters
> in posix names -- they are just always available after release x.y
>
> I believe that the CI should be able to access any name that the shell can
> via the MPE-escaped syntax.  IOWs this is not something that should be
> implemented in the shell or posix utilities, but rather in the generic
> name parsing and pattern matching NL code.
>
> I think we need to consider backwards compatibility carefully because
> there are scenerios where we easily break a script or program that used to
> work.  Some examples are shown below.
>
> I think that all printable chars (except maybe very few) should be allowed
> in a filename.  This is necessary to make MPE an easier porting target.
> I believe this, even if it makes selecting files with "unusual" chars
> difficult via the CI or shell.
>
> If a script or program accepts a filename from the user, say, via script
> parameters, then extra robustness may be necesary when referencing this name.
> Eg. the passed in filename may contain the quote character.  This adds
> more weight to the need for the CI to provide some kind of quoting function.
>
> Today, CM commands are parsed differently from NM commands.  Filename
> supplied to CM commands are initially parsed by the old MYCOMMAND intrinsic.
> This implies that an extended name character may be being used as the
> token delimiter.  Eg. the delim passed to MYCOMMAND may be "=;," yet the
> filename passed in may be "./fee;fie,foo=bar".
>
> We have a central routine that does simple pattern matching.  However this
> routine, without modification, cannot distinguish between the pattern
> "F@X#" and a file with that exact name.
>
> The above few paragraphs point strongly to supporting an "escape" character
> that ignores (or escapes) the meaning of a special (or meta) character. Eg:
>    LISTFILE ./f@     -- should list all files beginning with "f", however,
>    LISTFILE ./f\@    -- should only list the file named "f@".
> In the second example "\" is the escape character.
> The point is that if we support an escape character it becomes pervasive
> in the code (CI and intrinsics) associated with command line parsing,
> filename parsing and pattern matching.
>
> Issues:
> -------
> 1) intrinsics like FLABELINFO, FOPEN, FRENAME, (more) expect a delimited
> filename and support the MPE-escaped syntax.  Today an application can
> FOPEN("./abc$zz ") and open ./abc.  Tomorrow the $ would be considered part
> of the name so fopen would try to open "./abc$zz" (assuming blank is still
> not a legal name char).  Is this OK since most applications use space, null
> or cr as the filename terminator.  However, I know that the shell uses a
> "$" as a name terminator.  (yes this make the Java work more interesting!,
> and in MKS' defense, we recommended that char to them, for some reason?)
>
> 2) The CI has to parse out a filename from the command line to do I/O
> redirection.  Today,
>    :echo abc>./def$hij
> writes "abc$hij" to file ./def. Tomorrow it would write "abc" to the file
> ./def$hij.  Is this OK?
>
> 3) The CI and shell interpret certain characters as metachars, like "@",
> "?", "#", "[", "]", "<", ">", ">>", "!", "![", "!!", etc. Assuming these
> chars will be legal for a posix filename, there needs to be a way to treat
> the character as a simple char - not a metachar.  This is usually called
> "escaping" or quoting.
>
> Using single or double quotemarks to escape the meaning of a metachar seems
> reasonable except that MYCOMMAND knows nothing special about quotes, and the
> NM parser allows all command line tokens to be quoted.  Eg.
>   :PRINT abc   and  :PRINT "abc"
> are equivalent today.  So, today,
>   :LISTFILE './ab@z', 2
> shows all files beginning with "ab" and ending with "z".  Tomorrow, the
> same command could show the single file named "./ab@z", if we want to define
> the quotes as being significant.
>
> Using an explicit escape character, like "\", also seem reasonable.  However,
> "\" could be the filename delimiter used in a call to FOPEN.  "\" could
> be the delimiter seen by the CI in i/o redirection filename extraction.
> Any code that calls MYCOMMAND passing in filenames with a "\" in the
delimiter
> list may break.
>
> It should be noted that an escape character is a CI or shell concept, NOT
> part of a filename syntax.  I don't want FOPEN, HPFOPEN, etc. to recognize
> an escape character; however, 3rd party products will probably want to
support
> the same escape character.
>
> Common uses of an escape character in the CI would be if the target filename
> contained:
>   wildcard chars,
>   either quote char,
>   command line token delimiters (comma, semicolon, space, equalsign),
>   variable dereferencing char (!, ![...], !!, etc),
>   I/O redirection symbols,
>   parenthesis
> and probably more.
>
> 4) There are some internal issues: if we teach our filename parser to
> recognize an escape character (like "\") then we need to know what to do
> with all the "\"s in the filename.  If we strip them out and then pass the
> filename to our pattern matching routine, the "\@" becomes "@" and this
> will not produce the correct result.
>
> OTOH, if we leave the "\"s in the filename, teach our pattern matcher about
> escaped chars, then we need to tell the directory code to strip them out
> before accessing the directory, which is contrary to the directory philosphy.
> We can have our filename parser return both names (with and without "\"s) and
> use the correct version for various operations.  This seems ok, but affects
> lots of code (both NM and CM)!
>

This can be solved by making escape apply only to metacharacters
following it
thus :
        abc\@   would mean abc@
      while
        abc@    would retain the original meaning
        abc\a   would mean abc\a
for \ character can continue to be used as a delimiter
so:
        to represent abc\@  use abc\\@


> Jeff Vance, CSY
>
> --

--

ATOM RSS1 RSS2