Jeff Vance wrote:
>
> Hi all,
>
> Based on Stan's suggestion and the real need that Java has to support
> filenames with a '$', here is the beginning of a dialog on what it will
> take to support more special characters in POSIX filenames.
>
> POSIX requires a filename to accept A-Z, a-z, 0-9, "_", "-" and ".", with
> the rule that the name cannot begin with a "-". So far this all we have
> implemented in MPE.
>
> I have saved the private email conversations with a number of folks on this
> list and will use them as a starting point for this discussion. Below
> are my thought on extending POSIX filenames, in no particualr order:
>
> ----------------------------------------------------------------------
>
> I believe that the extra filename characters should be always available.
> (I don't want to re-visit the infamous posix "switch" scenerios). That is,
> a system or system manager does not somehow "enable" extended characters
> in posix names -- they are just always available after release x.y
>
> I believe that the CI should be able to access any name that the shell can
> via the MPE-escaped syntax. IOWs this is not something that should be
> implemented in the shell or posix utilities, but rather in the generic
> name parsing and pattern matching NL code.
>
> I think we need to consider backwards compatibility carefully because
> there are scenerios where we easily break a script or program that used to
> work. Some examples are shown below.
>
> I think that all printable chars (except maybe very few) should be allowed
> in a filename. This is necessary to make MPE an easier porting target.
> I believe this, even if it makes selecting files with "unusual" chars
> difficult via the CI or shell.
>
> If a script or program accepts a filename from the user, say, via script
> parameters, then extra robustness may be necesary when referencing this name.
> Eg. the passed in filename may contain the quote character. This adds
> more weight to the need for the CI to provide some kind of quoting function.
>
> Today, CM commands are parsed differently from NM commands. Filename
> supplied to CM commands are initially parsed by the old MYCOMMAND intrinsic.
> This implies that an extended name character may be being used as the
> token delimiter. Eg. the delim passed to MYCOMMAND may be "=;," yet the
> filename passed in may be "./fee;fie,foo=bar".
>
> We have a central routine that does simple pattern matching. However this
> routine, without modification, cannot distinguish between the pattern
> "F@X#" and a file with that exact name.
>
> The above few paragraphs point strongly to supporting an "escape" character
> that ignores (or escapes) the meaning of a special (or meta) character. Eg:
> LISTFILE ./f@ -- should list all files beginning with "f", however,
> LISTFILE ./f\@ -- should only list the file named "f@".
> In the second example "\" is the escape character.
> The point is that if we support an escape character it becomes pervasive
> in the code (CI and intrinsics) associated with command line parsing,
> filename parsing and pattern matching.
>
> Issues:
> -------
> 1) intrinsics like FLABELINFO, FOPEN, FRENAME, (more) expect a delimited
> filename and support the MPE-escaped syntax. Today an application can
> FOPEN("./abc$zz ") and open ./abc. Tomorrow the $ would be considered part
> of the name so fopen would try to open "./abc$zz" (assuming blank is still
> not a legal name char). Is this OK since most applications use space, null
> or cr as the filename terminator. However, I know that the shell uses a
> "$" as a name terminator. (yes this make the Java work more interesting!,
> and in MKS' defense, we recommended that char to them, for some reason?)
>
> 2) The CI has to parse out a filename from the command line to do I/O
> redirection. Today,
> :echo abc>./def$hij
> writes "abc$hij" to file ./def. Tomorrow it would write "abc" to the file
> ./def$hij. Is this OK?
>
> 3) The CI and shell interpret certain characters as metachars, like "@",
> "?", "#", "[", "]", "<", ">", ">>", "!", "![", "!!", etc. Assuming these
> chars will be legal for a posix filename, there needs to be a way to treat
> the character as a simple char - not a metachar. This is usually called
> "escaping" or quoting.
>
> Using single or double quotemarks to escape the meaning of a metachar seems
> reasonable except that MYCOMMAND knows nothing special about quotes, and the
> NM parser allows all command line tokens to be quoted. Eg.
> :PRINT abc and :PRINT "abc"
> are equivalent today. So, today,
> :LISTFILE './ab@z', 2
> shows all files beginning with "ab" and ending with "z". Tomorrow, the
> same command could show the single file named "./ab@z", if we want to define
> the quotes as being significant.
>
> Using an explicit escape character, like "\", also seem reasonable. However,
> "\" could be the filename delimiter used in a call to FOPEN. "\" could
> be the delimiter seen by the CI in i/o redirection filename extraction.
> Any code that calls MYCOMMAND passing in filenames with a "\" in the
delimiter
> list may break.
>
> It should be noted that an escape character is a CI or shell concept, NOT
> part of a filename syntax. I don't want FOPEN, HPFOPEN, etc. to recognize
> an escape character; however, 3rd party products will probably want to
support
> the same escape character.
>
> Common uses of an escape character in the CI would be if the target filename
> contained:
> wildcard chars,
> either quote char,
> command line token delimiters (comma, semicolon, space, equalsign),
> variable dereferencing char (!, ![...], !!, etc),
> I/O redirection symbols,
> parenthesis
> and probably more.
>
> 4) There are some internal issues: if we teach our filename parser to
> recognize an escape character (like "\") then we need to know what to do
> with all the "\"s in the filename. If we strip them out and then pass the
> filename to our pattern matching routine, the "\@" becomes "@" and this
> will not produce the correct result.
>
> OTOH, if we leave the "\"s in the filename, teach our pattern matcher about
> escaped chars, then we need to tell the directory code to strip them out
> before accessing the directory, which is contrary to the directory philosphy.
> We can have our filename parser return both names (with and without "\"s) and
> use the correct version for various operations. This seems ok, but affects
> lots of code (both NM and CM)!
>
This can be solved by making escape apply only to metacharacters
following it
thus :
abc\@ would mean abc@
while
abc@ would retain the original meaning
abc\a would mean abc\a
for \ character can continue to be used as a delimiter
so:
to represent abc\@ use abc\\@
> Jeff Vance, CSY
>
> --
--
|