URL2Link Rexx Program

/*          ReXX       */
/*  #!/usr/local/bin/rexx  */

/*   Getting this running NOTES:
     ****************************

     For  IBM/Rexx standard interpreters, put
      /*   ReXX  */  as first in this script,
    For UNIX style systems, put the #!/...  'shebang' line
    as first in file, correcting the interpreter
    path/filename as needed.
    For variations from Rexx standard Library functions
    that need corrections (Lines, LineIn), see
    'Function Corrections' comments, way down there.
    These are known to be needed for OS/2 / AS400 interpreters,
    and somewhat current GPL Regina interpreters.
    If you printed this from an html web page and  experience
    traps for unmatched '/*' - '*/' pairs, check for a '*' and '/'
    pair split by an end-of-line, and rejoin them.
    It has been tested to 'auto process' itself into html
    that runs when processed back to plain text.
*/


/*     General NOTES:
       **************

       Based on prototype ReXX filter script
       from 'Teach Yourself ReXX in 21 DAYS'
       By Bill and Esther Schindler
       p. 271-272

       ReXX Language Association:  http://www.rexxla.org

       Delete #!/usr/local/bin/rexx  'shebang' line
       for use on non-UNIX-like systems.
       Make sure /* REXX */ comment is leftmost in
       the topmost line.

       (C) Copyright D.E. Legan,
       You may modify or use this program as you wish,
       as long as you do not remove my credit for
       originally writing it.
       If you make any juicy improvements,
       let me know! :-)
       If you figure out the full implications of the GPL,
       let me know.
       I assume no liability for the results of this program,
       legal or otherwise.

       D.E. Legan   12 Feb. 2001

       mailto:leganii@surfree.com

       http://www.lafn.org/~aw585/index.html

       http://www.lafn.org/~aw585/dlegan.html

       Thanks to Rollin White (mailto:rollin@scoug.com)
       for a few tweaks.

*/


/*   More General Notes:
     *******************

   url2link   URL to HTML converter for text file input
              27 Jan 2001  d.e. legan
   __
   Help:
   ----
   url2link -h

   Typical use, at the command line prompt:
   -----------
   url2link < input.txt  >  output.htm

   List all substitutions made in text and URLs:

   'url2link -Lr',  'url2link -Lx', 'url2link -Ll', or 'url2link -Lxl'
   __

   Purpose:  to do about 90% of the work in converting
   -------   a simple text file to HTML.

   Used 'HTML Visual Quick Reference' by Scharf / pub. Que
   as a reference
   __

   To 'auto process':  url2link -r url2link
   -----------------           or
                       url2link -r url2link.cmd
   This should prepare it fairly cleanly for further
   processing towards conversion of this source text to HTML.
   See comments on 'self processing' below in the source for
   specific changes to make this come off neatly.
   This is probably a good test of handling special,
   weird cases.
   __

   *Known* Limitations:
    ------------------
   0) The first line, and the 'Lines' routine may have
      to be changed depending on OS you are using.
   1) All of URL must be on one line, no continuation on next line
   2) URL must have one of the 'schema' i.e. http:, or ftp:
      set up in schemix stem variable.
   3) URL must end with ' ' (space) or End of line,
      with a possible trailing '.', ',' '"', "'",
      or ';' before the space or EOL.
   4) Characters preceeded by % in URLs will be converted
      to %xx, xx = hexidecimal digits, (0-9, A-F) EXCEPT
      spaces and hexidecimal digits themselves.
      (which shouldn't need escaping this way anyway.)
      To embed a a space, you must put in by hand
      '%20' (hex. 32).
      (all '%' escaped hex codes for characters should be
      passed unchanged in URLs)
      To embed a '%' (as might occur in some e-mail addresses
      to other networks that go through gateways)
      use '%%%' (three (3) '%' in a row).
      This escaping should be extremely usefull for embedding
      Get methods inside URLs.
   5) Reserved/Special characters must be in the
      set up in stem variables in subroutine Datastructures.
   6) Might want to add ability to do arbitrary inclusion of files
        or macro substitution.
   7) Some routines currently might be optimized by
      converting tail recursion to iteration.
   8) This was used as tool to learn HTML by me.
      Some of the features may seem foolish in retrospect,
      so bear with me.  :-)

   In short, this is tool that does not completely automate
   the process of converting raw text to HTML, but takes care
   of much of the dog work associated with the process.
   Derived as a simplified version of the earlier
   'url2html' program.
   __

  D.E. Legan   03 Feb. 2001
  mailto:leganii@surfree.com

*/

/***********************************************************************
     Start of Actual Program
 ***********************************************************************/

/*  Since there shouldn't be any program errors,
    bunch up all the traps in one handler.
    This may not do much more than the defaults,
    but it puts in place a skeleton to customize it more
    if it is ever needed.
    D.E.L.  09/Aug./2001
*/

SIGNAL  ON   SYNTAX     NAME   Handler  ;
SIGNAL  ON   NOVALUE    NAME   Handler  ;
SIGNAL  ON   ERROR      NAME   Handler  ;
SIGNAL  ON   FAILURE    NAME   Handler  ;
SIGNAL  ON   HALT       NAME   Handler  ;
/*  SIGNAL  OFF  NOTREADY   NAME   Handler  ;  */
SIGNAL  OFF  NOTREADY    ;

/*
/*  Alternate if ever needed:  */
/*  only these conditions are valid with this type of handler:   */
/*  (one that returns after doing "it's thing")  */
CALL  ON  ERROR      NAME   HandlerR  ;
CALL  ON  FAILURE    NAME   HandlerR  ;
CALL  ON  HALT       NAME   HandlerR  ;
CALL  ON  NOTREADY   NAME   HandlerR  ;
*/

/*  For simplified exposing of these variables in functions:  */
basics  =  'myself STDIN STDOUT STDERR n nshadow  '  ,
             'clearscreen TRUE  FALSE  ON  OFF YES NO  '  ,
              'splitchar oddCharsInURLs dirseperator  helptest ',
              'hiddenstuff.'  ;
/*  say 'basics='basics'=' ;  exit  ;  */

/* find out what the name of this program is:
    (This routine also sets some OS / interpreter specific
    parameters)
*/
myself  =  WhoAmI( )  ;

switches     =  '-hsLrxltnp'  ; /*  switches are on or off, there or not  */
parameters   =  'e'  ;          /*  parameters take arguments  */
options      =  switches  ||  parameters  ;
/*  And now the Biggie:         */

USAGE  =  'USAGE:  'myself' ['options'=X] [[<file.in>|-] [<file.out>|-]]'  ;

/*  ----By convention, 'USAGE' should be near the head of the program   */



/*   Handle command line & switches    ***************************/

PARSE ARG  infile  outfile  scratch  ;
infile  =  Strip( infile )  ;

IF   1 >< Verify( infile, helptest, 'M', 1 )  THEN
  CALL  Help  infile  ;   /*  First check for help   */

/*  Initialize switch flags:  */
PARSE VAR OFF  =1 reservetrans     =1 debugmine  =1 extendedtrans  ,
               =1 backslashtrans   =1 listsubs  ,
               =1 extra            =1 national   =1 punctuation
/*         --  using parse to do a multiple assignment,
               instead of:

reservetrans     =  OFF  ;
debugmine        =  OFF  ;
extendedtrans    =  OFF  ;
backslashtrans   =  OFF  ;
listsubs         =  OFF  ;
 */
splitchar        =  ''   ;
escaper          =  '\' ;

/*  Have made the option processing more ReXXcentric, should only
    require only one pass to process each freestanding
    clump of switches on the
    command line.
    D.E.L.  12/March/2001
    Even more so, only with UNIX twist,
    D.E.L.  14/July/2002
*/
DO  WHILE( GetOpts( infile )   )

  IF '--'  =  infile THEN  /*  End of options with '--'  */
    DO
    PARSE VALUE  outfile scratch  WITH  infile outfile  scratch  ;
    infile  =  Strip( infile )  ;
    Leave  ;
    END  ;

  IF '-'  ==  infile THEN   /*  '-' is a valid infile spec.  */
    Leave  ;

  /*  Start of processing 'parameters', 'options' with 'arguments'  */

  parameter  =  ''  ;

  cutpoint  = Verify( infile, parameters, 'MATCH', 1 )  ;
  IF  0 < cutpoint  THEN
      DO
      PARSE VAR infile    infile =(cutpoint) parameter +1 infile2  ;
      IF  '' << optarg  THEN
        infile  =  infile || infile2  ;
      ELSE
        optarg  =  infile2  ;
      END  ;

  IF  (( '' =  parameter )  &  ( '' >< optarg ))  THEN
    DO
    SAY  'Spurious argument 'optarg', with no parameter specified.'  ;
    infile  =  '-help'  ;
    CALL Help  ;
    END  ;

  IF  (( '' >< parameter )  &  ( '' =  optarg ))  THEN
    DO
    optarg  =  outfile  ;
    outfile   =  ''  ;
    IF  '' = optarg  THEN
      DO
      SAY 'Parameter 'parameter' must have an argument!'  ;
      infile  =  '-help'  ;
      CALL Help  ;
      END   ;
    END  ;   /*  parameter argument location/extraction  */

  CALL  Help  infile  ;  /*  First check for help   */
           /*  this had to be moved to after processing of
               optarg / parameter values.
            */

  IF  '' >< parameter  THEN   /* There is a parameter with argument  */
    SELECT
    /*  Process parameters, with ARGUMENTs   here   */
    /*  Should only be one parameter per option cluster */
    /*  with possible argument following either ':', '=' or a space  */
    WHEN  'e' == parameter  THEN
      DO
      /*  Must handle this before testing for unacceptable options.  */
      escaper  =  optarg  ;
      IF  1 < Length( optarg )  THEN
        DO
        infile  =  '-help'  ;
        CALL Help  ;
        END  ;
      END  ;
    OTHERWISE
      SAY 'Something seems to be wrong with parameter set up.'  ;
      infile  =  '-help'  ;
      CALL Help  ;
    END  ;

  /*  End of processing 'parameters', 'options' with 'arguments'  */

  /*  START of Hidden switch processing   */

  IF  Inside( '?', infile )  THEN     /*  hidden switch   */
    DO                 /*  Help synonymn, for the totally lost   */
    /*  CALL LineOut  outfile  ;  */
    infile  =  '-help'  ;
    CALL Help  ;
    END  ;

  IF  Inside( 'd', infile )  THEN    /*  hidden switch    */
    DO                       /* secret debug switch, for me, D.E.L.  */
    debugmine     =  intent  ;
    END  ;

  IF  Inside( 'Q',  infile )  THEN   /*  hidden switch  */
    DO                /* secret switch to query version and source info  */
    CALL Whereami  ;
    END  ;

  /*  END of Hidden switches   */

  /*  Verify valid, non-hidden options  */

       /*  Parameters, with arguments, should be sliced out of
           infile at this point   */
  IF Verify( infile, options ) > 0  THEN  /*  check for invalid switches  */
    DO
    /*  CALL LineOut  outfile  ;  */
    infile  =  '-help'  ;
    CALL Help  ;
    END  ;

  /*  End of Verifying valid, non-hidden switches  */

  /*  Process cleartext switches, with no arguments here  */

  IF  Inside( 'h', infile )  THEN
    DO
    /*  CALL LineOut  outfile  ;  */
    infile  =  '-help'  ;
    CALL Help  ;
    END  ;

  IF  Inside( 's',  infile )  THEN
    DO
    splitchar  =  n  ;
    END  ;

  IF  Inside( 'r',  infile )  THEN
    DO
    reservetrans  =  intent  ;
    END  ;

  IF  Inside( 'x',  infile )  THEN
    DO
    reservetrans   =  intent  ;
    extendedtrans  =  intent  ;
    END  ;

  IF  Inside( 'l',  infile )  THEN
    DO
    reservetrans    =  intent  ;
    backslashtrans  =  intent  ;
    END  ;

  IF  Inside( 'L',  infile )  THEN
    DO
    listsubs      =  intent  ;
    END  ;

  IF  Inside( 't',  infile )  THEN
    DO
    extra         =  intent  ;
    END  ;

  IF  Inside( 'n',  infile )  THEN
    DO
    national      =  intent  ;
    END  ;

  IF  Inside( 'p',  infile )  THEN
    DO
    punctuation   =  intent  ;
    END  ;


  PARSE VALUE  outfile scratch  WITH  infile outfile  scratch  ;
  infile  =  Strip( infile )  ;

END  ;   /*  DO WHILE( processing command line options  )  */


/*  Set up I/O  */
SELECT
WHEN  outfile  =  ''  THEN
  outfile  =  STDOUT  ;
WHEN  outfile  =  '-'  THEN
  outfile  =  STDOUT  ;
OTHERWISE
  NOP  ;
END  /*  SELECT  */

SELECT
WHEN  infile  =  ''  THEN
  infile  =  STDIN  ;
WHEN  infile  =  '-'  THEN
  infile  =  STDIN  ;
OTHERWISE
  NOP  ;
END  /*  SELECT  */


IF  debugmine  THEN
  DO
  SAY 'infile is *'infile'*'  ;
  SAY 'outfile is *'outfile'*'  ;
  END

/*    End of switch command line processing ***********************/

/*****************************************************************/

CALL  DataStructures   ;  /*  VERY Important call here   */

/*****************************************************************/

/*   Preliminary Steps now complete - Begin generating HTML  */


/*  Main Processing Loop                    */
lastline  =  ''  ;


DO  WHILE  Lines( infile )

/*
See the Function Corrections section below.
See Daney, Lines() is not
generally Boolean
only for OS/2 and Regina
therefore, the test

Also in Regina Lines only seems to work after a read at the
end of the file returns a null string.
Previously, I tried a Blines() function that tried to deal
with this, but some variations of this failed with OS/2,
and also it seemed excessively complex.
So the LineIn() call was moved to the end of the loop,
and then these tests seemed to work fine, were reasonably fast
and fairly simple.

It also seems to be critical if the NOTREADY trap is set or not.

 */

  linethat  =  LineIn( infile )  ;

  linethat  =  Strip( linethat, 'T' )  ;  /*  remove trailing blanks  */

  /*  Here, look at each possibility, and process appropriately    */
  IF  '' >< linethat  THEN
    DO

    /*  Now process each reserved character */
    /*  Those that cannot be in an URL      */
    IF  reservetrans  THEN
      DO cindex = 1 TO reservechar.0
        linethat  =  Reserver( cindex, linethat )  ;
      END  cindex

    /****    The most important call:       ****/
    linethat  =  Process( linethat )  ;



    END   /*  IF  */


  linethat  =  Translate( linethat, cleanout, dirtyin )  ;
             /*  cleaning up the non-ASCII characters used
                 to simplify conflicts between freestanding
                 and HTML meaningfull uses.
              */
  CALL LineOut outfile, linethat  ;

END       /*  while  */


EXIT

/******************************************************************
*******************************************************************
          END OF MAIN PROGRAM
          START OF SUBROUTINES
*******************************************************************
******************************************************************/

Process:  PROCEDURE  EXPOSE  fperc reservetrans  ,
                         extendedtrans backslashtrans ,
                         escaper splitchar  ,
                         oddCharsInURLs  ,
                         schemix.  ,
                         specialchar.  specialnum. ,
                         spexialchar.  spexialnum. ,
                         specialvector. spexialvector. ,
                         specialvectors. spexialevector. ;

/*  This is the most important subroutine!!  */

PARSE  ARG  linehere  ;

/*  schemalength  =  0  ;  */
firstpart  =  linehere  ;   /*  default if there is no URL on line */
url        =  ''  ;
lastpart   =  ''  ;




/* This part has gone through many convoluted versions.
   Part of the reason was a desire to deal cleanly with
   various character cases for the schema.
   Philosophical conclusion to date:
   You should simply use Parse to split up data,
   not to *decide* if it should be split up
   (or should of been split up).
*/
IF  Inside( ':', linehere )  THEN   /*  No colon, no schema!   */
DO

  PARSE UPPER VAR linehere  shadowline  ;
      /* Destroying case dependence.
         Since several parses are to be done, this allows
         PARSE VAR ....   instead of
         PARSE UPPER VAR ....
         and should speed things up since the PARSE does
         not have to do upper casing along with everything else.
         Tests seem to indicate 'PARSE VAR' is 3/5's the time of a
         'PARSE UPPER VAR', so do it once here.
         'PARSE UPPER VAR' is itself about half the time of
         a call to Tranlate( <defaults to upper> ) and assignment.
         NOTE:  The URL schema in the data setup, MUST be
                uppercase.
        */

  urlpoint  =  shadowline  ;  /* set an initial value  */


  DO index  =  1 TO schemix.0

    /* Search for the schema closest to the start of the string   */
    /* in this loop.                                              */
    /* Note that shadowline, uppercased above, is used,           */
    /* with the uppercase schema so that the numeric location     */
    /* will be case insensitive.                                  */
    /* Note that the position of the colon triggering this        */
    /* is ignored, since something like -                         */
    /*  ........:   http://.                                    */
    /* would make it erroneous data to consider.                  */


    PARSE VAR shadowline  urlframe (schemix.index)  .  ;
    /*  urlframe  == (one of the) string(s) that frames/brackets
                     the URL.
        If schemix.index is not found, the PARSE match point moves
        to the end of the record, and urlframe captures the complete
        line/record.
     */

    IF  urlframe <<  urlpoint  THEN
      DO
      urlpoint  =  urlframe  ;
      schemalength  =  schemix.length.index  ;
      END
    /*  In strict comparison ('<<'), the longer string is greater
        if it is an extension of the shorter.
        That is, it consists of the shorter concatenated to
        some more characters.
        This comparison moves the urlpoint to the leftmost
        schema.
     */

  END  index


  /* Having found the first schema, now split things apart.  */
  urlpoint  =  Length( urlpoint )  ;

  /*  Now, use the actual line, not the uppercased 'shadowline',
      used for searching in a case insensitive way.
   */
  PARSE  VAR linehere   =1 .          +(urlpoint) url ' '   lastpart ,
                        =1 firstpart              (url) .  ;

  /* trickiness needed, the straight ahead:
  PARSE  VAR linehere   =1 firstpart  +(urlpoint) url ' '   lastpart  ;
     causes firstpart to slurp up the whole record
     if 0 == urlpoint, that is the url is first thing on the line.
     This seems to be a result of '+(urlpoint)' when urlpoint = 0
     produces the equivalent of a backup in the PARSE template.
     (backups causing the preceeding variable to 'grab' the rest of
     line/record.  This is to facilitate multiple assignments
     with PARSE.).

     If no URL was found, 'urlpoint' will be the length of the line,
     so the '+(urlpoint)' match point will be past the end of the record,
     squeezing 'url' and 'lastpart' to be empty strings.
     On the second pass of the parse, '(url)' will, due to one
     of PARSES peculiarities, put the end point of 'firstpart' at the
     end of the record.  (An empty string is 'found' at the end of
     the record by PARSE, matching nothing in a finitely sized string.)
   */

   trimmer    =  ' '  ;
   littlebit  =  ''   ;
   textlabel  =  ''   ;
   SELECT
   WHEN Inside( '|', url )  THEN
     DO
     /*  Now grab the text label for the link:  */
     PARSE VAR url   url '|' textlabel '|' sp '|' littlebit  ;
     IF  1 < Length( sp )  THEN  /* may want to change this in the future  */
       DO
       SAY 'Invalid space character (it is a string, not a single character.)'
       SAY ' URL='url'='  ;
       SAY ' text label='textlabel'='  ;
       SAY ' space character (the problem)='sp'='  ;
       SAY ' text abutted to URL='littlebit'='  ;
       EXIT  ;
       END  ;
     END  ;
   WHEN  '' >< oddCharsInURLs  & ( '' >< url )  THEN
     DO
     cutpoint  =  Verify( url, oddCharsInURLs, 'M', schemalength )  ;
     IF  0 < cutpoint  THEN
       DO
       PARSE VAR url   url =(cutpoint) littlebit  ;
       trimmer  =  ''  ;
       END  ;    /*  IF  */
     END  ;      /*  WHEN  */
   OTHERWISE
     NOP  ;
   END  ;

   SELECT     /* use select to allote for future complications  */
   WHEN  '' = lastpart  THEN
     lastpart  =  littlebit  ;
   WHEN  '' >< lastpart  THEN
     lastpart  =  littlebit lastpart  ;
   OTHERWISE
     NOP  ;
   END  ;    /*  SELECT  */

END  /* IF  */


/*  Here, recursively handle multiple URLs on one line  */
IF  '' >< lastpart  THEN
  lastpart  =  trimmer || Process( lastpart )  ;
/*                 --  calling this routine itself.    */
/*  If not for the above, only the first URL would be processed  */

/*  At this point, firstpart is assured to not have any
    part of an URL in it, so this would be an ideal point
    to process it for special and resesrved characters that
    might be part of an URL.
*/
IF '' ><  firstpart  THEN
    firstpart  =  FreeText( firstpart )  ;

/*   This DO block replaced with the FreeText() function call
  DO

  /*  order of evaluation is important on these things in this DO block!  */
  /*  handle these first, so that '&', and ';' don't cause trouble  */
  IF  extendedtrans  THEN
    firstpart  =  Specialsx( firstpart )  ;

  IF  backslashtrans  THEN
    firstpart  =  Specialbsx( firstpart )  ;

  IF  reservetrans  THEN
    DO cindex = 1 TO specialchar.0
    firstpart  =  Specials( cindex, firstpart )  ;
    END  cindex

  END  /*  IF  */
*/     /*  FreeText  */

IF  '' >< url  THEN
  DO

  /*  url       = schema ||  url  ;  */

  /*  process url for hex escapes here    */
  /*  order of evaluation is important on these things!  */
  /*  Convert '%symbols;'  to hex encoded '%xx' in URL:     */
  IF  extendedtrans  THEN
    url  =  Percent2sX( url )  ;

  /*  Convert '\x' character strings to hex escapes:   */
  IF  backslashtrans  THEN
    url  =  Percent2bsX( url )  ;

  /*  Convert characters preceeded with '%' to hex escapes in URl.
      ('%%%' -> '%'):    */
  url  =  Percent2X( url )  ;


  /*
  strend  =  Right( url, 1 )  ;
    Let's do this in a more sophisticated way:    :-)
   */
  PARSE VAR url   . ''   -1  strend   ;

  /*  add more characters to terminate URL string here as needed  */
  SELECT
  /*  Adjust these spacings to preferences     */
  /*
  These branchs no longer used:
  WHEN  strend == "'"  THEN
    DO
    /*  This branch is specificly to enable self processing  */
    lastpart  =  "' " ||  lastpart  ;
    url  =  Strip( url, 'T', "'" )  ;
    END
  WHEN strend == '"'  THEN
    DO
    /*  This branch is specificly to enable self processing  */
    lastpart  =  '"' ||  lastpart  ;
    url  =  Strip( url, 'T', '"' )  ;
    END
             End of eliminated branches
   */
  WHEN  strend == '.'  THEN
    DO
    lastpart  =  '.' ||  lastpart  ;
    url  =  Strip( url, 'T', '.' )  ;
    END
  WHEN  strend == ';'  THEN
    DO
    lastpart  =  ';' ||  lastpart  ;
    url  =  Strip( url, 'T', ';' )  ;
    END
  /*  Another branch eliminated:
  WHEN  strend == ','  THEN     /*  adding ','  to the terminators  */
    DO
    lastpart  =  ',' ||  lastpart  ;
    url  =  Strip( url, 'T', ',' )  ;
                            /*  Apparently, quoting the T is sometimes
                                needed with some interpreters with ',',
                                it tries to interprete 'T' as a varaible
                                otherwise, sometimes.
                             */
    END
                      end of branch eliminated    */
  OTHERWISE
    NOP  ;
    /*
    IF  '' >< lastpart  THEN
      lastpart  =  ' ' || lastpart  ;
    */
  END
  /* add other trailing charters here and above as needed  */


  /*  Now process the text label for the link:  */
  IF  '' >< textlabel  THEN
    DO
    IF  '' >< sp  THEN
      textlabel  =  translate( textlabel, ' ', sp )  ;
    ELSE
      textlabel  =  translate( textlabel, ' ', '_' )  ;
    textlabel    =  FreeText( textlabel )  ;
    END  ;
  ELSE
    DO
    textlabel  =  url  ;
    END  ;
  /*   original:
  url  =  "<A HREF='" || url || "'>" || url || '</A>'   ;
  suggested change by Rollin White:

  url  =  "<A HREF=""" || url || """>" || url || '</A>'   ;
  */

  IF  schemalength  < Length( url )  THEN
    url  =  "<A HREF=""" || url || """>" || splitchar || textlabel || ,
               splitchar || '</A>'   ;
    /*  modified to allow text label for the link.
    url  =  "<A HREF=""" || url || """>" || splitchar || url || ,
               splitchar || '</A>'   ;
     */
    /*  This will skip null URLs, that are just schema standing alone
        in text, making it easier to process this script by itself.  :-)
     */

  END


linehere  =  firstpart || url || lastpart  ;

RETURN  linehere  ;

FreeText:
/*  Bundle some function calls for processing free standing text
    into one function call
    The EXPOSE list is basicly borrowed from 'Process:'  -
 */
PROCEDURE  EXPOSE  fperc reservetrans  ,
                         extendedtrans backslashtrans ,
                         escaper splitchar  ,
                         oddCharsInURLs  ,
                         schemix.  ,
                         specialchar.  specialnum. ,
                         spexialchar.  spexialnum. ,
                         specialvector. spexialvector. ,
                         specialvectors. spexialevector. ;


PARSE ARG  bitoftext  ;


  /*  order of evaluation is important on these things in this DO block!  */
  /*  handle these first, so that '&', and ';' don't cause trouble  */
  IF  extendedtrans  THEN
    bitoftext  =  Specialsx( bitoftext )  ;

  IF  backslashtrans  THEN
    bitoftext  =  Specialbsx( bitoftext )  ;

  IF  reservetrans  THEN
    DO cindex = 1 TO specialchar.0
    bitoftext  =  Specials( cindex, bitoftext )  ;
    END  cindex

RETURN  bitoftext  ;




Reserver:  PROCEDURE  EXPOSE  reservechar.  reservenum.    ;

/*  This is mostly to handle freestanding, literal angle brackets  */

PARSE  ARG  reserved,  linehere  ;


PARSE  VAR linehere  firstpart  (reservechar.reserved)  lastpart    ;

/*  Here, recursively handle multiple instances on one line  */
IF  '' >< lastpart   THEN
  lastpart  =  Reserver( reserved, lastpart )  ;
/*  If not for the above, only the first instance would be processed  */

/*  IF  0 < Pos( rchar, linehere )  THEN  */

IF  linehere >< firstpart  THEN
     linehere  =  firstpart  ||  reservenum.reserved  ||  lastpart  ;
     /*  linehere  =  firstpart  ||  '&#'  ||  rnum || ';'  ||  lastpart  ; */


RETURN  linehere  ;


Specials:  PROCEDURE  EXPOSE  specialchar.  specialnum.    ;

/* this is for freestanding characters that might have special
   meaning if they appeared in a URL.
*/

PARSE  ARG  specialized,  linehere  ;


PARSE  VAR linehere  firstpart  (specialchar.specialized)  lastpart    ;


/*  Here, recursively handle multiple instances on one line  */
IF  '' >< lastpart   THEN
  lastpart  =  Specials( specialized, lastpart )  ;
/*  If not for the above, only the first instance would be processed  */

/*  IF  0 < Pos( rchar, linehere )  THEN  */

IF  linehere >< firstpart  THEN
     linehere  =  firstpart  ||  specialnum.specialized  ||  lastpart  ;
     /*  linehere  =  firstpart  ||  '&#'  ||  rnum || ';'  ||  lastpart  ; */


RETURN  linehere  ;


Specialsx:  PROCEDURE  EXPOSE  specialvector.  ;

/* This is for '&abbrev/backslashcode;'  to decimal translations    */

PARSE  ARG  linehere  ;

/*  PARSE  VAR linehere  firstpart  '&' guts ';'  lastpart ,  */
PARSE  VAR linehere  firstpart  '&' +0 guts ';'  lastpart ,
                =1       .      '&'  .   ';'  +0    lptest ,
                =1   falsepatch          (lptest)  ;


lastpartorg  =  lastpart  ;
IF  '' >< lastpart  THEN
  lastpart  =  Specialsx( lastpart )  ;

IF  '' >< lptest  THEN
  DO

  /*  guts  =  '&' || Translate( guts ) || ';'  ;    */
  /*  guts  =  Translate( guts ) || ';'  ;  */
  PARSE UPPER VAR guts  guts  ;
  guts  =  guts';'  ;

  SELECT
  WHEN  '' >< specialvector.guts  THEN
    linehere  =  firstpart || specialvector.guts || lastpart  ;
  WHEN  lastpartorg >< lastpart  THEN
    linehere  =  falsepatch || ';' || lastpart  ;
  OTHERWISE
    NOP  ;
  END  /*  SELECT  */

  END  /*  IF  */


RETURN  linehere  ;


Specialbsx:  PROCEDURE  EXPOSE  specialvectors. escaper  ;

/*  This is for prefix escaped substitutions, originally '\'  */

PARSE  ARG  linehere  ;

/* PARSE  VAR linehere  firstpart  '\' guts +4  lastpart  ; */
PARSE  VAR linehere  firstpart  (escaper) guts +4  lastpart  ;

/* IF  '' >< guts  &  '\' >< guts  THEN */
IF  '' >< guts  &  escaper >< guts  THEN
  DO  index  =  3  TO 1 BY -1

  /*  guts  =  Translate( guts )  ;  */
  PARSE UPPER VAR guts  guts  ;

  IF  '' >< specialvectors.guts  THEN
    DO
    IF  '' >< lastpart  THEN
      lastpart  =  Specialbsx( lastpart )  ;
    return  firstpart || specialvectors.guts  || lastpart  ;
    END

    /* PARSE  VAR linehere  firstpart  '\' guts +(index)  lastpart  ; */
    PARSE  VAR linehere  firstpart  (escaper) guts +(index)  lastpart  ;

  END  index


RETURN  linehere  ;


Percent2X:  PROCEDURE  EXPOSE  fperc  ;

/* to precede a character by '%' and convert to %xx  format
   in URL (xx == hex encoded number).
*/


PARSE  ARG  linehere  ;



PARSE  VAR linehere  firstpart  '%' +1 kar +1   lastpart,
                   =1 fp       '%'  +1 xkar +2  lp  ;

DO WHILE( '' >< kar )

  SELECT
  WHEN  '%%' = xkar  THEN
     linehere  =  fp  ||  fperc  ||  lp  ;  /*  '%%%' passed as '%' */
  WHEN  0 = Datatype( xkar, 'X' )  THEN  /* xkar is not a hex number! */
     linehere  =  firstpart  ||  fperc  ||  C2X( kar )  ||  lastpart  ;
  OTHERWISE   /*  When kar  specifies an already escaped Hex. char. code */
     linehere  =  firstpart  ||  fperc  ||  kar  ||  lastpart  ;
  END

  PARSE  VAR linehere  firstpart  '%' +1 kar +1   lastpart,
                  =1 fp '%' +1 xkar +2 lp     ;

END   /*  while  */

RETURN  linehere  ;


Percent2sX:  PROCEDURE  EXPOSE  spexialvector. ;

/*  convert in URL  '%symbols;'  to hex encoded '%xx'    */

PARSE  ARG  linehere  ;



PARSE  VAR linehere  firstpart  '%'  +0  guts  ';' +0  lptest ,
                 =1  .          '%'      .     ';'      lastpart ,
                 =1  falsepatch                (lptest)    ;

lastpartorg  =  lastpart  ;
IF  '' >< lastpart  THEN
  lastpart  =  Percent2sX( lastpart )  ;

IF  '' >< lptest  THEN
  DO

  /* guts  =   Translate( guts ) || ';'  ; */
         /* Translate() defaults to uppercase */
  PARSE UPPER VAR guts  guts  ;
  guts  =  guts';'  ;
  SELECT
  WHEN  '' >< spexialvector.guts  THEN
    linehere  =  firstpart || spexialvector.guts || lastpart  ;
  WHEN  lastpartorg >< lastpart  THEN
    linehere  =  falsepatch || ';' || lastpart  ;
  OTHERWISE
    NOP  ;
  END   /*  SELECT  */

  END  /*  IF  */

RETURN  linehere  ;


Percent2bsX:  PROCEDURE  EXPOSE  spexialevector.  escaper  ;


/*  convert '\symbol'  to '%xx' string in URL   */

PARSE  ARG  linehere  ;

/* PARSE  VAR linehere  firstpart  '\' guts +4  lastpart  ; */
PARSE  VAR linehere  firstpart  (escaper) guts +4  lastpart  ;

/* IF  '' >< guts  &  '\' >< guts  THEN */
IF  '' >< guts  &  escaper >< guts  THEN
  DO  index  =  3  TO 1 BY -1

  /*  guts  =  Translate( guts )  ;  */
  PARSE UPPER VAR guts  guts  ;

  IF  '' >< spexialevector.guts  THEN
    DO
    IF  '' >< lastpart  THEN
      lastpart  =  Percent2bsx( lastpart )  ;
    return  firstpart || spexialevector.guts  || lastpart  ;
    END

    /* PARSE  VAR linehere  firstpart  '\' guts +(index)  lastpart  ; */
    PARSE  VAR linehere  firstpart  (escaper) guts +(index)  lastpart  ;

  END  index


RETURN  linehere  ;


WhoAmI:  PROCEDURE  EXPOSE  (basics)  ;

/*
    Here take care of interpreter specific stuff:
    Borrowing some ideas from Rexxtry ;-)
*/

PARSE version  particular '_'  .

IF 'REXX-Regina'  ==  particular  THEN
  DO
  /*  Some defaults for Regina:   */
  STDIN   =  ''  ;
  STDOUT  =  ''  ;
  STDERR  =  ''  ;
  /*       These are handled where more specificly attributable:
  STDIN   =  '/dev/stdin'  ;
  STDOUT  =  '/dev/stdout'  ;
  STDERR  =  '/dev/stderr'  ;
  /*  Or possibly:
  STDIN   =  '/dev/fd/0'  ;
  STDOUT  =  '/dev/fd/1'  ;
  STDERR  =  '/dev/fd/2'  ;
  */
  */
  END
ELSE
  DO
  STDIN   =  'STDIN:'   ;
  STDOUT  =  'STDOUT:'  ;
  STDERR  =  'STDERR:'  ;
  END

TRUE   =  1  ;
FALSE  =  0  ;
ON     =  TRUE    ;
OFF    =  FALSE   ;
YES    =  TRUE    ;
NO     =  FALSE   ;

environment  =  address()  ;
/* Storing the original environment for future use,
   such as possible retreival of of environmental variables,
   with value()
   Where appropriate below, this should be overridden.
 */

/*  Find out what is being executed!  */
PARSE  SOURCE  os . pgmName  ;

/*
   n will be the end of line character for various OSs for 'here' documents.
   See Help, Header, Footer and How functions for examples.
   clearscreen  will clear the screnn
*/

/*  os  =  Translate( os )  ;  */
PARSE UPPER VAR os  os  ;

SELECT
WHEN  'OS/2'  = os  THEN
  DO
  n  =  '0D 0A'x  ;
  clearscreen  =  'CLS'  ;
  environment  =  'OS2ENVIRONMENT'  ;
  END
WHEN abbrev( os,  'Windows' ) THEN
  DO
  n  =  '0D 0A'x  ;
  clearscreen  =  'CLS'  ;
  END
WHEN  'DOS'  = os  THEN
  DO
  n  =  '0D 0A'x  ;
  clearscreen  =  'CLS'  ;
  END
WHEN  'UNIX'  = os  THEN
  DO
  n  =  '0A'x  ;
  clearscreen  =  'clear'  ;
  STDIN   =  '/dev/stdin'  ;
  STDOUT  =  '/dev/stdout'  ;
  STDERR  =  '/dev/stderr'  ;
  /*  Or possibly:
  STDIN   =  '/dev/fd/0'  ;
  STDOUT  =  '/dev/fd/1'  ;
  STDERR  =  '/dev/fd/2'  ;
  */
  END
WHEN  'LINUX' = os  THEN
  DO
  n  =  '0A'x  ;
  clearscreen  =  'clear'  ;
  STDIN   =  '/dev/stdin'  ;
  STDOUT  =  '/dev/stdout'  ;
  STDERR  =  '/dev/stderr'  ;
  /*  Or possibly:
  STDIN   =  '/dev/fd/0'  ;
  STDOUT  =  '/dev/fd/1'  ;
  STDERR  =  '/dev/fd/2'  ;
  */
  END
WHEN  'AIX'   = os  THEN
  DO
  n  =  '0A'x  ;
  clearscreen  =  'clear'  ;
  STDIN   =  '/dev/stdin'  ;
  STDOUT  =  '/dev/stdout'  ;
  STDERR  =  '/dev/stderr'  ;
  /*  Or possibly:
  STDIN   =  '/dev/fd/0'  ;
  STDOUT  =  '/dev/fd/1'  ;
  STDERR  =  '/dev/fd/2'  ;
  */
  END
WHEN  'CMS'   = os  THEN
  DO
  /* n  =  ' ' || D2C( 10 )  ;  /* Does anyone really know?  */  */
  n  =  ' ' || '85'x      ;  /* According to some UUASC e-list traffic  */
  clearscreen  =  'VMFCLEAR'  ;
  END
WHEN  'MAC'   = os  THEN    /*  and various 8 bit machines :-)  */
  DO
  n  =  '0D'x  ;
  /*  ??? clearscreen  =  'clear'  ;  */
  END
OTHERWISE
  DO
  n  =  '0D 0A'x  ;
  clearscreen  =  'clear'  ;    /*  ?????          */
  END
END  /* SELECT */

/*  where  =  directory()  ;  */
where  =  pgmName  ;

SELECT
WHEN  Inside( '\', where )  THEN
  dirseperator  =  '\'  ;    /*  PC proprietary  */
WHEN  Inside( '/', where )  THEN
  dirseperator  =  '/'  ;   /*   UNIX   */
WHEN  Inside( '.', where )  THEN
  dirseperator  =  '.'  ;    /* some IBM mainframes  */
WHEN  Inside( ':', where )  THEN
  dirseperator  =  ':'  ;    /* some Macs?  */
OTHERWISE
  NOP  ;
END  ;

helptest  =  '-+'  ;
IF ( '/' >< dirseperator )  Then
  helptest  =  helptest'/'  ;

nshadow  =  Translate( n, , n, ' ' )  ;
/* ....for use with record/line oriented functions   */


pgmName  =  Translate( pgmName, '  ', '/\' )  ;  /*  typical dir. seperators */
              /*  Note that this may not work on certain IBM mainframe
                  OSs where '.' is effectively the directory seperator
               */

nameCnt  =  Words( pgmName )  ;   /* count spaced words and               */

pgmName  =  Word( pgmName, nameCnt )  ;  /* get the last one  */

pgmName  =  Translate( pgmName, ' ', '.' )  ;  /*  Now strip extensions  */

PARSE  VAR  pgmName  pgmName  .  ;

RETURN  pgmName  ;




HandlerR:

/*  Type 2 Exception handler - returns to program  */
/*  By default this is inactive in this program, just here
    for possible future use, as needed.
*/
/*  Basicly ripped off of Daney p. 205    */
/*  "Programming in REXX", (c) 1992, Charles Daney,
     J. Ranade IBM Series, McGraw-Hill, Inc.
 */

ttype         =  Condition( 'C' )  ;

IF  'SYNTAX'  =  ttype  THEN
  SAY 'REXX error 'rc' ('errortext( rc )' ) occured in line 'sigl'.'  ;

IF  'ERROR' = ttype  |  'FAILURE' = ttype  THEN
  SAY 'Command return code: 'rc' occured in line 'sigl'.'  ;

IF  SourceLine() > 0  THEN
  SAY  '=====> 'sourceline( sigl )  ;

SAY  'Of 'sourceline()' total lines.'  ;

/*
signal on syntax
signal restart
*/

tdescription  =  Condition( 'D' )  ;
tinstruction  =  Condition( 'I' )  ;
tstate        =  Condition( 'S' )  ;

SAY 'Trap type: 'ttype  ;
SAY 'Trap description: 'tdescription  ;
SAY 'Trap instruction: 'tinstruction ;
SAY 'Trap state: 'tstate  ;

/*  EXIT  ;    /*   HandlerR   */   */
RETURN   ;     /*   HandlerR    */


Handler:

/*  Type 1 Exception handler  - No Return   */
/*  Basicly ripped off of Daney p. 205    */
/*  "Programming in REXX", (c) 1992, Charles Daney,
     J. Ranade IBM Series, McGraw-Hill, Inc.
 */

ttype         =  Condition( 'C' )  ;

IF  'FAILURE' = ttype  THEN
  SAY 'Command return code: 'rc' occured in line 'sigl'.'  ;

IF  SourceLine() > 0  THEN
  SAY  '=====> 'sourceline( sigl )  ;

SAY  'Of 'sourceline()' total lines.'  ;

/*
signal on syntax
signal restart
*/

tdescription  =  Condition( 'D' )  ;
tinstruction  =  Condition( 'I' )  ;
tstate        =  Condition( 'S' )  ;

SAY 'Trap type: 'ttype  ;
SAY 'Trap description: 'tdescription  ;
SAY 'Trap instruction: 'tinstruction ;
SAY 'Trap state: 'tstate  ;

EXIT  ;    /*   Handler   */




Whereami:  PROCEDURE  EXPOSE  (basics)   ;


/*  Here take care of interpreter specific stuff:         */
/*  Borrowing some ideas from Rexxtry ;-)  */

PARSE VERSION  version  ;

PARSE  SOURCE  source  ;

clearscreen  ;

SAY  ,
''n,
version n,
''n,
source n,
''n  ;            /*  end of 'HERE' document  */

EXIT  ;


RETURN  ;


/*
Lowercase:

PARSE ARG casetarget  ;

  casegoal  =  Translate( casetarget, lowers, uppers )  ;
                           / *  ( string, to, from )    * /

return casegoal  ;

EXIT  ;
*/




Inside:   Procedure  ;

/*  This returns a TRUE / FALSE value if string 'letter'
    is inside ( a substring ) of string 'string'.
    It is used mainly for processing command line switches.
    It uses the same 'needle' in 'haystack' ordering
    of parameters as ReXX's Pos command.

    This is really just a boolean version of the
    CountStr function, and so has been rewritten
    to use methods I developed in that article.
*/

/*
PARSE ARG  letter, string  ;

RETURN  Pos( letter, string )  >  0  ;
*/

PARSE ARG  needle, haystack   =1  subhaystack (needle)  .  ;

RETURN   subhaystack << haystack  ;

GetOpts:
PROCEDURE   EXPOSE  infile  intent optarg  (basics)  ;

PARSE ARG   optclump  ;

/*  a default value:  */
optarg  =  ''  ;
intent  =  OFF  ;
IF  1 >< Verify( optclump, helptest, 'M', 1 )  THEN
  RETURN OFF  ;

cutpoint  =  Verify( optclump, ':=', 'M', 1 )  ;
IF 0 < cutpoint  THEN
  PARSE VAR optclump   optclump  =(cutpoint) . +1  optarg  ;


SELECT
WHEN  '' = optclump  THEN
  RETURN OFF  ;
WHEN  Abbrev( optclump, '-', 1 )  THEN
  DO
  intent  =  ON  ;
  infile    =  optclump  ;
  END  ;
WHEN  Abbrev( optclump, '+', 1 )  THEN
  DO
  intent  =  OFF  ;
  infile    =  optclump  ;
  END  ;
WHEN  ( Abbrev( optclump, '/', 1 )  & ( '/' >< dirseperator ) )  THEN
  DO
  intent  =  ON  ;
  infile    =  optclump  ;
  END  ;
OTHERWISE
  RETURN  OFF  ;
END  ;

RETURN  ON  ;


Help:  PROCEDURE  EXPOSE  USAGE  infile (basics)  ;
/*  Be flexible accepting help    */


rescueCry  =  Translate( infile, '-', '/' )  ;
                           /*  Translate( string, to, from )    */
rescueCry   =  Strip( rescueCry, 'L', '-' )  ;

PARSE UPPER VAR rescueCry  =1 rescueCry  =5 .  ;

rescueCry = Strip( rescueCry )  ;



IF  Abbrev( 'HELP', rescueCry, 1 ) | ,
    Abbrev( 'HHHH', rescueCry, 1 ) | ,
    Abbrev( '????', rescueCry, 1 )
THEN
  DO

  /*
  clearscreen  ;
  */

  CALL  LineOut  STDERR,  ,
  USAGE n,
  ''n,
  ' 'myself', a simple tool to aid converting flat text to HTML.'n,
  '   Typical use: 'myself' -xl < input.txt  >  output.htm'n,
  '   Switches:'n,
  '    -h  -  Help (this) message'n,
  '           Reserve/special character processing:'n,
  '    -s  -  Split the portions of the anchor onto seperate lines'n,
  '    -r  -  Simple reserve character processing'n,
  '    -x  -  Extended reserve character processing'n,
  '    -l  -  Backslashed reserve character processing'n,
  '    -L  -  List schema / reserve/special character substitutions'n,
  '    -eX -  Make ''X'' instead of ''\'' the escape character'n,
  '    -t  -  Allow RFC 1738 extra characters in URLs. (!*''(),)'n,
  '    -n  -  Allow RFC 1738 national characters in URLs. ({}\^[]`)'n,
  '    -p  -  Allow RFC 1738 punctuation characters in URLs. (<>"#)'n,
  '   For all:'n,
  '   "URL|someClabel|C|"  -> "some label" linked to URL'n,
  '   "C" defaults to "_", "someClabel" defaults to URL'n,
      ;   /*  end of 'Here' document  */


  EXIT  ;

  END  /*  IF  */


RETURN  ;


/****************************************************************/
/*    Function Corrections **************************************/
/*    Uncomment as appropriate  *********************************/
/****************************************************************/

/*  */
/*  For OS/2, AS400, and sometimes Regina:  */
Lines:
PROCEDURE   ;

PARSE ARG  infile  ;

RETURN  0 < 'LINES'( infile )  ;
/* These calls to the built-in functions must be in upper case.  */

/*  END OS/2, AS400 special function defs.  */
/*  */

/*
/*  Sometimes these are needed for Regina, if lines() seems to act up:  */
Lines:
PROCEDURE  EXPOSE  (basics)  ;

PARSE ARG  infile  ;

hiddenstuff.infile.line  =  'LINEIN'( infile )  ;
/* These calls to the built-in functions must be in upper case.  */

IF  '' == hiddenstuff.infile.line  THEN
  IF  0 = 'LINES'( infile )  THEN
/* These calls to the built-in functions must be in upper case.  */
    RETURN  FALSE  ;

RETURN  TRUE  ;

LineIn:
PROCEDURE  EXPOSE  (basics)  ;

PARSE ARG  infile  ;

RETURN  hiddenstuff.infile.line  ;

/*  End Regina special function defs.  */
 */



/****************************************************************/
/*    Initialize Data *********************************/
/****************************************************************/

DataStructures:   /*  PROCEDURE   ......not!   */
                              /*  This must all be available. */

/* What would be known in FORTRAN as the BLOCK DATA   */
/*    or was that DATA BLOCK?  Been a long time ago...... */



/*  lowers  =  Xrange( 'a', 'z' )  ;  */
/*  uppers  =  Xrange( 'A', 'Z' )  ;  */
/*
/*  See  Daney p.xxx  -  This is portable across ASCII to EBCIDC:  */
lowers  =  Xrange( 'a', 'i' ) || Xrange( 'j', 'r') || Xrange( 's', 'z' )  ;
/*  uppers  =  Translate( lowers )  ;  */
/*  Faster:    */
PARSE UPPERS VAR lowers  uppers  ;
*/

/*  Initialize schema array  */
/*  These should be arranged in ascending order if any one is a
    substring of another, with some processing schemes,
    also they should be uppercase:  */
schemix.    =  ''  ;         /*  set the compound variable default value  */
schemix.1   =  'HTTP://'  ;
schemix.2   =  'HTTPS://'  ;
schemix.3   =  'FTP://'  ;
schemix.4   =  'FILE://'  ;
schemix.5   =  'MAILTO:'  ;
schemix.6   =  'NEWS:'  ;
schemix.7   =  'NNTP:'  ;
schemix.8   =  'TELNET://'  ;
schemix.9   =  'SNEWS:'  ;       /* easier to handle snews here  */
schemix.10  =  'NEWSPOST://'  ;
schemix.11  =  'NEWSREPLY://'  ;
schemix.12  =  'SNEWSPOST://'  ;
schemix.13  =  'SNEWSREPLY://'  ;
schemix.14  =  'IKSD://'  ;    /* Internet Kermit Server Daemon    */
schemix.15  =  'RLOGIN://'  ;
schemix.16  =  'TN3270://'  ;
schemix.17  =  'TN5250://'  ;
schemix.18  =  'GOPHER://'  ;
schemix.19  =  'WAIS://'  ;
schemix.20  =  'FINGER://'  ;
schemix.21  =  'CSO://'  ;     /* CSO? I can't even find an RFC
                                    for this :-)
                                    it seems related to gopher, mentioned in
                                    gopher RFCs.  Some predecessor to LDAP?
                                    (was listed as a URL Lynx could cope with)
                                 */
    /*  add more schema here as appropriate   */
schemix.0   =  21  ;   /*  keep number of elements in '.0'
                             as is traditional in ReXX
                        */

DO  I = 1   TO  schemix.0
  /*  Used to spot null URLs that are just a schema  */
  schemix.length.I  =  Length( schemix.I )  ;
END  I  ;




IF  listsubs  THEN
  DO
  SAY  'Schema recognized:'
  DO  index = 1 TO schemix.0
    SAY  '  'schemix.index  ;
  END  index
  END

/*    characters with questionable URL use status, official status
       (theory):

From RFC 1738:
extra          = "!" | "*" | "'" | "(" | ")" | ","
national       = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "`"
punctuation    = "<" | ">" | "#" | "%" | <">
*/

/*  What will be allowed in practice:  */

oddCharsInURLs  =  ''  ;     /*  default status   */
IF  \ extra  THEN
   oddCharsInURLs  =  oddCharsInURLs || "!*'(),"  ;
IF  \ national  THEN
   oddCharsInURLs  =  oddCharsInURLs || "{}\^[]`"  ;
    /* strictly, should be: "{}\^~[|]`"  ,
       but '~' is in too many URLs in path specification
       and '|' is being used for link text
           (in the future may want to allow for alternate symbol for this)
       and '\' is being used for some character substitution sequences
     */
IF  \ punctuation  THEN
   oddCharsInURLs  =  oddCharsInURLs || '<>"#'  ;
   /* strictly, should be: '<>#%"'  ,
      but '%' is used for hex encoding and so not included here  */

/*  Now specificly exempt some characters:   */
oddCharsInURLs  =  Translate( oddCharsInURLs, , escaper, ' ' )  ;
oddCharsInURLs  =  Space( oddCharsInURLs, 0 )  ;
/*   ---  This may need extension in the future  */

IF  listsubs  THEN
  DO
  SAY  'Odd characters not allowed in URLs:'
  SAY  '  "'oddCharsInURLs'"'  ;
  END


/*  Initialize reserve characters to be translated
    These are tricky ones that are used until final
    cleanup toward the end
    The actual values, 130-132, is pretty arbitrary as long
    as they are not in the normally printable or control ASCII values
    They may have to be adjusted to deal with non-standard ASCII
    values.
*/
fake    =  128      /*  128 < fake +1 & fake + last fake char < 256  */
famp    =  D2C( fake + 1 )  ;    /*  Fake ampersand  */
fhash   =  D2C( fake + 2 )  ;    /*  Fake hash, sharp or pound sign  */
fsemi   =  D2C( fake + 3 )  ;    /*  Fake semicolon  */
fperc   =  D2C( fake + 4 )  ;    /*  Fake percent  */
fcolon  =  D2C( fake + 5 )  ;    /*  Fake colon  */
fampfhash  =  famp || fhash  ;

/* For translation just before printing the final output:  */
dirtyin   =  famp || fhash || fsemi || fperc || fcolon  ;
cleanout  =  '&'  ||  '#'  ||  ';'  ||  '%'  ||   ':'   ;

/*  These are for outside URLs, plain text      */
/*  Here handling characters guaranteed not to be in URLs   */
/*  These are the main trouble makers!   */

reservechar.    =  ''   ;   /*  again, set the default  */
reservechar.1   =  '<'  ;
reservechar.2   =  '>'  ;
/*  add more characters here as appropriate   */
reservechar.0   =  2    ;

/*  now set up an array of the decimal values of the characters:  */
/*  These are default values that can be over ridden below.       */

reservenum.  =  ''  ;        /*  set default   */
DO nindex  =  1  TO  reservechar.0
  tnumber  =  C2D( reservechar.nindex )  ;
  tnumber  =  Right( tnumber, 3, '0' )  ;
  reservenum.nindex  =  fampfhash || tnumber || fsemi  ;
END  nindex
reservenum.0   =  reservechar.0  ;  /*  redundant loop, but ready for  */
reservenum.1   =  famp'lt'fsemi  ;  /*  growth in the future  */
reservenum.2   =  famp'gt'fsemi  ;

IF  reservetrans  THEN
IF  listsubs  THEN
  DO
  SAY 'Reserve character translations (-r, -x, -l):'  ;
  SAY 'in all text:'  ;
  DO  index = 1 TO reservechar.0
    SAY '  'reservechar.index 'goes to',
          Translate( reservenum.index, cleanout, dirtyin )  ;
  END  index
  END

/*  Now these are for characters that will not be inside an
    URL, but generally have special meaning in URLs
*/
specialchar.     =  ''  ;    /* initialize compound variable  */

specialchar.1    =  '&'   ;
specialchar.2    =  '"'   ;
specialchar.3    =  '#'   ;
specialchar.4    =  ';'   ;
specialchar.5    =  '%'   ;
specialchar.6    =  ':'   ;
specialchar.7    =  '?'   ;
specialchar.8    =  '@'   ;

specialchar.0    =   8     ;

specialnum.  =  ''  ;      /*  set default  */
DO nindex  =  1  TO  specialchar.0  /* set defaults for &#nnn; trans.  */
  tnumber  =  C2D( specialchar.nindex )  ;
  tnumber  =  Right( tnumber, 3, '0' )  ;
  specialnum.nindex  =  fampfhash || tnumber || fsemi  ;
END  nindex
specialnum.0   =  specialchar.0  ;
specialnum.1   =  famp'amp'fsemi  ;
specialnum.2   =  famp'quot'fsemi  ;


IF  reservetrans  THEN
IF  listsubs  THEN
  DO
  SAY 'Special character translations (-r, -l, -x):'  ;
  SAY '(inside free text):'  ;
  DO  index = 1 TO specialchar.0
    SAY '  'specialchar.index 'goes to',
          Translate( specialnum.index, cleanout, dirtyin )  ;
  END  index
  END


/*  seperating these to simplify expanding all   */


IF  extendedtrans  THEN   /*  &code;  -->  &#dec.number;   */
DO                              /*  %code;  -->  %#hex.number    */

  specialstr.     =  ''     ;     /* initialize compound variable   */

  specialstr.1    =  '&NUL;'  ;   /* and standard control chars.  */
  specialstr.2    =  '&SOH;'  ;   /*  shifted off by 1 due to Rexx  */
  specialstr.3    =  '&STX;'  ;    /* convention vrs.  ASCII values  */
  specialstr.4    =  '&ETX;'  ;
  specialstr.5    =  '&EOT;'  ;
  specialstr.6    =  '&ENQ;'  ;
  specialstr.7    =  '&ACK;'  ;
  specialstr.8    =  '&BEL;'  ;
  specialstr.9    =  '&BS;'  ;
  specialstr.10   =  '&HT;'  ;
  specialstr.11   =  '&LF;'  ;
  specialstr.12   =  '&VT;'  ;
  specialstr.13   =  '&NP;'  ;
  specialstr.14   =  '&CR;'  ;
  specialstr.15   =  '&SO;'  ;
  specialstr.16   =  '&SI;'  ;
  specialstr.17   =  '&DLE;'  ;
  specialstr.18   =  '&DC1;'  ;
  specialstr.19   =  '&DC2;'  ;
  specialstr.20   =  '&DC3;'  ;
  specialstr.21   =  '&DC4;'  ;
  specialstr.22   =  '&NAK;'  ;
  specialstr.23   =  '&SYN;'  ;
  specialstr.24   =  '&ETB;'  ;
  specialstr.25   =  '&CAN;'  ;
  specialstr.26   =  '&EM;'  ;
  specialstr.27   =  '&SUB;'  ;
  specialstr.28   =  '&ESC;'  ;
  specialstr.29   =  '&FS;'  ;
  specialstr.30   =  '&GS;'  ;
  specialstr.31   =  '&RS;'  ;
  specialstr.32   =  '&US;'  ;
  specialstr.33   =  '&SP;'  ; /*  and include space with the other whitespace
                                 */
  specialstr.34   =  '&TAB;'  ;   /*  alternate tab symbol  */

  specialstr.35   =  '&\T;' ; /* here define some common C world escapes */
  specialstr.36   =  '&\R;' ;
  specialstr.37   =  '&\F;' ;    /*  form feed */
  specialstr.38   =  '&\A;' ;    /* bell */
  specialstr.39   =  '&\E;' ;    /* escape */
  specialstr.40   =  '&\B;' ;    /* backspace  */
  specialstr.41   =  '&\N;' ;    /* 'newline'  =  form feed  */
  specialstr.42   =  '&\0;' ;    /* null  */
  specialstr.43   =  '&\V;' ;    /* vertical tab  */
  specialstr.44   =  '&\S;' ;    /* space (from perl)  */
  specialstr.45   =  '&\\;' ;    /* backslash itself  */


  specialstr.0    =   45  ;

/*     .........setting up defaults       */




  specialnums.   =  ''  ;      /* initializing the compound variable  */

  DO nindex  =  1  TO  specialstr.0
    nvalue  =  nindex - 1  ;
    nvalue  =  Right( nvalue, 3, '0' )  ;
    specialnums.nindex  =  fampfhash || nvalue || fsemi  ;
  END  nindex

  /*  Now the irregular stuff:   */

  specialnums.34  =  fampfhash'009'fsemi  ;    /* alternate horizontal Tab  */

  specialnums.35  =  fampfhash'009'fsemi  ;    /* horizontal tab */
  specialnums.36  =  fampfhash'013'fsemi  ;   /* carriage return */
  specialnums.37  =  fampfhash'011'fsemi  ;   /* form feed */
  specialnums.38  =  fampfhash'007'fsemi  ;    /* bell */
  specialnums.39  =  fampfhash'027'fsemi  ;   /* escape */
  specialnums.40  =  fampfhash'008'fsemi  ;    /* backspace   */
  specialnums.41  =  fampfhash'010'fsemi  ;   /* linefeed ('newline')  */
  specialnums.42  =  fampfhash'000'fsemi  ;    /* NUL  */
  specialnums.43  =  fampfhash'011'fsemi  ;   /* Vertical Tab  */
  specialnums.44  =  fampfhash'032'fsemi  ;    /* alternate space  */
  specialnums.45  =  fampfhash'092'fsemi  ;    /* backslash  */


  specialnums.0   =  45  ;





  spexialnum.   =  ''  ;    /* again, initializing the compound variable  */

  DO  index  =  1    TO  specialstr.0
    spexialstr.index   =  Translate( specialstr.index, '%', '&' )  ;
    spexialnum.index   =  Strip( specialnums.index, 'L', famp )  ;
    spexialnum.index   =  Strip( spexialnum.index, 'L', fhash )  ;
    spexialnum.index   =  Strip( spexialnum.index, 'T', fsemi )  ;
    spexialnum.index   =  D2X( spexialnum.index )  ;
    spexialnum.index   =  Right( spexialnum.index, 2, '0' )  ;
    spexialnum.index   =  fperc || spexialnum.index  ;
  END  index

  spexialnum.0   =  specialchar.0  ;
  spexialchar.0  =  spexialnum.0  ;

/*  .......finishing off specialnum's with some special cases    */



  /* so we can do things associatively  !!!  */
    specialvector.  =  ''   ;    /* initializing these 2 compound variables */
    spexialvector.  =  ''  ;

  DO  index   =  1 TO specialstr.0  ;
    vindex  =  specialstr.index  ;
    specialvector.vindex  =  specialnums.index  ;
    vindex  =  spexialstr.index  ;
    spexialvector.vindex  =  spexialnum.index  ;
  END  index  ;



IF  extendedtrans  THEN
IF  listsubs  THEN
  DO
  SAY  'Extended translations (-x):'  ;
  SAY  'In free text:'  ;
  DO  index = 1 TO specialstr.0
    SAY '  'Left( specialstr.index, 5 ) 'goes to',
       Translate( specialnums.index, cleanout, dirtyin )  ;
  END  index
  SAY  'Inside an URL:'  ;
  DO  index = 1 TO specialstr.0
    SAY '  'Left( spexialstr.index, 5 ) 'goes to',
       Translate( spexialnum.index, cleanout, dirtyin )  ;
  END  index
  END


END   /*   extended translations  */


IF  backslashtrans  THEN   /*  \code  -->  &#dec.number;  in body */
DO                               /*  \code  --> %hex#  in URL  */

  /* escaper         =  '\'   ;     /*  the escape character       */  */

  specialesc.     =  ''      ;   /* initializing this compound variable  */

  specialesc.1    =  escaper'NUL'  ;   /* and standard control chars.  */
  specialesc.2    =  escaper'SOH'  ;   /*  shifted off by 1 due to Rexx  */
  specialesc.3    =  escaper'STX'  ;    /* convention vrs.  ASCII values  */
  specialesc.4    =  escaper'ETX'  ;
  specialesc.5    =  escaper'EOT'  ;
  specialesc.6    =  escaper'ENQ'  ;
  specialesc.7    =  escaper'ACK'  ;
  specialesc.8    =  escaper'BEL'  ;
  specialesc.9    =  escaper'BS'  ;
  specialesc.10   =  escaper'HT'  ;
  specialesc.11   =  escaper'LF'  ;
  specialesc.12   =  escaper'VT'  ;
  specialesc.13   =  escaper'NP'  ;
  specialesc.14   =  escaper'CR'  ;
  specialesc.15   =  escaper'SO'  ;
  specialesc.16   =  escaper'SI'  ;
  specialesc.17   =  escaper'DLE'  ;
  specialesc.18   =  escaper'DC1'  ;
  specialesc.19   =  escaper'DC2'  ;
  specialesc.20   =  escaper'DC3'  ;
  specialesc.21   =  escaper'DC4'  ;
  specialesc.22   =  escaper'NAK'  ;
  specialesc.23   =  escaper'SYN'  ;
  specialesc.24   =  escaper'ETB'  ;
  specialesc.25   =  escaper'CAN'  ;
  specialesc.26   =  escaper'EM'  ;
  specialesc.27   =  escaper'SUB'  ;
  specialesc.28   =  escaper'ESC'  ;
  specialesc.29   =  escaper'FS'  ;
  specialesc.30   =  escaper'GS'  ;
  specialesc.31   =  escaper'RS'  ;
  specialesc.32   =  escaper'US'  ;
  specialesc.33   =  escaper'SP'  ; /*  and include space with the other
                                        whitespace */
  specialesc.34   =  escaper'TAB'  ;   /*  alternate tab symbol  */

  specialesc.35   =  escaper'T' ; /* here define some common C world escapes */
  specialesc.36   =  escaper'R' ;
  specialesc.37   =  escaper'F' ;    /*  form feed */
  specialesc.38   =  escaper'A' ;    /* bell */
  specialesc.39   =  escaper'E' ;    /* escape */
  specialesc.40   =  escaper'B' ;    /* backspace  */
  specialesc.41   =  escaper'N' ;    /* 'newline'  =  form feed  */
  specialesc.42   =  escaper'0' ;    /* null  */
  specialesc.43   =  escaper'V' ;    /* vertical tab  */
  specialesc.44   =  escaper'S' ;    /* space (from perl)  */
  specialesc.45   =  escaper'\' ;    /* backslash itself  */
  specialesc.46   =  escaper || escaper  ;


  specialesc.0    =   46  ;

/*     .........setting up defaults       */



  specialnums.  =  ''  ;     /* initializing a compound variable  */
  DO nindex  =  1  TO  specialesc.0
    nvalue  =  nindex - 1  ;
    nvalue  =  Right( nvalue, 3, '0' )  ;
    specialnums.nindex  =  fampfhash || nvalue || fsemi  ;
  END  nindex

  /*  Now the irregular stuff:   */

  specialnums.34  =  fampfhash'009'fsemi  ;    /* alternate horizontal Tab  */

  specialnums.35  =  fampfhash'009'fsemi  ;    /* horizontal tab */
  specialnums.36  =  fampfhash'013'fsemi  ;   /* carriage return */
  specialnums.37  =  fampfhash'011'fsemi  ;   /* form feed */
  specialnums.38  =  fampfhash'007'fsemi  ;    /* bell */
  specialnums.39  =  fampfhash'027'fsemi  ;   /* escape */
  specialnums.40  =  fampfhash'008'fsemi  ;    /* backspace   */
  specialnums.41  =  fampfhash'010'fsemi  ;   /* linefeed ('newline')  */
  specialnums.42  =  fampfhash'000'fsemi  ;    /* NUL  */
  specialnums.43  =  fampfhash'011'fsemi  ;   /* Vertical Tab  */
  specialnums.44  =  fampfhash'032'fsemi  ;    /* alternate space  */
  specialnums.45  =  fampfhash'092'fsemi  ;    /* backslash  */
  specialnums.46  =  fampfhash || Right( C2D( escaper ), 3, '0' ) || fsemi  ;

  specialnums.0   =  46  ;





  spexialenum.   =  ''  ;    /* initializing a compound variable  */

  DO  index  =  1    TO  specialesc.0
    spexialenum.index   =  Strip( specialnums.index, 'L', famp )  ;
    spexialenum.index   =  Strip( spexialenum.index, 'L', fhash )  ;
    spexialenum.index   =  Strip( spexialenum.index, 'T', fsemi )  ;
    spexialenum.index   =  D2X( spexialenum.index )  ;
    spexialenum.index   =  Right( spexialenum.index, 2, '0' )  ;
    spexialenum.index   =  fperc || spexialenum.index  ;
  END  index

  spexialenum.0   =  specialchar.0  ;

/*  .......finishing off specialnum's with some special cases    */



  /* so we can do things associatively  !!!  */
    specialvectors.  =  ''  ;    /* initializing these 2 compound variables */
    spexialevector.  =  ''  ;

  DO  index   =  1 TO specialesc.0  ;
    vindex  =  specialesc.index  ;
    specialvectors.vindex  =  specialnums.index  ;
    spexialevector.vindex  =  spexialenum.index  ;
  END  index  ;


IF  backslashtrans  THEN
IF  listsubs  THEN
  DO
  SAY  '"Backslash" translations (-l):'  ;
  SAY  'In free text:'  ;
  DO  index = 1 TO specialesc.0
    SAY '  'Left( specialesc.index, 4 ) 'goes to',
       Translate( specialnums.index, cleanout, dirtyin )  ;
  END  index
  SAY  'Inside an URL:'  ;
  DO  index = 1 TO specialesc.0
    SAY '  'Left( specialesc.index, 4 ) 'goes to',
       Translate( spexialenum.index, cleanout, dirtyin )  ;
  END  index
  END



END   /*   backslash escaped translations  */

/*  finished setting up data structures **********************/

IF  listsubs  THEN
  EXIT  ;


RETURN  ;






/*

Dallas's home page
*/

Dallas E. Legan

mailto:aw585@lafn.org

Valid HTML 4.01!