regular expressions

Help
r corak
2006-10-06
2013-05-23
  • r corak
    r corak
    2006-10-06

    How would I specify a LOCATE command equivalent to the grep input:      -i '\<word\>'

     
    • First, let us all make certain that we understand what your grep parameters would accomplish.  As I understand it:
      -i specifies that case is to be ignored.
      \<  and \> respectively match the beginning and end of a word (a "word" is made up of are letters, digits, and the underscore; other characters, as well as the beginning and end of the line terminate words).

      WHAT YOU NEED TO DO:
      (1) In order to have the case insensitivity specified by the -i option, issue the primary command [SET] CASE M I

      As a next step, there is no primary command, or short sequence of primary commands, which occurs to me which would accomplish your goal.

      I believe that the only way to solve this problem is to write a short macro to repeatedly locate the next occurrance of the characters in <word> until it finds such an occurrance with non-word characters before and after (counting beginning- and end-of-zone as non-word characters)

      Based on your request, this macro would work like the LOCATE command, which only identifies the line in which the target appears.

      Here is my first cut at such a macro.  I have only had time to conduct cursory testing, so there may very well be bug(s) in it.  Please post any further comments on this macro in this thread.

      /*REXX Macro to locate the next occurrance of a "word"              */
      /* A word is defined as a sequence of the characters A-Z, 0-9, and   */
      /* and/or underscore.                                                */

      SIGNAL ON NOVALUE

      word_chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ_0123456789'

      /* Obtain editor settings which affect the operation of this macro   */

      'EXTRACT/ZONE/'

      left_zone = zone.1
      right_zone = zone.2

      'EXTRACT/CASE/'

      locate_case = LEFT( case.2, 1 ) /* I for Ignore, R for Respect */

      IF locate_case = 'I'
      THEN  word_chars = word_chars || 'abcdefghijklmnopqrstuvwxyz'

      'EXTRACT /STAY/'
      stay = stay.1

      'EXTRACT /MSGMODE/'
      msgmode = msgmode.1

      /*--------------------------*/
      /* Obtain user parameter(s) */
      /*--------------------------*/

      PARSE ARG word excess

      /* Validate parameters */

      IF excess <> ''
      THEN  DO
         'ERRMSG More than one word specified!'
         EXIT 16
      END

      IF word = ''
      THEN  DO
         'ERRMSG One word must be specified!'
         EXIT 16
      END

      ii = VERIFY( word, word_chars )
      IF ii > 0
      THEN  DO
         'ERRMSG Invalid character, `'SUBSTR(word,ii,1)'`, in position' ii 'of word `'word'`!'
         EXIT 16
      END

      /*------------------------*/
      /* Remember current state */
      /*------------------------*/

      'EXTRACT /LINE/'
      start_line = line.1

      /*-------------------*/
      /* Search for <word> */
      /*-------------------*/

      DO UNTIL found > 0 | eof

         found = 0
         eof = 0
         'SET MSGMODE OFF'  /* Suppress error message(s) from following commands */
         'LOCATE /'word'/'
         locate_RC = RC
         'SET MSGMODE' msgmode /* Restore user's message mode setting */

         IF locate_RC > 1
         THEN /* reached end of file */ eof = 1
         ELSE /* found the characters in <word> */ DO
            'EXTRACT/CURLINE/'
            line = curline.3
            word_len = LENGTH( word )
            ii = left_zone
            jj = POS( word, line, ii ) /* starting position of word */

            DO  UNTIL jj = 0 | found > 0

               valid_start = 0
               valid_end = 0

               IF jj = left_zone
               THEN valid_start = 1
               ELSE valid_start = VERIFY( SUBSTR( line, jj-1, 1 ), word_chars )

               IF ( valid_start )
               THEN /* start of word is valid */ DO

                  kk = jj + word_len - 1 /* ending position of word */

                  IF  kk = right_zone
                  THEN  valid_end = 1
                  ELSE  valid_end = VERIFY( SUBSTR( line, kk+1, 1 ), word_chars )

                  IF ( valid_end )
                  THEN found = jj  /* end of word is also valid */

               END /* start of word is valid */

               ii = ii + word_len /* resume search after end of this word */
               jj = POS( word, line, ii ) /* starting position of word */

            END /* DO  WHILE jj > 0 & \ found */

         END /* found the characters in <word> */

         IF found = 0
         THEN /* word not found */ DO
            'EMSG Error 0017: Word not found "'word'", RC='locate_RC
            IF stay = 'ON' THEN 'LOCATE :'start_line
         END /* word not found */
         ELSE  'MSG Word `'word'` found in column' found', RC='locate_RC

      END /* DO UNTIL found */

      EXIT locate_RC

       
    • r corak
      r corak
      2006-10-13

      Firstly, CASE IGNORE doesn't do anything for LOCATE RE.  Experiment: Start with a file with upper case contents. Set CASE I, as appropriate.  Try LOCATE RE /x/.  This will not locate an upper case X.

      Secondly, your macro is impressive.  I dare say, we could even go further replace the LOCATE command with a LOCATE.THE macro.  Seriously, with all the power that regular expressions provide, wouldn't case-insensitivity be a productivity boon, without requiring large macros as you provided?

       
    • LesK
      LesK
      2007-11-06

      CASE IGNORE certainly works as advertised during LOCATE on version 3.2