Re: [netrexx-pipelines] REGEX stage now with "context" and complete(?)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Welcome back.  Thank you for your thoughts.<br>
    </p>
    <p>Answers / thoughts interlaced below.<br>
    </p>
    <div class="moz-cite-prefix">On 8/2/2020 12:58 AM, Colin wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:b53...@im...">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <font face="Century Gothic">I've been away for a while and am just
        catching up on some old emails.<br>
        <br>
        If I read the diagram correctly I could say BEFORE &lt;n&gt;
        AFTER &lt;m&gt; to get the same effect as CONTEXT.  Is that a
        fair statement or am I not misunderstanding the CONTEXT option?<br>
      </font></blockquote>
    <font face="Century Gothic">True, if m = n.</font><br>
    <blockquote type="cite"
      cite="mid:b53...@im..."><font
        face="Century Gothic"> <br>
        To your questions, here are my 2 cents worth if I am not too
        late to chime in. . . .</font><br>
      <ul>
        <li>Would it ever help to put matched records out on the
          tertiary stream without the context records?  Unmatched ones
          are already on the secondary.</li>
      </ul>
      <p>Interesting idea.  At the moment I don't think I have an
        opinion either way.  As the number of streams increases does it
        increase the complexity and readability of a PIPE?<br>
      </p>
    </blockquote>
    This has been added as a <b>Tertiary</b> option.  It does increase
    the complexity.  I don't think it would be used often and certainly
    not casually.  (I'd like an easier to read syntax for multiple
    pipes.  Such as a distinction between labels for stage input and
    output.  But that train left decades ago.)<br>
    <blockquote type="cite"
      cite="mid:b53...@im...">
      <p> </p>
      <ul>
        <li>GREP has an option to just put out the COUNT of matched
          records.  Do we have any use for this? ( REGEX  string | COUNT
          LINES does the same.)</li>
      </ul>
      <p>Would there be a huge performance benefit to providing this
        functionality?  If not, then my vote leans towards "no".  REGEX
        string | COUNT LINES should suffice. <br>
      </p>
    </blockquote>
    Sure there would be some performance hit.  These days, if fewer than
    a few thousand records probably wouldn't be noticeable. (Guess, not
    profiled.)  But likely not difficult to add the option.  I have a
    couple of things with higher priority.<br>
    <blockquote type="cite"
      cite="mid:b53...@im...">
      <p> <br>
      </p>
      <ul>
        <li>Possibly the regex_string should be a delimited string. 
          This because a potential REGEX_CHANGE stage would have two
          delimited strings.</li>
      </ul>
      <p>I would have thought that regex_string would have to a
        delimited string as wouldn't you need to handle the case of a
        blank in the middle of your regex_string?  E.g.    regex a b
        c    vs    regex / a b c/  find all strings
        (space)a(space)b(space)c<br>
      </p>
    </blockquote>
    Good thought, but it is the last thing, so spaces are ok.  But for
    compatibility it may be a good idea.  "Least surprise" principle. <br>
    <blockquote type="cite"
      cite="mid:b53...@im...">
      <p> </p>
      <ul>
        <li>Should this be named GREP?  Which term would be more or less
          familiar to our users?  Both? with one an alias of the other?</li>
      </ul>
      <p>Personally my preference is for REGEX_MATCH and REGEX_CHANGE
        since it more closely matches (pun intended) how the matching
        takes place than GREP does.  Having one the alias of the other
        works too.<br>
      </p>
    </blockquote>
    At the moment, it is <b>REGEX</b> with <b>GREP</b> as an alias.  <br>
    <blockquote type="cite"
      cite="mid:b53...@im...">
      <p> </p>
      <font face="Century Gothic">Just my thoughts from the rookie.  :-)<br>
      </font></blockquote>
    <p><font face="Century Gothic">Thank you.</font></p>
    <p><font face="Century Gothic">Here is the current header
        documentation.  Last change 7/1:</font></p>
    <p><font face="Century Gothic">
        <blockquote type="cite"><tt>/** regex<br>
            <br>
&gt;&gt;--+--REGEX--+--+--------------------------+--regex_string-(1)---&gt;&lt;<br>
                +--GREP---+  +-(--| options_string |--)-+<br>
            <br>
            options_string:<br>
               +----------------------------+<br>
            |--v-+------------------------+-+--|<br>
                 +-Numbers----------------+ (2)<br>
                 +-Before-+-1------+------+ (3)<br>
                 |        +-number-+      |<br>
                 +-After-+-1------+-------+ (3)<br>
                 |       +-number-+       |<br>
                 +-Context-+-1------+-----+ (4)<br>
                 |         +-number-+     |<br>
                 +-NOSeparator------------+ (5)<br>
                 +-Separator-+-/--/----+--+ (5)<br>
                 |           +-DString-+  |<br>
                 +-Tertiary---------------+ (6)<br>
            <br>
             NetRexx Pipelines only.<br>
             Records matching the RegEx are put out on primary output.<br>
             Records not matching are put out on secondary, if
            connected, or discarded.<br>
            <br>
            (1) Regex_string is a Java RegEx expresion. Null string
            passes all records.<br>
            (2) Records are prefaced with records number, 10 characters,
            right justified.<br>
            (3) Number of records put out after a matching record.<br>
            (4) Number of records put out before and after a matching
            record.<br>
            (5) Inserted before a group of "before records" or the found
            record with "after records."<br>
            (6) Send all matching records (no numbers) to tertiary
            output stream, if connected.<br>
            <br>
            */</tt></blockquote>
        <br>
      </font></p>
    <blockquote type="cite"
      cite="mid:b53...@im..."><font
        face="Century Gothic"> <br>
        Cheers<br>
        Colin K<br>
        <br>
      </font><br>
      <div class="moz-cite-prefix">On 2020-06-30 20:19, Jeff Hennick
        wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:586...@Je...">
        <meta http-equiv="content-type" content="text/html;
          charset=UTF-8">
        <p>I have added the CONTEXT <i>number</i> option.  This reports
          not only the matching record, but some before and after it
          also.  Also added are BEFORE and AFTER to get contextual
          records in one direction.  There is an optional SEPARATOR to
          set off the groups of records.  It defaults to "--".</p>
        <p> </p>
        <blockquote type="cite"><tt>/** regex<br>
            <br>
            &gt;&gt;--<b>REGEX</b>--+--------------------------+--<i>regex_string</i>-(1)---&gt;&lt;<br>
                       +-(--| <i>options_string</i> |--)-+<br>
            <br>
            <b>options_string</b>:<br>
               +----------------------------+<br>
            |--v-+------------------------+-+--|<br>
                 +-NUMBERS----------------+ (2)<br>
                 +-BEFORE-+-<i>0</i>------+------+ (3)<br>
                 |        +-<i>number</i>-+      |<br>
                 +-AFTER-+-<i>0</i>------+-------+ (3)<br>
                 |       +-<i>number</i>-+       |<br>
                 +-CONTEXT-+-<i>0</i>------+-----+ (4)<br>
                 |         +-<i>number</i>-+     |<br>
                 +-NOSEPARATOR------------+<br>
                 +-SEPARATOR-+- -- ----+--+<br>
                 |           +-<i>DString</i>-+  |<br>
            <br>
             Records matching the RegEx are put out on primary output<br>
             Records not matching are put out on secondary, if
            connected, or discarded.<br>
            <br>
            (1) string is a Java RegEx expresion. null string passes all
            records.<br>
            (2) lines are prefaced with line number, 10 characters,
            right justified<br>
            (3) number of records put out after a matching record<br>
            (4) number of records put out before and after a matching
            record<br>
            <br>
            */<br>
          </tt></blockquote>
        <br>
        This brings it up, functionally, almost to GNU GREP 3.4 (minus
        all of its file input options).
        <p>A few things for discussion:</p>
        <ul>
          <li>Would it ever help to put matched records out on the
            tertiary stream without the context records?  Unmatched ones
            are already on the secondary.</li>
          <li>GREP has an option to just put out the COUNT of matched
            records.  Do we have any use for this? ( REGEX  string |
            COUNT LINES does the same.)<br>
          </li>
          <li>Possibly the regex_string should be a delimited string. 
            This because a potential REGEX_CHANGE stage would have two
            delimited strings.</li>
          <li>Should this be named GREP?  Which term would be more or
            less familiar to our users?  Both? with one an alias of the
            other?</li>
        </ul>
        (Oops, just spotted a bug: BEFORE, etc, without a number works
        if it is the last option, but not otherwise.  Something for the
        morning fix.)<br>
        <p>Jeff<br>
        </p>
        <br>
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <br>
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <pre class="moz-quote-pre" wrap="">_______________________________________________
netrexx-pipelines mailing list
<a class="moz-txt-link-abbreviated" href="mailto:net...@li..." moz-do-not-send="true">net...@li...</a>
<a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/netrexx-pipelines" moz-do-not-send="true">https://lists.sourceforge.net/lists/listinfo/netrexx-pipelines</a>
</pre>
      </blockquote>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
netrexx-pipelines mailing list
<a class="moz-txt-link-abbreviated" href="mailto:net...@li...">net...@li...</a>
<a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/netrexx-pipelines">https://lists.sourceforge.net/lists/listinfo/netrexx-pipelines</a>
</pre>
    </blockquote>
  </body>
</html>

Re: [netrexx-pipelines] REGEX stage now with "context" and complete(?)

The open sourced NetRexx reference implementation

Re: [netrexx-pipelines] REGEX stage now with "context" and complete(?)