From: Jeff H. <Je...@Je...> - 2020-08-02 12:38:12
|
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body> <p>Welcome back. Thank you for your thoughts.<br> </p> <p>Answers / thoughts interlaced below.<br> </p> <div class="moz-cite-prefix">On 8/2/2020 12:58 AM, Colin wrote:<br> </div> <blockquote type="cite" cite="mid:b53...@im..."> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <font face="Century Gothic">I've been away for a while and am just catching up on some old emails.<br> <br> If I read the diagram correctly I could say BEFORE <n> AFTER <m> to get the same effect as CONTEXT. Is that a fair statement or am I not misunderstanding the CONTEXT option?<br> </font></blockquote> <font face="Century Gothic">True, if m = n.</font><br> <blockquote type="cite" cite="mid:b53...@im..."><font face="Century Gothic"> <br> To your questions, here are my 2 cents worth if I am not too late to chime in. . . .</font><br> <ul> <li>Would it ever help to put matched records out on the tertiary stream without the context records? Unmatched ones are already on the secondary.</li> </ul> <p>Interesting idea. At the moment I don't think I have an opinion either way. As the number of streams increases does it increase the complexity and readability of a PIPE?<br> </p> </blockquote> This has been added as a <b>Tertiary</b> option. It does increase the complexity. I don't think it would be used often and certainly not casually. (I'd like an easier to read syntax for multiple pipes. Such as a distinction between labels for stage input and output. But that train left decades ago.)<br> <blockquote type="cite" cite="mid:b53...@im..."> <p> </p> <ul> <li>GREP has an option to just put out the COUNT of matched records. Do we have any use for this? ( REGEX string | COUNT LINES does the same.)</li> </ul> <p>Would there be a huge performance benefit to providing this functionality? If not, then my vote leans towards "no". REGEX string | COUNT LINES should suffice. <br> </p> </blockquote> Sure there would be some performance hit. These days, if fewer than a few thousand records probably wouldn't be noticeable. (Guess, not profiled.) But likely not difficult to add the option. I have a couple of things with higher priority.<br> <blockquote type="cite" cite="mid:b53...@im..."> <p> <br> </p> <ul> <li>Possibly the regex_string should be a delimited string. This because a potential REGEX_CHANGE stage would have two delimited strings.</li> </ul> <p>I would have thought that regex_string would have to a delimited string as wouldn't you need to handle the case of a blank in the middle of your regex_string? E.g. regex a b c vs regex / a b c/ find all strings (space)a(space)b(space)c<br> </p> </blockquote> Good thought, but it is the last thing, so spaces are ok. But for compatibility it may be a good idea. "Least surprise" principle. <br> <blockquote type="cite" cite="mid:b53...@im..."> <p> </p> <ul> <li>Should this be named GREP? Which term would be more or less familiar to our users? Both? with one an alias of the other?</li> </ul> <p>Personally my preference is for REGEX_MATCH and REGEX_CHANGE since it more closely matches (pun intended) how the matching takes place than GREP does. Having one the alias of the other works too.<br> </p> </blockquote> At the moment, it is <b>REGEX</b> with <b>GREP</b> as an alias. <br> <blockquote type="cite" cite="mid:b53...@im..."> <p> </p> <font face="Century Gothic">Just my thoughts from the rookie. :-)<br> </font></blockquote> <p><font face="Century Gothic">Thank you.</font></p> <p><font face="Century Gothic">Here is the current header documentation. Last change 7/1:</font></p> <p><font face="Century Gothic"> <blockquote type="cite"><tt>/** regex<br> <br> >>--+--REGEX--+--+--------------------------+--regex_string-(1)---><<br> +--GREP---+ +-(--| options_string |--)-+<br> <br> options_string:<br> +----------------------------+<br> |--v-+------------------------+-+--|<br> +-Numbers----------------+ (2)<br> +-Before-+-1------+------+ (3)<br> | +-number-+ |<br> +-After-+-1------+-------+ (3)<br> | +-number-+ |<br> +-Context-+-1------+-----+ (4)<br> | +-number-+ |<br> +-NOSeparator------------+ (5)<br> +-Separator-+-/--/----+--+ (5)<br> | +-DString-+ |<br> +-Tertiary---------------+ (6)<br> <br> NetRexx Pipelines only.<br> Records matching the RegEx are put out on primary output.<br> Records not matching are put out on secondary, if connected, or discarded.<br> <br> (1) Regex_string is a Java RegEx expresion. Null string passes all records.<br> (2) Records are prefaced with records number, 10 characters, right justified.<br> (3) Number of records put out after a matching record.<br> (4) Number of records put out before and after a matching record.<br> (5) Inserted before a group of "before records" or the found record with "after records."<br> (6) Send all matching records (no numbers) to tertiary output stream, if connected.<br> <br> */</tt></blockquote> <br> </font></p> <blockquote type="cite" cite="mid:b53...@im..."><font face="Century Gothic"> <br> Cheers<br> Colin K<br> <br> </font><br> <div class="moz-cite-prefix">On 2020-06-30 20:19, Jeff Hennick wrote:<br> </div> <blockquote type="cite" cite="mid:586...@Je..."> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> <p>I have added the CONTEXT <i>number</i> option. This reports not only the matching record, but some before and after it also. Also added are BEFORE and AFTER to get contextual records in one direction. There is an optional SEPARATOR to set off the groups of records. It defaults to "--".</p> <p> </p> <blockquote type="cite"><tt>/** regex<br> <br> >>--<b>REGEX</b>--+--------------------------+--<i>regex_string</i>-(1)---><<br> +-(--| <i>options_string</i> |--)-+<br> <br> <b>options_string</b>:<br> +----------------------------+<br> |--v-+------------------------+-+--|<br> +-NUMBERS----------------+ (2)<br> +-BEFORE-+-<i>0</i>------+------+ (3)<br> | +-<i>number</i>-+ |<br> +-AFTER-+-<i>0</i>------+-------+ (3)<br> | +-<i>number</i>-+ |<br> +-CONTEXT-+-<i>0</i>------+-----+ (4)<br> | +-<i>number</i>-+ |<br> +-NOSEPARATOR------------+<br> +-SEPARATOR-+- -- ----+--+<br> | +-<i>DString</i>-+ |<br> <br> Records matching the RegEx are put out on primary output<br> Records not matching are put out on secondary, if connected, or discarded.<br> <br> (1) string is a Java RegEx expresion. null string passes all records.<br> (2) lines are prefaced with line number, 10 characters, right justified<br> (3) number of records put out after a matching record<br> (4) number of records put out before and after a matching record<br> <br> */<br> </tt></blockquote> <br> This brings it up, functionally, almost to GNU GREP 3.4 (minus all of its file input options). <p>A few things for discussion:</p> <ul> <li>Would it ever help to put matched records out on the tertiary stream without the context records? Unmatched ones are already on the secondary.</li> <li>GREP has an option to just put out the COUNT of matched records. Do we have any use for this? ( REGEX string | COUNT LINES does the same.)<br> </li> <li>Possibly the regex_string should be a delimited string. This because a potential REGEX_CHANGE stage would have two delimited strings.</li> <li>Should this be named GREP? Which term would be more or less familiar to our users? Both? with one an alias of the other?</li> </ul> (Oops, just spotted a bug: BEFORE, etc, without a number works if it is the last option, but not otherwise. Something for the morning fix.)<br> <p>Jeff<br> </p> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ netrexx-pipelines mailing list <a class="moz-txt-link-abbreviated" href="mailto:net...@li..." moz-do-not-send="true">net...@li...</a> <a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/netrexx-pipelines" moz-do-not-send="true">https://lists.sourceforge.net/lists/listinfo/netrexx-pipelines</a> </pre> </blockquote> <br> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <br> <fieldset class="mimeAttachmentHeader"></fieldset> <pre class="moz-quote-pre" wrap="">_______________________________________________ netrexx-pipelines mailing list <a class="moz-txt-link-abbreviated" href="mailto:net...@li...">net...@li...</a> <a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/netrexx-pipelines">https://lists.sourceforge.net/lists/listinfo/netrexx-pipelines</a> </pre> </blockquote> </body> </html> |