Re: [netrexx-pipelines] REGEX stage now with "context" and complete(?)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I've been away for a while and am just catching up on some old emails.

If I read the diagram correctly I could say BEFORE <n> AFTER <m> to get 
the same effect as CONTEXT.  Is that a fair statement or am I not 
misunderstanding the CONTEXT option?

To your questions, here are my 2 cents worth if I am not too late to 
chime in. . . .

  * Would it ever help to put matched records out on the tertiary stream
    without the context records?  Unmatched ones are already on the
    secondary.

Interesting idea.  At the moment I don't think I have an opinion either 
way.  As the number of streams increases does it increase the complexity 
and readability of a PIPE?

  * GREP has an option to just put out the COUNT of matched records.  Do
    we have any use for this? ( REGEX  string | COUNT LINES does the same.)

Would there be a huge performance benefit to providing this 
functionality?  If not, then my vote leans towards "no".  REGEX string | 
COUNT LINES should suffice.

  * Possibly the regex_string should be a delimited string.  This
    because a potential REGEX_CHANGE stage would have two delimited strings.

I would have thought that regex_string would have to a delimited string 
as wouldn't you need to handle the case of a blank in the middle of your 
regex_string?  E.g.    regex a b c    vs    regex / a b c/  find all 
strings (space)a(space)b(space)c

  * Should this be named GREP?  Which term would be more or less
    familiar to our users?  Both? with one an alias of the other?

Personally my preference is for REGEX_MATCH and REGEX_CHANGE since it 
more closely matches (pun intended) how the matching takes place than 
GREP does.  Having one the alias of the other works too.

Just my thoughts from the rookie.  :-)

Cheers
Colin K

On 2020-06-30 20:19, Jeff Hennick wrote:
>
> I have added the CONTEXT /number/ option.  This reports not only the 
> matching record, but some before and after it also.  Also added are 
> BEFORE and AFTER to get contextual records in one direction.  There is 
> an optional SEPARATOR to set off the groups of records.  It defaults 
> to "--".
>
>> /** regex
>>
>> >>--*REGEX*--+--------------------------+--/regex_string/-(1)---><
>>            +-(--| /options_string/ |--)-+
>>
>> *options_string*:
>>    +----------------------------+
>> |--v-+------------------------+-+--|
>>      +-NUMBERS----------------+ (2)
>>      +-BEFORE-+-/0/------+------+ (3)
>>      |        +-/number/-+      |
>>      +-AFTER-+-/0/------+-------+ (3)
>>      |       +-/number/-+       |
>>      +-CONTEXT-+-/0/------+-----+ (4)
>>      |         +-/number/-+     |
>>      +-NOSEPARATOR------------+
>>      +-SEPARATOR-+- -- ----+--+
>>      |           +-/DString/-+  |
>>
>>  Records matching the RegEx are put out on primary output
>>  Records not matching are put out on secondary, if connected, or 
>> discarded.
>>
>> (1) string is a Java RegEx expresion. null string passes all records.
>> (2) lines are prefaced with line number, 10 characters, right justified
>> (3) number of records put out after a matching record
>> (4) number of records put out before and after a matching record
>>
>> */
>
> This brings it up, functionally, almost to GNU GREP 3.4 (minus all of 
> its file input options).
>
> A few things for discussion:
>
>   * Would it ever help to put matched records out on the tertiary
>     stream without the context records?  Unmatched ones are already on
>     the secondary.
>   * GREP has an option to just put out the COUNT of matched records. 
>     Do we have any use for this? ( REGEX  string | COUNT LINES does
>     the same.)
>   * Possibly the regex_string should be a delimited string. This
>     because a potential REGEX_CHANGE stage would have two delimited
>     strings.
>   * Should this be named GREP?  Which term would be more or less
>     familiar to our users?  Both? with one an alias of the other?
>
> (Oops, just spotted a bug: BEFORE, etc, without a number works if it 
> is the last option, but not otherwise.  Something for the morning fix.)
>
> Jeff
>
>
>
> _______________________________________________
> netrexx-pipelines mailing list
> net...@li...
> https://lists.sourceforge.net/lists/listinfo/netrexx-pipelines

Re: [netrexx-pipelines] REGEX stage now with "context" and complete(?)

The open sourced NetRexx reference implementation

Re: [netrexx-pipelines] REGEX stage now with "context" and complete(?)