Re: [Gramps-devel] [Gramps-users] Regular expression

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Den Thursday 26 May 2011 19.24.18 skrev John Ralls:
> On May 26, 2011, at 8:44 AM, John Ralls wrote:
> > On May 26, 2011, at 3:37 AM, Rob Healey wrote:
> >> Greetings:
> >> 
> >> I did not even know about the [] and (), so I am grateful that someone
> >> asked the question...
> >> 
> >> Sincerely yours,
> >> Rob G. Healey
> >> 
> >> 
> >> On Thu, May 26, 2011 at 3:27 AM, doug <do...@o2...> wrote:
> >> 
> >> On 25/05/11 21:08, Serge Noiraud wrote:
> >> > Le 25/05/2011 20:36, doug a Ã©crit :
> >> >> On 25/05/11 18:44, Peter Landgren wrote:
> >> >>> Hi,
> >> >>> 
> >> >>> I'm definitely not an expert on regular expressions, so I
> >> >>> need some help:
> >> >>> I would like to easily find people with names spelled
> >> >>> with on or two "s":
> >> >>> Like Nilson and Nilsson in the same person filter search.
> >> >>> 
> >> >>> /Peter
> >> >> 
> >> >> Does this work?
> >> >> 
> >> >> \s*[a-rt-zA-Z]*[s|ss]\w*
> >> > 
> >> > I don't really know how it works in gramps, but the solution
> >> > should be :
> >> > (s|ss)
> >> > 
> >> > The [] means only one character : from a to z and from A to Z
> >> > the () means several characters : in our case s or ss
> >> > 
> >> >> Doug
> >> 
> >> Ah! thanks for that. I hadn't appreciated the difference
> >> between [] and ()
> > 
> > Better and easier to use a lazy quantifier: \b[a-zA-Z]+?(s|ss)[a-z]*\b.
> > Note that \w adds [0-9_], and you probably don't want that when you're
> > matching names. I trust that the code behind this has re.M set so that
> > [a-z] will be interpreted correctly (i.e., not literally, but as any
> > unicode alphabetic character).  "\b" means word boundary, and is better
> > than \s (whitespace) for isolating words... especially "zero or more"
> > whitespace (\s*).
> 
> Oops, that's wrong. There isn't any unicode magic in [a-z] with re.M, so
> the only way to make it work with non-ascii characters is
> \b\w+?(s|ss)\w*\b . Python 3 is supposed to support POSIX character
> classes, so eventually you'll be able to use
> \b[[:alpha:]]+?(s|ss)[[:alpha:]]*\b, which will avoid matching numbers and
> underscores.
> 
> Regards,
> John Ralls

Thanks for all input.

But I needed a very simple regular expression. I wanted to filter out persons, spelling their 
surnames a little different: There are four versions of "Eriksson":
Erikson
Eriksson
Ericson
Ericsson

Which I get with:
eri[ck](s|ss)on

Regards,

Peter

Re: [Gramps-devel] [Gramps-users] Regular expression

Gramps, the open source genealogy program

Re: [Gramps-devel] [Gramps-users] Regular expression