regexkit-discussion Mailing List for RegexKit
Status: Beta
Brought to you by:
jengelhart
You can subscribe to this list here.
| 2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2008 |
Jan
|
Feb
(2) |
Mar
(4) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
|
From: CB <id...@gm...> - 2009-10-18 12:57:50
|
Is RegExKit a dead project? It desperately needs updating for Snow Leopard. |
|
From: John E. <joh...@gm...> - 2009-03-17 18:38:36
|
On Sat, Mar 14, 2009 at 6:51 PM, Aaron Warner <pap...@gm...> wrote:
> Hello,
> The following code block results in an infinite loop. The subject string
> that is commented out, seems to work ok. I'have tried the same expression on
> other engines without this problem. I should note that this expression comes
> from the textmate c code bundle. Am I missing anything here?
> {
> // capture preprocessor macro
> NSString *regexString
> = @"(?ms)((?x)^\\s*\\#\\s*(define)\\s+((?<id>[a-zA-Z_][a-zA-Z0-9_]*))(?:(\\()(\\s*\\g{id}\\s*((,)\\s*\\g{id}\\s*)*(?:\\.\\.\\.)?)(\\)))?.*?(?=(?://|/\\*))|$)";
> NSString *subjectString = @"/";
> // NSString *subjectString = @"#define HELLO(a,b) hello(a,b) // say hello";
> NSUInteger matchCount = 0;
>
> // make sure regexString doesn't cause an infinate loop
> RKEnumerator *matchEnumerator = [subjectString
> matchEnumeratorWithRegex:regexString];
> while([matchEnumerator nextRanges] != NULL) {
> matchCount++;
> if(matchCount++ > 10) {
> assert(0);
> }
> }
This is a bug in RegexKit.framework <= 0.6. It's a rather simple fix
if you're willing to edit the sources. The problem, and a fix, is
covered in the following forum message:
http://sourceforge.net/forum/forum.php?thread_id=2660738&forum_id=731002
There is also a bug opened on this issue as well:
http://sourceforge.net/tracker/index.php?func=detail&aid=1958025&group_id=204582&atid=990188
I really should get around to patching up this and a few other
miscelanous bugs in RegexKit.framework, I just haven't gotten around
to it. Most of my effort has been on RegexKitLite (which uses the
system supplied ICU regex matcher, not PCRE) because it's fairly
popular with iPhone developers (much smaller size since the system
supplies the regex matcher in a shared library, and no apple imposed
'no external frameworks' limitations).
|
|
From: Aaron W. <pap...@gm...> - 2009-03-14 22:51:36
|
Hello,
The following code block results in an infinite loop. The subject
string that is commented out, seems to work ok. I'have tried the same
expression on other engines without this problem. I should note that
this expression comes from the textmate c code bundle. Am I missing
anything here?
{
// capture preprocessor macro
NSString *regexString = @"(?ms)((?x)^\\s*\\#\\s*(define)\\s+((?<id>
[a-zA-Z_][a-zA-Z0-9_]*))(?:(\\()(\\s*\\g{id}\\s*((,)\\s*\\g{id}\\s*)*
(?:\\.\\.\\.)?)(\\)))?.*?(?=(?://|/\\*))|$)";
NSString *subjectString = @"/";
// NSString *subjectString = @"#define HELLO(a,b) hello(a,b) // say
hello";
NSUInteger matchCount = 0;
// make sure regexString doesn't cause an infinate loop
RKEnumerator *matchEnumerator = [subjectString
matchEnumeratorWithRegex:regexString];
while([matchEnumerator nextRanges] != NULL) {
matchCount++;
if(matchCount++ > 10) {
assert(0);
}
}
thanks,
kube |
|
From: Mark P. <msp...@gm...> - 2008-08-16 03:58:05
|
I examined the numerical value of NSScannedOption and decided to give
this a try in RegexKitLite.m:
#ifndef iPhone
else { if((splitStrings = rkl_realloc(&scratchBuffer[1],
splitStringsSize, (NSUInteger)NSScannedOption)) == NULL) { goto
exitNow; } } # <<<< original line 579
#else
else { if((splitStrings = rkl_realloc(&scratchBuffer[1],
splitStringsSize, (NSUInteger)1)) == NULL) { goto exitNow; } }
#endif
Now the test app link_example.m app installs and runs on a real phone.
Whether this actually solves a problem with the code running on an
iPhone, I do not know. I'm a newcomer to the Mac, and fixing one
problem may cause another. iow, my fix is naive.
I also decided to try to figure out where NSScannedOption originates,
and found this:
$ sudo find /usr /System/ /Developer/ -name \*.h |xargs grep
NSScannedOption 2>/dev/null
/System//Library/Frameworks/Foundation.framework/Versions/C/Headers/
NSZone.h: NSScannedOption = (1<<0),
/Developer//Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS2.0.sdk/
System/Library/Frameworks/Foundation.framework/Headers/NSZone.h:
NSScannedOption = (1<<0),
/Developer//Platforms/iPhoneSimulator.platform/Developer/SDKs/
iPhoneSimulator2.0.sdk/System/Library/Frameworks/Foundation.framework/
Versions/C/Headers/NSZone.h: NSScannedOption = (1<<0),
/Developer//SDKs/MacOSX10.4u.sdk/System/Library/Frameworks/
Foundation.framework/Versions/C/Headers/NSZone.h: NSScannedOption =
(1<<0)
/Developer//SDKs/MacOSX10.5.sdk/System/Library/Frameworks/
Foundation.framework/Versions/C/Headers/NSZone.h: NSScannedOption =
(1<<0),
indicating that NSScannedOption lies completely within NSZone.h, at
least in the likely directory trees.
My guess is that the build process for the iPhone simulator config
differs from the device config in some way that includes NSZone.h for
the former, but not the latter. Or some such.
On Aug 15, 2008, at 3:52 PM, Mark Petrovic wrote:
> Good day.
>
> I'm attempting to use the RegexKitLite code in an iPhone app. When
> I attempt to build the application for deployment to a real phone,
> Xcode produces an error to the effect:
>
> error: 'NSScannedOption' undeclared (first use in this function)
> RegexKitLite.m line 579
>
> I'm fairly new to Mac and iPhone programming, and I suspect this
> class is not available in the iPhone SDK. Is there a workaround for
> this?
>
> I would note that building the app to run in the simulator works
> fine. Only when I build for the real device do I run into this
> unresolved reference.
>
> Thanks.
|
|
From: Mark P. <msp...@gm...> - 2008-08-15 22:52:57
|
Good day. I'm attempting to use the RegexKitLite code in an iPhone app. When I attempt to build the application for deployment to a real phone, Xcode produces an error to the effect: error: 'NSScannedOption' undeclared (first use in this function) RegexKitLite.m line 579 I'm fairly new to Mac and iPhone programming, and I suspect this class is not available in the iPhone SDK. Is there a workaround for this? I would note that building the app to run in the simulator works fine. Only when I build for the real device do I run into this unresolved reference. Thanks. |
|
From: John E. <joh...@gm...> - 2008-03-06 22:24:26
|
On Mar 6, 2008, at 7:50 AM, Jonathan Dann wrote:
> Hi All,
>
> I added a working regex to my app to search through a string, but
> noticed that I was getting a log like this:
>
> CFPropertyListCreateFromXMLData(): Old-style plist parser: missing
> semicolon in dictionary.
>
> This doesn't cause a crash but happens with any regex I try, even the
> simplest test case below shows up this error:
>
> - (void)findMatches;
> {
> NSString *body = @"section section";
> NSString *regex = @"section";
> NSRange range = [body rangeOfRegex:regex];
> }
>
> I've added the #import <RegexKit/RegexKit.h> to my .pch file.
>
> Is this a known error? Or even a problem?
It's a known problem that got past me when developing 0.6.0. The fix
is trivial: The problem lies in the frameworks Resources/English.lproj/
Localizable.strings file. I would recommend opening the 'master'
framework version (likely /Developer/Local/Frameworks/
RegexKit.framework/Versions/A/Resources/English.lproj/
Localizable.strings) with Xcode and use Xcodes find and replace in
Regular Expression mode searching for the regex "$ (quote dollar-
sign). Each line is required to end in a semi-colon, and this will
find all the lines that end in just a quote, you'll need to add a ;
semi-colon to those lines. As I recall, there's about three to four.
Once fixed, you'll need to rebuild your app so that the fixed version
is copied in to your apps bundle. That should clear up the problem
for you. If it doesn't, another possibility is that you're running
Safari AdBlock and running in to an oddity of how input managers work.
You'll need to copy the fixed version of Localized.strings to /Library/
InputManagers/Safari\ AdBlock/Safari\ AdBlock.bundle/Contents/
Frameworks/RegexKit.framework/Versions/A/Resources/English.lproj/
Localizable.strings
|
|
From: Jonathan D. <j.p...@gm...> - 2008-03-06 12:51:15
|
Hi All,
I added a working regex to my app to search through a string, but
noticed that I was getting a log like this:
CFPropertyListCreateFromXMLData(): Old-style plist parser: missing
semicolon in dictionary.
This doesn't cause a crash but happens with any regex I try, even the
simplest test case below shows up this error:
- (void)findMatches;
{
NSString *body = @"section section";
NSString *regex = @"section";
NSRange range = [body rangeOfRegex:regex];
}
I've added the #import <RegexKit/RegexKit.h> to my .pch file.
Is this a known error? Or even a problem?
Thanks,
Jon
|
|
From: John E. <joh...@gm...> - 2008-03-05 02:09:53
|
On Mar 4, 2008, at 5:54 PM, Jonathan Dann wrote:
> Hi All,
>
> I found RegexKit after being pointed to it by a reply on the cocoa-dev
> list, its ace. I was then informed that PCRE may not yet support
> proper word breaks (sorry if that's not the correct terminology) in
> scripts like Kanji. Is this still the case? I'm expecting to support
> more languages other than English in my app an this may end up being a
> headache for me.
Unicode is pretty complex, and I'm an English only speaker, so I'm at
a disadvantage as to what a "word break" means in this precise
context. I'm pretty sure it's not "space or tab" :).
PCRE has a build time option of supporting UTF-8 Unicode and
optionally Unicode Properties (the \p{} & \P{} syntax). The PCRE
built in to RegexKit includes both, so it's as Unicode enabled as PCRE
can get. Foundation uses UTF-16 as it's abstract representation of
text, even though internally it may keep the text in any format that
happens to be the most convenient for it. PCRE uses only UTF-8 to
represent text, however, so things like NSRange values for a piece of
text can have to wildly different values depending on whether or not
they are for the UTF-8 or UTF-16 representation. RegexKit tries to
hide these differences from you and "do the right thing", which
generally translates in to "If has to do with Foundation (NSString,
etc), then all values are in UTF-16, otherwise raw, low level byte
buffer access is in UTF-8." More specific details are covered in the
RKRegex class documentation.
I'd recommend looking at http://regexkit.sourceforge.net/Documentation/pcre/pcresyntax.html#SEC5
and http://regexkit.sourceforge.net/Documentation/pcre/pcrepattern.html#SEC3
(the section regarding Unicode). There is probably a Unicode
property that covers "word break", or something like it. For example,
\p{L} matches any Unicode letter, so something like \p{L}+ could be
used to match all 'words', but again this is from an English only
speaker.
The other operator that is typically used for matching "word breaks"
is the \b (zero-width non-word character) and \B (logical opposite).
However, under PCRE, 'word character' is ASCII word characters only,
but I would think that the equivalent could be fashioned out of the
\p{} Unicode properties. The ICU documentation is unclear on what \b
matches precisely, other than 'word to non-word character transition'
and 'seems' to point to a different ICU API for doing 'better word
boundary analysis'. http://www.icu-project.org/userguide/regexp.html
and http://www.icu-project.org/userguide/boundaryAnalysis.html .
A zero order approximation gives me the impression that both PCRE and
ICU are both equally capable at finding 'simple' words (those composed
of letters), but neither is capable of complex 'word breaking' by
themselves, such as word breaking something like "that's", in which
the ' is clearly a part of the word.
I'd recommend the PCRE mailing list for a more informed opinion, which
can be found at http://lists.exim.org/mailman/listinfo/pcre-dev There
you'll find the developers and likely someone who can give you a more
authoritative answer than my speculation.
Towards the future, the latest release started a move to generalize
what regex pattern matching library is used with the first obvious
candidate being the ICU library that ships with Mac OS X. I have ICU
pattern matching working in extremely rough form now, but I'm not
terribly happy with the way it's going. The ICU library presents a
number of problems, ones that I didn't really take in to consideration
when I first started this project. The first is the ICU regex API was
clearly not designed with multi-threading in mind. Using the C API,
when a regex is compiled, a "regex matcher" is returned. You then
"set" the string (which /must/ be UTF-16) which the regex matches.
This mixes the state of what's being matched, and how far in to the
string the current match is with that compiled regex. This requires
"compiling" the regex for each thread, and an awful lot of overhead
and per thread information. It's a massive inconvenience, and right
now I'm not sure it's actually worth all the effort. There's also the
fact that every string needs to be converted in to UTF-16 before it
can be matched by ICU. While PCRE requires everything to be converted
to UTF-8, I've found that in practice this isn't actually a problem as
most of the time strings seem to be kept in UTF-8 or a UTF-8
compatible encoding (ie, ascii). RegexKit expends a lot of effort to
get access to the raw NSString buffer to avoid the constant allocation
and destruction of temporary strings for a one time match, but that
raw buffer obviously has to be in a UTF-8 compatible encoding for that
to happen. It would seem that using ICU would require an almost
constant conversion of strings to UTF-16 for a one time match, but
this is very much application and usage sensitive.
On top of those issues, the ICU library that ships with OS X is
'technically' not supported for developer use, it's mostly there for
apples internal usage. So by using it, you could essentially be
considered to be using an 'unpublished, private API'. The regex
syntax of PCRE is also much richer than the regex syntax provided by
ICU. Examples include Named Subcapture (not just $1 numbers, but
$name for a subpattern, very handy), conditional subpatterns,
recursive subpatterns, subpattern subroutines, etc. From what I can
tell, PCRE has every feature of the ICU regex pattern matcher, and a
lot more. The only area I'm not entirely sure on is the particulars
regarding the \p{} Unicode property support, the ICU documentation
only says that it supports it, but gives no other details as to what
exactly is supported. In general, the PCRE regex engine tends to be
one of the fastest regex matchers to boot.
I've attached my consolidated ICU Regex C API header file in case the
ICU library is a drop dead requirement for you. This is for the C API
of just the ICU Regex matcher only, nothing else. It's an all in one
file, nothing else needs to be included (I think, at least not from
the ICU headers at least). You'll need to link against /usr/lib/
libicu.dylib (typically just a -licu to the compiler/linker). The ICU
documentation can be found at http://www.icu-project.org/apiref/icu4c/uregex_8h.html
Hopefully it's enough to give you a head start if that's the route
you need to go. Again, I'm not sure if the next version of RegexKit
will include ICU support even though I've got it limping along right
now due to some other problems I outlined above. It probably will
make it in eventually, but it's sort of iffy for the next release.
|
|
From: Jonathan D. <j.p...@gm...> - 2008-03-04 22:54:38
|
Hi All, I found RegexKit after being pointed to it by a reply on the cocoa-dev list, its ace. I was then informed that PCRE may not yet support proper word breaks (sorry if that's not the correct terminology) in scripts like Kanji. Is this still the case? I'm expecting to support more languages other than English in my app an this may end up being a headache for me. Thanks, Jon |
|
From: John E. <joh...@gm...> - 2008-02-03 01:34:25
|
On Feb 2, 2008, at 8:07 PM, Aria Stewart wrote:
> There's a lot of MacOS-isms in RegexKit. I've spent most of a morning
> hacking them out, though there's a handful I don't know what to do
> about
> (pthread_*_np functions especially, since they don't exist under
> Linux).
> Any ideas there for reducing or avoiding the need for them would be
> awesome.
Unfortunately I don't have access to a Linux machine, so it's hard for
me to test against it. :) I have some FreeBSD machines that I do
most of my GNUstep testing against, and it (obviously) works there.
It also helps me weed out and contain the OSX stuff. OSX is
(obviously) the primary development platform for RegexKit, but I try
to keep it portable and useable under GNUstep (hence the number of
#define ENABLE_* like flags for turning bits and pieces on and off,
like garbage collection, dtrace, etc).
The pthread_*_np functions are "non-portable", usually implementation
specific but the one that's used (pthread_main_np) is fairly common.
This is isolated in the file RegexKitPrivateThreads.h file with the
macro/function RKIsMainThread. For example, the solaris version ends
up being:
RKREGEX_STATIC_INLINE void RKIsMainThread(void) { thr_main(); }
I'm sure Linux has a similar function.
Actually, I just did a quick check to see where RKIsMainThread is
called and it's... currently not. So, it would probably be safe to
put something like
#define RKIsMainThread() abort()
which will trip any uses of it (which there should be none of). I
can't remember specifically why this was originally put in, but the
need has obviously changed.
>
> Also, why does RegexKit depend on the GUI portions of AppKit? GNUstep,
> for example, has no NSShadow class, and I can't see a reason for its
> use
> in RegexKit. Any comments?
This almost certainly comes from the function
'RKErrorForCompileInitFailure' in RKRegex.m. If you take a look at
it, you'll see there's a #if check to seperate out the 'extra' Mac OS
X Cocoa bits from what GNUstep provides. Maybe I botched it this time
around (this is a new function as I split out the old exception
throwing behavior from the newer NSError reporting functionality).
The reason for the usage of NSShadow is there is an NSAttributedString
that is created that 'highlights' the character in which PCRE detected
the error by using a red text shadow and increasing the kern around
that character to further emphasize it. It's primarily intended for
GUI's as an enhanced way of highlighting the problem.
>
> Aria Stewart
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/_______________________________________________
> Regexkit-discussion mailing list
> Reg...@li...
> https://lists.sourceforge.net/lists/listinfo/regexkit-discussion
|
|
From: Aria S. <are...@nb...> - 2008-02-03 01:07:32
|
There's a lot of MacOS-isms in RegexKit. I've spent most of a morning hacking them out, though there's a handful I don't know what to do about (pthread_*_np functions especially, since they don't exist under Linux). Any ideas there for reducing or avoiding the need for them would be awesome. Also, why does RegexKit depend on the GUI portions of AppKit? GNUstep, for example, has no NSShadow class, and I can't see a reason for its use in RegexKit. Any comments? Aria Stewart |
|
From: Alfonso G. <hup...@gm...> - 2007-12-26 01:59:42
|
The first error I ran up against was the undefined PTHREAD_RWLOCK_INITIALIZER in readWriteLock = PTHREAD_RWLOCK_INITIALIZER; which I replaced with pthread_rwlock_init(&readWriteLock, NULL); as that would seem to be the intended purpose, just as it is in the line following. The next error I ran up against was the lack of thread affinity support in Tiger, at which point I gave up on it and am now attempting to integrate 0.4beta into my project. However, being a plugin, it requires an adjustment to its installation dir setting. I don't know how well it works as of yet, but the API looks to be very useful. I am particularly interested in your library for its simple type conversion and sub-expression extraction facility, but for the amount of time spent trying to integrate it, I might have had written (and yet still might) a hand-crafted routine for this update and a RegexKit version for the next (which should be a few days from now). I also hope whatever ails the ML archive is fixed by this report. Alfonso Guerra President Apokalypse Software Corp. Mori - Your notes, organized. Clockwork - On time, in style. |