regexkit-discussion Mailing List for RegexKit
Status: Beta
Brought to you by:
jengelhart
You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
|
Feb
(2) |
Mar
(4) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2009 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
From: CB <id...@gm...> - 2009-10-18 12:57:50
|
Is RegExKit a dead project? It desperately needs updating for Snow Leopard. |
From: John E. <joh...@gm...> - 2009-03-17 18:38:36
|
On Sat, Mar 14, 2009 at 6:51 PM, Aaron Warner <pap...@gm...> wrote: > Hello, > The following code block results in an infinite loop. The subject string > that is commented out, seems to work ok. I'have tried the same expression on > other engines without this problem. I should note that this expression comes > from the textmate c code bundle. Am I missing anything here? > { > // capture preprocessor macro > NSString *regexString > = @"(?ms)((?x)^\\s*\\#\\s*(define)\\s+((?<id>[a-zA-Z_][a-zA-Z0-9_]*))(?:(\\()(\\s*\\g{id}\\s*((,)\\s*\\g{id}\\s*)*(?:\\.\\.\\.)?)(\\)))?.*?(?=(?://|/\\*))|$)"; > NSString *subjectString = @"/"; > // NSString *subjectString = @"#define HELLO(a,b) hello(a,b) // say hello"; > NSUInteger matchCount = 0; > > // make sure regexString doesn't cause an infinate loop > RKEnumerator *matchEnumerator = [subjectString > matchEnumeratorWithRegex:regexString]; > while([matchEnumerator nextRanges] != NULL) { > matchCount++; > if(matchCount++ > 10) { > assert(0); > } > } This is a bug in RegexKit.framework <= 0.6. It's a rather simple fix if you're willing to edit the sources. The problem, and a fix, is covered in the following forum message: http://sourceforge.net/forum/forum.php?thread_id=2660738&forum_id=731002 There is also a bug opened on this issue as well: http://sourceforge.net/tracker/index.php?func=detail&aid=1958025&group_id=204582&atid=990188 I really should get around to patching up this and a few other miscelanous bugs in RegexKit.framework, I just haven't gotten around to it. Most of my effort has been on RegexKitLite (which uses the system supplied ICU regex matcher, not PCRE) because it's fairly popular with iPhone developers (much smaller size since the system supplies the regex matcher in a shared library, and no apple imposed 'no external frameworks' limitations). |
From: Aaron W. <pap...@gm...> - 2009-03-14 22:51:36
|
Hello, The following code block results in an infinite loop. The subject string that is commented out, seems to work ok. I'have tried the same expression on other engines without this problem. I should note that this expression comes from the textmate c code bundle. Am I missing anything here? { // capture preprocessor macro NSString *regexString = @"(?ms)((?x)^\\s*\\#\\s*(define)\\s+((?<id> [a-zA-Z_][a-zA-Z0-9_]*))(?:(\\()(\\s*\\g{id}\\s*((,)\\s*\\g{id}\\s*)* (?:\\.\\.\\.)?)(\\)))?.*?(?=(?://|/\\*))|$)"; NSString *subjectString = @"/"; // NSString *subjectString = @"#define HELLO(a,b) hello(a,b) // say hello"; NSUInteger matchCount = 0; // make sure regexString doesn't cause an infinate loop RKEnumerator *matchEnumerator = [subjectString matchEnumeratorWithRegex:regexString]; while([matchEnumerator nextRanges] != NULL) { matchCount++; if(matchCount++ > 10) { assert(0); } } thanks, kube |
From: Mark P. <msp...@gm...> - 2008-08-16 03:58:05
|
I examined the numerical value of NSScannedOption and decided to give this a try in RegexKitLite.m: #ifndef iPhone else { if((splitStrings = rkl_realloc(&scratchBuffer[1], splitStringsSize, (NSUInteger)NSScannedOption)) == NULL) { goto exitNow; } } # <<<< original line 579 #else else { if((splitStrings = rkl_realloc(&scratchBuffer[1], splitStringsSize, (NSUInteger)1)) == NULL) { goto exitNow; } } #endif Now the test app link_example.m app installs and runs on a real phone. Whether this actually solves a problem with the code running on an iPhone, I do not know. I'm a newcomer to the Mac, and fixing one problem may cause another. iow, my fix is naive. I also decided to try to figure out where NSScannedOption originates, and found this: $ sudo find /usr /System/ /Developer/ -name \*.h |xargs grep NSScannedOption 2>/dev/null /System//Library/Frameworks/Foundation.framework/Versions/C/Headers/ NSZone.h: NSScannedOption = (1<<0), /Developer//Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS2.0.sdk/ System/Library/Frameworks/Foundation.framework/Headers/NSZone.h: NSScannedOption = (1<<0), /Developer//Platforms/iPhoneSimulator.platform/Developer/SDKs/ iPhoneSimulator2.0.sdk/System/Library/Frameworks/Foundation.framework/ Versions/C/Headers/NSZone.h: NSScannedOption = (1<<0), /Developer//SDKs/MacOSX10.4u.sdk/System/Library/Frameworks/ Foundation.framework/Versions/C/Headers/NSZone.h: NSScannedOption = (1<<0) /Developer//SDKs/MacOSX10.5.sdk/System/Library/Frameworks/ Foundation.framework/Versions/C/Headers/NSZone.h: NSScannedOption = (1<<0), indicating that NSScannedOption lies completely within NSZone.h, at least in the likely directory trees. My guess is that the build process for the iPhone simulator config differs from the device config in some way that includes NSZone.h for the former, but not the latter. Or some such. On Aug 15, 2008, at 3:52 PM, Mark Petrovic wrote: > Good day. > > I'm attempting to use the RegexKitLite code in an iPhone app. When > I attempt to build the application for deployment to a real phone, > Xcode produces an error to the effect: > > error: 'NSScannedOption' undeclared (first use in this function) > RegexKitLite.m line 579 > > I'm fairly new to Mac and iPhone programming, and I suspect this > class is not available in the iPhone SDK. Is there a workaround for > this? > > I would note that building the app to run in the simulator works > fine. Only when I build for the real device do I run into this > unresolved reference. > > Thanks. |
From: Mark P. <msp...@gm...> - 2008-08-15 22:52:57
|
Good day. I'm attempting to use the RegexKitLite code in an iPhone app. When I attempt to build the application for deployment to a real phone, Xcode produces an error to the effect: error: 'NSScannedOption' undeclared (first use in this function) RegexKitLite.m line 579 I'm fairly new to Mac and iPhone programming, and I suspect this class is not available in the iPhone SDK. Is there a workaround for this? I would note that building the app to run in the simulator works fine. Only when I build for the real device do I run into this unresolved reference. Thanks. |
From: John E. <joh...@gm...> - 2008-03-06 22:24:26
|
On Mar 6, 2008, at 7:50 AM, Jonathan Dann wrote: > Hi All, > > I added a working regex to my app to search through a string, but > noticed that I was getting a log like this: > > CFPropertyListCreateFromXMLData(): Old-style plist parser: missing > semicolon in dictionary. > > This doesn't cause a crash but happens with any regex I try, even the > simplest test case below shows up this error: > > - (void)findMatches; > { > NSString *body = @"section section"; > NSString *regex = @"section"; > NSRange range = [body rangeOfRegex:regex]; > } > > I've added the #import <RegexKit/RegexKit.h> to my .pch file. > > Is this a known error? Or even a problem? It's a known problem that got past me when developing 0.6.0. The fix is trivial: The problem lies in the frameworks Resources/English.lproj/ Localizable.strings file. I would recommend opening the 'master' framework version (likely /Developer/Local/Frameworks/ RegexKit.framework/Versions/A/Resources/English.lproj/ Localizable.strings) with Xcode and use Xcodes find and replace in Regular Expression mode searching for the regex "$ (quote dollar- sign). Each line is required to end in a semi-colon, and this will find all the lines that end in just a quote, you'll need to add a ; semi-colon to those lines. As I recall, there's about three to four. Once fixed, you'll need to rebuild your app so that the fixed version is copied in to your apps bundle. That should clear up the problem for you. If it doesn't, another possibility is that you're running Safari AdBlock and running in to an oddity of how input managers work. You'll need to copy the fixed version of Localized.strings to /Library/ InputManagers/Safari\ AdBlock/Safari\ AdBlock.bundle/Contents/ Frameworks/RegexKit.framework/Versions/A/Resources/English.lproj/ Localizable.strings |
From: Jonathan D. <j.p...@gm...> - 2008-03-06 12:51:15
|
Hi All, I added a working regex to my app to search through a string, but noticed that I was getting a log like this: CFPropertyListCreateFromXMLData(): Old-style plist parser: missing semicolon in dictionary. This doesn't cause a crash but happens with any regex I try, even the simplest test case below shows up this error: - (void)findMatches; { NSString *body = @"section section"; NSString *regex = @"section"; NSRange range = [body rangeOfRegex:regex]; } I've added the #import <RegexKit/RegexKit.h> to my .pch file. Is this a known error? Or even a problem? Thanks, Jon |
From: John E. <joh...@gm...> - 2008-03-05 02:09:53
|
On Mar 4, 2008, at 5:54 PM, Jonathan Dann wrote: > Hi All, > > I found RegexKit after being pointed to it by a reply on the cocoa-dev > list, its ace. I was then informed that PCRE may not yet support > proper word breaks (sorry if that's not the correct terminology) in > scripts like Kanji. Is this still the case? I'm expecting to support > more languages other than English in my app an this may end up being a > headache for me. Unicode is pretty complex, and I'm an English only speaker, so I'm at a disadvantage as to what a "word break" means in this precise context. I'm pretty sure it's not "space or tab" :). PCRE has a build time option of supporting UTF-8 Unicode and optionally Unicode Properties (the \p{} & \P{} syntax). The PCRE built in to RegexKit includes both, so it's as Unicode enabled as PCRE can get. Foundation uses UTF-16 as it's abstract representation of text, even though internally it may keep the text in any format that happens to be the most convenient for it. PCRE uses only UTF-8 to represent text, however, so things like NSRange values for a piece of text can have to wildly different values depending on whether or not they are for the UTF-8 or UTF-16 representation. RegexKit tries to hide these differences from you and "do the right thing", which generally translates in to "If has to do with Foundation (NSString, etc), then all values are in UTF-16, otherwise raw, low level byte buffer access is in UTF-8." More specific details are covered in the RKRegex class documentation. I'd recommend looking at http://regexkit.sourceforge.net/Documentation/pcre/pcresyntax.html#SEC5 and http://regexkit.sourceforge.net/Documentation/pcre/pcrepattern.html#SEC3 (the section regarding Unicode). There is probably a Unicode property that covers "word break", or something like it. For example, \p{L} matches any Unicode letter, so something like \p{L}+ could be used to match all 'words', but again this is from an English only speaker. The other operator that is typically used for matching "word breaks" is the \b (zero-width non-word character) and \B (logical opposite). However, under PCRE, 'word character' is ASCII word characters only, but I would think that the equivalent could be fashioned out of the \p{} Unicode properties. The ICU documentation is unclear on what \b matches precisely, other than 'word to non-word character transition' and 'seems' to point to a different ICU API for doing 'better word boundary analysis'. http://www.icu-project.org/userguide/regexp.html and http://www.icu-project.org/userguide/boundaryAnalysis.html . A zero order approximation gives me the impression that both PCRE and ICU are both equally capable at finding 'simple' words (those composed of letters), but neither is capable of complex 'word breaking' by themselves, such as word breaking something like "that's", in which the ' is clearly a part of the word. I'd recommend the PCRE mailing list for a more informed opinion, which can be found at http://lists.exim.org/mailman/listinfo/pcre-dev There you'll find the developers and likely someone who can give you a more authoritative answer than my speculation. Towards the future, the latest release started a move to generalize what regex pattern matching library is used with the first obvious candidate being the ICU library that ships with Mac OS X. I have ICU pattern matching working in extremely rough form now, but I'm not terribly happy with the way it's going. The ICU library presents a number of problems, ones that I didn't really take in to consideration when I first started this project. The first is the ICU regex API was clearly not designed with multi-threading in mind. Using the C API, when a regex is compiled, a "regex matcher" is returned. You then "set" the string (which /must/ be UTF-16) which the regex matches. This mixes the state of what's being matched, and how far in to the string the current match is with that compiled regex. This requires "compiling" the regex for each thread, and an awful lot of overhead and per thread information. It's a massive inconvenience, and right now I'm not sure it's actually worth all the effort. There's also the fact that every string needs to be converted in to UTF-16 before it can be matched by ICU. While PCRE requires everything to be converted to UTF-8, I've found that in practice this isn't actually a problem as most of the time strings seem to be kept in UTF-8 or a UTF-8 compatible encoding (ie, ascii). RegexKit expends a lot of effort to get access to the raw NSString buffer to avoid the constant allocation and destruction of temporary strings for a one time match, but that raw buffer obviously has to be in a UTF-8 compatible encoding for that to happen. It would seem that using ICU would require an almost constant conversion of strings to UTF-16 for a one time match, but this is very much application and usage sensitive. On top of those issues, the ICU library that ships with OS X is 'technically' not supported for developer use, it's mostly there for apples internal usage. So by using it, you could essentially be considered to be using an 'unpublished, private API'. The regex syntax of PCRE is also much richer than the regex syntax provided by ICU. Examples include Named Subcapture (not just $1 numbers, but $name for a subpattern, very handy), conditional subpatterns, recursive subpatterns, subpattern subroutines, etc. From what I can tell, PCRE has every feature of the ICU regex pattern matcher, and a lot more. The only area I'm not entirely sure on is the particulars regarding the \p{} Unicode property support, the ICU documentation only says that it supports it, but gives no other details as to what exactly is supported. In general, the PCRE regex engine tends to be one of the fastest regex matchers to boot. I've attached my consolidated ICU Regex C API header file in case the ICU library is a drop dead requirement for you. This is for the C API of just the ICU Regex matcher only, nothing else. It's an all in one file, nothing else needs to be included (I think, at least not from the ICU headers at least). You'll need to link against /usr/lib/ libicu.dylib (typically just a -licu to the compiler/linker). The ICU documentation can be found at http://www.icu-project.org/apiref/icu4c/uregex_8h.html Hopefully it's enough to give you a head start if that's the route you need to go. Again, I'm not sure if the next version of RegexKit will include ICU support even though I've got it limping along right now due to some other problems I outlined above. It probably will make it in eventually, but it's sort of iffy for the next release. |
From: Jonathan D. <j.p...@gm...> - 2008-03-04 22:54:38
|
Hi All, I found RegexKit after being pointed to it by a reply on the cocoa-dev list, its ace. I was then informed that PCRE may not yet support proper word breaks (sorry if that's not the correct terminology) in scripts like Kanji. Is this still the case? I'm expecting to support more languages other than English in my app an this may end up being a headache for me. Thanks, Jon |
From: John E. <joh...@gm...> - 2008-02-03 01:34:25
|
On Feb 2, 2008, at 8:07 PM, Aria Stewart wrote: > There's a lot of MacOS-isms in RegexKit. I've spent most of a morning > hacking them out, though there's a handful I don't know what to do > about > (pthread_*_np functions especially, since they don't exist under > Linux). > Any ideas there for reducing or avoiding the need for them would be > awesome. Unfortunately I don't have access to a Linux machine, so it's hard for me to test against it. :) I have some FreeBSD machines that I do most of my GNUstep testing against, and it (obviously) works there. It also helps me weed out and contain the OSX stuff. OSX is (obviously) the primary development platform for RegexKit, but I try to keep it portable and useable under GNUstep (hence the number of #define ENABLE_* like flags for turning bits and pieces on and off, like garbage collection, dtrace, etc). The pthread_*_np functions are "non-portable", usually implementation specific but the one that's used (pthread_main_np) is fairly common. This is isolated in the file RegexKitPrivateThreads.h file with the macro/function RKIsMainThread. For example, the solaris version ends up being: RKREGEX_STATIC_INLINE void RKIsMainThread(void) { thr_main(); } I'm sure Linux has a similar function. Actually, I just did a quick check to see where RKIsMainThread is called and it's... currently not. So, it would probably be safe to put something like #define RKIsMainThread() abort() which will trip any uses of it (which there should be none of). I can't remember specifically why this was originally put in, but the need has obviously changed. > > Also, why does RegexKit depend on the GUI portions of AppKit? GNUstep, > for example, has no NSShadow class, and I can't see a reason for its > use > in RegexKit. Any comments? This almost certainly comes from the function 'RKErrorForCompileInitFailure' in RKRegex.m. If you take a look at it, you'll see there's a #if check to seperate out the 'extra' Mac OS X Cocoa bits from what GNUstep provides. Maybe I botched it this time around (this is a new function as I split out the old exception throwing behavior from the newer NSError reporting functionality). The reason for the usage of NSShadow is there is an NSAttributedString that is created that 'highlights' the character in which PCRE detected the error by using a red text shadow and increasing the kern around that character to further emphasize it. It's primarily intended for GUI's as an enhanced way of highlighting the problem. > > Aria Stewart > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/_______________________________________________ > Regexkit-discussion mailing list > Reg...@li... > https://lists.sourceforge.net/lists/listinfo/regexkit-discussion |
From: Aria S. <are...@nb...> - 2008-02-03 01:07:32
|
There's a lot of MacOS-isms in RegexKit. I've spent most of a morning hacking them out, though there's a handful I don't know what to do about (pthread_*_np functions especially, since they don't exist under Linux). Any ideas there for reducing or avoiding the need for them would be awesome. Also, why does RegexKit depend on the GUI portions of AppKit? GNUstep, for example, has no NSShadow class, and I can't see a reason for its use in RegexKit. Any comments? Aria Stewart |
From: Alfonso G. <hup...@gm...> - 2007-12-26 01:59:42
|
The first error I ran up against was the undefined PTHREAD_RWLOCK_INITIALIZER in readWriteLock = PTHREAD_RWLOCK_INITIALIZER; which I replaced with pthread_rwlock_init(&readWriteLock, NULL); as that would seem to be the intended purpose, just as it is in the line following. The next error I ran up against was the lack of thread affinity support in Tiger, at which point I gave up on it and am now attempting to integrate 0.4beta into my project. However, being a plugin, it requires an adjustment to its installation dir setting. I don't know how well it works as of yet, but the API looks to be very useful. I am particularly interested in your library for its simple type conversion and sub-expression extraction facility, but for the amount of time spent trying to integrate it, I might have had written (and yet still might) a hand-crafted routine for this update and a RegexKit version for the next (which should be a few days from now). I also hope whatever ails the ML archive is fixed by this report. Alfonso Guerra President Apokalypse Software Corp. Mori - Your notes, organized. Clockwork - On time, in style. |