In fact, Meghan Pike already wrote a component for finding Date elements in text using the java.util.regex package, but we decided not to add it yet because we didn't feel ready to go on to 1.4.
I am wondering again about stepping up to 1.4, but Eric and Gann had some reasons to hold off on that. Can you guys remind me? I can't find the emails we exchanged about this. Gate (http://gate.ac.uk) is using 1.4, and they are really concerned with stability and all that.... is there still reason to wait?
The PerlHelp stuff was based on gnu.regexp, which Eric notes was a memory hog and slow. Since most of the functions we needed anyway were quite simple, I rewrote the expressions as methods in the latest version of opennlp.common.PerlHelp. Did you use that version?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When we last discussed this, 1.4 wasn't final, so that was the primary reason for holding off.
Unfortunately, my application can't move to 1.4 right away because none of the major DB vendors we integrate with (Oracle, M$) have released JDBC drivers that work with JDBC 3.0, which is now part of 1.4. So, upgrading has gotten harder than it used to be because the dependency set for large scale applications has become more multi-faceted.
A couple of solutions are possible:
1. Use reflection to dynamically load an implementation of PerlHelp that uses the regex stuff when 1.4 is present and the GNUregex stuff when it's not.
2. Create a terminal release branch for grok/maxent for 1.3 VMs and move the head over to the 1.4 stuff.
Eric
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Do you mean the switch statements for Sentence Detection?
Still, nothing beats a real regexp in flexibility! :D I brought this up coz' the EnglishBasicAffix analysis was really slowing down the POSTagging process. java.util.regex was so quick, I hardly noticed any time penalty. :)
But maybe this can wait till it's final...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I agree that we should move over to 1.4 sometime in the near future, and I think we should just take Eric's second suggestion of creating a branch of Grok/OpenNLP for 1.3. The problem is that I simply can't do it right now. So, are there any volunteers to do some repository maintenence? Otherwise, I'll get to it in late May probably.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The latest beta release of Java 1.4 has a regular expressions library *MUCH* faster than PerlHelp's. Strongly recommended!!! :D
In fact, Meghan Pike already wrote a component for finding Date elements in text using the java.util.regex package, but we decided not to add it yet because we didn't feel ready to go on to 1.4.
I am wondering again about stepping up to 1.4, but Eric and Gann had some reasons to hold off on that. Can you guys remind me? I can't find the emails we exchanged about this. Gate (http://gate.ac.uk) is using 1.4, and they are really concerned with stability and all that.... is there still reason to wait?
The PerlHelp stuff was based on gnu.regexp, which Eric notes was a memory hog and slow. Since most of the functions we needed anyway were quite simple, I rewrote the expressions as methods in the latest version of opennlp.common.PerlHelp. Did you use that version?
When we last discussed this, 1.4 wasn't final, so that was the primary reason for holding off.
Unfortunately, my application can't move to 1.4 right away because none of the major DB vendors we integrate with (Oracle, M$) have released JDBC drivers that work with JDBC 3.0, which is now part of 1.4. So, upgrading has gotten harder than it used to be because the dependency set for large scale applications has become more multi-faceted.
A couple of solutions are possible:
1. Use reflection to dynamically load an implementation of PerlHelp that uses the regex stuff when 1.4 is present and the GNUregex stuff when it's not.
2. Create a terminal release branch for grok/maxent for 1.3 VMs and move the head over to the 1.4 stuff.
Eric
Do you mean the switch statements for Sentence Detection?
Still, nothing beats a real regexp in flexibility! :D I brought this up coz' the EnglishBasicAffix analysis was really slowing down the POSTagging process. java.util.regex was so quick, I hardly noticed any time penalty. :)
But maybe this can wait till it's final...
I agree that we should move over to 1.4 sometime in the near future, and I think we should just take Eric's second suggestion of creating a branch of Grok/OpenNLP for 1.3. The problem is that I simply can't do it right now. So, are there any volunteers to do some repository maintenence? Otherwise, I'll get to it in late May probably.