Menu

New java.util.regex for Java 1.4

Wilson Na
2002-03-07
2002-03-11
  • Wilson Na

    Wilson Na - 2002-03-07

    The latest beta release of Java 1.4 has a regular expressions library *MUCH* faster than PerlHelp's. Strongly recommended!!! :D

     
    • Jason Baldridge

      Jason Baldridge - 2002-03-07

      In fact, Meghan Pike already wrote a component for finding Date elements in text using the java.util.regex package, but we decided not to add it yet because we didn't feel ready to go on to 1.4. 

      I am wondering again about stepping up to 1.4, but Eric and Gann had some reasons to hold off on that.  Can you guys remind me?  I can't find the emails we exchanged about this.  Gate (http://gate.ac.uk) is using 1.4, and they are really concerned with stability and all that.... is there still reason to wait?

      The PerlHelp stuff was based on gnu.regexp, which Eric notes was a memory hog and slow.  Since most of the functions we needed anyway were quite simple, I rewrote the expressions as methods in the latest version of opennlp.common.PerlHelp.  Did you use that version?

       
      • Eric Friedman

        Eric Friedman - 2002-03-07

        When we last discussed this, 1.4 wasn't final, so that was the primary reason for holding off.

        Unfortunately, my application can't move to 1.4 right away because none of the major DB vendors we integrate with (Oracle, M$) have released JDBC drivers that work with JDBC 3.0, which is now part of 1.4.  So, upgrading has gotten harder than it used to be because the dependency set for large scale applications has become more multi-faceted.

        A couple of solutions are possible: 

        1.  Use reflection to dynamically load an implementation of PerlHelp that uses the regex stuff when 1.4 is present and the GNUregex stuff when it's not.

        2.  Create a terminal release branch for grok/maxent for 1.3 VMs and move the head over to the 1.4 stuff.

        Eric

         
      • Wilson Na

        Wilson Na - 2002-03-08

        Do you mean the switch statements for Sentence Detection?

        Still, nothing beats a real regexp in flexibility! :D I brought this up coz' the EnglishBasicAffix analysis was really slowing down the POSTagging process. java.util.regex was so quick, I hardly noticed any time penalty. :)

        But maybe this can wait till it's final...

         
        • Jason Baldridge

          Jason Baldridge - 2002-03-11

          I agree that we should move over to 1.4 sometime in the near future, and I think we should just take Eric's second suggestion of creating a branch of Grok/OpenNLP for 1.3.  The problem is that I simply can't do it right now.  So, are there any volunteers to do some repository maintenence?  Otherwise, I'll get to it in late May probably.

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.