Menu

A nice optimization for LuaString.raweq.

2014-01-16
2014-07-03
  • Stefan Reich

    Stefan Reich - 2014-01-16

    First, cache hash codes (add this to LuaString.java):

    /** stefan: Cached hash code. Added for optimization */
    private int hashCode;
    

    Replace LuaString's hashCode method:

    public int hashCode() {
    if (hashCode == 0) {
      int h = m_length;  /* seed */
      int step = (m_length>>5)+1;  /* if string is too long, don't hash all its chars */
      for (int l1=m_length; l1>=step; l1-=step)  /* compute hash */
          h = h ^ ((h<<5)+(h>>2)+(((int) m_bytes[m_offset+l1-1] ) & 0x0FF ));
      hashCode = h;
    }
        return hashCode;
    }
    

    That was the preparation.

    And now for the master piece!

    Add this to LuaString.java:

    private static final boolean profile = false;
    public static int invocations, notSameInvocations, notSameLengthInvocations, rawCompares;
    public static boolean dumpRaweq = false;
    
    public static final boolean makeStringsMoreEqualOptimization = true;
    

    Make m_bytes and m_offset not final:

    /** The bytes for the string */
    public /*final*/ byte[] m_bytes;
    
    /** The offset into the byte array, 0 means start at the first byte */
    public /*final*/ int    m_offset;
    

    Then replace raweq:

    public boolean raweq( LuaString s ) {
    if (profile)
      ++invocations;
    if ( this == s )
            return true;
    if (profile)
      ++notSameInvocations;
        if ( s.m_length != m_length )
            return false;
    if (profile)
      ++notSameLengthInvocations;
    if ( s.m_bytes == m_bytes && s.m_offset == m_offset )
            return true;
        if ( s.hashCode() != hashCode() )
            return false;
    if (profile)
      ++rawCompares;
    if (dumpRaweq)
      System.out.println(this + " / " + s);
        for ( int i=0; i<m_length; i++ )
            if ( s.m_bytes[s.m_offset+i] != m_bytes[m_offset+i] )
                return false;
    
    if (makeStringsMoreEqualOptimization) {
      if (s.m_bytes.hashCode() < m_bytes.hashCode()) {
        m_bytes = s.m_bytes;
        m_offset = s.m_offset;
      } else {
        s.m_bytes = m_bytes;
        s.m_offset = m_offset;
      }
    }
    
        return true;
    }
    

    Turns out that this eliminates almost all raw string comparisons. In my tests, I have about 100 raw comparisons in over 1 million calls to raweq.

    Oh, and before it, raweq was the function that most time was spent in. Actually, it's still that function after the optimization, so there is probably room for optimizing somewhere else (less table lookups?).

     
  • vanyatka

    vanyatka - 2014-07-03

    Awesomely noted about hashcode caching, Stefan!
    However, it looks like authors do imply a degree of mutability by omitting this optimisation.

     

Log in to post a comment.