You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
(414) |
Apr
(123) |
May
(448) |
Jun
(180) |
Jul
(17) |
Aug
(49) |
Sep
(3) |
Oct
(92) |
Nov
(101) |
Dec
(64) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
(132) |
Feb
(230) |
Mar
(146) |
Apr
(146) |
May
|
Jun
|
Jul
(34) |
Aug
(4) |
Sep
(3) |
Oct
(10) |
Nov
(12) |
Dec
(24) |
2008 |
Jan
(6) |
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(11) |
Nov
(4) |
Dec
|
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Bryan T. <tho...@us...> - 2009-09-24 14:47:04
|
Update of /cvsroot/cweb/junit-ext/src/java/junit/framework In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv11369/src/java/junit/framework Modified Files: TestCase2.java Log Message: modified assertSameIteratorAnyOrder(...) to work without requiring that the objects implement Comparable. It was relying on a TreeMap, which uses Comparable. It now relies on a HashMap. Of course, if the object does not implement hashCode() and equals() correctly then this will cause unit tests for that object to break. Index: TestCase2.java =================================================================== RCS file: /cvsroot/cweb/junit-ext/src/java/junit/framework/TestCase2.java,v retrieving revision 1.28 retrieving revision 1.29 diff -C2 -d -r1.28 -r1.29 *** TestCase2.java 4 Dec 2007 15:03:03 -0000 1.28 --- TestCase2.java 24 Sep 2009 14:46:56 -0000 1.29 *************** *** 913,1005 **** /** ! * Verifies that the iterator visits the specified objects in some ! * arbitrary ordering and that the iterator is exhausted once all ! * expected objects have been visited. The implementation uses a ! * selection without replacement "pattern". */ - static public void assertSameIteratorAnyOrder - ( Comparable[] expected, - Iterator actual - ) - { - - assertSameIteratorAnyOrder - ( "", - expected, - actual - ); - } /** ! * Verifies that the iterator visits the specified objects in some ! * arbitrary ordering and that the iterator is exhausted once all ! * expected objects have been visited. The implementation uses a ! * selection without replacement "pattern". ! * ! * FIXME Write test cases for this one. ! * ! * FIXME Can we we this without requiring the objects to implement ! * {@link Comparable}? */ ! static public void assertSameIteratorAnyOrder ! ( String msg, ! Comparable[] expected, ! Iterator actual ! ) ! { ! // Populate a map that we will use to realize the match and ! // selection without replacement logic. ! final int nrange = expected.length; ! ! java.util.Map range = new java.util.TreeMap(); ! ! for( int j=0; j<nrange; j++ ) { ! ! range.put ! ( expected[ j ], ! expected[ j ] ! ); ! ! } ! ! // Do selection without replacement for the objects visited by ! // iterator. ! ! for( int j=0; j<nrange; j++ ) { ! ! if( ! actual.hasNext() ) { ! ! fail( msg+": Index exhausted while expecting more object(s)"+ ! ": index="+j ! ); ! ! } ! ! Object actualObject = actual.next(); ! ! if( range.remove( actualObject ) == null ) { ! ! fail( "Object not expected"+ ! ": index="+j+ ! ", object="+actualObject ! ); ! ! } ! ! } ! if( actual.hasNext() ) { ! fail( "Iterator will deliver too many objects." ! ); ! } } //************************************************************ --- 913,1073 ---- /** ! * Verifies that the iterator visits the specified objects in some arbitrary ! * ordering and that the iterator is exhausted once all expected objects ! * have been visited. The implementation uses a selection without ! * replacement "pattern". */ + @SuppressWarnings("unchecked") + static public void assertSameIteratorAnyOrder(final Object[] expected, + final Iterator actual) { + + assertSameIteratorAnyOrder("", expected, actual); } /** ! * Verifies that the iterator visits the specified objects in some arbitrary ! * ordering and that the iterator is exhausted once all expected objects ! * have been visited. The implementation uses a selection without ! * replacement "pattern". */ + @SuppressWarnings("unchecked") + static public void assertSameIteratorAnyOrder(final String msg, + final Object[] expected, final Iterator actual) { ! // Populate a map that we will use to realize the match and ! // selection without replacement logic. ! final int nrange = expected.length; ! java.util.Map range = new java.util.HashMap(); ! for (int j = 0; j < nrange; j++) { ! range.put(expected[j], expected[j]); ! } ! ! // Do selection without replacement for the objects visited by ! // iterator. ! ! for (int j = 0; j < nrange; j++) { ! ! if (!actual.hasNext()) { ! ! fail(msg + ": Index exhausted while expecting more object(s)" ! + ": index=" + j); ! ! } ! ! Object actualObject = actual.next(); ! ! if (range.remove(actualObject) == null) { ! ! fail("Object not expected" + ": index=" + j + ", object=" ! + actualObject); ! ! } ! ! } ! ! if (actual.hasNext()) { ! ! fail("Iterator will deliver too many objects."); ! ! } } + + // /** + // * Verifies that the iterator visits the specified objects in some + // * arbitrary ordering and that the iterator is exhausted once all + // * expected objects have been visited. The implementation uses a + // * selection without replacement "pattern". + // */ + // + // static public void assertSameIteratorAnyOrder + // ( Comparable[] expected, + // Iterator actual + // ) + // { + // + // assertSameIteratorAnyOrder + // ( "", + // expected, + // actual + // ); + // + // } + // + // /** + // * Verifies that the iterator visits the specified objects in some + // * arbitrary ordering and that the iterator is exhausted once all + // * expected objects have been visited. The implementation uses a + // * selection without replacement "pattern". + // * + // * FIXME Write test cases for this one. + // * + // * FIXME Can we we this without requiring the objects to implement + // * {@link Comparable}? + // */ + // + // static public void assertSameIteratorAnyOrder + // ( String msg, + // Comparable[] expected, + // Iterator actual + // ) + // { + // + // // Populate a map that we will use to realize the match and + // // selection without replacement logic. + // + // final int nrange = expected.length; + // + // java.util.Map range = new java.util.TreeMap(); + // + // for( int j=0; j<nrange; j++ ) { + // + // range.put + // ( expected[ j ], + // expected[ j ] + // ); + // + // } + // + // // Do selection without replacement for the objects visited by + // // iterator. + // + // for( int j=0; j<nrange; j++ ) { + // + // if( ! actual.hasNext() ) { + // + // fail( msg+": Index exhausted while expecting more object(s)"+ + // ": index="+j + // ); + // + // } + // + // Object actualObject = actual.next(); + // + // if( range.remove( actualObject ) == null ) { + // + // fail( "Object not expected"+ + // ": index="+j+ + // ", object="+actualObject + // ); + // + // } + // + // } + // + // if( actual.hasNext() ) { + // + // fail( "Iterator will deliver too many objects." + // ); + // + // } + // + // } //************************************************************ |
From: Bryan T. <tho...@us...> - 2008-11-10 16:29:55
|
Update of /cvsroot/cweb/lgpl-utils In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv8787 Modified Files: .cvsignore Log Message: cvsignore file. Index: .cvsignore =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/.cvsignore,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** .cvsignore 9 Oct 2008 13:13:20 -0000 1.3 --- .cvsignore 10 Nov 2008 16:29:37 -0000 1.4 *************** *** 3,4 **** --- 3,5 ---- lgpl-utils-1.0-b2-dev.jar lgpl-utils-1.0-b3-dev.jar + lgpl-utils-1.0-b4-dev.jar |
From: Bryan T. <tho...@us...> - 2008-11-07 21:52:03
|
Update of /cvsroot/cweb/lgpl-utils/src/test/it/unimi/dsi/mg4j/util In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv2472/src/test/it/unimi/dsi/mg4j/util Added Files: TestBloomFilter2.java Removed Files: BloomFilterTest.java Log Message: modified bloom filter serialization and created a new revision. renamed the bloom filter class to avoid possible class path issues. --- NEW FILE: TestBloomFilter2.java --- package it.unimi.dsi.mg4j.util; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; import java.io.ObjectInputStream; import java.io.ObjectOutputStream; import java.util.HashSet; import java.util.Random; import junit.framework.TestCase; public class TestBloomFilter2 extends TestCase { public void testAdd() { BloomFilter2 bloomFilter = new BloomFilter2( 10, 30 ); // High precision assertTrue( bloomFilter.add( "test" ) ); assertFalse( bloomFilter.add( "test" ) ); assertTrue( bloomFilter.add( "foo" ) ); assertTrue( bloomFilter.add( "bar" ) ); assertEquals( 3, bloomFilter.size() ); bloomFilter.clear(); assertTrue(bloomFilter.add(new int[] { 0, 1 })); assertFalse(bloomFilter.add(new int[] { 0, 1 })); assertTrue(bloomFilter.add(new int[] { 1, 2 })); assertTrue(bloomFilter.add(new int[] { 1, 0 })); assertEquals( 3, bloomFilter.size() ); } // public void testAdd2() { // // BloomFilter bloomFilter = new BloomFilter( 10, 30 ); // High precision // // // correct rejections (must succeed) // assertFalse("3",bloomFilter.contains(new int[]{3},0,1)); // assertFalse("5",bloomFilter.contains(new int[]{5},0,1)); // assertFalse("7",bloomFilter.contains(new int[]{7},0,1)); // assertFalse("4",bloomFilter.contains(new int[]{4},0,1)); // assertFalse("9",bloomFilter.contains(new int[]{9},0,1)); // // // add keys. // assertTrue("3",bloomFilter.add(new int[]{3},0,1)); // assertTrue("5",bloomFilter.add(new int[]{5},0,1)); // assertTrue("7",bloomFilter.add(new int[]{7},0,1)); // // // false positive tests (should succeed with resonable errorRate). // assertTrue("3",bloomFilter.contains(new int[]{3},0,1)); // assertTrue("5",bloomFilter.contains(new int[]{5},0,1)); // assertTrue("7",bloomFilter.contains(new int[]{7},0,1)); // // // false positive tests with non-zero offset (should succeed with resonable errorRate). // assertTrue("3",bloomFilter.contains(new int[]{0,3},1,1)); // assertTrue("5",bloomFilter.contains(new int[]{0,5},1,1)); // assertTrue("7",bloomFilter.contains(new int[]{0,7},1,1)); // // // correct rejections (must succeed) // assertFalse("4",bloomFilter.contains(new int[]{4},0,1)); // assertFalse("9",bloomFilter.contains(new int[]{9},0,1)); // // // correct rejections with non-zero offset (must succeed) // assertFalse("4",bloomFilter.contains(new int[]{0,4},1,1)); // assertFalse("9",bloomFilter.contains(new int[]{0,9},1,1)); // // // correct rejections with non-zero offset (must succeed) // assertFalse("4",bloomFilter.contains(new int[]{3,4},1,1)); // assertFalse("9",bloomFilter.contains(new int[]{5,9},1,1)); // // } public void testConflicts() { BloomFilter2 bloomFilter = new BloomFilter2( 1000, 10 ); // Low precision // LongOpenHashSet longs = new LongOpenHashSet(); HashSet<Long>longs = new HashSet<Long>(); Random random = new Random( 0 ); for( int i = 1000; i-- != 0; ) { final long l = random.nextLong(); longs.add( l ); bloomFilter.add( Long.toBinaryString( l ) ); } /* * Note: I see bloomFile.size() coming in at 999 sometimes. This is a * minimum size so I expect that the error is allowable and that the * test could be better. */ assertEquals( longs.size(), bloomFilter.size() ); } public void testSerialization() throws IOException { BloomFilter2 bloomFilter = new BloomFilter2( 10, 30 ); // High precision assertTrue( bloomFilter.add( new int[] { 0, 1 } ) ); assertFalse( bloomFilter.add( new int[] { 0, 1 } ) ); assertTrue( bloomFilter.add( new int[] { 1, 2 } ) ); assertTrue( bloomFilter.add( new int[] { 1, 0 } ) ); assertEquals( 3, bloomFilter.size() ); assertTrue( bloomFilter.contains( new int[] { 0, 1 } ) ); assertTrue( bloomFilter.contains( new int[] { 1, 2 } ) ); assertTrue( bloomFilter.contains( new int[] { 1, 0 } ) ); final byte[] buf = toByteArray(bloomFilter); bloomFilter = fromByteArray(buf); assertEquals( 3, bloomFilter.size() ); assertTrue( bloomFilter.contains( new int[] { 0, 1 } ) ); assertTrue( bloomFilter.contains( new int[] { 1, 2 } ) ); assertTrue( bloomFilter.contains( new int[] { 1, 0 } ) ); assertFalse( bloomFilter.contains( new int[]{ 2, 2 } ) ); } public void testSerialization2() throws IOException { BloomFilter2 bloomFilter = new BloomFilter2( 10, 30 ); // High precision assertTrue( bloomFilter.add( "test" ) ); assertTrue( bloomFilter.add( "foo" ) ); assertTrue( bloomFilter.add( "bar" ) ); assertEquals( 3, bloomFilter.size() ); final byte[] buf = toByteArray(bloomFilter); bloomFilter = fromByteArray(buf); assertEquals( 3, bloomFilter.size() ); assertTrue( bloomFilter.contains( "test" ) ); assertTrue( bloomFilter.contains( "foo" ) ); assertTrue( bloomFilter.contains( "bar" ) ); assertFalse( bloomFilter.contains( "baz" ) ); } protected byte[] toByteArray(BloomFilter2 bloomFilter) throws IOException { ByteArrayOutputStream baos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(baos); oos.writeObject(bloomFilter); oos.flush(); oos.close(); byte[] bloomBytes = baos.toByteArray(); return bloomBytes; } protected BloomFilter2 fromByteArray(byte[] buf) throws IOException { ByteArrayInputStream bais = new ByteArrayInputStream(buf); try { ObjectInputStream ois = new ObjectInputStream(bais); BloomFilter2 bloomFilter = (BloomFilter2) ois.readObject(); return bloomFilter; } catch (Exception ex) { IOException ex2 = new IOException("Could not read bloom filter: " + ex); ex2.initCause(ex); throw ex2; } } } --- BloomFilterTest.java DELETED --- |
From: Bryan T. <tho...@us...> - 2008-11-07 21:52:03
|
Update of /cvsroot/cweb/lgpl-utils In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv2472 Modified Files: build.properties Log Message: modified bloom filter serialization and created a new revision. renamed the bloom filter class to avoid possible class path issues. Index: build.properties =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/build.properties,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** build.properties 9 Oct 2008 13:13:20 -0000 1.2 --- build.properties 7 Nov 2008 21:51:58 -0000 1.3 *************** *** 2,4 **** build.dir=ant-build build.docs=docs ! jar.name=lgpl-utils-1.0-b3-dev.jar \ No newline at end of file --- 2,4 ---- build.dir=ant-build build.docs=docs ! jar.name=lgpl-utils-1.0-b4-dev.jar \ No newline at end of file |
From: Bryan T. <tho...@us...> - 2008-11-07 21:52:03
|
Update of /cvsroot/cweb/lgpl-utils/src/java/it/unimi/dsi/mg4j/util In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv2472/src/java/it/unimi/dsi/mg4j/util Modified Files: Fast.java Added Files: BloomFilter2.java Removed Files: BloomFilter.java Log Message: modified bloom filter serialization and created a new revision. renamed the bloom filter class to avoid possible class path issues. Index: Fast.java =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/src/java/it/unimi/dsi/mg4j/util/Fast.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** Fast.java 3 Jan 2007 15:09:03 -0000 1.1 --- Fast.java 7 Nov 2008 21:51:58 -0000 1.2 *************** *** 26,30 **** * <code>it.unimi.dsi.mg4j.util.Fast</code> in the mg4j project. Changes have * been restricted commenting out aspects of the code that are not required ! * to support {@link BloomFilter}. * * <P>This class contains static optimised utility methods that are used by all --- 26,30 ---- * <code>it.unimi.dsi.mg4j.util.Fast</code> in the mg4j project. Changes have * been restricted commenting out aspects of the code that are not required ! * to support {@link BloomFilter2}. * * <P>This class contains static optimised utility methods that are used by all --- NEW FILE: BloomFilter2.java --- package it.unimi.dsi.mg4j.util; /* * MG4J: Managing Gigabytes for Java * * Copyright (C) 2004-2006 Sebastiano Vigna * * This library is free software; you can redistribute it and/or modify it * under the terms of the GNU Lesser General Public License as published by the Free * Software Foundation; either version 2.1 of the License, or (at your option) * any later version. * * This library is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License * for more details. * * You should have received a copy of the GNU Lesser General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ import java.io.Externalizable; import java.io.IOException; import java.io.ObjectInput; import java.io.ObjectOutput; import java.util.Arrays; import java.util.Random; /** * A Bloom filter derived directly from * <code>it.unimi.dsi.mg4j.util.BloomFilter</code> in the mg4j project. The * primary changes are: * <ul> * <li>Removed use of the MersenneTwister random number generator to remove a * dependency on the Colt project.</li> * <li>Removed the main() routine, including all of the various dependencies * that it was dragging in.</li> * <li>Removed the use of LongArrays from the <code>fastutil</code> project. * The code uses {@link Arrays#fill(long[], long)} instead to clear the state on * reset.</li> * <li>Custom serialization (not compatible, new serialization UUID).</li> * <li>Made various public fields private since they can not be final with an * {@link Externalizable} implementation and added access methods for their * data.</li> * <li>Changed the class name to avoid possible class path confusions. </li> * </ul> * * <P> * Instances of this class represent a set of character sequences or * primitive-type arrays (with false positives) using a Bloom filter. Because of * the way Bloom filters work, you cannot remove elements. * * <p> * The intended usage is that character sequences and arrays should <em>not</em> * be mixed (albeit in principle it could work). A Bloom filter is rather * agnostic with respect to the data it contains—all it needs is a * sequence of hash functions that can be applied to the data. * * <P> * Bloom filters have an expected error rate, depending on the number of hash * functions used, on the filter size and on the number of elements in the * filter. This implementation uses a variable optimal number of hash functions, * depending on the expected number of elements. More precisely, a Bloom filter * for <var>n</var> character sequences with <var>d</var> hash functions will * use ln 2 <var>d</var><var>n</var> ≈ 1.44 <var>d</var><var>n</var> * bits; false positives will happen with probability 2<sup>-<var>d</var></sup>. * The maximum number of bits supported is {@link #MAX_BITS}. * * <P> * Hash functions are generated at creation time using a mix of universal * hashing and shift-add-xor hashing (for the latter, see <i>Performance in * practice of string hashing functions</i>, by M.V. Ramakrishna and Justin * Zobel, <i>Proc. of the Fifth International Conference on Database Systems for * Advanced Applications</i>, 1997, pages 215−223). * * <p> * Each hash function uses {@link #NUMBER_OF_WEIGHTS} random integers, which are * cyclically multiplied by the character codes in a character sequence. The * resulting integers are then summed with a shifted hash and XOR-ed together. * * <P> * This class exports access methods that are similar to those of * {@link java.util.Set}, but it does not implement that interface, as too many * non-optional methods would be unimplementable (e.g., iterators). * * <P> * A main method makes it easy to create serialised Bloom filters starting from * a list of terms. * * @author Sebastiano Vigna, who wrote the original implementation. * @author tho...@us..., who hacked the implementation down * to have fewer dependencies. */ public class BloomFilter2 implements Externalizable { /** * */ private static final long serialVersionUID = -880117129471136320L; private transient static final boolean DEBUG = false; // private static final long serialVersionUID = 2L; public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException { final int version = in.readInt(); if (version != VERSION0) throw new IOException("Unknown version=" + version); size = in.readInt(); m = in.readLong(); d = in.readInt(); bits = new long[(int)m / Long.SIZE]; for(int i=0; i<bits.length; i++) { bits[i] = in.readLong(); } weight = new int[d][NUMBER_OF_WEIGHTS]; for(int i=0; i<d; i++) { weight[i] = new int[NUMBER_OF_WEIGHTS]; for(int j=0; j<NUMBER_OF_WEIGHTS; j++) { weight[i][j] = in.readInt(); } } init = new int[d]; for(int i=0; i<d; i++) { init[i] = in.readInt(); } } public void writeExternal(ObjectOutput out) throws IOException { out.writeInt(VERSION0); out.writeInt(size); out.writeLong(m); out.writeInt(d); // m/Long.SIZE [bits] for(int i=0; i<bits.length; i++) { out.writeLong(bits[i]); } // weight[d][#weights] for (int i = 0; i < d; i++) { for (int j = 0; j < NUMBER_OF_WEIGHTS; j++) { out.writeInt(weight[i][j]); } } // init[d] for (int i = 0; i < d; i++) { out.writeInt(init[i]); } } /** * Initial {@link Externalizable} serialization format. */ private static transient final int VERSION0 = 0x0; /** The number of elements currently in the filter. It may be * smaller than the actual number of additions because of false positives. */ private int size; /** The maximum number of bits in a filter (limited by array size and bits in a long). */ public transient static final long MAX_BITS = (long)Long.SIZE * Integer.MAX_VALUE; private transient final static long LOG2_LONG_SIZE = 6; private transient final static long BIT_INDEX_MASK = ( 1 << LOG2_LONG_SIZE ) - 1; /** The number of weights used to create hash functions. */ final public static transient int NUMBER_OF_WEIGHTS = 16; /** The number of bits in this filter. */ private long m; /** The number of bits in this filter. */ final public long m() {return m;} /** The number of hash functions used by this filter. */ private int d; /** The number of hash functions used by this filter. */ final public int d() {return d;} /** The underlying bit vector. */ private long[] bits; /** The random integers used to generate the hash functions. */ private int[][] weight; /** The random integers used to initialise the hash functions. */ private int[] init; /** The natural logarithm of 2, used in the computation of the number of bits. */ transient private final static double NATURAL_LOG_OF_2 = Math.log( 2 ); /** * De-serialization ctor. */ public BloomFilter2() { } /** Creates a new high-precision Bloom filter a given expected number of elements. * * <p>This constructor uses a number of hash functions that is logarithmic in the number * of expected elements. This usually results in no false positives at all. * * @param n the expected number of elements. */ public BloomFilter2( final int n ) { this( n, Fast.mostSignificantBit( n ) + 1 ); } /** Creates a new Bloom filter with given number of hash functions and expected number of elements. * * @param n the expected number of elements. * @param d the number of hash functions; under obvious uniformity and indipendence assumptions, * if the filter has not more than <code>n</code> elements, * false positives will happen with probability 2<sup>-<var>d</var></sup>. */ public BloomFilter2( final int n, final int d ) { this.d = d; final long wantedNumberOfBits = (long)Math.ceil( n * ( d / NATURAL_LOG_OF_2 ) ); if ( wantedNumberOfBits > MAX_BITS ) throw new IllegalArgumentException( "The wanted number of bits (" + wantedNumberOfBits + ") is larger than " + MAX_BITS ); bits = new long[ (int)( ( wantedNumberOfBits + Long.SIZE - 1 ) / Long.SIZE ) ]; m = bits.length * (long)Long.SIZE; if ( DEBUG ) System.err.println( "Number of bits: " + m ); // The purpose of Random().nextInt() is to generate a different seed at each invocation. // final MersenneTwister mersenneTwister = new MersenneTwister( new Random().nextInt() ); final Random rand = new Random(); weight = new int[ d ][]; init = new int[ d ]; for( int i = 0; i < d; i++ ) { weight[ i ] = new int[ NUMBER_OF_WEIGHTS ]; init[ i ] = rand.nextInt(); for( int j = 0; j < NUMBER_OF_WEIGHTS; j++ ) weight[ i ][ j ] = rand.nextInt(); } } /** Returns the value of the bit with the specified index in the specified array. * * <p>This method (and its companion {@link #set(long[], long)}) are static * so that the bit array can be cached by the caller in a local variable. * * @param index the bit index. * @return the value of the bit of index <code>index</code>. */ private static boolean get( long[] bits, long index ) { return ( bits[ (int)( index >> LOG2_LONG_SIZE ) ] & ( 1L << ( index & BIT_INDEX_MASK ) ) ) != 0; } /** Sets the bit with specified index in the specified array. * * @param index the bit index. * @see #get(long[], long) */ private boolean set( long[] bits, long index ) { final int unit = (int)( index >> LOG2_LONG_SIZE ); final long mask = 1L << ( index & BIT_INDEX_MASK ); final boolean result = ( bits[ unit ] & mask ) != 0; bits[ unit ] |= mask; return result; } /** Hashes the given sequence with the given hash function. * * @param s a character sequence. * @param l the length of <code>s</code>. * @param k a hash function index (smaller than {@link #d}). * @return the position in the filter corresponding to <code>s</code> for the hash function <code>k</code>. */ private long hash( final CharSequence s, final int l, final int k ) { final int[] w = weight[ k ]; long h = init[ k ]; int i = l; while( i-- != 0 ) h ^= ( h << 5 ) + s.charAt( i ) * w[ i % NUMBER_OF_WEIGHTS ] + ( h >>> 2 ); return ( h & 0x7FFFFFFFFFFFFFFFL ) % m; } /** Hashes the given byte array with the given hash function. * * @param a a byte array. * @param l the length of <code>s</code>. * @param k a hash function index (smaller than {@link #d}). * @return the position in the filter corresponding to <code>a</code> for the hash function <code>k</code>. */ private long hash( final byte[] a, final int l, final int k ) { final int[] w = weight[ k ]; long h = init[ k ]; int i = l; while( i-- != 0 ) h ^= ( h << 5 ) + a[ i ] * w[ i % NUMBER_OF_WEIGHTS ] + ( h >>> 2 ); return ( h & 0x7FFFFFFFFFFFFFFFL ) % m; } /** Hashes the given short array with the given hash function. * * @param a a short array. * @param l the length of <code>s</code>. * @param k a hash function index (smaller than {@link #d}). * @return the position in the filter corresponding to <code>a</code> for the hash function <code>k</code>. */ private long hash( final short[] a, final int l, final int k ) { final int[] w = weight[ k ]; long h = init[ k ]; int i = l; while( i-- != 0 ) h ^= ( h << 5 ) + a[ i ] * w[ i % NUMBER_OF_WEIGHTS ] + ( h >>> 2 ); return ( h & 0x7FFFFFFFFFFFFFFFL ) % m; } /** Hashes the given character array with the given hash function. * * @param a a character array. * @param l the length of <code>s</code>. * @param k a hash function index (smaller than {@link #d}). * @return the position in the filter corresponding to <code>a</code> for the hash function <code>k</code>. */ private long hash( final char[] a, final int l, final int k ) { final int[] w = weight[ k ]; long h = init[ k ]; int i = l; while( i-- != 0 ) h ^= ( h << 5 ) + a[ i ] * w[ i % NUMBER_OF_WEIGHTS ] + ( h >>> 2 ); return ( h & 0x7FFFFFFFFFFFFFFFL ) % m; } /** Hashes the given int array with the given hash function. * * @param a an int array. * @param l the length of <code>s</code>. * @param k a hash function index (smaller than {@link #d}). * @return the position in the filter corresponding to the slice of <code>a</code> for the hash function <code>k</code>. */ private long hash( final int[] a, final int l, final int k ) { final int[] w = weight[ k ]; long h = init[ k ]; int i = l; while( i-- != 0 ) h ^= ( h << 5 ) + a[ i ] * w[ i % NUMBER_OF_WEIGHTS ] + ( h >>> 2 ); return ( h & 0x7FFFFFFFFFFFFFFFL ) % m; } /** Hashes the given long array with the given hash function. * * @param a a long array. * @param l the length of <code>s</code>. * @param k a hash function index (smaller than {@link #d}). * @return the position in the filter corresponding to <code>a</code> for the hash function <code>k</code>. */ private long hash( final long[] a, final int l, final int k ) { final int[] w = weight[ k ]; long h = init[ k ]; int i = l; while( i-- != 0 ) h ^= ( h << 5 ) + a[ i ] * w[ i % NUMBER_OF_WEIGHTS ] + ( h >>> 2 ); return ( h & 0x7FFFFFFFFFFFFFFFL ) % m; } /** Hashes the given float array with the given hash function. * * @param a a float array. * @param l the length of <code>s</code>. * @param k a hash function index (smaller than {@link #d}). * @return the position in the filter corresponding to <code>a</code> for the hash function <code>k</code>. */ private long hash( final float[] a, final int l, final int k ) { final int[] w = weight[ k ]; long h = init[ k ]; int i = l; while( i-- != 0 ) h ^= ( h << 5 ) + Float.floatToRawIntBits( a[ i ] ) * w[ i % NUMBER_OF_WEIGHTS ] + ( h >>> 2 ); return ( h & 0x7FFFFFFFFFFFFFFFL ) % m; } /** Hashes the given double array with the given hash function. * * @param a a double array. * @param l the length of <code>s</code>. * @param k a hash function index (smaller than {@link #d}). * @return the position in the filter corresponding to <code>a</code> for the hash function <code>k</code>. */ private long hash( final double[] a, final int l, final int k ) { final int[] w = weight[ k ]; long h = init[ k ]; int i = l; while( i-- != 0 ) h ^= ( h << 5 ) + Double.doubleToRawLongBits( a[ i ] ) * w[ i % NUMBER_OF_WEIGHTS ] + ( h >>> 2 ); return ( h & 0x7FFFFFFFFFFFFFFFL ) % m; } /** Checks whether the given character sequence is in this filter. * * <P>Note that this method may return true on a character sequence that has * not been added to the filter. This will happen with probability 2<sup>-<var>d</var></sup>, * where <var>d</var> is the number of hash functions specified at creation time, if * the number of the elements in the filter is less than <var>n</var>, the number * of expected elements specified at creation time. * * @param s a character sequence. * @return true if <code>s</code> (or some element * with the same hash sequence as <code>s</code>) is in the filter. */ public boolean contains( final CharSequence s ) { int i = d, l = s.length(); long bits[] = this.bits; while( i-- != 0 ) if ( ! get( bits, hash( s, l, i ) ) ) return false; return true; } /** Checks whether the given byte array is in this filter. * * @param a a byte array. * @return true if <code>a</code> (or some element * with the same hash sequence as <code>a</code>) is in the filter. * @see #contains(CharSequence) */ public boolean contains( final byte[] a ) { int i = d, l = a.length; long bits[] = this.bits; while( i-- != 0 ) if ( ! get( bits, hash( a, l, i ) ) ) return false; return true; } /** Checks whether the given short array is in this filter. * * @param a a short array. * @return true if <code>a</code> (or some element * with the same hash sequence as <code>a</code>) is in the filter. * @see #contains(CharSequence) */ public boolean contains( final short[] a ) { int i = d, l = a.length; long bits[] = this.bits; while( i-- != 0 ) if ( ! get( bits, hash( a, l, i ) ) ) return false; return true; } /** Checks whether the given character array is in this filter. * * @param a a character array. * @return true if <code>a</code> (or some element * with the same hash sequence as <code>a</code>) is in the filter. * @see #contains(CharSequence) */ public boolean contains( final char[] a ) { int i = d, l = a.length; long bits[] = this.bits; while( i-- != 0 ) if ( ! get( bits, hash( a, l, i ) ) ) return false; return true; } /** Checks whether the given int array is in this filter. * * @param a an int array. * @return true if <code>a</code> (or some element * with the same hash sequence as <code>a</code>) is in the filter. * @see #contains(CharSequence) */ public boolean contains( final int[] a ) { int i = d, l = a.length; long bits[] = this.bits; while( i-- != 0 ) if ( ! get( bits, hash( a, l, i ) ) ) return false; return true; } /** Checks whether the given long array is in this filter. * * @param a a long array. * @return true if <code>a</code> (or some element * with the same hash sequence as <code>a</code>) is in the filter. * @see #contains(CharSequence) */ public boolean contains( final long[] a ) { int i = d, l = a.length; long bits[] = this.bits; while( i-- != 0 ) if ( ! get( bits, hash( a, l, i ) ) ) return false; return true; } /** Checks whether the given float array is in this filter. * * @param a a float array. * @return true if <code>a</code> (or some element * with the same hash sequence as <code>a</code>) is in the filter. * @see #contains(CharSequence) */ public boolean contains( final float[] a ) { int i = d, l = a.length; long bits[] = this.bits; while( i-- != 0 ) if ( ! get( bits, hash( a, l, i ) ) ) return false; return true; } /** Checks whether the given double array is in this filter. * * @param a a double array. * @return true if <code>a</code> (or some element * with the same hash sequence as <code>a</code>) is in the filter. * @see #contains(CharSequence) */ public boolean contains( final double[] a ) { int i = d, l = a.length; long bits[] = this.bits; while( i-- != 0 ) if ( ! get( bits, hash( a, l, i ) ) ) return false; return true; } /** Adds a character sequence to the filter. * * @param s a character sequence. * @return true if this filter was modified (i.e., neither <code>s</code> nor any * other element with the same hash sequence as <code>s</code> was already in this filter). */ public boolean add( final CharSequence s ) { int i = d, l = s.length(); long bits[] = this.bits; boolean alreadySet = true; while( i-- != 0 ) alreadySet &= set( bits, hash( s, l, i ) ); if ( ! alreadySet ) size++; return ! alreadySet; } /** Adds a byte array to the filter. * * @param a a byte array. * @return true if this filter was modified (i.e., neither <code>a</code> nor any * other element with the same hash sequence as <code>a</code> was already in this filter). */ public boolean add( final byte[] a ) { int i = d, l = a.length; long bits[] = this.bits; boolean alreadySet = true; while( i-- != 0 ) alreadySet &= set( bits, hash( a, l, i ) ); if ( ! alreadySet ) size++; return ! alreadySet; } /** Adds a short array to the filter. * * @param a a short array. * @return true if this filter was modified (i.e., neither <code>a</code> nor any * other element with the same hash sequence as <code>a</code> was already in this filter). */ public boolean add( final short[] a ) { int i = d, l = a.length; long bits[] = this.bits; boolean alreadySet = true; while( i-- != 0 ) alreadySet &= set( bits, hash( a, l, i ) ); if ( ! alreadySet ) size++; return ! alreadySet; } /** Adds a character array to the filter. * * @param a a character array. * @return true if this filter was modified (i.e., neither <code>a</code> nor any * other element with the same hash sequence as <code>a</code> was already in this filter). */ public boolean add( final char[] a ) { int i = d, l = a.length; long bits[] = this.bits; boolean alreadySet = true; while( i-- != 0 ) alreadySet &= set( bits, hash( a, l, i ) ); if ( ! alreadySet ) size++; return ! alreadySet; } /** Adds an int array to the filter. * * @param a an int array. * @return true if this filter was modified (i.e., neither <code>a</code> nor any * other element with the same hash sequence as <code>a</code> was already in this filter). */ public boolean add( final int[] a ) { int i = d, l = a.length; long bits[] = this.bits; boolean alreadySet = true; while( i-- != 0 ) alreadySet &= set( bits, hash( a, l, i ) ); if ( ! alreadySet ) size++; return ! alreadySet; } /** Adds a long array to the filter. * * @param a a long array. * @return true if this filter was modified (i.e., neither <code>a</code> nor any * other element with the same hash sequence as <code>a</code> was already in this filter). */ public boolean add( final long[] a ) { int i = d, l = a.length; long bits[] = this.bits; boolean alreadySet = true; while( i-- != 0 ) alreadySet &= set( bits, hash( a, l, i ) ); if ( ! alreadySet ) size++; return ! alreadySet; } /** Adds a float array to the filter. * * @param a a float array. * @return true if this filter was modified (i.e., neither <code>a</code> nor any * other element with the same hash sequence as <code>a</code> was already in this filter). */ public boolean add( final float[] a ) { int i = d, l = a.length; long bits[] = this.bits; boolean alreadySet = true; while( i-- != 0 ) alreadySet &= set( bits, hash( a, l, i ) ); if ( ! alreadySet ) size++; return ! alreadySet; } /** Adds a double array to the filter. * * @param a a double array. * @return true if this filter was modified (i.e., neither <code>a</code> nor any * other element with the same hash sequence as <code>a</code> was already in this filter). */ public boolean add( final double[] a ) { int i = d, l = a.length; long bits[] = this.bits; boolean alreadySet = true; while( i-- != 0 ) alreadySet &= set( bits, hash( a, l, i ) ); if ( ! alreadySet ) size++; return ! alreadySet; } /** Clears this filter. */ public void clear() { /*Long*/Arrays.fill( bits, 0 ); size = 0; } /** Returns the size of this filter. * * <p>Note that the size of a Bloom filter is only a <em>lower bound</em> * for the number of distinct elements that have been added to the filter. * False positives might make the number returned by this method smaller * than it should be. * * @return the size of this filter. */ public long size() { return size; } // public static void main( final String[] arg ) throws IOException, JSAPException, NoSuchMethodException { // // final SimpleJSAP jsap = new SimpleJSAP( BloomFilter.class.getName(), "Creates a Bloom filter reading from standard input a newline-separated list of terms.", // new Parameter[] { // new FlaggedOption( "bufferSize", IntSizeStringParser.getParser(), "64Ki", JSAP.NOT_REQUIRED, 'b', "buffer-size", "The size of the I/O buffer used to read terms." ), // new FlaggedOption( "encoding", ForNameStringParser.getParser( Charset.class ), "UTF-8", JSAP.NOT_REQUIRED, 'e', "encoding", "The term file encoding." ), // new UnflaggedOption( "bloomFilter", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename for the serialised front-coded list." ), // new UnflaggedOption( "size", JSAP.INTSIZE_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The size of the filter (i.e., the expected number of elements in the filter; usually, the number of terms)." ), // new UnflaggedOption( "precision", JSAP.INTEGER_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The precision of the filter." ) // }); // // JSAPResult jsapResult = jsap.parse( arg ); // if ( jsap.messagePrinted() ) return; // // final int bufferSize = jsapResult.getInt( "bufferSize" ); // final String filterName = jsapResult.getString( "bloomFilter" ); // final Charset encoding = (Charset)jsapResult.getObject( "encoding" ); // // BloomFilter filter = new BloomFilter( jsapResult.getInt( "size" ), jsapResult.getInt( "precision" ) ); // final ProgressLogger pl = new ProgressLogger(); // pl.itemsName = "terms"; // pl.start( "Reading terms..." ); // MutableString s = new MutableString(); // FastBufferedReader reader = new FastBufferedReader( new InputStreamReader( System.in, encoding ), bufferSize ); // while( reader.readLine( s ) != null ) { // filter.add( s ); // pl.lightUpdate(); // } // pl.done(); // // BinIO.storeObject( filter, filterName ); // } } --- BloomFilter.java DELETED --- |
From: Bryan T. <tho...@us...> - 2008-10-13 19:51:36
|
Update of /cvsroot/cweb/lgpl-utils In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv27585 Modified Files: build.xml Log Message: added line to delete the old jar to the clean target. Index: build.xml =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/build.xml,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** build.xml 8 Oct 2008 12:46:14 -0000 1.2 --- build.xml 13 Oct 2008 19:49:55 -0000 1.3 *************** *** 15,18 **** --- 15,19 ---- <target name="clean"> + <delete dir="${jar.name}"/> <delete dir="${build.dir}"/> </target> |
From: Bryan T. <tho...@us...> - 2008-10-09 13:13:47
|
Update of /cvsroot/cweb/lgpl-utils/src/java/it/unimi/dsi/mg4j/io In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv25575/src/java/it/unimi/dsi/mg4j/io Modified Files: OutputBitStream.java InputBitStream.java Log Message: Done. Modified InputBitStream and OutputBitStream to have ctor variants that avoid a reflection test on the underlying stream. This was causing approximately 5000 MethodNotFoundException instances to be thrown per second. Introduced a new lgpl-utils jar for this change (1.1-b3). The change will be rolled into dsiutils by its author. Index: OutputBitStream.java =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/src/java/it/unimi/dsi/mg4j/io/OutputBitStream.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** OutputBitStream.java 10 Jul 2007 16:12:56 -0000 1.1 --- OutputBitStream.java 9 Oct 2008 13:13:20 -0000 1.2 *************** *** 163,173 **** ! /** Creates a new output bit stream wrapping a given output stream with a specified buffer size. * * @param os the output stream to wrap. * @param bufSize the size in byte of the buffer; it may be 0, denoting no buffering. */ ! public OutputBitStream( final OutputStream os, final int bufSize ) { this.os = os; if ( bufSize != 0 ) { --- 163,189 ---- ! /** Creates a new output bit stream wrapping a given output stream with a specified buffer size. ! * ! * @param os the output stream to wrap. ! * @param bufSize the size in byte of the buffer; it may be 0, denoting no buffering. ! */ ! public OutputBitStream( final OutputStream os, final int bufSize ) { ! this( os, bufSize, true/*reflectionTest*/); ! } ! /** Creates a new output bit stream wrapping a given output stream with a specified buffer size. * * @param os the output stream to wrap. * @param bufSize the size in byte of the buffer; it may be 0, denoting no buffering. + * @param reflectionTest + * when <code>true</code>, an instance of this class will try + * to cast the underlying byte stream to a + * {@link RepositionableStream} and to fetch by reflection the + * {@link java.nio.channels.FileChannel} underlying the given + * input stream, in this order (the traditional behavior). When + * <code>false</code> the reflection test will not be made in + * order to reduce the overhead of the ctor. */ ! public OutputBitStream( final OutputStream os, final int bufSize, final boolean reflectionTest ) { this.os = os; if ( bufSize != 0 ) { *************** *** 178,182 **** if ( os instanceof RepositionableStream ) repositionableStream = (RepositionableStream)os; ! if ( repositionableStream == null ) { try { --- 194,199 ---- if ( os instanceof RepositionableStream ) repositionableStream = (RepositionableStream)os; ! ! if(reflectionTest) { if ( repositionableStream == null ) { try { *************** *** 189,192 **** --- 206,210 ---- catch( ClassCastException e ) {} } + } } Index: InputBitStream.java =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/src/java/it/unimi/dsi/mg4j/io/InputBitStream.java,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** InputBitStream.java 10 Jul 2007 16:12:56 -0000 1.1 --- InputBitStream.java 9 Oct 2008 13:13:20 -0000 1.2 *************** *** 207,211 **** * @param bufSize the size in byte of the buffer; it may be 0, denoting no buffering. */ ! public InputBitStream( final InputStream is, final int bufSize ) { this.is = is; if ( ! ( this.noBuffer = bufSize == 0 ) ) this.buffer = new byte[ bufSize ]; --- 207,235 ---- * @param bufSize the size in byte of the buffer; it may be 0, denoting no buffering. */ ! public InputBitStream(final InputStream is, final int bufSize) { ! ! this(is, bufSize, true/* reflectionTest */); ! ! } ! ! /** ! * Creates a new input bit stream wrapping a given input stream with a ! * specified buffer size. ! * ! * @param is ! * the input stream to wrap. ! * @param bufSize ! * the size in byte of the buffer; it may be 0, denoting no ! * buffering. ! * @param reflectionTest ! * when <code>true</code>, an instance of this class will try ! * to cast the underlying byte stream to a ! * {@link RepositionableStream} and to fetch by reflection the ! * {@link java.nio.channels.FileChannel} underlying the given ! * input stream, in this order (the traditional behavior). When ! * <code>false</code> the reflection test will not be made in ! * order to reduce the overhead of the ctor. ! */ ! public InputBitStream( final InputStream is, final int bufSize,final boolean reflectionTest ) { this.is = is; if ( ! ( this.noBuffer = bufSize == 0 ) ) this.buffer = new byte[ bufSize ]; *************** *** 213,216 **** --- 237,241 ---- if ( is instanceof RepositionableStream ) repositionableStream = (RepositionableStream)is; + if(reflectionTest) { if ( repositionableStream == null ) { try { *************** *** 223,226 **** --- 248,252 ---- catch( ClassCastException e ) {} } + } } |
From: Bryan T. <tho...@us...> - 2008-10-09 13:13:29
|
Update of /cvsroot/cweb/lgpl-utils In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv25575 Modified Files: .cvsignore build.properties Log Message: Done. Modified InputBitStream and OutputBitStream to have ctor variants that avoid a reflection test on the underlying stream. This was causing approximately 5000 MethodNotFoundException instances to be thrown per second. Introduced a new lgpl-utils jar for this change (1.1-b3). The change will be rolled into dsiutils by its author. Index: .cvsignore =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/.cvsignore,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** .cvsignore 2 Oct 2008 16:51:36 -0000 1.2 --- .cvsignore 9 Oct 2008 13:13:20 -0000 1.3 *************** *** 2,3 **** --- 2,4 ---- ant-build lgpl-utils-1.0-b2-dev.jar + lgpl-utils-1.0-b3-dev.jar Index: build.properties =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/build.properties,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** build.properties 2 Oct 2008 16:51:36 -0000 1.1 --- build.properties 9 Oct 2008 13:13:20 -0000 1.2 *************** *** 2,4 **** build.dir=ant-build build.docs=docs ! jar.name=lgpl-utils-1.0-b2-dev.jar \ No newline at end of file --- 2,4 ---- build.dir=ant-build build.docs=docs ! jar.name=lgpl-utils-1.0-b3-dev.jar \ No newline at end of file |
From: Bryan T. <tho...@us...> - 2008-10-08 13:11:38
|
Update of /cvsroot/cweb/lgpl-utils In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv1008 Modified Files: build.xml Log Message: bug fix to the ant script for building the jar. Index: build.xml =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/build.xml,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** build.xml 2 Oct 2008 16:51:36 -0000 1.1 --- build.xml 8 Oct 2008 12:46:14 -0000 1.2 *************** *** 32,36 **** </javac> <!-- copy resources. --> ! <copy toDir="${build.dir}/classes"> <fileset dir="src/java"> <exclude name="**/*.java"/> --- 32,36 ---- </javac> <!-- copy resources. --> ! <copy toDir="${build.dir}"> <fileset dir="src/java"> <exclude name="**/*.java"/> |
From: Bryan T. <tho...@us...> - 2008-10-02 16:51:52
|
Update of /cvsroot/cweb/lgpl-utils In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv1144 Modified Files: .cvsignore Added Files: build.properties build.xml Log Message: added a custom variant of the dsi/fastutils ByteFrontCodedArrayList to facilitate faster serialization. this will be replaced by a subclass once there is a release of fastutils which makes it possible to write that subclass (there needs to be protected method to rebuild the pointer array). Index: .cvsignore =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/.cvsignore,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** .cvsignore 9 Nov 2007 12:23:20 -0000 1.1 --- .cvsignore 2 Oct 2008 16:51:36 -0000 1.2 *************** *** 1 **** --- 1,3 ---- target + ant-build + lgpl-utils-1.0-b2-dev.jar --- NEW FILE: build.xml --- <project name="lgpl-utils" default="jar" basedir="."> <property file="build.properties"/> <path id="build.classpath"> <fileset dir="lib"> <include name="**/*.jar"/> </fileset> </path> <path id="runtime.classpath"> <pathelement location="${build.dir}"/> <path refid="build.classpath"/> </path> <target name="clean"> <delete dir="${build.dir}"/> </target> <target name="prepare"> <mkdir dir="${build.dir}"/> </target> <target name="compile" depends="prepare"> <mkdir dir="${build.dir}"/> <javac destdir="${build.dir}" classpathref="build.classpath" verbose="off" target="1.5"> <src path="src/java"/> <src path="src/test"/> </javac> <!-- copy resources. --> <copy toDir="${build.dir}/classes"> <fileset dir="src/java"> <exclude name="**/*.java"/> </fileset> </copy> </target> <!-- overview="../bigdata/overview.html" windowtitle="bigdata®" --> <!-- <bottom><![CDATA[<i>Copyright © 2006-2008 Cognitive Web. All Rights Reserved.</i>]]></bottom> --> <target name="javadoc" depends="prepare"> <mkdir dir="${build.docs}/api"/> <javadoc destdir="${build.docs}/api" defaultexcludes="yes" author="true" version="true" use="true" classpathref="build.classpath"> <packageset dir="src/java"/> <doctitle><![CDATA[<h1>lgpl-utils</h1>]]></doctitle> <tag name="todo" scope="all" description="TODO:"/> <tag name="issue" scope="all" description="ISSUE:"/> <!--tag name="FIXME" scope="all" description="FIXME:"/--> <link href="http://java.sun.com/j2se/1.5.0/docs/api/"/> <!--<link href="http://openrdf.org/doc/sesame/api/"/>--> </javadoc> </target> <target name="jar" depends="compile"> <jar destfile="${jar.name}"> <fileset dir="${build.dir}"/> <manifest> <!--<attribute name="Main-Class" value="com/bigdata/rdf/rio/TestRioIntegration"/>--> </manifest> </jar> </target> </project> --- NEW FILE: build.properties --- cweb.dir=.. build.dir=ant-build build.docs=docs jar.name=lgpl-utils-1.0-b2-dev.jar |
From: Bryan T. <tho...@us...> - 2008-10-02 16:51:49
|
Update of /cvsroot/cweb/lgpl-utils/src/test/it/unimi/dsi/fastutil/bytes In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv1144/src/test/it/unimi/dsi/fastutil/bytes Added Files: TestCustomByteArrayFrontCodedList.java Log Message: added a custom variant of the dsi/fastutils ByteFrontCodedArrayList to facilitate faster serialization. this will be replaced by a subclass once there is a release of fastutils which makes it possible to write that subclass (there needs to be protected method to rebuild the pointer array). --- NEW FILE: TestCustomByteArrayFrontCodedList.java --- /** The Notice below must appear in each file of the Source Code of any copy you distribute of the Licensed Product. Contributors to any Modifications may add their own copyright notices to identify their own contributions. License: The contents of this file are subject to the CognitiveWeb Open Source License Version 1.1 (the License). You may not copy or use this file, in either source code or executable form, except in compliance with the License. You may obtain a copy of the License from http://www.CognitiveWeb.org/legal/license/ Software distributed under the License is distributed on an AS IS basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. Copyrights: Portions created by or assigned to CognitiveWeb are Copyright (c) 2003-2003 CognitiveWeb. All Rights Reserved. Contact information for CognitiveWeb is available at http://www.CognitiveWeb.org Portions Copyright (c) 2002-2003 Bryan Thompson. Acknowledgements: Special thanks to the developers of the Jabber Open Source License 1.0 (JOSL), from which this License was derived. This License contains terms that differ from JOSL. Special thanks to the CognitiveWeb Open Source Contributors for their suggestions and support of the Cognitive Web. Modifications: */ /* * Created on Oct 1, 2008 */ package it.unimi.dsi.fastutil.bytes; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import it.unimi.dsi.fastutil.objects.ObjectListIterator; import junit.framework.TestCase; /** * Unit tests for {@link CustomByteArrayFrontCodedList}. * * @author <a href="mailto:tho...@us...">Bryan Thompson</a> * @version $Id$ */ public class TestCustomByteArrayFrontCodedList extends TestCase { /** * */ public TestCustomByteArrayFrontCodedList() { } /** * @param arg0 */ public TestCustomByteArrayFrontCodedList(String arg0) { super(arg0); } public void test() { _test(100000); } /* * test harness from here to the end. */ private static long seed = System.currentTimeMillis(); private static java.util.Random r = new java.util.Random( seed ); private static byte genKey() { return (byte)(r.nextInt()); } private static void fatal( String msg ) { // System.out.println( msg ); // System.exit( 1 ); fail(msg); } private static void ensure( boolean cond, String msg ) { if ( cond ) return; fatal( msg ); } private static boolean contentEquals( java.util.List x, java.util.List y ) { if ( x.size() != y.size() ) return false; for( int i = 0; i < x.size(); i++ ) if ( ! java.util.Arrays.equals( (byte[])x.get( i ), (byte[])y.get( i ) ) ) return false; return true; } private int l[]; private byte[][] a; private void _test( int n ) { int c; l = new int[ n ]; a = new byte[n][]; for( int i = 0; i < n; i++ ) l[i] = (int)(Math.abs(r.nextGaussian())*32); for( int i = 0; i < n; i++ ) a[i] = new byte[l[i]]; for( int i = 0; i < n; i++ ) for( int j = 0; j < l[i]; j++ ) a[i][j] = genKey(); CustomByteArrayFrontCodedList m = new CustomByteArrayFrontCodedList( it.unimi.dsi.fastutil.objects.ObjectIterators.wrap( a ), r.nextInt( 4 ) + 1 ); it.unimi.dsi.fastutil.objects.ObjectArrayList t = new it.unimi.dsi.fastutil.objects.ObjectArrayList( a ); //System.out.println(m); //for( i = 0; i < t.size(); i++ ) System.out.println(ARRAY_LIST.wrap((byte[])t.get(i))); /* Now we check that m actually holds that data. */ ensure( contentEquals( m, t ), "Error (" + seed + "): m does not equal t at creation" ); /* Now we check cloning. */ ensure( contentEquals( m, (java.util.List)m.clone() ), "Error (" + seed + "): m does not equal m.clone()" ); /* Now we play with iterators. */ { ObjectListIterator i; java.util.ListIterator j; Object J; i = m.listIterator(); j = t.listIterator(); for( int k = 0; k < 2*n; k++ ) { ensure( i.hasNext() == j.hasNext(), "Error (" + seed + "): divergence in hasNext()" ); ensure( i.hasPrevious() == j.hasPrevious(), "Error (" + seed + "): divergence in hasPrevious()" ); if ( r.nextFloat() < .8 && i.hasNext() ) { ensure( java.util.Arrays.equals( (byte[])i.next(), (byte[])j.next() ), "Error (" + seed + "): divergence in next()" ); } else if ( r.nextFloat() < .2 && i.hasPrevious() ) { ensure( java.util.Arrays.equals( (byte[])i.previous(), (byte[])j.previous() ), "Error (" + seed + "): divergence in previous()" ); } ensure( i.nextIndex() == j.nextIndex(), "Error (" + seed + "): divergence in nextIndex()" ); ensure( i.previousIndex() == j.previousIndex(), "Error (" + seed + "): divergence in previousIndex()" ); } } { Object previous = null; Object I, J; int from = r.nextInt( m.size() +1 ); ObjectListIterator i; java.util.ListIterator j; i = m.listIterator( from ); j = t.listIterator( from ); for( int k = 0; k < 2*n; k++ ) { ensure( i.hasNext() == j.hasNext(), "Error (" + seed + "): divergence in hasNext() (iterator with starting point " + from + ")" ); ensure( i.hasPrevious() == j.hasPrevious() , "Error (" + seed + "): divergence in hasPrevious() (iterator with starting point " + from + ")" ); if ( r.nextFloat() < .8 && i.hasNext() ) { ensure( java.util.Arrays.equals( (byte[])i.next(), (byte[])j.next() ), "Error (" + seed + "): divergence in next() (iterator with starting point " + from + ")" ); //System.err.println("Done next " + I + " " + J + " " + badPrevious); } else if ( r.nextFloat() < .2 && i.hasPrevious() ) { ensure( java.util.Arrays.equals( (byte[])i.previous(), (byte[])j.previous() ), "Error (" + seed + "): divergence in previous() (iterator with starting point " + from + ")" ); } } } /* * Standard serialization. */ try { java.io.ByteArrayOutputStream os = new ByteArrayOutputStream(); java.io.ObjectOutputStream oos = new java.io.ObjectOutputStream(os); oos.writeObject(m); oos.close(); ByteArrayInputStream is = new ByteArrayInputStream(os.toByteArray()); java.io.ObjectInputStream ois = new java.io.ObjectInputStream(is); CustomByteArrayFrontCodedList m2 = (CustomByteArrayFrontCodedList) ois.readObject(); ois.close(); contentEquals(m, m2); } catch(Exception e) { throw new AssertionError(e); } /* * Direct byte[] (de-)serialization. */ try { CustomByteArrayFrontCodedList m2 = new CustomByteArrayFrontCodedList( m.size(),m.ratio(),m.getArray().clone()); contentEquals(m, m2); } catch(Exception e) { throw new AssertionError(e); } ensure( contentEquals( m, t ), "Error (" + seed + "): m does not equal t after save/read" ); System.out.println("Test OK"); return; } // public static void main( String args[] ) { // int n = Integer.parseInt(args[1]); // if ( args.length > 2 ) r = new java.util.Random( seed = Long.parseLong( args[ 2 ] ) ); // // try { // if ("speedTest".equals(args[0]) || "speedComp".equals(args[0])) speedTest( n, "speedComp".equals(args[0]) ); // else if ( "test".equals( args[0] ) ) test(n); // } catch( Throwable e ) { // e.printStackTrace( System.err ); // System.err.println( "seed: " + seed ); // } // } } |
From: Bryan T. <tho...@us...> - 2008-10-02 16:51:48
|
Update of /cvsroot/cweb/lgpl-utils/src/java/it/unimi/dsi/fastutil/bytes In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv1144/src/java/it/unimi/dsi/fastutil/bytes Added Files: CustomByteArrayFrontCodedList.java Log Message: added a custom variant of the dsi/fastutils ByteFrontCodedArrayList to facilitate faster serialization. this will be replaced by a subclass once there is a release of fastutils which makes it possible to write that subclass (there needs to be protected method to rebuild the pointer array). --- NEW FILE: CustomByteArrayFrontCodedList.java --- /* * fastutil: Fast & compact type-specific collections for Java * * Copyright (C) 2002, 2003, 2004, 2005, 2006 Sebastiano Vigna * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * */ package it.unimi.dsi.fastutil.bytes; import it.unimi.dsi.fastutil.objects.AbstractObjectListIterator; import it.unimi.dsi.fastutil.objects.AbstractObjectList; import it.unimi.dsi.fastutil.objects.ObjectListIterator; import it.unimi.dsi.fastutil.ints.IntArrays; import java.io.DataOutput; import java.io.Serializable; import java.util.Iterator; import java.util.Collection; import java.util.NoSuchElementException; /** * Compact storage of lists of arrays using front coding. * * <P> * This class stores immutably a list of arrays in a single large array using * front coding (of course, the compression will be reasonable only if the list * is sorted lexicographically—see below). It implements an immutable * type-specific list that returns the <var>i</var>-th array when calling * {@link #get(int) get(<var>i</var>)}. The returned array may be freely * modified. * * <P> * Front coding is based on the idea that if the <var>i</var>-th and the (<var>i</var>+1)-th * array have a common prefix, we might store the length of the common prefix, * and then the rest of the second array. * * <P> * This approach, of course, requires that once in a while an array is stored * entirely. The <def>ratio</def> of a front-coded list defines how often this * happens (once every {@link #ratio()} arrays). A higher ratio means more * compression, but means also a longer access time, as more arrays have to be * probed to build the result. Note that we must build an array every time * {@link #get(int)} is called, but this class provides also methods that * extract one of the stored arrays in a given array, reducing garbage * collection. See the documentation of the family of <code>get()</code> * methods. * * <P> * By setting the ratio to 1 we actually disable front coding: however, we still * have a data structure storing large list of arrays with a reduced overhead * (just one integer per array, plus the space required for lengths). * * <P> * Note that the typical usage of front-coded lists is under the form of * serialized objects; usually, the data that has to be compacted is processed * offline, and the resulting structure is stored permanently. Since the pointer * array is not stored, the serialized format is very small. * * <H2>Implementation Details</H2> * * <P> * All arrays are stored in a large array. A separate array of pointers indexes * arrays whose position is a multiple of the ratio: thus, a higher ratio means * also less pointers. * * <P> * More in detail, an array whose position is a multiple of the ratio is stored * as the array length, followed by the elements of the array. The array length * is coded by a simple variable-length list of <var>k</var>-1 bit blocks, * where <var>k</var> is the number of bits of the underlying primitive type. * All other arrays are stored as follows: let <code>common</code> the length * of the maximum common prefix between the array and its predecessor. Then we * store the array length decremented by <code>common</code>, followed by * <code>common</code>, followed by the array elements whose index is greater * than or equal to <code>common</code>. For instance, if we store <samp>foo</samp>, * <samp>foobar</samp>, <samp>football</samp> and <samp>fool</samp> in a * front-coded character-array list with ratio 3, the character array will * contain * * <pre> * <b>3</b> f o o <b>3</b> <b>3</b> b a r <b>5</b> <b>3</b> t b a l l <b>4</b> f o o l * </pre> * * <H2>Limitations</H2> * * <P> * All arrays are stored in a large array: thus, the compressed list must not * exceed {@link java.lang.Integer#MAX_VALUE} elements. Moreover, iterators are * less efficient when they move back, as * {@link java.util.ListIterator#previous() previous()} cannot build * incrementally the previous array (whereas ({@link java.util.ListIterator#next() next()} * can). * <P> * Note: This class was derived from <code>ArrayFrontCodedList</code>, which * is part of fastutils. The name of the class has been changed to prevent * classpath problems. The class has a new {@link #serialVersionUID} and the * serialization logic has been modified to allow serialization against * {@link DataOutput}. The test code has been isolated in a junit test suite. */ public class CustomByteArrayFrontCodedList extends AbstractObjectList<byte[]> implements Serializable, Cloneable { /** * */ private static final long serialVersionUID = -2532468860579334765L; // The value for the original impl. // public static final long serialVersionUID = -7046029254386353130L; /** The number of arrays in the list. */ protected int n; /** The ratio of this front-coded list. */ protected int ratio; /** The array containing the compressed arrays. */ protected byte[] array; /** The pointers to entire arrays in the list. */ transient protected int[] p; /** Creates a new front-coded list containing the arrays returned by the given iterator. * * @param arrays an iterator returning arrays. * @param ratio the desired ratio. */ public CustomByteArrayFrontCodedList( final Iterator arrays, final int ratio ) { if ( ratio < 1 ) throw new IllegalArgumentException( "Illegal ratio (" + ratio + ")" ); byte[] array = ByteArrays.EMPTY_ARRAY; int[] p = IntArrays.EMPTY_ARRAY; byte[][] a = new byte[ 2 ][]; int curSize = 0, b = 0, common, length, minLength; while( arrays.hasNext() ) { a[ b ] = (byte[])arrays.next(); length = a[ b ].length; if ( n % ratio == 0 ) { p = IntArrays.grow( p, n / ratio + 1 ); p[ n / ratio ] = curSize; array = ByteArrays.grow( array, curSize + count( length ) + length, curSize ); curSize += writeInt( array, length, curSize ); System.arraycopy( a[ b ], 0, array, curSize, length ); curSize += length; } else { minLength = a[ 1 - b ].length; if ( length < minLength ) minLength = length; for( common = 0; common < minLength; common++ ) if ( a[ 0 ][ common ] != a[ 1 ][ common ] ) break; length -= common; array = ByteArrays.grow( array, curSize + count( length ) + count( common ) + length, curSize ); curSize += writeInt( array, length, curSize ); curSize += writeInt( array, common, curSize ); System.arraycopy( a[ b ], common, array, curSize, length ); curSize += length; } b = 1 - b; n++; } this.ratio = ratio; this.array = ByteArrays.trim( array, curSize ); this.p = IntArrays.trim( p, ( n + ratio - 1 ) / ratio ); } /** Creates a new front-coded list containing the arrays in the given collection. * * @param c a collection containing arrays. * @param ratio the desired ratio. */ public CustomByteArrayFrontCodedList( final Collection c, final int ratio ) { this( c.iterator(), ratio ); } /** Reads a coded length. * @param a the data array. * @param pos the starting position. * @return the length coded at <code>pos</code>. */ private static int readInt( final byte a[], int pos ) { if ( a[ pos ] >= 0 ) return a[ pos ]; if ( a[ pos + 1 ] >= 0 ) return ( - a[ pos ] - 1 ) << 7 | a[ pos + 1 ]; if ( a[ pos + 2 ] >= 0 ) return ( - a[ pos ] - 1 ) << 14 | ( - a[ pos + 1 ] - 1 ) << 7 | a[ pos + 2 ]; if ( a[ pos + 3 ] >= 0 ) return ( - a[ pos ] - 1 ) << 21 | ( - a[ pos + 1 ] - 1 ) << 14 | ( - a[ pos + 2 ] - 1 ) << 7 | a[ pos + 3 ]; return ( - a[ pos ] - 1 ) << 28 | ( - a[ pos + 1 ] - 1 ) << 21 | ( - a[ pos + 2 ] - 1 ) << 14 | ( - a[ pos + 3 ] - 1 ) << 7 | a[ pos + 4 ]; } /** Computes the number of elements coding a given length. * @param length the length to be coded. * @return the number of elements coding <code>length</code>. */ @SuppressWarnings("unused") private static int count( final int length ) { if ( length < ( 1 << 7 ) ) return 1; if ( length < ( 1 << 14 ) ) return 2; if ( length < ( 1 << 21 ) ) return 3; if ( length < ( 1 << 28 ) ) return 4; return 5; } /** Writes a length. * @param a the data array. * @param length the length to be written. * @param pos the starting position. * @return the number of elements coding <code>length</code>. */ private static int writeInt( final byte a[], int length, int pos ) { final int count = count( length ); a[ pos + count - 1 ] = (byte)( length & 0x7F ); if ( count != 1 ) { int i = count - 1; while( i-- != 0 ) { length >>>= 7; a[ pos + i ] = (byte)( - ( length & 0x7F ) - 1 ); } } return count; } /** Returns the ratio of this list. * * @return the ratio of this list. */ public int ratio() { return ratio; } /** Computes the length of the array at the given index. * * <P>This private version of {@link #arrayLength(int)} does not check its argument. * * @param index an index. * @return the length of the <code>index</code>-th array. */ private int length( final int index ) { final byte[] array = this.array; final int delta = index % ratio; // The index into the p array, and the delta inside the block. int pos = p[ index / ratio ]; // The position into the array of the first entire word before the index-th. int length = readInt( array, pos ); if ( delta == 0 ) return length; // First of all, we recover the array length and the maximum amount of copied elements. int common; pos += count( length ) + length; length = readInt( array, pos ); common = readInt( array, pos + count( length ) ); for( int i = 0; i < delta - 1; i++ ) { pos += count( length ) + count( common ) + length; length = readInt( array, pos ); common = readInt( array, pos + count( length ) ); } return length + common; } /** Computes the length of the array at the given index. * * @param index an index. * @return the length of the <code>index</code>-th array. */ public int arrayLength( final int index ) { ensureRestrictedIndex( index ); return length( index ); } /** Extracts the array at the given index. * * @param index an index. * @param a the array that will store the result (we assume that it can hold the result). * @param offset an offset into <code>a</code> where elements will be store. * @param length a maximum number of elements to store in <code>a</code>. * @return the length of the extracted array. */ private int extract( final int index, final byte a[], final int offset, final int length ) { final int delta = index % ratio; // The delta inside the block. final int startPos = p[ index / ratio ]; // The position into the array of the first entire word before the index-th. int pos, arrayLength = readInt( array, pos = startPos ), prevArrayPos, currLen = 0, actualCommon; if ( delta == 0 ) { pos = p[ index / ratio ] + count( arrayLength ); System.arraycopy( array, pos, a, offset, Math.min( length, arrayLength ) ); return arrayLength; } int common = 0; for( int i = 0; i < delta; i++ ) { prevArrayPos = pos + count( arrayLength ) + ( i != 0 ? count( common ) : 0 ); pos = prevArrayPos + arrayLength; arrayLength = readInt( array, pos ); common = readInt( array, pos + count( arrayLength ) ); actualCommon = Math.min( common, length ); if ( actualCommon <= currLen ) currLen = actualCommon; else { System.arraycopy( array, prevArrayPos, a, currLen + offset, actualCommon - currLen ); currLen = actualCommon; } } if ( currLen < length ) System.arraycopy( array, pos + count( arrayLength ) + count( common ), a, currLen + offset, Math.min( arrayLength, length - currLen ) ); return arrayLength + common; } public byte[] get( final int index ) { return getArray( index ); } /** * @see #get(int) */ public byte[] getArray( final int index ) { ensureRestrictedIndex( index ); final int length = length( index ); final byte a[] = new byte[ length ]; extract( index, a, 0, length ); return a; } /** Stores in the given array elements from an array stored in this front-coded list. * * @param index an index. * @param a the array that will store the result. * @param offset an offset into <code>a</code> where elements will be store. * @param length a maximum number of elements to store in <code>a</code>. * @return if <code>a</code> can hold the extracted elements, the number of extracted elements; * otherwise, the number of remaining elements with the sign changed. */ public int get( final int index, final byte[] a, final int offset, final int length ) { ensureRestrictedIndex( index ); ByteArrays.ensureOffsetLength( a, offset, length ); final int arrayLength = extract( index, a, offset, length ); if ( length >= arrayLength ) return arrayLength; return length - arrayLength; } /** Stores in the given array an array stored in this front-coded list. * * @param index an index. * @param a the array that will store the content of the result (we assume that it can hold the result). * @return if <code>a</code> can hold the extracted elements, the number of extracted elements; * otherwise, the number of remaining elements with the sign changed. */ public int get( final int index, final byte[] a ) { return get( index, a, 0, a.length ); } public int size() { return n; } public ObjectListIterator<byte[]> listIterator( final int start ) { ensureIndex( start ); return new AbstractObjectListIterator<byte[]>() { byte a[] = ByteArrays.EMPTY_ARRAY; int i = 0, pos = 0; boolean inSync; // Whether the current value in a is the string just before the next to be produced. { if ( start != 0 ) { if ( start == n ) i = start; // If we start at the end, we do nothing. else { pos = p[ start / ratio ]; int j = start % ratio; i = start - j; while( j-- != 0 ) next(); } } } public boolean hasNext() { return i < n; } public boolean hasPrevious() { return i > 0; } public int previousIndex() { return i - 1; } public int nextIndex() { return i; } public byte[] next() { int length, common; if ( ! hasNext() ) throw new NoSuchElementException(); if ( i % ratio == 0 ) { pos = p[ i / ratio ]; length = readInt( array, pos ); a = ByteArrays.ensureCapacity( a, length, 0 ); System.arraycopy( array, pos + count( length ), a, 0, length ); pos += length + count( length ); inSync = true; } else { if ( inSync ) { length = readInt( array, pos ); common = readInt( array, pos + count( length ) ); a = ByteArrays.ensureCapacity( a, length + common, common ); System.arraycopy( array, pos + count( length ) + count ( common ), a, common, length ); pos += count( length ) + count( common ) + length; length += common; } else { a = ByteArrays.ensureCapacity( a, length = length( i ), 0 ); extract( i, a, 0, length ); } } i++; return ByteArrays.copy( a, 0, length ); } public byte[] previous() { if ( ! hasPrevious() ) throw new NoSuchElementException(); inSync = false; return getArray( --i ); } }; } /** Returns a copy of this list. * * @return a copy of this list. */ public Object clone() { CustomByteArrayFrontCodedList c; try { c = (CustomByteArrayFrontCodedList)super.clone(); } catch( CloneNotSupportedException cantHappen ) { throw new InternalError(); } c.array = array.clone(); c.p = p.clone(); return c; } public String toString() { final StringBuffer s = new StringBuffer(); s.append( "[ " ); for( int i = 0; i < n; i++ ) { if ( i != 0 ) s.append( ", " ); s.append( ByteArrayList.wrap( getArray( i ) ).toString() ); } s.append( " ]" ); return s.toString(); } private void readObject(java.io.ObjectInputStream s) throws java.io.IOException, ClassNotFoundException { s.defaultReadObject(); rebuildPointerArray(); } /** * Reconsitute an instance from just the coded byte[], the #of elements in * the array, and the ratio. * * @param n * The #of elements in the array. * @param ratio * The ratio of this front-coded list. * @param array * The array containing the compressed arrays. */ public CustomByteArrayFrontCodedList(int n, int ratio, byte[] array) { if ( ratio < 1 ) throw new IllegalArgumentException( "Illegal ratio (" + ratio + ")" ); this.n = n; this.ratio = ratio; this.array = array; rebuildPointerArray(); } /** * Return the backing byte[]. */ public byte[] getArray() { return array; } /** * Rebuild pointer array from the packed byte {@link #array}, the #of * elements in that array {@link #n}, and the {@link #ratio()}. */ private void rebuildPointerArray() { final int[] p = new int[(n + ratio - 1) / ratio]; final byte a[] = array; int i = 0, pos = 0, length, common; for( i = 0; i < n; i++ ) { length = readInt( a, pos ); if ( i % ratio == 0 ) { p[ i / ratio ] = pos; pos += count( length ) + length; } else { common = readInt( a, pos + count( length ) ); pos += count( length ) + count ( common ) + length; } } this.p = p; } } |
From: Bryan T. <tho...@us...> - 2008-10-02 16:51:43
|
Update of /cvsroot/cweb/lgpl-utils/src/test/it/unimi/dsi/fastutil In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv1111/src/test/it/unimi/dsi/fastutil Log Message: Directory /cvsroot/cweb/lgpl-utils/src/test/it/unimi/dsi/fastutil added to the repository |
From: Bryan T. <tho...@us...> - 2008-10-02 16:51:40
|
Update of /cvsroot/cweb/lgpl-utils/src/test/it/unimi/dsi/fastutil/bytes In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv1111/src/test/it/unimi/dsi/fastutil/bytes Log Message: Directory /cvsroot/cweb/lgpl-utils/src/test/it/unimi/dsi/fastutil/bytes added to the repository |
From: Bryan T. <tho...@us...> - 2008-10-02 16:51:40
|
Update of /cvsroot/cweb/lgpl-utils/src/java/it/unimi/dsi/fastutil/bytes In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv1111/src/java/it/unimi/dsi/fastutil/bytes Log Message: Directory /cvsroot/cweb/lgpl-utils/src/java/it/unimi/dsi/fastutil/bytes added to the repository |
From: Bryan T. <tho...@us...> - 2008-10-02 16:51:39
|
Update of /cvsroot/cweb/lgpl-utils/src/java/it/unimi/dsi/fastutil In directory ddv4jf1.ch3.sourceforge.com:/tmp/cvs-serv1111/src/java/it/unimi/dsi/fastutil Log Message: Directory /cvsroot/cweb/lgpl-utils/src/java/it/unimi/dsi/fastutil added to the repository |
From: Bryan T. <tho...@us...> - 2008-05-01 13:45:11
|
Update of /cvsroot/cweb/generic-native/src/java/org/CognitiveWeb/generic/core In directory sc8-pr-cvs4.sourceforge.net:/tmp/cvs-serv28261/src/java/org/CognitiveWeb/generic/core Modified Files: AbstractBTree.java Log Message: added INFO and DEBUG for the log. Index: AbstractBTree.java =================================================================== RCS file: /cvsroot/cweb/generic-native/src/java/org/CognitiveWeb/generic/core/AbstractBTree.java,v retrieving revision 1.8 retrieving revision 1.9 diff -C2 -d -r1.8 -r1.9 *** AbstractBTree.java 5 Dec 2007 15:11:16 -0000 1.8 --- AbstractBTree.java 1 May 2008 13:45:06 -0000 1.9 *************** *** 63,66 **** --- 63,67 ---- import org.CognitiveWeb.generic.core.ndx.Successor; import org.CognitiveWeb.generic.core.om.BaseObject; + import org.apache.log4j.Level; import org.apache.log4j.Logger; *************** *** 82,85 **** --- 83,98 ---- protected static final Logger log = Logger.getLogger(AbstractBTree.class); + /** + * True iff the {@link #log} level is INFO or less. + */ + final static protected boolean INFO = log.getEffectiveLevel().toInt() <= Level.INFO + .toInt(); + + /** + * True iff the {@link #log} level is DEBUG or less. + */ + final static protected boolean DEBUG = log.getEffectiveLevel().toInt() <= Level.DEBUG + .toInt(); + // // Persistent data. |
From: Bryan T. <tho...@us...> - 2008-01-14 19:19:37
|
Update of /cvsroot/cweb/lgpl-utils In directory sc8-pr-cvs4.sourceforge.net:/tmp/cvs-serv21347 Modified Files: project.xml Log Message: modified POM so that required binary files (*.8, *.16) would be loaded with InputBitStream Index: project.xml =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/project.xml,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** project.xml 14 Jan 2008 17:25:28 -0000 1.3 --- project.xml 14 Jan 2008 19:19:34 -0000 1.4 *************** *** 202,212 **** <!-- <resource> ! <directory>${basedir}/src/resources/logging</directory> <includes> ! <include>log4j.properties</include> </includes> </resource> - --> </resources> --- 202,219 ---- <!-- + --> <resource> ! <!-- Note: These binary files are required for InputBitStream.class --> ! <directory>${basedir}/src/java/it/unimi/dsi/mg4j/io</directory> ! <targetPath>it/unimi/dsi/mg4j/io</targetPath> <includes> ! <include>delta.8</include> ! <include>delta.16</include> ! <include>gamma.8</include> ! <include>gamma.16</include> ! <include>zeta3.8</include> ! <include>zeta3.16</include> </includes> </resource> </resources> |
From: Bryan T. <tho...@us...> - 2008-01-14 17:26:02
|
Update of /cvsroot/cweb/lgpl-utils In directory sc8-pr-cvs4.sourceforge.net:/tmp/cvs-serv11400 Modified Files: project.properties project.xml Log Message: updating dependencies for maven builds Index: project.properties =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/project.properties,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** project.properties 20 Mar 2007 14:39:37 -0000 1.1 --- project.properties 14 Jan 2008 17:25:28 -0000 1.2 *************** *** 7,10 **** --- 7,11 ---- # authoritative source. + #todo maven2 is ftp://ibiblio.org/pub/packages/maven2/ maven.repo.remote=http://www.ibiblio.org/maven/,http://proto.cognitiveweb.org/maven-repository Index: project.xml =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/project.xml,v retrieving revision 1.2 retrieving revision 1.3 diff -C2 -d -r1.2 -r1.3 *** project.xml 14 Jan 2008 16:45:47 -0000 1.2 --- project.xml 14 Jan 2008 17:25:28 -0000 1.3 *************** *** 149,153 **** <!-- FIXME Use autojar to select only the required classes for redistribution. --> <!-- FIXME Publish resulting JAR on maven repository. --> ! <dependency> <id>fastutil</id> <version>5.0.9</version> --- 149,153 ---- <!-- FIXME Use autojar to select only the required classes for redistribution. --> <!-- FIXME Publish resulting JAR on maven repository. --> ! <dependency> <id>fastutil</id> <version>5.0.9</version> |
From: Bryan T. <tho...@us...> - 2008-01-14 17:25:41
|
Update of /cvsroot/cweb/lgpl-utils/lib In directory sc8-pr-cvs4.sourceforge.net:/tmp/cvs-serv11400/lib Removed Files: fastutil-5.0.5.jar Log Message: updating dependencies for maven builds --- fastutil-5.0.5.jar DELETED --- |
From: Bryan T. <tho...@us...> - 2008-01-14 16:51:15
|
Update of /cvsroot/cweb/lgpl-utils/lib In directory sc8-pr-cvs4.sourceforge.net:/tmp/cvs-serv28228/lib Added Files: fastutil-5.0.9.jar Log Message: updated dependency version --- NEW FILE: fastutil-5.0.9.jar --- (This appears to be a binary file; contents omitted.) |
From: Bryan T. <tho...@us...> - 2008-01-14 16:45:55
|
Update of /cvsroot/cweb/lgpl-utils In directory sc8-pr-cvs4.sourceforge.net:/tmp/cvs-serv28115 Modified Files: project.xml Log Message: updated dependencies. Index: project.xml =================================================================== RCS file: /cvsroot/cweb/lgpl-utils/project.xml,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** project.xml 20 Mar 2007 14:39:37 -0000 1.1 --- project.xml 14 Jan 2008 16:45:47 -0000 1.2 *************** *** 147,150 **** --- 147,166 ---- </dependency> + <!-- FIXME Use autojar to select only the required classes for redistribution. --> + <!-- FIXME Publish resulting JAR on maven repository. --> + <dependency> + <id>fastutil</id> + <version>5.0.9</version> + <url>http://fastutil.dsi.unimi.it/</url> + </dependency> + + <!-- + <dependency> + <id>mg4j</id> + <version>2.0.1</version> + <url>http://mg4j.dsi.unimi.it/</url> + </dependency> + --> + </dependencies> |
From: Bryan T. <tho...@us...> - 2008-01-10 15:40:08
|
Update of /cvsroot/cweb/lgpl-utils/src/java/it/unimi/dsi/mg4j/io In directory sc8-pr-cvs4.sourceforge.net:/tmp/cvs-serv19198/src/java/it/unimi/dsi/mg4j/io Added Files: zeta3.8 delta.16 zeta3.16 delta.8 gamma.16 gamma.8 Log Message: refactored to the sesame 2.x api - the current implementation does NOT support context, but high level query appears to work. i have not integrated with the "TCK" (technology compatibility kit) so there has not been much verification of this integration so far. the scale-out triple store is running again. working on concurrency control for the local triple store next, which will remove the memory cap during inference. working on concurrent load performance - modestly good so far for the scale-out triple store (10k+ tps) --- NEW FILE: zeta3.8 --- (This appears to be a binary file; contents omitted.) --- NEW FILE: gamma.8 --- (This appears to be a binary file; contents omitted.) --- NEW FILE: zeta3.16 --- (This appears to be a binary file; contents omitted.) --- NEW FILE: delta.16 --- (This appears to be a binary file; contents omitted.) --- NEW FILE: delta.8 --- (This appears to be a binary file; contents omitted.) --- NEW FILE: gamma.16 --- (This appears to be a binary file; contents omitted.) |
From: Bryan T. <tho...@us...> - 2007-12-11 14:35:08
|
Update of /cvsroot/cweb/gom-for-jdbm/src/java/org/CognitiveWeb/generic/store/jdbm In directory sc8-pr-cvs4.sourceforge.net:/tmp/cvs-serv13225/src/java/org/CognitiveWeb/generic/store/jdbm Modified Files: MyBTree.java Log Message: modified method signature for 1.4 compliance. Index: MyBTree.java =================================================================== RCS file: /cvsroot/cweb/gom-for-jdbm/src/java/org/CognitiveWeb/generic/store/jdbm/MyBTree.java,v retrieving revision 1.9 retrieving revision 1.10 diff -C2 -d -r1.9 -r1.10 *** MyBTree.java 17 Jun 2006 09:10:58 -0000 1.9 --- MyBTree.java 11 Dec 2007 14:35:01 -0000 1.10 *************** *** 349,353 **** } ! public ICompositeKey newCompositeKey(Object coercedKey, long oid) { return new CompositeKey(coercedKey,oid); --- 349,353 ---- } ! public Object newCompositeKey(Object coercedKey, long oid) { return new CompositeKey(coercedKey,oid); |
From: Bryan T. <tho...@us...> - 2007-12-07 13:05:31
|
Update of /cvsroot/cweb/gom-for-jdbm In directory sc8-pr-cvs4.sourceforge.net:/tmp/cvs-serv26680 Modified Files: project.xml Log Message: trying to converge dependencies. Index: project.xml =================================================================== RCS file: /cvsroot/cweb/gom-for-jdbm/project.xml,v retrieving revision 1.12 retrieving revision 1.13 diff -C2 -d -r1.12 -r1.13 *** project.xml 6 Dec 2007 12:09:53 -0000 1.12 --- project.xml 7 Dec 2007 13:05:24 -0000 1.13 *************** *** 241,245 **** <groupId>org.CognitiveWeb</groupId> <artifactId>cweb-generic-test</artifactId> ! <version>1.1-b3-dev</version> <url>http://www.cognitiveweb.org/</url> </dependency> --- 241,245 ---- <groupId>org.CognitiveWeb</groupId> <artifactId>cweb-generic-test</artifactId> ! <version>1.1-b2-dev</version> <url>http://www.cognitiveweb.org/</url> </dependency> |