[Summa-devel] ANNOUNCE: Summa 1.5.0 and license change to Apache 2

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On behalf of the Summa team I am proud to announce the immediate
availability of Summa 1.5.0. It's been almost two months since our last
release, which means that this has been a longer release cycle than
usual. Indeed this announcement mail is also longer than usual so go
grab a cup'a before you read on.

As always you can get the release from:

https://sourceforge.net/projects/summa/files/releases/1.5.0

ABI BREAK
First of all you might have noticed that the minor version has been
bumped from 4 to 5. This is because we had to break the binary interface
(ABI) in order to roll in some new features. API and source
compatibility has been retained, so fear not.

What an ABI break means that RMI services can not talk together because
the wire format of the serialized classes have changed. This means that
you have to upgrade you entire Summa stack to the 1.5.0 series in one go
if you want to make the change.

NOTABLE NEW FEATURES
The reason that we changed the ABI was in order to add some new methods
to the Storage interface. The biggest visible change is the new
Storage.batchJob() method. It allows you to run a script that resides in
the classpath of the Storage process on a specified subset of the
storage. The script can be in any scripting language supported by the
JVM via the ScriptEngine SPI. This has been tested with the native
Javascript support of Java 6 as well as Python/Jython (requires Jython
>= 2.5.1 in the classpath).

We have also added variants of Storage.flush() and Storage.flushAll()
that takes a set of metaflags affecting how the flushed record(s) should
be updated. One new thing enabled this way is the flag that says not to
update a record if it is already stored in the same state (eg. all
fields are identical).

LICENSE CHANGE
We have changed the license to Apache 2 (was LGPL 2 before that). There
are many reasons for this, but a few of the main ones are:

 - Ease code flow between Summa and Lucene, and Summa and Solr. This is
   especially important now that we are engaging more in upstream
   development (more on this in the blog later[1]).

 - Ease vendor adoption. It is a known fact that some vendors stay away
   from copyleft licenses by principle. Whether or not this is a good
   thing is besides the point, we just want people to deploy Summa
   anywhere it's applicable

RELEASE SUMMARY
The release summary is longer than usual and includes, not only new
features, but also some important bug fixes:

========================
2010-01-27: Summa 1.5.00
========================

 * NOTICE: Convert entire Summa source tree to Apache 2 license

 * Storage ABI break (source compatibility preserved): Add variants
	  of flush() and flushAll() that takes an extra QueryOptions
	  parameter.

 * ABI break: Add setter methods for QueryOptions.meta(). We alreay
	  broke the Storage binary protocol so it's no biggie.

 * Update H2 dep. to 1.2.126, which among other fixes includes: "The
	  database is now closed after an out of memory exception, because
	  the database could get corrupt otherwise.". and disable the H2 
	  level 2 cache by default

 * New API Storage.batchJob for running predefined batch jobs in any 
 	  scripting language accrose of subsets of the storage.
	  Bundled sample batch jobs with Summix: 
	  collect_ids.job.js, delete.job.js, and count.job.js 

 * Add a 'holdings' command to storage-tool

 * Add a __holdings__ record to storage containing detailed storage 
 	  statistic. Show storage __holdings__ in status page and feed
	  The holdings can also be looked up with the "holdings" action in
	  storage-tool.sh

 * Define a meta flag "TRY_UPDATE", which alter default behaviour of 
	  Storage.flush() and Storage.flushAll() to not update records if they 
	  already are up to date.

 * New boolean property on RecordWriter:
	  "summa.storage.recordwriter.tryupdate" to pass TRY_UPDATE=true in
	  the QueryOptions to flushAll()

 * Bugfix: Allow both old and new H2 file formats in H2Storage and
	  SuggestStorageH2 (ie. allow both .data.db and .h2.db)

 * Bugfix: There was an elusive deadlock in QueuedInputStream used
	  by the ZipParser class. It both involved a missing 'synchronized'
	  statement and a bad cast of byte to int. We have replaced the
	  entire implementation with Java's native Piped{In,Out}putStream.

 * Bugfix: The result of an IndexLookup was scrabled when a
	  sortLocale was defined for the facet

 * Bugfix: The FacetResults did not perform entity escaping
	  correctly

 * Bugfix: UniqueTimestampGenerator.MAX_TIME was not constructed
	  correctly and was in fact negative!

 * Bugfix: LuceneManipulator kept running in the event of out of
	  disk space

 * BugFix: Tag-representations were double entity-encoded

 * Bugfix: The SortedPoolImpl was not thread-safe for reads

 * BugFix: Sort failed with an ArrayIndexOutOfBoundsException if
	  certain documents did not contain a term for the sort field

 * Bugfix: The sample website did not support sorted search or
	  paging

 * Bugfix: The MultipassSortcomparator works now, but used the
	  memory-inefficient BitsArrayInts
	  Testing: A bug has been discovered in BitsArrayPacked, where
	  assigning to a value nullifies the subsequent value

 * Feature: Made the LuceneManipulator optionally multi threaded
	  (deafult is 1 thread, which mimicks the old behaviour). As par of
	  this, the IndexManipulator-interface was extended with support
	  for signalling if the order of documents gets unreliable

 * Feature: Added filter for post-processing in ingest workflows
	  involving full dumps. The primary usage is clearing of old data
	  from Storage

 * Feature: Provided an option for enabling lenient marc-parsing in
	  ISO2709ToMARCXMLFilter

 * Feature: Made it possible to specify the sort comparators used by
	  a LuceneSearchNode, making it possible to choose between memory
	  usage and speed when sorting fields with many terms. Still in the
	  experimental phase

 * Make StorageRunner.main(), SummaSearcherRunner.main(), and
	  FilterControl.main() do a System.exit(1) if they encounter a fatal 
	  top level exception

 * Use 'set -e' in all Summix .sh script to have them bail out on
	  any uncaught errors

 * Make all Summix .sh shell scripts use 'exec' when launching their
	  primary child process. This way we don't litter 3 processes for
	  each service - we have one process/service and that is that of
	  the JVM.

 * Logging for Summix: Services now log to log/<service-type>.log
	  and log/<service-type>.fatal.log. All *-tool.sh log to
	  log/tools.log. Here <service-type> is derived from the basename of
	  the config file

Cheers,
Mikkel

[1]: http://sbdevel.wordpress.com/

[Summa-devel] ANNOUNCE: Summa 1.5.0 and license change to Apache 2

Lucene/Solr based search engine and workflow system

[Summa-devel] ANNOUNCE: Summa 1.5.0 and license change to Apache 2