2.9.1.1.0 Maintenance Release based on Lucene 2.9 core base
* New Lucene Core base libraries
* Full Lucene Test Suites certified
* Fixed bug enqueue more rowids than required when using OnLine mode and ExtraTabs,
WhereCondition parameters
* Fixed operator priority when WhereCondition have OR operator
* DefaultUserDataStore now uses an array of cached fields to improve performance
* Spanish Analyzer use latest ASCIIFoldingFilter
* high_freq_terms(idx_name,term,max_num_term) pipeline table function was added to return
high frequent terms and their associated docFreq value.
* index_terms(idx_name,term) pipeline table function was added to return a list
of terms and their associated frequency.
* DefaultUserDataStore now have support for ANALYZED, ANALYZED_WITH_VECTORS,
ANALYZED_WITH_OFFSETS, ANALYZED_WITH_POSITIONS and ANALYZED_WITH_POSITIONS_OFFSETS
Lucene Field option values
* OJVMLock was replaced by SingleInstanceLockFactory for per instance locking, cross
sessions lockings are implemented by select for update functionality
* An automatic upgrade from 2.9.0 its possible without Index deletions,
you have to only execute:
ant upgrade-domain-index
ant ncomp-lucene-ojvm (10g only)
ant jit-lucene-classes (11g only)
2.9.0.1.0 Initial release based on Lucene 2.9.0 core base
* Tested with Oracle 11gR2, 11gR1 and 10.2 databases
* DefaultUserDataStore do a SAX parsing to get text nodes and attributes from an XMLType value.
* A SimpleLRUCache is used to load rowids and his associated Lucene doc id, this reduce
memory consumption when querying very big tables. A new parameters has been added,
CachedRowIdSize by default 10000 to control the size of the LRU cache.
* Lucene Domain Index core was updated to use TopFieldCollector and to avoid
computation time when lscore() is not used.
* Two new parameter has been added NormalizeScore which control when to track the
Max Score and PreserveDocIdOrder when querying, both parameters are consequence
of new Lucene Collector API and boost perforamnce when querying.
* A table alias L$MT is defined for the master table associated to the index to be
used in complex queries to associate columns from master tables and columns from
dependant tables
2.4.1.1.0 (mantenaince release based on Lucene 2.4.1, 27/Mar/09)
* Do not store internal parameters into system's views and force to PopulateIndex:false
* After every sync, now files marked as deleted are purged to free BLOB storage
* Added lfacets aggregated function for doing facets
* CountHits function no longer requires sort argument
* Filter are stored/retrived only using QueryParser.toString() key
* UN_TOKENIZED format string at DefaultUserDataStore class was replaced by
NOT_ANALYZED or NOT_ANALYZED_STORED according to new Lucene definitions.
* Fix bug when sync try to process more than 32767 rowids enqueued.
* Added parameters for highlighting functions Formatter, MaxNumFragmentsRequired,
FragmentSeparator and FragmentSize.
* Added PerFieldAnalyzer parameter to use independent Analyzer for each columns.
* Added sample of a custom Formatter org.apache.lucene.search.highlight.MyHTMLFormatter
2.4.1.0.0 (first release based on Lucene 2.4.1, 9/Mar/09)
* Fix compatibilty problem between 10g/11g SQL Date representation on pipeline
table function.
2.4.0.1.0 (mantenaince release based on Lucene 2.4.0, 10/Jan/09)
* Added Rhighlight(index_name VARCHAR2, qry VARCHAR2, cols VARCHAR2, rType IN VARCHAR2, rws IN SYS_REFCURSOR) RETURN ANYDATASET
pipeline table function
* Added Phighlight(index_name VARCHAR2, qry VARCHAR2, cols VARCHAR2, stmt IN VARCHAR2) RETURN ANYDATASET
pipeline table function
* Added lhighlight(NUMBER):VARCHAR2 ancilliary operator
* Removed usage of Lucene deprecated API (Hits and IndexWriter for example)
* Usage of FIRST_ROWS optimizer hits to decide how many rows load at first time
* sync, optimize and rebuild interfaces now use index_name or [owner,index_name] arguments
* A better build system to build Lucene Domain Index from sources
* More tests
* Tested against 11.1.0.7 and 10.2.0.3
* See online docs to see usage of FIRST_ROWS and lhighlight() operator
2.4.0.0.0 (production release based on Lucene 2.4.0, 10/10/08)
* Added parameter for CLOB enconding
* More Like this function
* NGram analyzer
* EnglishWikipediaAnalyzer
* DataStore interface include API for setting current connection
* Now analyzers, queries, snowball and wikipedia contrib packages are required
2.3.2.0.0 (binary release based on Lucene 2.3.2, 1/Jun/08)
* Compiled against Lucene 2.3.2 production release
* Used latest API for merging based on RAM usage
* Use Writer for deleting during Sync
* Confirm 4x improvement during indexing reported by Lucene dev group
* Fix workaround which changes order of the rowids in ODCRIDList
* Added an Spanish WikiPedia Analyzer for testing
* Reports IOException instead of RunTimeException to signal EOF or File Not Found
* Decouple Flush functionality from TableIndexer
2.2.0.2.2 (fixpack for 2.2.0.2.0 release, 5/Apr/08)
* Added Rowid to lucene doc id caching.
* Usage of LoadFirstFieldSelector during Document loading to only load rowid field.
* Added a test suite which index a wikipedia dump inside the OJVM.
2.2.0.2.1 (fixpack for 2.2.0.2.0 release, 12/Dec/07)
* DefaultUserDataStore requires usage of XPath text() expresion for getting only
textual value
* Added logging info SQL being executed at table indexer
* Change document logging to FINER level
* More pre-defined mapping at DefaultUserDataStore for NUMBER, BINARY_FLOAT,
BINARY_DOUBLE, TIMESTAMP, TIMESTAMPTZ and TIMESTAMPLTZ Oracle types.
* New parameter PopulateIndex:[true|false] for populating or not Lucene Index
at creation time.
* New parameter IncludeMasterColumn:[true|false], to choose whether or not
index master column, useful with Virtual Columns and XMLType.
* New parameter BatchCount:integer, to choose how many rows count are enqueued for
indexing using create ... index ... parameters('SyncMode:OnLine');
* Creating an index with SyncMode:OnLine causes that LuceneDomain index will enqueue
batchs of "BatchCount" rows for index by AQ PLSQL callback in background. Lucene Domain
index is inmediatilly ready for querying after create.
* Batch rowid indexing is doing using a pipeline function.
2.2.0.2.0 (third major relese synchronized with lucene 2.2.0, 12/Dec/07)
Binary download (see package ojvm):
http://sourceforge.net/project/showfiles.php?group_id=56183
CVS accesss:
cvs -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism login
cvs -z3 -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism co -P ojvm
* sort by column passed at lcontains(col,query_parser_str,sort_str,corr_id) syntax
* Logging support using Java Util Logging package
* JUnit test suites emulating middle tier environment
* Support for rebuild and optimize online for SyncMode:OnLine index
* XMLDB Export
* AutoTuneMemory parameter for replacing MaxBufferedDocs parameter
* Functional column support
2.2.0.1.1 (second release, 27/Sep/07 05:39 AM)
Binary download:
https://issues.apache.org/jira/secure/attachment/12366661/ojvm-09-27-07.tar.gz
CVS accesss:
cvs -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism login
cvs -z3 -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism co -P ojvm
* LuceneDomainIndex.countHits() function to replace select count from .. where lcontains(..)>0 syntax.
* support inline pagination at lcontains(col,'rownum:[n TO m] AND ...") function
* rounding and padding support for columns date, timestamp, mumber, float, varchar2 and char
* ODCI API array DML support
* BLOB parameter support
2.2.0.1.0 (first release synchronized with lucene 2.2.0, 14/Sep/07 06:44 AM)
CVS accesss:
cvs -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism login
cvs -z3 -d:pserver:anonymous@dbprism.cvs.sourceforge.net:/cvsroot/dbprism co -P ojvm
* Synchronized with latest Lucene 2.2.0 production
* Replaced in memory storage using Vector based implementation by direct BLOB IO,
reducing memory usage for large index.
* Support for user data stores, it means you can not only index one column at time
(limited by Data Cartridge API on 10g), now you can index multiples columns at base
table and columns on related tabled joined together.
* User Data Stores can be customized by the user, it means writing a simple Java Class
users can control which column are indexed, padding used or any other functionality
previous to document adding step.
* There is a DefaultUserDataStore which gets all columns of the query and built a
Lucene Document with Fields representing each database columns these fields are
automatically padded if they have NUMBER or rounded if they have DATE data, for example.
* lcontains() SQL operator support full Lucene's QueryParser syntax to provide access
to all columns indexed, see examples below.
* Support for DOMAIN_INDEX_SORT and FIRST_ROWS hint, it means that if you want to
get rows order by lscore() operator (ascending,descending) the optimizer hint will
assume that Lucene Domain Index will returns rowids in proper order avoided an
inline-view to sort it.
* Automatic index synchronization by using AQ's Call Back.
* Lucene Domain Index creates extra tables named IndexName$T and an Oracle AQ named
IndexName$Q with his storage table IndexName$QT at user's schema,
so you can alter storage's preference if you want.
* ojvm project is at SourceForge.net CVS, so anybody can get it and collaborate
* Tested against 10gR2 and 11g database.
2.0.0.1.3 (third release, 09/Jan/07 11:40 AM)
https://issues.apache.org/jira/secure/attachment/12348574/ojvm-01-09-07.tar.gz
* The Data Cartridge API is used without column data to reduce the data stored on
the queue of changes and speedup the operation of the synchronize method.
* Query Hits are cached associated to the index search and the string returned by
the QueryParser.toString() method.
* If no ancillary operator is used in the select, do not store the score list.
* The "Stemmer" argument is recognized as parameter given the argument for the
SnowBall analyzer, for example:
create index it1 on t1(f2) indextype is lucene.LuceneIndex parameters('Stemmer:English');.
* Before installing the ojvm extension is necessary to execute "ant jar-core" on
the snowball directory.
* The IndexWriter.setUseCompoundFile(false) is called to use multi file storage
(faster than the compound file) because there is no file descriptor limitation
inside the OJVM, BLOBs are used instead of File.
* Files are marked for deletion and they are purged when calling to Sync
or Optimize methods.
* Blob are created and populated in one call using Oracle SQL RETURNING information.
* A testing script for using OE sample schema, with query comparisons against
Oracle Text ctxsys.context index.
2.0.0.1.2 (second release, 20/Dec/06 02:03 PM)
https://issues.apache.org/jira/secure/attachment/12347614/ojvm-12-20-06.tar.gz
This new release of the OJVMDirectory Lucene Store includes a fully functional
Oracle Domain Index with a queue for update/insert massive operations and a
lot of performance improvement.
2.0.0.1.1 (first release, 28/Nov/06 01:04 PM)
https://issues.apache.org/jira/secure/attachment/12345967/ojvm-11-28-06.tar.gz
* The complet API for the Oracle Domain index was completed, but the solution for
the operator contains outside the where clause is not good.
* I will implement a singleton solution for the OJVMDirectory object when is used
in read only mode, typically when user performs select operations against tables
which have columns indexed with Lucene. This implementation will increase a lot
the final performance because the index reader will be ready for each select operation.
Obviously I will check if another user or thread makes a write operation on the
index to reload the read-only singleton.
* The queue for storing the changes on the index is not implemented yet,
I'll add it in a short time.
2.0.0.1.0 (initial implementation, 22/Nov/06 03:45 PM)
https://issues.apache.org/jira/secure/attachment/12345516/ojvm.tar.gz