From: Wolfgang M. <wol...@ex...> - 2004-12-13 19:31:22
|
I just uploaded a new development snapshot, featuring two important changes: 1) eXist now supports the manual creation of range indexes to be used with standard XPath operators 2) improved index configuration by collection and at runtime 1) In addition to the existing structure and fulltext indexes, range indexes provide a shortcut for the database to directly select nodes by their typed values. Range indexes are used by all comparison operators (<,>,!=,lt, gt ...), as well as some standard XPath functions like fn:contains() or fn:matches(). A typical example would be a document with many "price" elements containing a floating point number. So far, to evaluate an expression like "price < 100.0", the query engine had to do a full scan over all price elements, converting each node value to the desired type. Using the new index facility, you can now define a range index on price: <create path="//price" type="xs:double"/> During indexing, eXist will try to convert all <price> values to an xs:double instance and put these values into the range index. Values that cannot be cast to xs:double are ignored. The index is automatically maintained during subsequent XUpdate operations. With the index defined, eXist can evaluate expressions like "price >= 55.0" with a single index lookup. Moreover, the query engine does not need access to the actual node values. These benefits also apply to string comparisons: without a range index, eXist will try to use the fulltext index to restrict the number of nodes to look at, but it still has to scan the resulting node set to filter out wrong matches. So, though the difference will not be as dramatic as with numeric data, defining a range index on string values makes sense. Currently, the following types are supported: xs:string, xs:double, xs:integer, xs:boolean (xs:float is supported, but doesn't work for some reason...). xs:integer is internally limited to a signed long. 2) So far, the only possibility to configure the fulltext index was to edit conf.xml. However, with the new range index facility, we needed a way to configure indexes at runtime and with a finer granularity. I have thus decided to extend the scheme already used for trigger configuration: every collection may have a collection configuration document, which is itself stored within the database. For example, to create some numeric indexes on the mondial data, one can create a file mondial.xconf: <?xml version="1.0" encoding="ISO-8859-1"?> <collection xmlns="http://exist-db.org/collection-config/1.0"> <!-- Defines a bunch of numeric indexes on the mondial collection. --> <index> <fulltext default="all" attributes="yes"/> <create path="/mondial//population" type="xs:integer"/> <create path="/mondial//population_growth" type="xs:double"/> <create path="/mondial//infant_mortality" type="xs:double"/> <create path="/mondial//inflation" type="xs:double"/> </index> <triggers> <!-- Optional trigger configuration --> </triggers> </collection> Contrary to the trigger configuration in previous versions, this collection configuration should _not_ be stored directly into the target collection. Instead, it should go into a collection below /db/system/config. So if our collection is /db/mondial, the collection config should be stored into /db/system/config/db/mondial/mondial.xconf. This way, we have only one central location for all configurations and avoid deadlocks between concurrent threads when scanning the configuration hierarchy. Also, configurations are now "inherited", i.e. the CollectionConfigurationManager always scans top-down through the collections to find configurations to apply. Settings in child collections overwrite those found in parent collections. So if you are lazy, you can just configure all indexes in /db/system/config/db/collection.xconf. Some things to note: * the doctype="" attribute - previously used to define indexes for a given root node - has been dropped. * namespaces are now fully supported * a range index cannot be created on nested or mixed content elements yet, but I plan to add that. * the index-path to be specified in the path="" attribute looks like an XPath expression, but it is not. I have actually been thinking about using full XPath here, but then the index creation would have to occur after the document has been stored and index maintenance during XUpdates would become rather difficult. More information is available at /exist/indexing.xml if you installed the snapshot. The snapshot can be downloaded here: http://prdownloads.sourceforge.net/exist/eXist-snapshot-20041213.jar The new features definitely need more testing. Any feedback will be welcome. Please report your problems or success stories. Wolfgang |