Hi Jim,

Here are a couple of answers regarding the indexing...I'll leave the PHP questions to Andrew.

All of the indexing rules happens in the compiled Java files, so the only way to tweak this (right now) is to go into the source, edit, and recompile as the XSLT file is not used in the 0.8 release. This is really quite limiting, so we're working on a way to map marc fields to the Solr fields in a more logical way (e.g. in XML or .properties).

If you need any help tweaking the indexing code, let me know...


On Sat, Mar 29, 2008 at 3:38 PM, James Farrugia <jfarrugi@drew.edu> wrote:
Hi Andrew, Wayne, ...

Could you please confirm/clarify some questions I have about how the query string is related to
the MARC record, and how the query strings are built, in version 0.7, 0.8, and 0.8.x (the next subrelease)?

1. In version 0.7 and greater, using the PHP loader, the mapping from MARC fields and subfields to words like
'title,' 'author2,' and 'genre' occurs via the file /vufind/import/marcxml2solr.xsl.

2. Are exactly the same mappings used by the Java loader in version 0.8 and 0.8.x?  If so, then if we modify
marcxml2solr.xsl  will the Java loader handle our changes?  For instance, maybe for whatever reason we don't want the 111 to be part of 'author2.'

3. The default search query that we can see when debug is on (e.g., Query: (titleStr:"Twain"^15 OR (title:(twain)^5 OR title2:(twain)^2)^10 OR author:(twain)^5  OR format:(twain) ...) is built explicitly
in /vufind/web/sys/SOLR.php.

4. Is function buildQueryString from SOLR.php what Andrew is modifying for the next subrelease? If so, will that code be extracted out into a config file?

5. In general, should the following 3 files be "in sync," in the sense that the values specified in each
should play well with each other?

* marc2xml2solr.xsl (specifying mappings between MARC tags and everyday field names);
* SOLR.php (with its specification in buildQueryString of the query string in terms of everyday field names and weights);and
* vufind/solr/conf/schema.xml , which specifies (in terms of everyday field names) parameters for solr like type, indexed, and stored.

I'm asking all this because I want to get an idea of which pieces of the code need to be touched in order to modify how the field names are associated with MARC values, how these field names (and hence MARC
values) get indexed  in solr, and then how they are weighted in the query.

Also, I'm trying to find out how these pieces may vary in the future and hoping that we will always be able
to tweak these kinds of values.

Thanks very much,


Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
VuFind-General mailing list