#66 Sub-document querying

open
mhaye
5
2009-02-07
2009-02-07
mhaye
No

This has come up several times with transcription projects, first at Stanford and most recently in meetings at University of Sydney.

Here's the basic setup: there are several volumes being indexed, each volume containing a number of items (e.g. individual poems). We really would like a crossQuery search to pull up individual items, but when we view the item in dynaXML we want to see the whole volume (its context).

Discussion

  • mhaye
    mhaye
    2009-02-07

    I figured out a way to do this fairly easily, and checked in an experimental implementation. It is used by adding xtf:subDocument="mySubDocName" attributes to nodes in the index prefilter.

    The resulting <docHit> elements output by crossQuery will have an additional subDocument attribute, and there will be a separate <docHit> for each matched sub-document.

    In dynaXML, top-level <xtf:snippet> elements will contain a subDocument attribute as well.

    To restrict full-text search to a particular sub-document only, there is a new query element called <subDocument> with the same syntax and usage as the existing <sectionType> element. This should be particularly useful in dynaXML when jumping from a crossQuery hit by rank number, as without this the number wouldn't match up.

    Each sub-document inherits meta-data from its parent document. It may also specify its own additional meta-data which will only appear in docHits for that sub-document.

    Caveat: It is possible to create multiple sub-documents of the same name. It is also possible to have text outside of a sub-document. In these cases, each contiguous section will be a separate search unit, which might not be what the end-user expects. Unless you really want this behavior, care should be taken to put all text with sub-documents and give each sub-document a unique name.

     
  • mhaye
    mhaye
    2009-02-07

    • status: open --> open-accepted