#12 CQPweb: XML support

CQPweb (34)

This is the big enhancement for version 3.0: many, MANY users have asked for it.

Just as the "text-based restrictions" parallel the "written text restrictions" in BNCweb, so the "XML-based restrictions" will need to parallel the "utterance-by-speaker-type" system in BNCweb.

Each XML span (ie s-attribute) which is to be covered in this way (and note, not all of the XML in a given corpus needs to be) will need to be identified by the combination of (a) an element-name (b) some given attribute. Its "is" in the database will then look a bti like this:

xml_metadata_for_CORPUSNAME [parallel to text_metadata_for_CORPUSNAME]
id gender class ... CQPbegin CQPend
u|who|S933 m AB ... \d\d\d\d \d\d\d\d

Boite, however, this kind of "natural" system for XML identifiers won't work, because the XML segment is not *uniquely* identified. Two solutions:
(1) allow CQPbeing and CQPend to contain *multiple* cwb-indexes
(2) enforce uniqueness of XML elements - so "who" could not be used for u, but "id" could be.

Neither of these is entirely satisfactory and this needs careful thinking about.

Also note that every different s-attribute will require (a) a different set of CWB-frequency indexes and (b) a separate set of frequency tables . This function will be **VERY** hungry of disk space.


  • Andrew Hardie

    Andrew Hardie - 2016-07-31
    • status: open --> closed
    • Group: --> TODO-3.5
  • Andrew Hardie

    Andrew Hardie - 2016-07-31

    done in Q1 2016


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks