Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#46 textcrit module RNG validation hangs libxml2

closed-rejected
5
2008-09-27
2008-09-27
Tara L Andrews
No

Attempts to parse a TEI RNG schema that includes the textcrit module (including the tei_all.rng that is available on the website) causes libxml2 to hang, as demonstrated here (note where I terminated the process for the validation against tei_all.rng).

tla@minuscule:~/excerpt> for i in ms_analysis all fixed; do echo Using tei_$i.rng schema for validation; date; xmllint --noout --relaxng tools/tei_$i.rng venice_901.xml; done; date
Using tei_ms_analysis.rng schema for validation
Sat Sep 27 19:52:49 BST 2008
venice_901.xml validates
Using tei_all.rng schema for validation
Sat Sep 27 19:52:50 BST 2008
Terminated
Using tei_fixed.rng schema for validation
Sat Sep 27 19:57:26 BST 2008
venice_901.xml validates
Sat Sep 27 19:57:36 BST 2008
tla@minuscule:~/excerpt> xmllint --version
xmllint: using libxml version 20631
compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib

The tei_ms_analysis.rng is one generated by Roma, using tei_ms as a base and adding the "analysis" module. The tei_fixed.rng module is tei_all with one <group>...</group> element and its contents removed - that is, the element that appears right after these lines:

<define xmlns="http://relaxng.org/ns/structure/1.0" name="app">
<element name="app">
<a:documentation xmlns:a="http://relaxng.org/ns/compatibility/annotations/
1.0">(apparatus entry) contains one entry in a critical apparatus, with an optional lemma and at least one reading.</a:documentation>

So there is something about this:
18262,18306d18261
< <group>
< <zeroOrMore>
< <ref name="model.global"/>
< </zeroOrMore>
< <optional>
< <ref name="lem"/>
< <zeroOrMore>
< <ref name="model.global"/>
< </zeroOrMore>
< <optional>
< <ref name="wit"/>
< <zeroOrMore>
< <ref name="model.global"/>
< </zeroOrMore>
< </optional>
< </optional>
< <zeroOrMore>
< <choice>
< <group>
< <ref name="model.rdgLike"/>
< <zeroOrMore>
< <ref name="model.global"/>
< </zeroOrMore>
< <optional>
< <ref name="wit"/>
< <zeroOrMore>
< <ref name="model.global"/>
< </zeroOrMore>
< </optional>
< </group>
< <group>
< <ref name="rdgGrp"/>
< <zeroOrMore>
< <ref name="model.global"/>
< </zeroOrMore>
< <optional>
< <ref name="wit"/>
< <zeroOrMore>
< <ref name="model.global"/>
< </zeroOrMore>
< </optional>
< </group>
< </choice>
< </zeroOrMore>
< </group>

that is sending libxml2 into a spin.

I'd be happy to send the file I was using for validation, but any stub file will display the same problem, so it isn't really necessary.

Discussion

1 2 > >> (Page 1 of 2)
  • James Cummings
    James Cummings
    2008-09-27

    • assigned_to: nobody --> rahtz
     
  • I don't like to have to say it, but this looks
    like a problem with xmllint, not the TEI. The other
    RELAX NG processors I have tried (jing and rnv)
    do not complain, and the schema appears to be valid
    RELAX NG syntax. I think you need to submit a bug
    report to Daniel Veillard.

     
    • status: open --> closed-rejected
     
  • Tara L Andrews
    Tara L Andrews
    2008-09-27

    Fair enough that it is probably a libxml2 bug, and I will be happy to send the report there, but there are tons of open bugs on that bugzilla instance and in the meantime the schema validation I need doesn't work. Is there no possibility of a workaround? (Unfortunately I don't know relaxng syntax, so can't suggest one.)

     
  • The workaround surely is to use a different validator?

    I can't change the RELAXNG content model unless I get some
    feedback from xmllint about which feature it cannot cope with.
    That construct expresses the constraint the TEI desires
    (we obviously can't change that without lots of heartache),
    so I can't just simplify it.

    How did you isolate this particular bit?

     
  • Tara L Andrews
    Tara L Andrews
    2008-09-27

    Sadly, libxml2 is the only means of validating RelaxNG in Perl of which I'm aware, and that's what I need. (I could call out to another command-line validator via a system call, but that is beyond ugly.)

    A friend who knows more about XML than I do isolated this bit for me. I'll ask him on Monday how he did it.

     
  • Sounds familiar. The overhead of system(" java ..... ") would be high, I know, but system("rnv ....") may be acceptable? If your friend can tell me what makes that particular bit of RELAX NG less acceptable than many similar content models, I'd be happy to reexamine it.

     
  •  
    Attachments
  • I looked at this again today, and did at least find a fix which may
    get you working for a bit. See attached app.patch. I applied
    this to tei_all.rng, and xmllint then worked. It removes the use
    of rdgGrp - if you don't plan to use that, then you'll be fine.

    I looked at the relevant part for a while, and I really can't
    see any thing we are doing which is strange or illegal or
    nonsensical in the content model of <app>. So its not at all
    clear what bug in xmllint it is exercising.
    File Added: app.patch

     
  • Tara L Andrews
    Tara L Andrews
    2008-09-28

    Thanks for that! (Gnome bugzilla is being uncooperative at the moment, so I am about to resort to sending the report to them via mailing list. Yet another reason I'm not optimistic about getting the bug fixed that way.)

    I can work around the use of rdgGrp in the meantime though.

     
1 2 > >> (Page 1 of 2)