Menu

#80 SGML invalid structure

TODO-3.5
closed-wont-fix
nobody
5
2023-10-14
2023-10-14
ram
No

Hi! It seems that sgml print mode has and invalid structure, i.e.:

<CONCORDANCE>
<attribute type=positional name="word" anr=0>
<attribute type=positional name="FORM" anr=1>
<attribute type=positional name="LEMMA" anr=2>
<attribute type=positional name="TAG" anr=3>
<attribute type=positional name="SHORT_TAG" anr=4>
<attribute type=positional name="MSD" anr=5>
<attribute type=positional name="NEC" anr=6>
<attribute type=positional name="SENSE" anr=7>
<attribute type=positional name="SYNTAX" anr=8>
<attribute type=positional name="DEPHEAD" anr=9>
<attribute type=positional name="DEPREL" anr=10>
<attribute type=positional name="COREF" anr=11>
<attribute type=positional name="TOKENID" anr=12>
<LINE><MATCHNUM>2</MATCHNUM><STRUCS>&lt;text_id MX_ETL2019_0000002&gt;</STRUCS><CONTENT> <MATCH><TOKEN>Texto    texto   NCMS000 NC  pos=noun|type=common|gen=masculine|num=singular -   06387980-n  (sn:1(grup-nom-ms:1(n-ms:1))    0   sentence    -   -   1</TOKEN></MATCH> <TOKEN>.  .   Fp  Fp  pos=punctuation|type=period -   -   (F-term:2)) 1   f   -   -   2</TOKEN></CONTENT></LINE>
</CONCORDANCE>

Where attribute tags doesn't have a closing tag, like:

<attribute type=positional name="word" anr=0></attribute>
<attribute type=positional name="word" anr=0 />

Discussion

  • Stephanie Evert

    Stephanie Evert - 2023-10-14

    Not a bug: SGML allows omission of closing tags – you just have to assume a suitable DTD for the output produced by CQP. Note that your second suggestion isn't valid SGML and would have to be written <attribute// instead.

    If your SGML output also included the kwic line with some s-attributes shown, you'd get many more validation errors (because nothing guarantees that open/close tags match up within a kwic line, and they can also overlap between context and match).

    If we ever find the nerves to implement an XML output mode, we'll make sure it's valid XML and will also provide a DTD for it. But it's entirely unclear so far what the format should look like.

     
  • Stephanie Evert

    Stephanie Evert - 2023-10-14

    PS: If you want to pursue this, please add a feature request “XML output mode for CQP” for CWB v3.6.

     
  • Stephanie Evert

    Stephanie Evert - 2023-10-14
    • status: open --> closed-wont-fix
     

Log in to post a comment.