Menu

#72 XML output mode for CQP

TODO-3.6
open
xml (1)
6
2023-10-21
2023-10-14
ram
No

This request originates from https://sourceforge.net/p/cwb/bugs/80/: The desire is to have add an XML print mode to CQP that always generates valid XML.

The DTD for this print mode is entirely unclear, so suggestions (in comments below) are very welcome.

Discussion

  • Stephanie Evert

    Stephanie Evert - 2023-10-14
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1 +1,3 @@
    -Hi, I am opening this request as indicated in https://sourceforge.net/p/cwb/bugs/80/
    +This request originates from https://sourceforge.net/p/cwb/bugs/80/: The desire is to have add an XML print mode to CQP that always generates valid XML.
    +
    +The DTD for this print mode is entirely unclear, so suggestions (in comments below) are very welcome.
    
    • assigned_to: Stephanie Evert
    • Group: TODO-3.5 --> TODO-3.6
    • Priority: 5 --> 6
     
  • Stephanie Evert

    Stephanie Evert - 2023-10-14

    Note to those not familiar with CQP print modes: Their implementation is a horrible mess, so we are reluctant to add extensions and very limited in what can be achieved. Moreover, the print modes only affect some CQP output (kwic concordances, frequency tables from group) but by far not all.

     
  • ram

    ram - 2023-10-14

    About the DTD, I will use the same SGML structure but XML compliant. For more ideas about the schema, FreeLing output formats could be an useful resource.

     
  • Stephanie Evert

    Stephanie Evert - 2023-10-15

    A kwic concordance (where left and right context might not even contain complete tokens!) is very different from a list of sentences with pre-determined annotation as in the FreeLing output. I don't think we can learn much from it to help us address the challenges of kwic XML output.

    SGML print mode is really badly broken if you display s-attributes in the concordance. It also includes them (and any p-attributes) as plain text in the tokens rather than in a way that allows them to be processed e.g. with XSLT.

     
    • ram

      ram - 2023-10-21

      Yes, I had to found a work around the attribute separator. My suggestion about FreeLing was mainly for the possibility to display the token and its attributes as nodes, instead of plain text. But it seems it implies a lot of fixes and that is something that is going to be fix in version 4. Is that the case, is there any draft for the XML output?

       
      • Andrew Hardie

        Andrew Hardie - 2023-10-21

        No there's not, because it's (a) really far in the future and (b) not going to be remotely difficult when we actually get there.

         

Log in to post a comment.