Menu

#418 XML PARSE

GC 3.x
accepted
3
2024-07-23
2022-08-12
No

@edward-h added XML GENERATE completely and the bigger parts of XML PARSE in the compiler.
GnuCOBOL uses libxml2 for XML handling which seems to provide an API similar to XML PARSE, see control flow; in general the xml document is parsed in stream mode and when "events" are happening (start/end/fragment found/error occurred) a procedure is executed after special registers, mostly XML-EVENT and XML-TEXT, are set.

Different docs with nearly same content: registers and content of XML-EVENT: https://www.microfocus.com/documentation/visual-cobol/vc80/DevHub/HRLHLHCLANU020.html and, with content for XML-CODE https://www.microfocus.com/documentation/visual-cobol/vc80/DevHub/HRLHLHAXME01.html

After first discussions with @articuno this FR ticket will be the place to discuss more details and track the process.

Current state:

  • libcob/mlio.c uses libxml2 already - the build system is setup to handle libxml2 fully
  • libcob/common.h (cob_module) already has the necessary cob_fields for the registers and there are a multitude of functions to set those from C strings and integers
  • parsing nearly completely done
  • codegen done, likely quite finished
  • runtime design done:
    • int cob_xml_parse (cob_field *data, cob_field *encoding, cob_field *validation, int flags, void **state) (encoding and validation are optional, the only flag used so far is 1 = "set the national registers")
    • operate on data, set XML-CODE, XML-EVENT, XML-TEXT, ... by provided functionsset_xml_code() and friends
    • return nonzero when the parsing ends, otherwise execute the specified PROCESSING PROCEDURE
    • after end: execute [NOT] ON EXCEPTION statements as appropriate

Things to do:

  • cobc: finish register handling, especially the ANY LENGTH parts needed for XML registers
  • cobc: finish parsing for XML PARSE, including the XML-SCHEMA definition in SPECIAL NAMES
  • cobc: add special codegen for XML PARSE as the specific loop, state variable and exception check after loop.. is special (similar to SEARCH)
  • runtime: read and write the registers (first: only the exception part)
  • runtime: code actual parsing into internal xml_parse using libxml2 (planned to be done by @articuno)
  • add COBOL test cases to the testsuite

Things that can be postponed and may either be done last - or "in a later iteration": * handle validation * handle encoding * handle national * cobc: more checks for the parameters

Related

Discussion: IBM COBOL XML Parsing
Discussion: GSOC24 - Implement XML PARSE in GnuCOBOL using libxml2
Discussion: IBM COBOL for Linux

Discussion

  • Simon Sobisch

    Simon Sobisch - 2022-08-16

    There is sample code (both snipped and a complete program [which contains EXEC SQL so needs an esql preprocessor] at [6f24010e48] - and likely someone that can test with CI builds on a bigger source code, too.

     

    Related

    Discussion: 6f24010e48

  • Simon Sobisch

    Simon Sobisch - 2022-08-18
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,5 +1,7 @@
     @edward-h added `XML GENERATE` completely and the bigger parts of `XML PARSE` in the compiler.
    -GnuCOBOL uses libxml2 for XML handling which seems to provide an API similar to [`XML PARSE`](https://www.ibm.com/docs/en/cobol-zos/6.4?topic=statements-xml-parse-statement), see [control flow](https://www.ibm.com/docs/en/cobol-zos/6.4?topic=statement-control-flow); in general the xml document is parsed in stream mode and when "events" are happening (start/end/fragment found/error occurred) a procedure is executed after special registers, mostly `XML-EVENT` and `XML-TEXT`, are set. All 
    +GnuCOBOL uses libxml2 for XML handling which seems to provide an API similar to [`XML PARSE`](https://www.ibm.com/docs/en/cobol-zos/6.4?topic=statements-xml-parse-statement), see [control flow](https://www.ibm.com/docs/en/cobol-zos/6.4?topic=statement-control-flow); in general the xml document is parsed in stream mode and when "events" are happening (start/end/fragment found/error occurred) a procedure is executed after special registers, mostly `XML-EVENT` and `XML-TEXT`, are set.
    +
    +Different docs with nearly same content: registers and content of `XML-EVENT`: https://www.microfocus.com/documentation/visual-cobol/vc80/DevHub/HRLHLHCLANU020.html and, with content for `XML-CODE` https://www.microfocus.com/documentation/visual-cobol/vc80/DevHub/HRLHLHAXME01.html
    
     After first discussions with @articuno this FR ticket will be the place to discuss more details and track the process.
    
    @@ -7,18 +9,25 @@
    
     * libcob/mlio.c uses libxml2 already - the build system is setup to handle libxml2 fully
     * libcob/common.h (cob_module) already has the necessary `cob_field`s for the registers and there are a multitude of functions to set those from C strings and integers
    -* parsing mostly done
    -* only minimal codegen
    -* full codegen design done:
    -   * will be similar to an out-of-line `PERFORM ... UNTIL` of the specified `PROCESSING PROCEDURE` with the exit being based on the return value from the parsing function
    -   * will execute the `[NOT] [ON] EXCEPTION` statements depending on the values of `XML-CODE` and `XML-EXCEPTION` values 
    +* parsing nearly completely done
    +* codegen done, likely quite finished
     * runtime design done:
    -   *  `int cob_xml_parse (cob_field *data, cob_field *encoding, cob_field *validation, int flags)` (encoding and validation are optional, the only flag used is 1 = "set the national registers")
    -   *  operate on `data`, set `XML-CODE` by `set_xml_code()`, other registers via `COB_MODULE_PTR` and return nonzero when the parsing ends
    +    *  `int cob_xml_parse (cob_field *data, cob_field *encoding, cob_field *validation, int flags, void **state)` (encoding and validation are optional, the only flag used so far is 1 = "set the national registers")
    +    *  operate on `data`, set `XML-CODE`,  `XML-EVENT`, `XML-TEXT`, ... by provided functions`set_xml_code()` and friends
    +    *  return nonzero when the parsing ends, otherwise execute the specified `PROCESSING PROCEDURE`
    +    *  after end: execute `[NOT] ON EXCEPTION` statements as appropriate
    
     Things to do:
    
    -* code `cob_xml_parse` (planned to be done by @articuno) using libxml2
    -* as nested `XML PARSE` is possible: check if the we need some `void *state` in the function signature
    -* add COBOL test cases to the testsuite (until codegen is done use either local or checked in C stubs for testing)
    -* add missing codegen (more compiler work needed, I'd offer bigger assistance or do it completely)
    +* cobc: finish register handling, especially the `ANY LENGTH` parts needed for XML registers
    +* cobc: finish parsing for `XML PARSE`, including the `XML-SCHEMA` definition in `SPECIAL NAMES`
    +* cobc: add special codegen for `XML PARSE` as the specific loop, state variable and exception check after loop.. is special (similar to `SEARCH`)
    +* runtime: read and write the registers (first: only the exception part)
    +* runtime: code actual parsing into internal `xml_parse` using libxml2 (planned to be done by @articuno)
    +* add COBOL test cases to the testsuite
    +
    +Things that can be postponed and may either be done last - or "in a later iteration":
    +* handle validation
    +* handle encoding
    +* handle national
    +* cobc: more checks for the parameters
    
     
  • Simon Sobisch

    Simon Sobisch - 2022-08-18

    I've found an IBM sample that cannot be used in the testsuite - but as a local test during development: https://www.ibm.com/docs/en/cobol-zos/6.3?topic=examples-example-program-processing-xml (which also showed an issue with PROCESS/CBL statement not handled in column 1-7).

    After two days of work the parsing, codegen and the runtime part outside of using libxml2 in xml_parse is quite complete. I hope to finish that tomorrow, allowing @articuno to do the implementation.

     
  • Simon Sobisch

    Simon Sobisch - 2022-08-20

    Checked in everything but actual XML parsing with [r4686] @articuno: have fun!

     

    Related

    Commit: [r4686]

  • Simon Sobisch

    Simon Sobisch - 2024-01-04
    • assigned_to: Philipp Böhme --> Brian Tiffin
     
  • Simon Sobisch

    Simon Sobisch - 2024-01-04

    Moved over to the next person to become famous :-)

     
  • Simon Sobisch

    Simon Sobisch - 2024-07-23
    • assigned_to: Brian Tiffin --> Ammar Almorsi
     
  • Simon Sobisch

    Simon Sobisch - 2024-07-23

    Ticked moved on, Ammar started this as GSOC project and after the need to withdraw from the project wants to go on with it, likely in September 2024.

     

Log in to post a comment.