Menu

#9 Non-blocking parsing interface for processing streaming data  Edit

alpha
open
9
2014-11-18
2014-06-11
No

I need an ability to parse the EXI
stream as the data comes in, e.g. not have a complete stream
pre-buffered. Unfortunately I have not found an EXIP parsing mode which would be able to support this interface. All I found was the IOStream interface, which
hints at the need to be able to immediately supply more bytes as the
parser needs them. This is pretty much impossible if the stream is
coming from network -- without having a thread-per-connection type of setup.

Are there any plans to support a push-parser type interface?

Discussion

  • Rumen Kyusakov

    Rumen Kyusakov - 2014-06-11
    • labels: --> API, parsing
    • assigned_to: Rumen Kyusakov
    • Group: bugfix_pre-alpha --> alpha
     
  • Rumen Kyusakov

    Rumen Kyusakov - 2014-06-11

    EXIP assumes blocking IOStream interface which is not an option is many cases.

    The simplest and probably the best solution is to reuse the EXIP_BUFFER_END_REACHED error code.
    Then you do the usual iterative parsing of events but when you get EXIP_BUFFER_END_REACHED you stop the processing and resume when more data is available and it is added to BinaryBuffer:

    while(tmp_err_code == EXIP_OK)
    {
        tmp_err_code = parseNext(&testParser);
    }
    
        if(tmp_err_code == EXIP_BUFFER_END_REACHED)
        { 
                /* Stop processing here, when data is available add
                   it to BinaryBuffer and resume the while loop
                   over parseNext() */
        }
    

    It is fairly simple to implement but there are quite a lot of caveats:
    -> the new data needs to be properly added to the BinaryBuffer (taking care of already full buffer etc.)
    -> the parseNext(); call that cause EXIP_BUFFER_END_REACHED might have updated the EXIP context (such as bufferIndx, bitPointer, currAttr, currNonTermID, currQNameID etc.) before reaching the end of the buffer so the context before each parseNext() (or better inside parseNext()) should be stored and restored in case of EXIP_BUFFER_END_REACHED error.

     
  • Rumen Kyusakov

    Rumen Kyusakov - 2014-11-18

    The non-blocking parsing and serializing interface for processing streaming data is partially implemented. As for now it only works for schema-informed EXI streams that do not have any deviations. The full implementation of this interface would require that the context of both the dynamic grammars and the string tables is stored before each processing step and restored in case of EXIP_BUFFER_END_REACHED. This however is very complicated and is doubtful what benefits it would bring. So for now it is not on the TODO list of features.

     
  • Rumen Kyusakov

    Rumen Kyusakov - 2014-11-18
    • Priority: 1 --> 9
     

Anonymous
Anonymous

Add attachments
Cancel