Menu

#367 Fault CharacterDataHandler if LF starts data

Not a Bug
closed-rejected
nobody
None
5
2014-08-28
2005-03-13
Anonymous
No

When a tag's data start with LF (0x0A), I get
XML_CharacterDataHandler with len=1 and s=0x0A
instead of the actual data that comes after the LF.

On the attached example I get the following calls to
XML_CharacterDataHandler :
- len =1, s=0x0A
- len=6, s="closed"
- len =1, s=0x0A
- len =1, s=0x0A
The last two calls for the function are problematic - I
don't get the actual data that comes after the LF.

The problem happens when the tag's data start with LF
but has more characters after the LF.

on Expat-1.95.8 created from
expat_win32bin_1_95_8.exe

Discussion

  • Nobody/Anonymous

     
  • Karl Waclawek

    Karl Waclawek - 2005-03-13

    Logged In: YES
    user_id=290026

    Your attached example does not have any LF directly
    before or after the string "closed". Please clarify your problem.

     
  • Nobody/Anonymous

    Logged In: NO

    The attached example has LF (0x0A) in the following cases:
    - before the text "sip:pep@example.com"
    - before the text "Full state presence document"
    In these 2 cases I receive XML_CharacterDataHandler with
    len=1 and s=0x0A only. The function is not called for the
    data AFTER the LF (In the example to "Full state presence
    document").
    Is there a way to overcome this problem?

     
  • Mike Rosky

    Mike Rosky - 2005-03-14

    Logged In: YES
    user_id=1238831

    I think that you (original sender) should examine your code,
    when and for which cases you're calling chardata handler.
    Please note that basically you have to (simplified) reset
    chardata buffer at the start element point and accumulate
    chardata value every time character data handler is invoked
    until parser calls the end element handler for that element.

    You can't suppose that chardata handler gets whole value
    in one call - actually in your case it's called twice because of
    CRLF, the first call returns CRLF (and eventually preceding
    chars), the second the rest of chardatas. Chardatas can even
    cross two source readings, which leads to same effect.

    You can try it when you add a character data handler to
    the outline.c example, where chardata handler just prints
    out the current part enclosed in brackets or something.

    Mike

     
  • Karl Waclawek

    Karl Waclawek - 2005-03-14

    Logged In: YES
    user_id=290026

    Just to clarify - in Expat, a contiguous string of characters
    does not necessarily have to be reported through exactly one
    characterData() call-back. Often, line-breaks determine the
    boundary between call-backs. In the attached example, the
    character data for the <note> element will likely be reported
    through three call-backs, as there are two line-breaks.

     
  • Karl Waclawek

    Karl Waclawek - 2005-04-19
    • status: open --> closed-rejected
     
  • Karl Waclawek

    Karl Waclawek - 2005-04-19

    Logged In: YES
    user_id=290026

    Closing this issue - no follow-up from poster.

     

Log in to post a comment.