#12 Coherency Test doesn't parse certain ascii from source_data

open
Herbert Law
None
5
2007-09-19
2007-09-12
Brendan Rehon
No

The Coherency Test will not correctly resolve a filepath in the <contributor><source_tag> with an ampersand ascii code.

For example, if you export from the following file:

C:\spaces & ampersand test\box.dae

which is written as the following <source_data> tag

<source_data>
file:///C:/spaces%20%26%20ampersand%20test/box.dae
</source_data>

then the Coherency Test returns the following error:

ERROR: Failed to parse file:///C:/spaces %26 ampersand test/box.dae

To reproduce this issue, open test.zip and copy "spaces & ampersand test" into C:\.

Included are a Maya 8.5 scene (C:\spaces & ampersand test\box.mb) which, when exported with ColladaMaya 3.04D, generates (C:\spaces & ampersand test\box.dae).

This box.dae is opened (via File->Open not drag and drop or import) in Maya and re-exported as "C:\spaces & ampersand test\box_reexport.dae".

Steps to Reproduce:
1) Drag and drop box_reexport.dae (in C:\spaces & ampersand test\ along with box.dae) into the Coherency Test.

Discussion

  • Brendan Rehon
    Brendan Rehon
    2007-09-12

    <source_data> with ampersand test

     
    Attachments
  • Herbert Law
    Herbert Law
    2007-09-19

    Logged In: YES
    user_id=1603796
    Originator: NO

    Steve
    The Dom corrently convert only ' ' to %20
    Other charactors that should be escaped will generate DOM error which will pass to the coherencytest
    The Dom might need to convert all the missing escaped charators.

    Brendon
    Can you investigate what other charactors should be escaped and compile a list for steve?

    Thanks,
    Herbert

     
  • Herbert Law
    Herbert Law
    2007-09-19

    • assigned_to: nobody --> sceahklaw
     
  • Brendan Rehon
    Brendan Rehon
    2007-09-19

    Logged In: YES
    user_id=1686792
    Originator: YES

    To clarify:

    Do we want

    (a) a list of characters that the DOM needs to escape when written to a URI?
    or
    (b) a list of escaped characters that the DOM needs to convert from the URI?

    Because if you're asking for (b) -- according to RFC 2396 -- "a URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics." So anytime the DOM encounters a % character, it should always considered the start of an escaped sequence.

    As for (a), here's my list from my understanding of RFC 2396 and RFC 2732 (for the square brackets):

    [Reserved characters; escape these if these appear in a directory name of the URI]
    reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," | "[" | "]"

    [Excluded US ASCII characters; always escape]
    control = <US-ASCII coded characters 00-1F and 7F hexadecimal>
    space = <US-ASCII coded character 20 hexadecimal>
    delims = "<" | ">" | "#" | "%" | <">
    unwise = "{" | "}" | "|" | "\" | "^" | "`"

    Also, all non-printable characters in a US-ASCII coded character set must be escaped.

    You can also escape unreserved characters if the context prevents the unreserved characters to be written.

    According to http://www.w3.org/TR/2001/WD-charmod-20010126/#sec-URIs, the exact method of escaping should work as follows:

    1. Each disallowed character is converted to UTF-8, resulting in one or more bytes.
    2. The resulting bytes are escaped using the URI escaping mechanism (that is, each byte is converted to %HH, where HH is the byte value expressed using hexadecimal notation).
    3. The original character is replaced by the resulting character sequence.