Menu

#17 Extension is not compatible to MIME type because of charset

v1.0 (example)
closed-works-for-me
None
5
2022-10-13
2022-05-27
Matthieu S.
No

Dear Markus

It appears that scans fail for rule violation "Extension (.xml) is not compatible to MIME type (text/xml)" on some XML upload and or download.
After searching and comparing, it seems that this issue appears when/because the xml file is utf-8.

E.g, got an xml test file, which pass the scan with no problems :

file -i my_little_test.xml
my_little_test: text/xml; charset=us-ascii

If I add some non-ascii chars in the file such as accentuated letters it starts to get rejected by Clam :

file -i my_little_test.xml
my_little_test: text/xml; charset=utf-8

I guess that the extension is set to match a MIME type definition that contains the charset.

Could you share any advice on this ? Do you think there's some workaround possible/available or do we need to wait for a fix ?

Thanks in advance,
best regards,
matthieu

Discussion

  • Markus Strehle

    Markus Strehle - 2022-05-29

    Hello Matthieu,
    in the output and tests you see that difference comes from file utilities which is used to determine the MIME type.
    Since some week clamsap supports now simply wildcards and therefore I recommend that you set text/xml* as allowed or blocked MIME type and then this includes both charsets.

    If you want distinguish between both then define the types in full length, but if you dont want then use either text/ if you want allow all text types or only text/xml.

    Hope this helps
    best regards,
    -markus

     
  • Matthieu S.

    Matthieu S. - 2022-05-29

    Hello Markus,
    Thanks for the sunday recommendation, it's very helpful.
    text/xml* will be my way to go but I'll first need to update from 0.103.3 to 0.104.1 at least.

    However, your suggestion to "distinguish between both" (which is not my main goal, but could nevertheless be pretty useful), leads to another question : can I achieve to define the 2 types in full length without the wildcard while transaction vscanprofile doesn't seem to accept charset additions as is (Message no. VSCAN082) ? As per the input rules stated in this message :

    Input characters for subtype: the lowercase letters a to z, the digits 0 to 9, the period ("."), the plus sign ("+"), and the hyphen ("-") are permitted. One placeholder ("*") can be defined in any place.

    So the semicolon (";") and the equal ("=") won't pass the input checks. Here the period (".") and the plus sign ("+") suggested by the documentation are meant as the allowed characters in MIME-types naming convention and not as some sort of ABAP regex specials, so I have no choices than going for text/xml*utf-8 and text/xml*us-ascii, and then again , I'll need my clamsap update first, I guess?

    Anyway, thanks already for the advice.
    Best regards/MfG
    matthieu

     
  • Markus Strehle

    Markus Strehle - 2022-05-31

    Hello Matthieu,
    ok, so in ABAP I now that there are restrictions in input field. I kow about length and characters restrictions, but I always recommend to report this as bug to support component BC-SEC-VIR.
    The check was defined according to https://datatracker.ietf.org/doc/html/rfc2045 and here charset is defined for text types, text/plain; charset="us-ascii".
    I have found VSA spec, see https://ftp.gwdg.de/pub/misc/sapdb/icc/nw-vsi/VSA-Specification.pdf so in Java the MIME type with charset is possible and in ABAP it would be possible if you could define it.

    With wildcard it would work and I always recommend to use latest clamsap library.

    best regards,
    -markus

     
  • Matthieu S.

    Matthieu S. - 2022-06-10

    Dear Markus,
    Thanks you very much for the time and effort. And sorry for the delay. Unfortunately I cannot confirm that 0.104.2 and the ending wildcard (neither text/xml* ,
    nor text/xml*utf-8) solved the issue.

    Interestingly, when testing both of those attached files in VSCANTEST, the us-ascii one get accepted and recognized as application/xml, so there's still some part in the mechanic that evades me.

    (both files comes from the same source, I've just replaced one char (A) with an accented A)

    best regards,
    matthieu

     

    Last edit: Matthieu S. 2022-06-10
  • Markus Strehle

    Markus Strehle - 2022-07-29

    Hi, I will re-use code from file to determine utf-8 so that it will end both in application/xml
    The internal function is only valid for ASCII
    I will do a fix in next weeks during summer vacation.
    regards

     
  • Markus Strehle

    Markus Strehle - 2022-08-07
    • status: open --> pending
     
  • Markus Strehle

    Markus Strehle - 2022-08-07

    commited a fix, are you able to verify it from source or due you need a library ? if you need a library for which OS ?

     
  • Matthieu S.

    Matthieu S. - 2022-08-08

    Dear Markus
    Nice vacations ;)
    A lib for SUSE would be great, I would say.
    Thanks !
    best summer regards,
    matthieu

     
  • Matthieu S.

    Matthieu S. - 2022-08-08

    Thanks Markus. I'll let you know but I fear I won't be able to provide a feedback before september.
    best regards,
    matthieu

     
  • Matthieu S.

    Matthieu S. - 2022-08-31

    Dear Markus
    Thanks you very much for the fix. I can now confirm this issue is solved with release 0.104.3

    Thnaks again,
    best regards,
    matthieu

     
  • Markus Strehle

    Markus Strehle - 2022-10-13
    • status: pending --> closed-works-for-me
     
  • Markus Strehle

    Markus Strehle - 2022-10-13

    close because of last feedback, thanks

     

Log in to post a comment.

MongoDB Logo MongoDB