can i write binary data to xml file

eninyo
2007-10-18
2013-05-20
  • eninyo
    eninyo
    2007-10-18

    hello,
    i looked inside the code of tinyxml and i saw that the file the is beeing open with LoadFile() function is open for binary data.
    accordingly, i want to add a child with binary data to my xml file.. can i do it???
    because as i know xml file is textual.

    tnx in advance,
    Eli

     
    • Ellers
      Ellers
      2007-10-19

      The question is more "if I was using ANY XML API, what ways can I store binary data in an XML file?".

      The file open mode of binary/text has nothing to do with this question.

      Google gives you some directions: http://www.google.com/search?hl=en&q=binary+data+xml&btnG=Google+Search

      IMO one option is to encode using MIME64 in a text or CData node.
      Another is to reference an external (true binary) source.

      HTH
      Ellers

       
      • eninyo
        eninyo
        2007-10-19

        hey,
        thanks very much for your answer.

        but what is "IMO"?
        moreover, can you please give an example of using MIME64 or CData? how i use it?

        i'm not interesting in using external source.

        again, thanks very much,
        hope to hear from you soon,
        Eli

         
    • Zmey
      Zmey
      2007-10-31

      > but what is "IMO"?
      "In my opinion".

      > moreover, can you please give an example of using MIME64 or CData? how i use it?
      To be more precise, it is called Base64 encoding. You can use it to encode arbitrary binary data into text. This encoding is used in MIME-encoded emails (thus "Mime64").

      TinyXml has no facilities for encoding/decoding Base64 data, but you can use some third-party libraries for that. The idea is like this:

      1. Before putting the data into XML node, you Base64-encode it. Encoded data looks like this: "9MRU9GDQaf==", it has no special characters in it.
      2. When you read the data from node, you Base64-decode it (which converts the text back into binary data).

      Note that Base64 encoding is absolutely illegible ("Hello!" becomes "9MRU9GDQ"). Besides, the length of Base64-encoded data increases by about 33% (1 Kb data becomes ~1.3 Kb of encoded text).

      As for CDATA - I doubt that TinyXml can handle truly binary data properly. In my experiments, TinyXml chokes on the first zero byte (0x00) of your binary data (haven't tested it with CDATA, though).

      I have solved this problem by reworking the interfaces of TinyXml (all functions accept std::string& instead of const char*). On most systems, STL strings work nicely with binary data.

       
      • eninyo
        eninyo
        2007-10-31

        hi zmey,
        thank very much for your reply.

        sorry for my ignorance but can you explain how "hello!" become "9MRU9GDQ"???
        at the mean time i encode every char to his ascii, so if "a" is 65h and "b" is 66h and so on, so i write in my xml for the string "abc" the string "65666700" (8 chars).

        i didn't understand your solution at the end of the reply. what exactly did you change to string? and how STL string works with binary data??? (example if you can).

        you right that the problem is that xml don't know how to deal with the null string terminator (00 = '\0') this is the problem. also it doesn'r work good with any "special" chars.

        i hope to hear from you or from someone else about this issue.

        tnx again,

        Eli

         
    • Zmey
      Zmey
      2007-11-01

      Hello eninyo,

      > sorry for my ignorance but can you explain how "hello!" become "9MRU9GDQ"???

      Base64 encoding takes 3 bytes (this is called "triplet"). 3 bytes make 24 bits. Then, these bits are divided into four groups, 6 bits each:
      10010101 10011010 00101010 -> 100101 011001 101000 101010
      6 bits make 64 different values (thus Base64). Each variant is assigned a letter (000000 -> 'A', 000001 -> 'B', etc). The "alphabet" includes uppercase and lowercase latin letters (26 * 2 = 52), decimal digits (0 .. 9), and two "more or less safe" characters ('/' and '+'). 52 + 10 + 2 = 64.
      If the binary data is not an exact multiple of 3, padding characters '=' are added to the end of the data: "As2W4w==". So in fact, Base64 encoding actually uses 65 characters. :)

      > at the mean time i encode every char to his ascii, so if "a" is 65h and "b" is 66h and so on, so i write in my xml for the string "abc" the string "65666700" (8 chars).

      Thus, you use hex encoding. It is very similar to Base64 (in the sense that you "encode" groups of bits, 4 bits in each group). It is also much simpler to implement.
      The drawback of hex encoding is obvious: it doubles the amount of data. So, if you encode 1 Kb, it becomes 2 Kb of text. Base64 is more difficult to implement, but it offers better "decompression" ratio.

      > i didn't understand your solution at the end of the reply. what exactly did you change to string?

      I have heavily modified TinyXml for my needs. The modifications include changes of interfaces (const char* was replaced by std::string&), changes in input/output of XML text (one routine for all kinds of output), etc. Some parts of the library have been deleted (like TiXmlPrinter class - it is not needed anymore).
      You can get this modified version here: http://sourceforge.net/tracker/index.php?func=detail&aid=1747028&group_id=13559&atid=313559
      Note that the documentation was not corrected, so it does not reflect the changes.

      > and how STL string works with binary data??? (example if you can).

      std::string data;
      char binary_data[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
      // move binary data to std::string
      data.resize(10);
      memcpy(&data[0], binary_data, sizeof(binary_data));
      TiXmlDocument *doc = new TiXmlDocument;
      TiXmlElement  *root = new TiXmlElement("root");
      // Add data to the root element
      root->SetAttribute("data", data);
      // Output the XML text to std::cout
      doc->SetIndent( " " );
      doc->SetNewlines( true );
      TiXmlPrint( doc, std::cout );
      delete doc;

      This code outputs:

      <root data="&#x00;&#x01;&#x02;&#x03;&#x04;&#x05;&#x06;&#x07;&#x08;&#x09;" />

      The symbols are escaped. Loading this XML results in a "root" element which has "data" attribute with the right binary value. Note that the original version of TinyXml will load this XML file, but you will not be able to read the binary data beyond the first zero byte.

       
    • Zmey
      Zmey
      2007-11-01

      Oops, I forgot one line in my example code:
      doc->LinkEndChild(root);

      So, the code must be like this:

      std::string data;
      char binary_data[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
      // move binary data to std::string
      data.resize(10);
      memcpy(&data[0], binary_data, sizeof(binary_data));
      TiXmlDocument *doc = new TiXmlDocument;
      TiXmlElement *root = new TiXmlElement("root");
      // Add data to the root element
      root->SetAttribute("data", data);
      // Add node to the document.
      doc->LinkEndChild(root);
      // Output the XML text to std::cout
      doc->SetIndent( " " );
      doc->SetNewlines( true );
      TiXmlPrint( doc, std::cout );
      delete doc;

       
    • eninyo
      eninyo
      2007-11-05

      Hi Zmey,
      thanks very much for your answer.
      i d/l from the link you wrote in the reply your TinyXml. i'll try it soon.

      i didn't know that when you memcpy chars to std::string it becomes as the output you show. how 0 becomes &#x00 ?
      this makes the file be much larger because any char becomes 5 chars (the &#x preffix).

      can you give also a simle read example of the same data you wrote? how you put it again in a chars array with the correct binary values?

      what is
      doc->SetIndent( " " ); 
      doc->SetNewlines( true ); 
      does? i'm not femiliar with those commands...

      tnx again!!!!
      hope to hear from you soon,
      Eli

       
    • Zmey
      Zmey
      2007-11-12

      Hello Eli,

      > i didn't know that when you memcpy chars to std::string it becomes as the output you show. how 0 becomes &#x00 ?

      The memcpy() itself does not alter the string. I used memcpy to initialize the std::string with binary data. \000 becomes &#00 after TintXml engine escapes it.

      > this makes the file be much larger because any char becomes 5 chars (the &#x preffix).

      That's right, the output is much larger if your binary data contains lots of nonprintable characters. On the other hand, if your data is mostly text with rare inclusions of nonprintable characters, the output will actually be SHORTER than Base64-encoded data. Plus, the text will remain readable.

      > can you give also a simle read example of the same data you wrote? how you put it again in a chars array with the correct binary values?

      Note that this example code lacks error checking.

      // Create empty root document.
      TiXmlDocument doc;
      // You can read the XML text from any std::istream container. Let us read from a file.
      std::ifstream is;
      is.open("data.xml", std::ios_base::in | std::ios_base::binary);
      // Simply feed the data into the document.
      is >> doc;
      // Now get the root element.
      TiXmlElement *root = doc.RootElement();
      // And get the attribute. Note that we use pointer to std::string, not to const char.
      // Also note that you need not call delete on the returned value.
      const std::string *v = root->Attribute("data");
      // Now v is a pointer to std::string (or NULL if there is no attribute called 'data').
      printf("Contents of the 'data' attribute: ");
      for(size_t i = 0; i < v->length(); i++)
      {
      printf("0x%02X ", v->[i]);
      }
      printf("\n");

       
    • Zmey
      Zmey
      2007-11-12

      oops... Forgot to answer the last question.

      > what is 
      > doc->SetIndent( " " ); 
      > doc->SetNewlines( true ); 
      > does? i'm not femiliar with those commands...

      These commands control pretty-printing.

      In original TinyXml code you use TiXmlPrinter class to print the XML data with nice formatting; in my version, you set the pretty-printing options directly on the document.

      TiXmlDocument::SetIndent(const std::string &indent): set the string that is used to for indentation. E.g., after doc->SetIndent("\t"); the output will be indented by tab characters; after doc->SetIndent("  "); the output will be indented by two spaces, etc.

      TiXmlDocument::SetNewlines( true ): if true, newlines will be added to the XML document to delimit nested elements; if false, no newlines will be added at all (so the document will be printed as one long line of text).

      You can achieve different styles by combining SetIndent() and SetNewlines().

      As for my short example above, these commands are no-ops because there is no nesting in the XML text. Just added them mechanically...

      Cheers.

       
    • Ignasi Mateos
      Ignasi Mateos
      2008-09-23

      Hi,

      I'm also interested in embedding some binary data (an image, in fact), in an XML of my own.

      I would like to know how to reference an external binary source or by using a CDATA tag.

      At the moment I'm trying to do it that way:

      <code snippet>
      ...
          FILE *fImage =_tfopen(imageFilename, _T("rb"));
          if (fImage)
          {
              if (fImage->_ptr)
              {
                  TiXmlElement *imageElem = new TiXmlElement("IMG");
                  imageElem->SetAttribute("Data", fImage->_ptr);
                  FPNetElem->LinkEndChild(imageElem); 
              }
                      fclose(fImage);
          }
      ...
      </code snippet>

      But it doesn't work propertly. Does anyone knows what I'm missing or doing wrong?

      Thanks in advance.

      Ignasi Mateos