Menu

Convert binary data file into text file

tunagendut
2007-01-10
2012-09-26
  • tunagendut

    tunagendut - 2007-01-10

    Hi,

    C/C++ newbee poster here... I'm trying to convert a binary data file into a readable textfile - I suppose an ascii format. Would anyone have a suggestion for me how I can do this?

    Part of the C progam I'm currently writing looks like this:

    void filecopy(FILE filein, FILE fileout)
    {
    int c;

    while ((c = getc(filein)) != EOF)
    {
    putc(c, fileout);
    }
    }

    which will do in it's current form a straight copy - it takes an input file and copies the data from the input file into a new output file. However, the input file contains binary data and what's going into the output file needs to be converted into readable text.. what might be a good way to do this?!

    btw, I'm writing this for a windows environment.

    Thanks.

     
    • Anonymous

      Anonymous - 2007-01-14

      >> Clifford, putc and fputc look very similar ...
      My mistake, I was thinking of putch() or putchar().

      >> What I do not understand is this: they say both putc and fputc write characters to the stream, but when examples are given they use an integer - as the definition of the functions use integers as the first input parameter.

      It is a quirk of the C language that a character constant, such as 'a' has type int (try printing sizeof('a') to demonstrate this). So I guess that the int type is used for consistence with this. In C++ incidentally character constants have type char. Either way, only the least significant 8 bits of teh value passed will be stored.

      ISO coded? ISO stands for International Standards Organisation, which governs thousands of standards, so I am not sure what you mean by ISO coded.

      To interpret the data in the file, you will also need to know the byte order - i.e. for an integer, which byte is the MSB, since this may not be the same as the natural byte order of the machine generating to decoding the data.

      You will need to create functions to de-deriaise teh various data types in the file. For example, taking the second field in your example (because it is easiest!), and assuming the first byte is the MSB:

      int read24BitBinary( FILE* fp )
      {
      int value = 0 ;
      char byte ;

      byte = (char)fgetc( fp ) ;
      value |= byte << 16 ;
      
      byte = (char)fgetc( fp ) ;
      value |= byte << 8 ;
      
      byte = (char)fgetc( fp ) ;
      value |= byte ;
      

      }

      If you then use this function to read the value represented by pos 1-3, then you can output it as ASCII text as follows:

      fprintf( outfile, "%d", read24BitBinary( infile ) ) ;

      Create similar functions for the other data field types and simply call them in the appropriate order determined by the file format.

      Clifford

       
    • Anonymous

      Anonymous - 2007-01-10

      That is not real code is it? putc() takes only one parameter and outputs to stdout - fputc() surely?

      All data in a file is 'binary', even if it is ASCII! All calling it ASCII does is infer some meaning to the data, eg. 00010000B (0x20) is interpreted as a <space> character in ASCII.

      So the question is what does your binary data represent and how do you want it converted? Do you simply mean convert to hexadecimal ASCII pairs?:

      fprint( fileout, "%2.2x", c ) ;

      or possibly a series of '1' and '0' characters? Or are you expecting something more?

      The difference between 'binary' and 'text' is that one is merely information, the other is data. Data is information with some meaning or interpretation applied. Information is merely a signal containing variation, in this case two states 1 or 0.

      Clifford

       
    • Nobody/Anonymous

      http://www.cplusplus.com/reference/clibrary/cstdio/fopen.html

      more specifically ...

      But won't the content/format, that the original file you are reading was made with, affect what you want to do with each char byte that you read?

      Why not first do a little dumping of the .exe file that you want to read ... to the screen ... so that you might then see what it 'looks like.'

       
      • Anonymous

        Anonymous - 2007-01-10

        You are perhaps reading too much into what has been asked (you may also be right). But you have assumed that this 'binary' file is a .exe file. That may not be true, it is one possible interpretation. If the OP intends to convert executables back to source code, that is not possible. It is a one way street.

        Moreover if you were intending that he use fopen() to open a file in binary or text mode, then that too is probably an over interpretation of what has been asked. Those modes simply determine whether '\n' CR+LF beinterpreted as '\n' + '\r' on input or simply as '\n', and vice versa on output.

        Really teh OP needs to clarify what he intends.

        Clifford

         
    • tunagendut

      tunagendut - 2007-01-14

      Thanks for the replies...

      Clifford, putc and fputc look very similar and are more or less doing the same thing according to:
      http://www.cplusplus.com/reference/clibrary/cstdio/putc.html
      http://www.cplusplus.com/reference/clibrary/cstdio/fputc.html

      both write a character to the stream and advances in the position... in the example they give in the above reference for putc -it can't compile ('n'is not declared)- they say that "putc may be implemented as a macro" - any advantages to that would be performance perhaps?

      What I do not understand is this: they say both putc and fputc write characters to the stream, but when examples are given they use an integer - as the definition of the functions use integers as the first input parameter.

      The funny thing is when i take my file, the one i would like to decode, and run the above copy code with some debug in it:

      void filecopy(FILE filein, FILE fileout)
      int c, i;

      i = 0;

      while ((c = getc(filein)) != EOF)
      {
      fputc(c, fileout);
      i++;
      }

      printf("How many times written to the stream? - %d\n",i);
      }

      .. I can see that it loops exactly the same amount of times as the number of bytes it writes to the output file. .. As the character datatype holds exactly 1 byte, it looks like that eventhough it takes an integer as an input, which stores in 2 bytes (..right?), it writes characters to the stream - 1 byte, not two bytes at the time?!

      ..

      With regard to the binary data file that i try to convert - a mate of mine sent me a doco that describes the binary data records in the file, ie. position and number of the bytes and type of decoding - a mix of ISO, binary and binary coded decimals. Eg.:
      position 0 - 1 byte, Numeral - ISO coded,
      position 1 - 3 bytes, Numeral - Binary coded,
      position 4 - 8 bytes, Digit string - Binary coded Decimals,
      position 12 etc.

      Would anyone have perhaps a hint for me on how I can get hold of the 0's and 1's that are held in each of the bytes and/or a way so that they can be converted into meaningful data for the human eye?

      Cheers!

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.