C/C++ newbee poster here... I'm trying to convert a binary data file into a readable textfile - I suppose an ascii format. Would anyone have a suggestion for me how I can do this?
Part of the C progam I'm currently writing looks like this:
which will do in it's current form a straight copy - it takes an input file and copies the data from the input file into a new output file. However, the input file contains binary data and what's going into the output file needs to be converted into readable text.. what might be a good way to do this?!
btw, I'm writing this for a windows environment.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2007-01-14
>> Clifford, putc and fputc look very similar ...
My mistake, I was thinking of putch() or putchar().
>> What I do not understand is this: they say both putc and fputc write characters to the stream, but when examples are given they use an integer - as the definition of the functions use integers as the first input parameter.
It is a quirk of the C language that a character constant, such as 'a' has type int (try printing sizeof('a') to demonstrate this). So I guess that the int type is used for consistence with this. In C++ incidentally character constants have type char. Either way, only the least significant 8 bits of teh value passed will be stored.
ISO coded? ISO stands for International Standards Organisation, which governs thousands of standards, so I am not sure what you mean by ISO coded.
To interpret the data in the file, you will also need to know the byte order - i.e. for an integer, which byte is the MSB, since this may not be the same as the natural byte order of the machine generating to decoding the data.
You will need to create functions to de-deriaise teh various data types in the file. For example, taking the second field in your example (because it is easiest!), and assuming the first byte is the MSB:
int read24BitBinary( FILE* fp )
{
int value = 0 ;
char byte ;
Create similar functions for the other data field types and simply call them in the appropriate order determined by the file format.
Clifford
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2007-01-10
That is not real code is it? putc() takes only one parameter and outputs to stdout - fputc() surely?
All data in a file is 'binary', even if it is ASCII! All calling it ASCII does is infer some meaning to the data, eg. 00010000B (0x20) is interpreted as a <space> character in ASCII.
So the question is what does your binary data represent and how do you want it converted? Do you simply mean convert to hexadecimal ASCII pairs?:
fprint( fileout, "%2.2x", c ) ;
or possibly a series of '1' and '0' characters? Or are you expecting something more?
The difference between 'binary' and 'text' is that one is merely information, the other is data. Data is information with some meaning or interpretation applied. Information is merely a signal containing variation, in this case two states 1 or 0.
Clifford
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
But won't the content/format, that the original file you are reading was made with, affect what you want to do with each char byte that you read?
Why not first do a little dumping of the .exe file that you want to read ... to the screen ... so that you might then see what it 'looks like.'
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2007-01-10
You are perhaps reading too much into what has been asked (you may also be right). But you have assumed that this 'binary' file is a .exe file. That may not be true, it is one possible interpretation. If the OP intends to convert executables back to source code, that is not possible. It is a one way street.
Moreover if you were intending that he use fopen() to open a file in binary or text mode, then that too is probably an over interpretation of what has been asked. Those modes simply determine whether '\n' CR+LF beinterpreted as '\n' + '\r' on input or simply as '\n', and vice versa on output.
Really teh OP needs to clarify what he intends.
Clifford
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
both write a character to the stream and advances in the position... in the example they give in the above reference for putc -it can't compile ('n'is not declared)- they say that "putc may be implemented as a macro" - any advantages to that would be performance perhaps?
What I do not understand is this: they say both putc and fputc write characters to the stream, but when examples are given they use an integer - as the definition of the functions use integers as the first input parameter.
The funny thing is when i take my file, the one i would like to decode, and run the above copy code with some debug in it:
void filecopy(FILE filein, FILE fileout)
int c, i;
printf("How many times written to the stream? - %d\n",i);
}
.. I can see that it loops exactly the same amount of times as the number of bytes it writes to the output file. .. As the character datatype holds exactly 1 byte, it looks like that eventhough it takes an integer as an input, which stores in 2 bytes (..right?), it writes characters to the stream - 1 byte, not two bytes at the time?!
..
With regard to the binary data file that i try to convert - a mate of mine sent me a doco that describes the binary data records in the file, ie. position and number of the bytes and type of decoding - a mix of ISO, binary and binary coded decimals. Eg.:
position 0 - 1 byte, Numeral - ISO coded,
position 1 - 3 bytes, Numeral - Binary coded,
position 4 - 8 bytes, Digit string - Binary coded Decimals,
position 12 etc.
Would anyone have perhaps a hint for me on how I can get hold of the 0's and 1's that are held in each of the bytes and/or a way so that they can be converted into meaningful data for the human eye?
Cheers!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
C/C++ newbee poster here... I'm trying to convert a binary data file into a readable textfile - I suppose an ascii format. Would anyone have a suggestion for me how I can do this?
Part of the C progam I'm currently writing looks like this:
void filecopy(FILE filein, FILE fileout)
{
int c;
while ((c = getc(filein)) != EOF)
{
putc(c, fileout);
}
}
which will do in it's current form a straight copy - it takes an input file and copies the data from the input file into a new output file. However, the input file contains binary data and what's going into the output file needs to be converted into readable text.. what might be a good way to do this?!
btw, I'm writing this for a windows environment.
Thanks.
>> Clifford, putc and fputc look very similar ...
My mistake, I was thinking of putch() or putchar().
>> What I do not understand is this: they say both putc and fputc write characters to the stream, but when examples are given they use an integer - as the definition of the functions use integers as the first input parameter.
It is a quirk of the C language that a character constant, such as 'a' has type int (try printing sizeof('a') to demonstrate this). So I guess that the int type is used for consistence with this. In C++ incidentally character constants have type char. Either way, only the least significant 8 bits of teh value passed will be stored.
ISO coded? ISO stands for International Standards Organisation, which governs thousands of standards, so I am not sure what you mean by ISO coded.
To interpret the data in the file, you will also need to know the byte order - i.e. for an integer, which byte is the MSB, since this may not be the same as the natural byte order of the machine generating to decoding the data.
You will need to create functions to de-deriaise teh various data types in the file. For example, taking the second field in your example (because it is easiest!), and assuming the first byte is the MSB:
int read24BitBinary( FILE* fp )
{
int value = 0 ;
char byte ;
}
If you then use this function to read the value represented by pos 1-3, then you can output it as ASCII text as follows:
fprintf( outfile, "%d", read24BitBinary( infile ) ) ;
Create similar functions for the other data field types and simply call them in the appropriate order determined by the file format.
Clifford
That is not real code is it? putc() takes only one parameter and outputs to stdout - fputc() surely?
All data in a file is 'binary', even if it is ASCII! All calling it ASCII does is infer some meaning to the data, eg. 00010000B (0x20) is interpreted as a <space> character in ASCII.
So the question is what does your binary data represent and how do you want it converted? Do you simply mean convert to hexadecimal ASCII pairs?:
fprint( fileout, "%2.2x", c ) ;
or possibly a series of '1' and '0' characters? Or are you expecting something more?
The difference between 'binary' and 'text' is that one is merely information, the other is data. Data is information with some meaning or interpretation applied. Information is merely a signal containing variation, in this case two states 1 or 0.
Clifford
http://www.cplusplus.com/reference/clibrary/cstdio/fopen.html
more specifically ...
But won't the content/format, that the original file you are reading was made with, affect what you want to do with each char byte that you read?
Why not first do a little dumping of the .exe file that you want to read ... to the screen ... so that you might then see what it 'looks like.'
You are perhaps reading too much into what has been asked (you may also be right). But you have assumed that this 'binary' file is a .exe file. That may not be true, it is one possible interpretation. If the OP intends to convert executables back to source code, that is not possible. It is a one way street.
Moreover if you were intending that he use fopen() to open a file in binary or text mode, then that too is probably an over interpretation of what has been asked. Those modes simply determine whether '\n' CR+LF beinterpreted as '\n' + '\r' on input or simply as '\n', and vice versa on output.
Really teh OP needs to clarify what he intends.
Clifford
Thanks for the replies...
Clifford, putc and fputc look very similar and are more or less doing the same thing according to:
http://www.cplusplus.com/reference/clibrary/cstdio/putc.html
http://www.cplusplus.com/reference/clibrary/cstdio/fputc.html
both write a character to the stream and advances in the position... in the example they give in the above reference for putc -it can't compile ('n'is not declared)- they say that "putc may be implemented as a macro" - any advantages to that would be performance perhaps?
What I do not understand is this: they say both putc and fputc write characters to the stream, but when examples are given they use an integer - as the definition of the functions use integers as the first input parameter.
The funny thing is when i take my file, the one i would like to decode, and run the above copy code with some debug in it:
void filecopy(FILE filein, FILE fileout)
int c, i;
i = 0;
while ((c = getc(filein)) != EOF)
{
fputc(c, fileout);
i++;
}
printf("How many times written to the stream? - %d\n",i);
}
.. I can see that it loops exactly the same amount of times as the number of bytes it writes to the output file. .. As the character datatype holds exactly 1 byte, it looks like that eventhough it takes an integer as an input, which stores in 2 bytes (..right?), it writes characters to the stream - 1 byte, not two bytes at the time?!
..
With regard to the binary data file that i try to convert - a mate of mine sent me a doco that describes the binary data records in the file, ie. position and number of the bytes and type of decoding - a mix of ISO, binary and binary coded decimals. Eg.:
position 0 - 1 byte, Numeral - ISO coded,
position 1 - 3 bytes, Numeral - Binary coded,
position 4 - 8 bytes, Digit string - Binary coded Decimals,
position 12 etc.
Would anyone have perhaps a hint for me on how I can get hold of the 0's and 1's that are held in each of the bytes and/or a way so that they can be converted into meaningful data for the human eye?
Cheers!