I have found bug, using Jacksum 1.7.0 under combined checksum start.
Tried in many combination and with key "-a all", but result of the distortion tree:tiger(2) checksum always is present, when not one.
Example console screen:
D:\>java -version
java version "1.5.0_11"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_11-b03)
Java HotSpot(TM) Client VM (build 1.5.0_11-b03, mixed mode)
thank you for the report, however it is not a bug, it is expected behaviour.
The default encoding of Tiger/Tiger2 is BASE32. If you use "all" or a plus character (more than one algorithm), the default encoding of each single algorithm is ignored and a hexadecimal encoding is used instead for all algorithms (normalized output).
It is documented:
> java -jar jacksum.jar -h -a
-a algo [...] As soon as "all" or a plus character is used,
the output is normalized with a hex checksum and a decimal
filesize. Examples: "sha+", "md5+"
[...]
You can change the default encoding with option -E.
Guilty, forgive for carelessness. Though me several has distressed such behaviour of the program. Consider that absence of the possibility to specify the output of the control total for each algorithm in mixed mode (exactly either as change the output by default) vastly narrows the applicability. But after all he allows to reduce the expenseses on calculations, when for file simultaneously it is necessary to get many hashes.
Today I have found more sad detail - can not get hash tree:tiger on big file, merit more 1 gigabyte. As a result only message:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
It is tested on miscellaneous platform (Windows, Linux) - given problem me is discovered always only with tree:tiger algorithm.
Possible I newly something have wrong understood?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
With respect to the 1st issue, I could imagine something like that
-a <algo1>+<algo2>+<algo3>+...
-E <encoding1>,<encoding2>,<encoding3>,...
to be able to set the encoding (including default) for each algorithm in mixed mode.
This feature would have impact on other features. Consider a checksum of 9999 - nobody can tell whether it is hex or decimal for example. Therefore new tokens (#ENCODING and #ENCODING{i}) must be added for the option -F. Also if option -m or -c has been choosen for verifying the integrity of a combined digest, Jacksum must take the possibility of multiple encodings into account. Did I forget something else?
Would it be helpful for you to have the feature above? Can you please share your business justification?
With respect to the OutOfMemoryError, ...
While many algorithms in Jacksum have a predictable, pretty small requirement of memory (even Terabytes are not a problem), the tree:tiger/tree:tiger2 algorithm must keep a structure into the memory which is a bit dependent on the filezise. But fortunately, the current implementation (it is actually derived from the public domain TigerTree reference implementation) has room for improvements. For now please use the following workaround. Set a higher Java heap for the JVM by specifying the Java option called -Xmx. Example:
java -Xmx256m -jar jacksum.jar -a tree:tiger 1GBfile.dat
I summarize, there are two new feature requests:
1) Let select the encoding for each algorithm in mixed mode
2) Decrease the memory requirement of the TigerTree class
What do you think?
Thanks,
-jonelo
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was just about to start a new thread regarding the first issue; multiple encodings in mixed mode. I'm playing around with disk cataloging software packages that use an SQL database for their back end. Unfortunately, they only support one or two hash algorithms internally (if at all) and often not the ones I want. However, some support custom user fields. I was planning on using Jacksum to output the desired hashes to a file. Then write a script that would use that file to plug the results into those fields.
Using your example above, the command would look something like this:
I also thought of a couple other suggestions pertaining to your concerns about the impact allowing multiple encodings. What if the the algorithm "-a" switch also took parameters for the encoding instead of a separate switch? For example a command that looked like this:
jacksum -a sha1,hex,base32+md5 -r -F
"#ALGONAME{i}#CHECKSUM{i} #FILENAME{NAME}" test.txt
The output of the above might look something like:
In this way you don't need separate #ENCODING tokens. The only time the encoding is displayed is when the 'default' is overridden. Or you could have the #ALGONAME token behave similarly to the way the #FILENAME token operates with regards to {NAME} and {PATH}. However in the case of #ALGONAME you might use {i}, and {ENCODING} or just {e}. Maybe even a combination of {e} and {i} depending on the desired formatting {i}, {e}, {i,e}, {e,i}. This also solves the problem of differentiating between the different encoding methods.
As for how to get around the the issue of using the "-m" and "-c" switches.... Perhaps the output could resemble this:
Jacksum: Meta-Info: version=1.7.0;algorithm=multi;filesep=\;encoding=multi;
Jacksum: Comment: created with Jacksum 1.7.0, http://jacksum.sourceforge.net
Jacksum: Comment: created on Wed Jul 25 21:43:56 EDT 2007
Jacksum: Comment: os name=Windows XP;os version=5.1;os arch=x86
Jacksum: Comment: jvm vendor=Sun Microsystems Inc.;jvm version=1.6.0_01-b06
Jacksum: Comment: user dir=I:\
sha1,hex,base32 0334c7bddaf8dc704542b86e8e2e6a7d97d7cd1b,AM2MPPO27DOHARKCXBXI4LTKPWL5PTI3 35249 TEST.TXT
If I were a bit more experienced with programming, and in particular Java, I'd offer to lend a hand... Sadly, I can only make suggestion at this time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Huge gratitude for participation, also forgive for a delay with the answer.
1) Let select the encoding for each algorithm in mixed mode
Completely I support your offers, including circumstances from necessity of introduction of additional attributes of an option -F.
It is unique, that I would like to see changed in behaviour of the program - not change of a format of a conclusion of result for separate algorithms in the mixed mode as though they have been received in separate starts if these formats have not been requested specially in an option -E <encoding1>,<encoding2>,<encoding3>,...
As, probably, it would be not bad to provide application of a uniform format to all algorithms in the mixed mode if that has been specified uniform in an option -E <encoding for all>.
Practical application to all above, concerning the mixed mode, for me is obvious - program structures for interaction with Jacksum become simpler. As I approve - use of the mixed mode ощутимо saves machine resources and time in most cases its applications (I hope it will not worsen in new versions).
2) Decrease the memory requirement of the TigerTree class
Greeting and gratitude! Basic application Jacksum for me - the server appendix during with a low priority (background) where high requirements to used memory extremely are not desirable.
As a whole I wish to express you profound gratitude for your program Jacksum! And support TigerTree has appeared then when and for me it became necessity. :)
Best regards
- Oleg Dyakun
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have found bug, using Jacksum 1.7.0 under combined checksum start.
Tried in many combination and with key "-a all", but result of the distortion tree:tiger(2) checksum always is present, when not one.
Example console screen:
D:\>java -version
java version "1.5.0_11"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_11-b03)
Java HotSpot(TM) Client VM (build 1.5.0_11-b03, mixed mode)
D:\>java -jar jacksum.jar -a tree:tiger -F "#ALGONAME = #CHECKSUM | #FILESIZE" MyFile.avi
tree:tiger = SQCJEJBDCT4AKPAR5C3EUBHKZD24OLW6SE42E3A | 4975347
D:\>java -jar jacksum.jar -a md5+ed2k+tree:tiger -F "#ALGONAME{i} = #CHECKSUM{i}" MyFile.avi
md5 = d355fa7a1edbc4db3e9e1a46b0a1f8be
ed2k = e2879467258cb2ad2193e5c8a64d18d8
tree:tiger = 940492242314f8053c11e8b64a04eac8f5c72ede9139a26c
Thanks!
Hi Oleg,
thank you for the report, however it is not a bug, it is expected behaviour.
The default encoding of Tiger/Tiger2 is BASE32. If you use "all" or a plus character (more than one algorithm), the default encoding of each single algorithm is ignored and a hexadecimal encoding is used instead for all algorithms (normalized output).
It is documented:
> java -jar jacksum.jar -h -a
-a algo [...] As soon as "all" or a plus character is used,
the output is normalized with a hex checksum and a decimal
filesize. Examples: "sha+", "md5+"
[...]
You can change the default encoding with option -E.
Examples:
Hex encoding:
-------------
> java -jar jacksum.jar -a tree:tiger -E hex -F "#ALGONAME = #CHECKSUM | #FILESIZE" jacksum.jar
tree:tiger = 221f6651e24bbbcc777b8c16f25362117a37a9b62213930d | 199398
> java -jar jacksum.jar -a md5+ed2k+tree:tiger -F "#ALGONAME{i} = #CHECKSUM{i}" jacksum.jar
md5 = 9666f5e2632d05b806e782d7d50855e8
ed2k = d647ffde863e43e00601a62cbb3133fc
tree:tiger = 221f6651e24bbbcc777b8c16f25362117a37a9b62213930d
Base32 encoding
---------------
> java -jar jacksum.jar -a tree:tiger -F "#ALGONAME = #CHECKSUM | #FILESIZE" jacksum.jar
tree:tiger = EIPWMUPCJO54Y533RQLPEU3CCF5DPKNWEIJZGDI | 199398
> java -jar jacksum.jar -E base32 -a md5+ed2k+tree:tiger -F "#ALGONAME{i} = #CHECKSUM{i}" jacksum.jar
md5 = SZTPLYTDFUC3QBXHQLL5KCCV5A
ed2k = 2ZD77XUGHZB6ABQBUYWLWMJT7Q
tree:tiger = EIPWMUPCJO54Y533RQLPEU3CCF5DPKNWEIJZGDI
Regards
-jonelo
Thank you for support!
Guilty, forgive for carelessness. Though me several has distressed such behaviour of the program. Consider that absence of the possibility to specify the output of the control total for each algorithm in mixed mode (exactly either as change the output by default) vastly narrows the applicability. But after all he allows to reduce the expenseses on calculations, when for file simultaneously it is necessary to get many hashes.
Today I have found more sad detail - can not get hash tree:tiger on big file, merit more 1 gigabyte. As a result only message:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
It is tested on miscellaneous platform (Windows, Linux) - given problem me is discovered always only with tree:tiger algorithm.
Possible I newly something have wrong understood?
Thank you for the feature requests.
With respect to the 1st issue, I could imagine something like that
-a <algo1>+<algo2>+<algo3>+...
-E <encoding1>,<encoding2>,<encoding3>,...
to be able to set the encoding (including default) for each algorithm in mixed mode.
This feature would have impact on other features. Consider a checksum of 9999 - nobody can tell whether it is hex or decimal for example. Therefore new tokens (#ENCODING and #ENCODING{i}) must be added for the option -F. Also if option -m or -c has been choosen for verifying the integrity of a combined digest, Jacksum must take the possibility of multiple encodings into account. Did I forget something else?
Would it be helpful for you to have the feature above? Can you please share your business justification?
With respect to the OutOfMemoryError, ...
While many algorithms in Jacksum have a predictable, pretty small requirement of memory (even Terabytes are not a problem), the tree:tiger/tree:tiger2 algorithm must keep a structure into the memory which is a bit dependent on the filezise. But fortunately, the current implementation (it is actually derived from the public domain TigerTree reference implementation) has room for improvements. For now please use the following workaround. Set a higher Java heap for the JVM by specifying the Java option called -Xmx. Example:
java -Xmx256m -jar jacksum.jar -a tree:tiger 1GBfile.dat
I summarize, there are two new feature requests:
1) Let select the encoding for each algorithm in mixed mode
2) Decrease the memory requirement of the TigerTree class
What do you think?
Thanks,
-jonelo
I was just about to start a new thread regarding the first issue; multiple encodings in mixed mode. I'm playing around with disk cataloging software packages that use an SQL database for their back end. Unfortunately, they only support one or two hash algorithms internally (if at all) and often not the ones I want. However, some support custom user fields. I was planning on using Jacksum to output the desired hashes to a file. Then write a script that would use that file to plug the results into those fields.
Using your example above, the command would look something like this:
jacksum -O <output file> -a md5+ed2k+sha1+sha1 -E hex,hex,hex,base32 -r -F
"#FILENAME{PATH} #FILENAME{NAME} #FILESIZE #ALGONAME{i}:#CHECKSUM{i}" -w <directory>
I also thought of a couple other suggestions pertaining to your concerns about the impact allowing multiple encodings. What if the the algorithm "-a" switch also took parameters for the encoding instead of a separate switch? For example a command that looked like this:
jacksum -a sha1,hex,base32+md5 -r -F
"#ALGONAME{i}#CHECKSUM{i} #FILENAME{NAME}" test.txt
The output of the above might look something like:
hex sha1:0334c7bddaf8dc704542b86e8e2e6a7d97d7cd1b test.txt
base32 sha1:AM2MPPO27DOHARKCXBXI4LTKPWL5PTI3 test.txt
md5:fbaae6f0d5476d816b95b9870da72086 test.txt
In this way you don't need separate #ENCODING tokens. The only time the encoding is displayed is when the 'default' is overridden. Or you could have the #ALGONAME token behave similarly to the way the #FILENAME token operates with regards to {NAME} and {PATH}. However in the case of #ALGONAME you might use {i}, and {ENCODING} or just {e}. Maybe even a combination of {e} and {i} depending on the desired formatting {i}, {e}, {i,e}, {e,i}. This also solves the problem of differentiating between the different encoding methods.
As for how to get around the the issue of using the "-m" and "-c" switches.... Perhaps the output could resemble this:
Jacksum: Meta-Info: version=1.7.0;algorithm=multi;filesep=\;encoding=multi;
Jacksum: Comment: created with Jacksum 1.7.0, http://jacksum.sourceforge.net
Jacksum: Comment: created on Wed Jul 25 21:43:56 EDT 2007
Jacksum: Comment: os name=Windows XP;os version=5.1;os arch=x86
Jacksum: Comment: jvm vendor=Sun Microsystems Inc.;jvm version=1.6.0_01-b06
Jacksum: Comment: user dir=I:\ sha1,hex,base32 0334c7bddaf8dc704542b86e8e2e6a7d97d7cd1b,AM2MPPO27DOHARKCXBXI4LTKPWL5PTI3 35249 TEST.TXT
If I were a bit more experienced with programming, and in particular Java, I'd offer to lend a hand... Sadly, I can only make suggestion at this time.
Issue 2) has become feature request # 1693872 which I'm going to fix for the next release.
I'm waiting for your input on 1)
Thanks,
-jonelo
Huge gratitude for participation, also forgive for a delay with the answer.
1) Let select the encoding for each algorithm in mixed mode
Completely I support your offers, including circumstances from necessity of introduction of additional attributes of an option -F.
It is unique, that I would like to see changed in behaviour of the program - not change of a format of a conclusion of result for separate algorithms in the mixed mode as though they have been received in separate starts if these formats have not been requested specially in an option -E <encoding1>,<encoding2>,<encoding3>,...
As, probably, it would be not bad to provide application of a uniform format to all algorithms in the mixed mode if that has been specified uniform in an option -E <encoding for all>.
Practical application to all above, concerning the mixed mode, for me is obvious - program structures for interaction with Jacksum become simpler. As I approve - use of the mixed mode ощутимо saves machine resources and time in most cases its applications (I hope it will not worsen in new versions).
2) Decrease the memory requirement of the TigerTree class
Greeting and gratitude! Basic application Jacksum for me - the server appendix during with a low priority (background) where high requirements to used memory extremely are not desirable.
As a whole I wish to express you profound gratitude for your program Jacksum! And support TigerTree has appeared then when and for me it became necessity. :)
Best regards
- Oleg Dyakun
Hi Oleg, it has been a while, ...
I have moved to github and I have just released Jacksum 3.
See also https://github.com/jonelo/jacksum
Release notes and download: https://github.com/jonelo/jacksum/releases/tag/v3.0.0
With respect to your request 1) the solution with Jacksum 3 is this:
Issue 2) has been fixed as well.
Thanks again for the feature request ... good thing takes time, you know ;-)
Have fun & kind regards,
Johann