Menu

tree:tiger(2) as a combined checksum bug

Oleg
2007-03-30
2021-09-04
  • Oleg

    Oleg - 2007-03-30

    I have found bug, using Jacksum 1.7.0 under combined checksum start.
    Tried in many combination and with key "-a all", but result of the distortion tree:tiger(2) checksum always is present, when not one.

    Example console screen:

    D:\>java -version
    java version "1.5.0_11"
    Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_11-b03)
    Java HotSpot(TM) Client VM (build 1.5.0_11-b03, mixed mode)

    D:\>java -jar jacksum.jar -a tree:tiger -F "#ALGONAME = #CHECKSUM | #FILESIZE" MyFile.avi
    tree:tiger = SQCJEJBDCT4AKPAR5C3EUBHKZD24OLW6SE42E3A | 4975347

    D:\>java -jar jacksum.jar -a md5+ed2k+tree:tiger -F "#ALGONAME{i} = #CHECKSUM{i}" MyFile.avi
    md5 = d355fa7a1edbc4db3e9e1a46b0a1f8be
    ed2k = e2879467258cb2ad2193e5c8a64d18d8
    tree:tiger = 940492242314f8053c11e8b64a04eac8f5c72ede9139a26c

    Thanks!

     
    • Johann N. Löfflmann

      Hi Oleg,

      thank you for the report, however it is not a bug, it is expected behaviour.

      The default encoding of Tiger/Tiger2 is BASE32. If you use "all" or a plus character (more than one algorithm), the default encoding of each single algorithm is ignored and a hexadecimal encoding is used instead for all algorithms (normalized output).

      It is documented:

      > java -jar jacksum.jar -h -a
          -a algo       [...] As soon as "all" or a plus character is used,
                        the output is normalized with a hex checksum and a decimal
                        filesize. Examples: "sha+", "md5+"
                        [...]

      You can change the default encoding with option -E.

      Examples:

      Hex encoding:
      -------------
      > java -jar jacksum.jar -a tree:tiger -E hex -F "#ALGONAME = #CHECKSUM | #FILESIZE" jacksum.jar
      tree:tiger = 221f6651e24bbbcc777b8c16f25362117a37a9b62213930d | 199398

      > java -jar jacksum.jar -a md5+ed2k+tree:tiger -F "#ALGONAME{i} = #CHECKSUM{i}" jacksum.jar
      md5 = 9666f5e2632d05b806e782d7d50855e8
      ed2k = d647ffde863e43e00601a62cbb3133fc
      tree:tiger = 221f6651e24bbbcc777b8c16f25362117a37a9b62213930d

      Base32 encoding
      ---------------
      > java -jar jacksum.jar -a tree:tiger -F "#ALGONAME = #CHECKSUM | #FILESIZE" jacksum.jar
      tree:tiger = EIPWMUPCJO54Y533RQLPEU3CCF5DPKNWEIJZGDI | 199398

      > java -jar jacksum.jar -E base32 -a md5+ed2k+tree:tiger -F "#ALGONAME{i} = #CHECKSUM{i}" jacksum.jar
      md5 = SZTPLYTDFUC3QBXHQLL5KCCV5A
      ed2k = 2ZD77XUGHZB6ABQBUYWLWMJT7Q
      tree:tiger = EIPWMUPCJO54Y533RQLPEU3CCF5DPKNWEIJZGDI

      Regards
      -jonelo

       
    • Oleg

      Oleg - 2007-03-31

      Thank you for support!

      Guilty, forgive for carelessness. Though me several has distressed such behaviour of the program. Consider that absence of the possibility to specify the output of the control total for each algorithm in mixed mode (exactly either as change the output by default) vastly narrows the applicability. But after all he allows to reduce the expenseses on calculations, when for file simultaneously it is necessary to get many hashes.

      Today I have found more sad detail - can not get hash tree:tiger on big file, merit more 1 gigabyte. As a result only message:

      Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

      It is tested on miscellaneous platform (Windows, Linux) - given problem me is discovered always only with tree:tiger algorithm.

      Possible I newly something have wrong understood?

       
      • Johann N. Löfflmann

        Thank you for the feature requests.

        With respect to the 1st issue, I could imagine something like that

        -a <algo1>+<algo2>+<algo3>+...
        -E <encoding1>,<encoding2>,<encoding3>,...

        to be able to set the encoding (including default) for each algorithm in mixed mode.

        This feature would have impact on other features. Consider a checksum of 9999 - nobody can tell whether it is hex or decimal for example. Therefore new tokens (#ENCODING and #ENCODING{i}) must be added for the option -F. Also if option -m  or -c has been choosen for verifying the integrity of a combined digest, Jacksum must take the possibility of multiple encodings into account. Did I forget something else?

        Would it be helpful for you to have the feature above? Can you please share your business justification?

        With respect to the OutOfMemoryError, ...
        While many algorithms in Jacksum have a predictable, pretty small requirement of memory (even Terabytes are not a problem), the tree:tiger/tree:tiger2 algorithm must keep a structure into the memory which is a bit dependent on the filezise. But fortunately, the current implementation (it is actually derived from the public domain TigerTree reference implementation) has room for improvements. For now please use the following workaround. Set a higher Java heap for the JVM by specifying the Java option called -Xmx. Example:

        java -Xmx256m -jar jacksum.jar -a tree:tiger 1GBfile.dat

        I summarize, there are two new feature requests:

        1) Let select the encoding for each algorithm in mixed mode
        2) Decrease the memory requirement of the TigerTree class

        What do you think?

        Thanks,
        -jonelo

         
        • wanpakukozou

          wanpakukozou - 2007-07-26

          I was just about to start a new thread regarding the first issue; multiple encodings in mixed mode.  I'm playing around with disk cataloging software packages that use an SQL database for their back end.  Unfortunately, they only support one or two hash algorithms internally (if at all) and often not the ones I want.  However, some support custom user fields.  I was planning on using Jacksum to output the desired hashes  to a file.  Then write a script that would use that file to plug the results into those fields. 

          Using your example above, the command would look something like this:

              jacksum -O <output file> -a md5+ed2k+sha1+sha1 -E hex,hex,hex,base32 -r -F
              "#FILENAME{PATH} #FILENAME{NAME} #FILESIZE #ALGONAME{i}:#CHECKSUM{i}" -w <directory>

          I also thought of a couple other suggestions pertaining to your concerns about the impact allowing multiple encodings.  What if the the algorithm "-a" switch also took parameters for the encoding instead of a separate switch?  For example a command that looked like this:

              jacksum -a sha1,hex,base32+md5 -r -F
              "#ALGONAME{i}#CHECKSUM{i} #FILENAME{NAME}" test.txt

          The output of the above might look something like:

              hex sha1:0334c7bddaf8dc704542b86e8e2e6a7d97d7cd1b test.txt

              base32 sha1:AM2MPPO27DOHARKCXBXI4LTKPWL5PTI3 test.txt

              md5:fbaae6f0d5476d816b95b9870da72086 test.txt

          In this way you don't need separate #ENCODING tokens.  The only time the encoding is displayed is when the 'default' is overridden.  Or you could have the #ALGONAME token behave similarly to the way the #FILENAME token operates with regards to {NAME} and {PATH}.  However in the case of #ALGONAME you might use {i}, and {ENCODING} or just {e}.  Maybe even a combination of {e} and {i} depending on the desired formatting {i}, {e}, {i,e}, {e,i}.  This also solves the problem of differentiating between the different encoding methods. 

          As for how to get around the the issue of using the "-m" and "-c" switches....  Perhaps the output could resemble this:

              Jacksum: Meta-Info: version=1.7.0;algorithm=multi;filesep=\;encoding=multi;
              Jacksum: Comment: created with Jacksum 1.7.0, http://jacksum.sourceforge.net
              Jacksum: Comment: created on Wed Jul 25 21:43:56 EDT 2007
              Jacksum: Comment: os name=Windows XP;os version=5.1;os arch=x86
              Jacksum: Comment: jvm vendor=Sun Microsystems Inc.;jvm version=1.6.0_01-b06
              Jacksum: Comment: user dir=I:\     sha1,hex,base32 0334c7bddaf8dc704542b86e8e2e6a7d97d7cd1b,AM2MPPO27DOHARKCXBXI4LTKPWL5PTI3 35249 TEST.TXT

          If I were a bit more experienced with programming, and in particular Java, I'd offer to lend a hand...  Sadly, I can only make suggestion at this time.

           
    • Johann N. Löfflmann

      Issue 2) has become feature request # 1693872 which I'm going to fix for the next release.
      I'm waiting for your input on 1)

      Thanks,
      -jonelo

       
    • Oleg

      Oleg - 2007-04-04

      Huge gratitude for participation, also forgive for a delay with the answer.

      1) Let select the encoding for each algorithm in mixed mode

      Completely I support your offers, including circumstances from necessity of introduction of additional attributes of an option -F.

      It is unique, that I would like to see changed in behaviour of the program - not change of a format of a conclusion of result for separate algorithms in the mixed mode as though they have been received in separate starts if these formats have not been requested specially in an option -E <encoding1>,<encoding2>,<encoding3>,...

      As, probably, it would be not bad to provide application of a uniform format to all algorithms in the mixed mode if that has been specified uniform in an option -E <encoding for all>.

      Practical application to all above, concerning the mixed mode, for me is obvious - program structures for interaction with Jacksum become simpler. As I approve - use of the mixed mode ощутимо saves machine resources and time in most cases its applications (I hope it will not worsen in new versions).

      2) Decrease the memory requirement of the TigerTree class

      Greeting and gratitude! Basic application Jacksum for me - the server appendix during with a low priority (background) where high requirements to used memory extremely are not desirable.

      As a whole I wish to express you profound gratitude for your program Jacksum! And support TigerTree has appeared then when and for me it became necessity. :)

      Best regards
      - Oleg Dyakun

       
  • Johann N. Löfflmann

    Hi Oleg, it has been a while, ...

    I have moved to github and I have just released Jacksum 3.

    See also https://github.com/jonelo/jacksum
    Release notes and download: https://github.com/jonelo/jacksum/releases/tag/v3.0.0

    With respect to your request 1) the solution with Jacksum 3 is this:

    jacksum -s \n -a sha1+sha1+md5 -F "#ALGONAME{0}/hex: #CHECKSUM{0,hex} #FILENAME{NAME}#SEPARATOR#ALGONAME{1}/base32: #CHECKSUM{1,base32} #FILENAME{NAME} #SEPARATOR#ALGONAME{2}/base64: #CHECKSUM{2,base64} #FILENAME{NAME}#SEPARATOR" *.txt
    

    Issue 2) has been fixed as well.

    Thanks again for the feature request ... good thing takes time, you know ;-)

    Have fun & kind regards,
    Johann

     

Log in to post a comment.