Menu

Creating PAR2 from AVI/MPG instead of RAR

Peter C
2003-10-06
2004-01-05
  • Peter C

    Peter C - 2003-10-06

    Some posters have told me about an interesting new way of using PAR2 files when posting on UseNet that removes the need for downloaders to use file joiner to recover the original AVI/MPG etc.

    The technique described below does not work if RAR is used as the method of splitting files.

    The normal method of using both PAR1 and PAR2 is to take the original video and split it (using RAR/ACE/HJSplit etc), then create PAR1 or PAR2 files from the split files. When the split files are downloaded (and repaired if necessary using the PAR1 or PAR2 files), they must be recombined to obtain the original video.

    The method I have been told of is as follows:

    1) Create PAR2 files from the original video (using a block size that is either equal to or an exact multiple of the article/segment size).

    2) Create split files from the original video (using a pure splitter such as HJSplit or MasterSplitter and NOT WinRAR). It is vital in this instance to ensure that the split size used is an exact multiple of the par2 block size.

    3) Post the par2 files and the split files.

    When someone downloads the small par2 file plus the split files and uses QuickPar, they will be told that the video file is missing but that sufficient useable data has been found in the split files to be able to carry out a repair without the need for any more par2 files.

    Obviously if some of the split files are incomplete then QuickPar will say that some recovery blocks are required, and once they have been downloaded, repair will be possible.

    The important thing here from the downloaders point of view is that QuickPar both repairs and combines all of the split parts in a single operation.

     
    • Nobody/Anonymous

      This stupid

       
      • Peter C

        Peter C - 2003-10-24

        Why is this idea stupid?

        This idea is in use in at least one newsgroup and is working very nicely.

        One of the problems with creating PAR/PAR2 files from RAR files is that whilst it allows you to "repair" or "reconstruct" damaged or missing RAR files, it does not guarantee that you will then be able to successfully extract the original AVI/MPG from the RAR files.

        I have seen a number of plaintive messages from people saying that they have used the MD5/SFV/PAR/PAR2 file to verify that all of the RAR files are ok, but that WinRAR says they are not. Clearly when the original poster created the RAR files some form of corruption occurred and that was never detected before the files were posted.

        If you use a simple file splitter (such as HJSplit, MasterSplitter etc) and create the PAR2 files from the original AVI/MPG then even if some form of corruption occurs during the splitting operation, QuickPar will be able to repair the effects and give you a good AVI/MPG file.

         
    • Lord

      Lord - 2003-11-01

      Hi Peter.

      I quote you:

      "some form of corruption occurred and that was never detected before the files were posted"

      As far as I know, this problem only occurs when the one that made rar files didn't test them.

      When you create RAR, there is an option to tick so that when WinRAR start its job, at the end of the process it will automatically test all files. Of course you can do it after the process, manually, but it may happens you forget to do it and this is what happens to people that never test their RAR files (rar or ACE or ZIP or whatever). Before posting 700Mb the least is to test your work, there is no hurry to post, take your time.

      Even if you implement the solution you are talking about, posters will have to test their work or at least QuickPAR should implement an option to test files because the splitter, HJ-Split or so, can made an error.

      Also it is same when you create PAR2 files. At the end I always click on "Verify" because an error may have occur and I don't want people complaining about my PAR2 files being corrupted. ;)

      By the way you should think to add an option within QuickPAR to let the user to automatically verify set of PAR2 files. We may forget to verify them ourselves, the eternal problem. ;)

       
      • Peter C

        Peter C - 2003-11-01

        > Also it is same when you create PAR2 files. At the end I always click on
        > "Verify" because an error may have occur and I don't want people complaining
        > about my PAR2 files being corrupted. ;)

        > By the way you should think to add an option within QuickPAR to let the user
        > to automatically verify set of PAR2 files. We may forget to verify them
        > ourselves, the eternal problem. ;)

        I definitely have automatic verification after par2 creation on my list of things to do, but there is a problem.

        The problem is that QuickPar's verification procedure is designed to verify the source files, and not to verify the par2 files.

        As you may know, a par2 file is a collection of packets of data, and every packet has its own checksum so that they can be individually checked for corruption.

        The problem is that whilst the checksum can be used to determine if a packet has been corrupted, it cannot detemine if the data in the packet was good in the first place (i.e. when it was first created).

        When you use QuickPar to verify, the packets containing verification data get used, but those containing repair data do not.

        If the verification of the original source files succeeds, then that implies that the verification packets in the par2 files must also contain good data. But since the repair packets never get used, this does not prove that they contain good data as well.

        So, whilst getting QuickPar to verify the source files gaurantees that downloaders will be able to verify as well, they might not be able to repair.

        The only way to prove that the recovery data is actually good recovery data is to have QuickPar carry out a repair that uses all of the data. At present, to do this you must remove an appropriate quantity of source files so that QuickPar needs all of the recovery blocks to do a repair and let it do it. I plan to add a function to allow QuickPar to simulate such a repair.

        The problem with this is of course that the repair will take as long to carry out as it took to create the par2 files in the first place. I am therefore looking into some of the mathematics involved to see if there is a way to verify that the recovery data is good without the need to actually do a repair.

        PS I don't want anyone getting the mistaken impression that this problem is new. The problem also exists with PAR1. When FSRaid verifies freshly created PAR files, it confirms that they can be used to verify source files, but not that they can be used to repair.

         
        • Lord

          Lord - 2003-11-02

          "But since the repair packets never get used, this does not prove that they contain good data as well."

          The checksum is built from the repair data? You say that Quickpar use this checksum to quickly see if everything is good. I therefore conclude that if the checksum is good so the repair data are.

          On the contrary you say that while quickpar was making PAR2 files, the repair data may have been corrupted and then the checksum built from those damaged repair data is itself misleading the user. This one verifies his PAR2 files, quickpar answers everything is good but in fact the repair data are bad.

          Is that what I understood or am I wrong? Because i'm a bit at a loss when you say:

          "The problem is that QuickPar's verification procedure is designed to verify the source files, and not to verify the par2 files."

          QuickPAR doesn't verify any PAR2 files? I'm lost. ;)

          "The problem also exists with PAR1. When FSRaid verifies freshly created PAR files, it confirms that they can be used to verify source files, but not that they can be used to repair."

          I thought as thousands of people I imagine that FSRaid or SmartPAR were verifying the whole PAR files freshly created? I see.

          Of course the risk to get corrupted repair data is very very tiny and occurs generally on machines not very stable, bad RAM, etc... But as the risk exists you want to eradicate it, if possible. However as you say whether you check the whole files and it takes ages except on high-end systems. I don't imagine one creating PAR2 files while 30 minutes running a verifying task process just after taking so long. ;)

          However, let takes WinRAR. When you add a 700Mb movie and cut it into several pieces, when WinRAR test the full set of archives, it doesn't take ages. Why should it be different with PAR2 files? Unless WinRAR verify only checksums too?

          An idea but maybe impossible to make: you take the original(s) file(s) and create a CRC checksum (something similar to quicksfv). Then you create PAR2 files as we do it now.

          Is it possible to quickly find the original checksum by adding all the small checksums together? Maybe its mathematically impossible so show me the corner. ;)

           
          • Peter C

            Peter C - 2003-11-02

            At present, the only confirmed cause of failed repairs is when either the person doing the repair has bad RAM, or the person who created the par2 files has bad RAM.

            When QuickPar creates a recovery packet and writes it to a par2 file, the following steps are involved:

            1) A block of data is read from one of the source files into a memory buffer.

            2) The md5 hash and crc32 checksum of the block of data are calculated and stored in the verification packet for that file.

            3) For each recovery block being created, the data in the input buffer is processed through a 16 bit lookup table and the result is xor'ed with the data in the recovery block.

            4) Once all block of source data have been processed, the md5 hash of every block of recovery data is calculated and written to the header of each recovery packet.

            5) Each recovery packet is written out to the par2 file.

            6) Finally the checksums of the verification packets are calculated, and they are written to the par2 files as well.

            At each of these four steps, a memory fault could corrupt the data.

            A memory fault at either step (1) or step (2) will result in par2 files which report that the original source files are corrupt.

            A memory fault at step (3) will result is par2 files which will report that the source files are good, but they will be unable to repair damaged files.

            A memory fault at steps (4), (5), or (6) will result in par2 files which QuickPar reports as damaged.

            The only way to be 100% certain that there has been no corruption introduced at step (3) is to actually use the par2 files to do a repair where every recovery block is used.

            I am currently looking into mathematics to see if there is some shortcut way of checking that the data in the recovery blocks is good or not.

             
            • Peter C

              Peter C - 2003-11-02

              OK, I've looked at the maths and have the following result:

              source blocks * reed solomon = recovery blocks
              recovery blocks * crc checksum = recovery checksums

              and also

              source blocks * crc checksum = source checksums
              source checksums * reed solomon = recovery checksums

              i.e. the two sets of operations give the same values.

              So all I need to do is use both of the two independent methods of calculating the checksums of the recovery blocks and confirm that they give the same result. If the results are different, then there must have been an in memory corruption during one of the two calculations.

               
              • Lord

                Lord - 2003-11-02

                Thanks for the great explanations. :)

                 
                • Peter C

                  Peter C - 2003-11-02

                  I've now written code for this (in a test program), and it seems to work very nicely.

                  I will incorporate it into the main program so it will be in 0.8. The code will operate both when creating par2 files and also when repairing using either par1 or par2 files. You'll get a message such as "Create Failed: Memory fault detected".

                  I also plan to write a basic memory tester which will do a quick cursory check of the main output buffer before it starts reading data from the source files.

                   
    • Bill Davidson

      Bill Davidson - 2003-11-08

      Hmmm.  I was suprised by this.  I tried it out though (Quickpar 0.5) and as you say, it works perfectly.

      The part that's throwing me is how does Quickpar know to look in the split files for data blocks?  Does it look for split files specifically or does it scan all files in the directory for missing blocks?

      I tried it with par2cmdline (0.2) and it does not do this.  A seperate file joiner is still needed for par2cmdline users.

       
      • Peter C

        Peter C - 2003-11-08

        The reason QuickPar finds split files when trying to verify to original is simply because I wrote it to automatically scan all files with names of the form filename.ext.NNN, filename.ext.NNN-MMM, and filename-NNN.ext to see if the contain data for the file.

        The reason I wrote it to do this is so that it would automatically detect the parts of an incomplete file downloaded using BNR (which uses that filename convention).

         
    • Mark C

      Mark C - 2003-11-10

      no fork intended but it seemed relevant.

      I have noticed a new trend to post .torrent files with some .avi's on usenet.

      To my eye this is an excellent idea especially since bittorrent can recover from incomplete downloads.

      However (and the point of tagging in this thread) is that this only works if the torrent is listed the same as the usenet post. So far I have never seen this

      i.e. the USENET post is typcal RAR and PAR set but the torrent is the source avi.

      So anything that PAR2 can do to standardise this divergence before it gets started proper the better.

      I leave it to the more knowing to suggest if this is sensible and any potential solutions.

       
      • Peter C

        Peter C - 2003-11-10

        This is an interesting concept, but it is completely pointless if the .torrent file is for the AVI file rather than the RAR set.

        If you have a RAR set for a single file and you are missing one of the RAR files from the middle of the set, then you can only extract the beginning of the file. Even with the option to retain damaged files, WinRAR won't permit you to extract beyond the missing RAR file (even if the RAR file is created with no compression).

        On the other hand, if the .torrent is created for the RAR files, it should work quite nicely. You download all the complete and incomplete RAR files, use the .torrent to download everything you are missing (giving you a good RAR set), and then extract the AVI.

        Of course the poster does have to make the RAR set available via BT in addition to posting it on UseNet. They also have to be very carefull not to post the .torrent file to early or else they may find people simply using that to download the whole thing rather than using UseNet to get the bulk of the set and BT only for fills.

         
    • Mark C

      Mark C - 2003-11-10

      see your point.

      so far I am seeing only torrents of the avi although there is no doubt it could be for a set of rars just as easily. If we stick to movie posts for a minute...perhaps I am wrong but does rar actually gain you anything here? You get very little compression as mpg is compressed already and for recovery PAR2 is way superior to rar anyway? Could the file set not be cut by the same tool as creates the PAR's (in this case Quickpar?).

      I dont want to be suggesting things just for the sake of it but there seems to my eye to be a covergence of thoughts here that could be interesting.

      Anything that minimises the need for reposts and gives another mechanism for sharing and bandwidth distribution cant be a bad thing.

       
      • Peter C

        Peter C - 2003-11-10

        The aim of both PAR1 and PAR2 has been to minimise the need for reposts.

        Their great advantage is that you can see before you start to download that there will be sufficient recovery data available to affect a repair (providing that to many of the files do not expire before you can download them).

        In no PAR1 or PAR2 files were posted at all and you were expected to use BT to complete a file, then there would be no way to tell in advance whether or not you would ever get to complete a file. Looking at the uploader/downloaders graphs for BT clearly shows that early downloaders are ok, but late downloaders are much less likely to find others online that can peer the file. This would ultimately leed to them posting fill requests.

         
      • Bill Davidson

        Bill Davidson - 2003-11-12

        Yeah, rar gets very little compression (maybe 5%?) on most binary files posted because those files are already compressed.  yEnc (vs. uuencode/base64) provides more bandwidth savings on USENET postings than rar.  Some of these formats also have built in error tollerance which means they could still mostly work with some missing pieces in the middle but with rar, once you get one bad piece, you're done.  Split files really make more sense for most binary USENET postings.  A video file does not have to be perfect and even if it does, PAR2 can make sure it is.

         
    • Mark C

      Mark C - 2003-11-12

      so there is validity in thinking that the par tool could package a whole post into split files and par2 then.

      Peter can you give us your learned opinion? Is this worth considering? or is it fixing a problem that is already fixed?

       
      • Peter C

        Peter C - 2003-11-12

        I will be implementing a split function.

        When used, the par2 files will be created from the unsplit version of the files, splitting will take place on block boundaries, and when verifying: if no errors are detected, the "Repair" button will be re-labelled as "Join".

        This will remove the need for the poster to use a splitter and the downloader to use an unsplitter.

        If the poster still wants to use RAR for compression, they would be advised not to use it to split.

        I may also consider adding an UnRAR feature to QuickPar, but if I do it will use the UnRAR.DLL from RarSoft rather than writing my own code.

         
    • Mark C

      Mark C - 2003-11-13

      good news.

      when looking at join can you consider a scenario i see quite alot...

      real example. This morning I had a set of files that were incomplete. On manual examination i found that I hard enough parts to complete but some files had not been joined by my downloading software. In this particular case it was two par1 files.

      Could quickpar look for and deal with joining parts in this way as well?

       
      • Peter C

        Peter C - 2003-11-13

        fat,

        Could you give me an example of how the par1 files ended up being named?

        If possible I want to enable QuickPar to recognise the way newsreaders name the files in these circumstances.

         
    • Mark C

      Mark C - 2003-11-16

      peter its BNR so

      sample.par.01-02
      sample.par.03
      sample.par.04-23

      etc

       
    • Nobody/Anonymous

      Hi peter, just a quick point (which you may have already considered already) :o)

      when you said:

      "...and when verifying: if no errors are detected, the "Repair" button will be re-labelled as "Join"."

      ... could this "repair" button/feature also offer an ability to choose the destination folder for where the full "joined" file would end up?

      eg, just in case the current drive location is too full, it sure would save people having to move files around just to make enough space for the final joined version :o)

       
      • Peter C

        Peter C - 2004-01-04

        The "repair" button now has three possible labels:

        1) "Repair" if any data blocks are missing and need to be reconstructed using the recovery data in the par2 files.

        2) "Rejoin" if all data blocks have been found but they are not arranged correctly.

        3) "Renamed" if all data blocks have been found and are arranged correctly but some of the files have the wrong name.

        I will have to think about the problem of running out of disk space.

         
    • Nobody/Anonymous

      ahh ok, sure no problem.
      & thanks for reading though in the meantime. :o)

       

Log in to post a comment.