Menu

partial PAR files

2001-10-16
2001-11-01
  • Olaf Seibert

    Olaf Seibert - 2001-10-16

    Suppose I have 10 data files, and from one of those I miss 10% (I know which part). Then in theory I only need the corresponding 10% of one PAR file. So in case of uuencoded split .rar files I would need to get only one segment to recreate my missing data segment (or maybe also the first segment of the PAR file to get its header).

    So it could be nice to include some operations in the par command to do some manipulations with files that take into account that files themselves are incomplete and that some data in them is really intended to be somewhere else... this might get a bit complicated to explain to users of course. Thoughts?

     
    • Willem Monsuwe

      Willem Monsuwe - 2001-10-16

      It's not only complicated to explain to the users, it's also going to be hell to implement.
      With a lot of user intervention this might be doable.  At the least, you need to tell the par decoder where, and how large, the 'gaps' are in the data and .par files.
      Also, please note that the segments of a .par file will be offset by the header info, so you'd need two segments to completely cover the missing one segment. (And you need the first segment for the header info, of course)

       
    • Ryan Gallagher

      Ryan Gallagher - 2001-10-16

      Saving partial files using the various newsreaders can be tricky... seg fills are a great idea in theory except that most users don't bother to preserve the uuencoded data from the segments the DO have. 

      creating PAR data on the segments themselves is an intersting idea.  Personlly, I think that since they both would require an investment on the posters part, doing FULL PARs is the easier way to go where the uses are concerned.

      But if someone did want to implement this it would be helpful for tools such as "powerpost" to save a local copy of the uuencoded data.  This would amount to a HUGE huge number of files.  A mirror like client could then be written to create the data upon only the uuencoded portion of these posted files.  The client (on the user end) would have to be able to "ignore" header content etc and only use the uuencoded data of the downloaded articles when regenerating missing parts.

      All in all it's alot of effort on both sides.  To be effectively implemented would likely require active incoropration into tools like PowerPost and the popular download clients like Agent etc. 

      If powerpost could seamlessly generate and post this parity data along with the uuencoded segments... and the readers could seamlessly recognize what these subject lines represented and use that data to recreate missing semgments of any attempted download if needed, then sure, it would be cool.

      Even if you undertook the task of developing your own posting software and a comprehensive newsreader with this concept in mind, you still would never be able to get all users to adopt it due to platform, language, etc etc...

      But at this point i think we'd do good just to sell the concept of full PARS to most of the posters.  *smile*

      That or you could re-write 'uuencode' and get it to understand the concepts of RS parity data, hehehe, trick would be getting the tools to adopt it.

      Binerman

       
    • Stefan Wehlus

      Stefan Wehlus - 2001-10-17

      Ugh, this really would be hard to implement.
      If you really want to work on segment level, you should spent this efford somewhere else:

      When Tobias came to me and asked "Hey, can we use this RAID thingy for posting?", we saw two ways.

      The first one was to work file based - Mirror and the parity volume sets are born out of this idea.

      The second way is (like Binermans thoughts) to create segment based parity information inside the newsposter/-reader.

      The posting program computes parity data out of the segments and posts additional parity segments:
      Foobar.r00 (01/25)
      Foobar.r00 (02/25)
      Foobar.r00 (03/25)
      .
      .
      .
      Foobar.r00 (24/25)
      Foobar.r00 (25/25)
      Foobar.r00 (P1/25)
      Foobar.r00 (P2/25)
      Foobar.r00 (P3/25)

      The newsreader can take the parity segments to restore missing segments. So, in this case, up to 3 missing segments can be restored with the parity segments.
      Because of the smaller "units", this is more bandwith efficient. You don't have to download a whole file because of a lousy missing segment. Problem is, that in case of a "server hickup" there are many segments missing. The parity segments won't work then. Only a field test can show, which system is bettter.
      But there is the big problem: For the parity volume sets we were able to create external clients, which do the trick. For the parity segment system, we have to write new newsposters and -readers. This is way above my league.

      So, like Binerman said, it's too much work to get this done. (Although I like the idea of a new redundancy capable uuencode standard...)

      Anyway, if you want to try the parity segment system, go for it. But without me...

      Stefan

       
    • Nobody/Anonymous

      I think you could get part of this gain without too much effort
      by making the PAR files smaller ...

      Suppose each F column is a file, file 5 has sections missing and
      cannot be rebuilt. Each P column is a parity file but is only a
      third the size of a data file. You will notice that all three
      of the full size P files are broken but for the one third size
      files we actually have enough information to reconstruct the data.

         F1 F2 F3 F4 F5 F6 F7 F8 F9 PA PB PC PD PE PF PG PH PI
      M1 X  X  X  X  .  X  X  X  X  X        X        X
      M2 X  X  X  X  .  X  X  X  X  X        X        X
      M3 X  X  X  X  .  X  X  X  X  X        X        .
      M4 X  X  X  X  .  X  X  X  X     X        X        X
      M5 X  X  X  X  .  X  X  X  X     .        .        X
      M6 X  X  X  X  .  X  X  X  X     X        X        X
      M7 X  X  X  X  .  X  X  X  X        X        X        X
      M8 X  X  X  X  .  X  X  X  X        X        X        X
      M9 X  X  X  X  .  X  X  X  X        X        X        X

      For encoding the only extra task is to decide how may 'sub-parity'
      files we need to create and add them to the list to post.
      For decoding I would expect people to:
      1) Fetch the r[a0-9][r0-9] files.
      2) Come back later ... Ooops bits missing.
      3) Fetch the p[a0-9][r0-9] files.
      4) Come back later ... Fix it please.
      5) Happyness.

      If the additional size of downloading all the pxx files is a problem
      then the user just has to understand a two level numbering scheme.

      Robert de Bath -- robert$ @mayday.cix.co.uk

       
      • Willem Monsuwe

        Willem Monsuwe - 2001-10-23

        This is needlessly complicated.  It could also be achieved by making the original .rar files smaller. (As long as the total stays beneath 255 files)
        As a matter of fact, at least one poster who is using par says he's making smaller rars exactly because of this.

        Also, I think it's theoretically possible to make a client that can handle partial downloads (as long as there's no corrupted parts and there's some overlap and maybe some other restrictions) although it would probably be hellishly slow.

         
    • Nobody/Anonymous

      To coin a phrase SNAFU!

      Okay lets see how the picture looks now:

      ___F1_F2_F3_F4_F5_F6_F7_F8_F9_PA_PB_PC_PD_PE_PF_PG_PH_PI_
      M1_X__X__X__X__.__X__X__X__X__X________X________X________
      M2_X__X__X__X__.__X__X__X__X__X________X________X________
      M3_X__X__X__X__.__X__X__X__X__X________X________.________
      M4_X__X__X__X__.__X__X__X__X_____X________X________X_____
      M5_X__X__X__X__.__X__X__X__X_____.________.________X_____
      M6_X__X__X__X__.__X__X__X__X_____X________X________X_____
      M7_X__X__X__X__.__X__X__X__X________X________X________X__
      M8_X__X__X__X__.__X__X__X__X________X________X________X__
      M9_X__X__X__X__.__X__X__X__X________X________X________X__

       
    • Nobody/Anonymous

      SNAFU! SNAFU! Okay try a third time for an unbroken picture!

      ___F1_F2_F3_F4_F5_F6_F7_F8_F9_PA_PB_PC_PD_PE_PF_PG_PH_PI_
      M1_X__X__X__X__.__X__X__X__X__X________X________X________
      M2_X__X__X__X__.__X__X__X__X__X________X________X________
      M3_X__X__X__X__.__X__X__X__X__X________X________.________
      M4_X__X__X__X__.__X__X__X__X_____X________X________X_____
      M5_X__X__X__X__.__X__X__X__X_____.________.________X_____
      M6_X__X__X__X__.__X__X__X__X_____X________X________X_____
      M7_X__X__X__X__.__X__X__X__X________X________X________X__
      M8_X__X__X__X__.__X__X__X__X________X________X________X__
      M9_X__X__X__X__.__X__X__X__X________X________X________X__

       
    • Nobody/Anonymous

      Well, I was thinking of CD images which for a full CD start at 1500 segments (big messages, data CD) and go over 3500 messages (smaller messages, Video CD) and the fact that some posters are taking the (probably reasonable) position of 'no fills' because of the FEC.

      NB: What are the issues with a 16 bit version of parchive? (As apposed to the current 8bit)

      -RDB

       
      • Willem Monsuwe

        Willem Monsuwe - 2001-10-30

        Most CD images I've seen were posted as a set of rars, ranging from 35*20megs to 140*5megs (roughly)

        I assume you're talking about the number of multipart segments.  Having parity data for those would require tighter integration into the news reading and posting software, I would say.

        The first issue with 16-bit Reed-Solomon coding is that you need 16-bit lookup tables, which is going to be a performance hit.
        The second issue with 16-bit Reed-Solomon coding is that you might run into endianess problems, which is also going to be a performance hit.
        In other words: 16-bit RS is slower than 8-bit RS, (and a bit more hairy to program)

         
    • Russ Walker

      Russ Walker - 2001-10-31

      Why not a  totally new format?

      a rar that has par built in  and call it sar (super archiver)

      then when you make the original sar files you decide
      the redundancy level.   Lower recover level of course means
      smaller sizes  and faster UL/DL.

      The problem I see in alt.bin.multimedia  is NOT missing
      files but missing segments.      That is the problem 95%
      of the time.    PAR files are HUGE since they recover the whole
      rar but I find only a segment or two is ever needed.
      In fact most of the time only 1 segment is missing
      and the original poster never reposts because it is a huge rar.
      and most posters do not know how to save and post just the
      missing segment and do not what to be bothered with
      all the extra steps of doing that anyway.

      Segments are a pain so why not base this on each post
      having just 1 but much larger segment.   Most posters
      post 10,000 lines a segment.  why not 100,000 lines
      or higher?   ( i wonder what the limit is)

      then instead of 1 post with ten  10,000 line  segments
      you get one post with 100,000 lines  with just 1  segment.

      Heck this would help out right now if posters increased
      the current line sizes of current rar posts.

      I DL Charmed episodes and each is 45 segments
      if the poster just increased from 7000 lines  to
      14,000 lines  it would knock 45 down to 23 segments.
      less chance of one segment getting lost.

      which is better 45  segments of a 14 meg rar file

      or

      1 segment of a 14 meg rar file (keeping in mind that almost no one
      reposts segments but the whole thing)

      ok enough rambling : )

      later

      Zep

      later

      Zep

       
      • Stefan Wehlus

        Stefan Wehlus - 2001-10-31

        SAR? Oh no! Not another archiver...

        BTW, this may not be necessary. This weekend, Eugene Roshal (the coder of Rar) contacted me. He is very interested in the parity concept and considers to implement it. He has other priorities now (improving of the compression algorithm), but if we're luky, we might get it with Rar 3.0...

        About that "larger segments" thing:

        Usenet was not build for big messages. A rule of thumb is: If you exeed 5000-7500 lines, the chances that all messages will propagate correctly are going way down...

        10000 lines? Maybe that's why you need that much segfills...

        Stefan

         
        • Ryan Gallagher

          Ryan Gallagher - 2001-10-31

          Cool... caught someone's attention. ;-)

          If winrar incorporates it that would be cool i suppose... Just don't want to see the stand-alone client vanish... because then posters that use other formats besides RAR wouldn't have the parity volume option.

          Try to make the winrar guys collab on the filespec so that it's still open source and others are still able to write a standalone client to handle just the parity volumes.

          Not many people can say they effected some major change in a software app used as widely as winrar.  Way to go guys.

          Are there free registered copies in it for us? *laugh*  ohh well... i'm sure i'll manage somehow *grin*.

          --Binerman

           
          • Willem Monsuwe

            Willem Monsuwe - 2001-10-31

            The filespec needn't change.
            The basic idea is to make a client that handles both the rar and the par format.  Because the rar filespec isn't open (at least I think it isn't), that's prolly gonna be written by the rar guys.  I.e. rar 3.0
            If we're talking to them anyhow, it would be nice if rar recovery could handle missing bits as well as corrupted bits.
            The recovery algo might be a tad more difficult, but it would make life a helluva lot easier.

            P.S.: Note to posters:  Calculate how many segments your rars will be posted as.  Then add enough recovery to the rar to make it possible to recover with a missing segment.
            For example: if your post will be 25 or more segments, you need 4% recovery (which turns out to be one extra segment) to be able to recover from one missing segment.
            Of course, I'm assuming rar uses RS-coding, with checksums to find corrupted data.

             
            • Stefan Wehlus

              Stefan Wehlus - 2001-11-01

              From my recent e-mail conversation with Eugene Roshal:

              me:

              A suggestion:
              A flaw of my format spec is, that the parity vloumes are a few bytes bigger than the input files. So if you create a rar set for floppy disks and make a parity volume for this set, you can't store this volume on a disk. If rar comes with such a feature, the parity volume must fit onto a disk too. This will be the killer feature for non usenet people. Many of them use Rar only if they want to transport something big with floppy. And you know, one disk is always defective...
              So, a "parity disk" would be really cool.
              Hehe, RAID with floppys... ;-)
              <snip>
              Oh, almost forgot this:
              Your recovery record is great, but it won't work, if there are missing byes in the file. You can fix this manually:

              http://riepersnest.tripod.com/rar/index.htm

              It would be nice, if you implement at least a advanced recovery for "case 2". This would fix 50% of the "unrecoverable" files in usenet.

              His reply:

              > A suggestion:
              > A flaw of my format spec is, that the parity vloumes are a few bytes
              > bigger than the input files. So if you create a rar set for floppy disks
              > and make a parity volume for this set, you can't store this volume on a
              > disk. If rar comes with such a feature, the parity volume must fit onto
              > a disk too.

              Yes, you are right, it is important to have all volumes of the same size.

              <snip>

              > Oh, almost forgot this:
              > Your recovery record is great, but it won't work, if there are missing
              > byes in the file. You can fix this manually:

              Yes, it is possible to fix missing or inserted bytes errors and it is already present in my "to do" list. I am not sure if I'll be able to implement it in RAR 3.0, because I plan other serious changes in compression, encryption, interface, switches...

              Stefan

               
        • Nobody/Anonymous

          i almost never lose segments when i post. In fact
          I almost never see anyone ask for reposts
          because i have a GOOD news server that does not barf.

          The person who posts Charmed at 7000 lines loses
          segments all the time.    i see others post 20000
          and i see them all and so does everyone else.

          i think crapy news servers when given more chances
          (i.e. more segemnts ) to choke ,  they choke.

          What is interesting is that when i see a missing segment  EVERONE sees the same missing segment
          so i take this to mean the segment was lost very
          close to the poster if not on that posters very own news server and NOT on my news server.

          have you tested the limit line length?
          I always post at 10,000 and i get no complaints.
          (that is also the default of my newsreader)

          but i see others post at 5000 and half of their
          segments are lost and everyone sees the same missing segments.

          I think I will do some testing and see what the limits really are.   : )

          later

          Zep

           
          • Stefan Wehlus

            Stefan Wehlus - 2001-11-01

            It's not a question about crappy or good, it's about the server policy. Back in the days, some servers and peers were configured to ignore all posts with more than 7500 lines. Today it may be different, but old habits die hard...

            But 100000 lines are definately too much.

            Stefan

             

Log in to post a comment.