Re: [parchive-devel] Par2 Par3 comments.

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello, parchive developers.
 I am Yutaka Sawada.

 from Michael Nahas 2010-06-15
> Also, the Par2 spec included optional packets for containing input file slices.
> Do we want to push to make those packets mandatory?

 This is my question, too.
Do I put the packet on PAR3 spec or not ?
If you see HTML source of my proposal file "par3_spec_prop.htm",
you find "Input File Slice packet" is commented out.
While containing file slice in PAR file is an interesting idea,
there is no PAR2 client which support this packet.
This may be a proof of the useless ?

 While one say "use TAR or ZIP to archive many files.",
the one say "do not use RAR to split large files." at same time.
I feel this is odd.
>From a programer's point of view,
I like this slice packet idea, and want to implement.
>From a user's point of view,
I will not use this packet myself...

 Because PAR2 has strong file slice searching method,
saving splited files and PAR2 file on same time may be enough.
PAR2 client can find a slice samely,
even if it is in simple splited files, in splited RAR files,
or in PAR files as "Input File Slice packet".
The difference is the total size.
"Input File Slice packet" requires additional size,
packet header and body header. (64 + 24 + 0-3 for alignment)

 "Input File Slice packet" may be useful,
when an user want to set more than 100% redundancy.
For example, when there are 2000 input file slices,
one want to save them as PAR files with 105% redundancy.
A) cretae 2100 "Recovery Slice packets".
B) cretae 2000 "Input File Slice packets" and 100 "Recovery Slice packets".
Both will create similar size PAR files,
but the creating speed is different.

 from Michael Nahas 2010-06-16
> The order should go: Use cases -> Goals -> Spec -> Code
> The order should not go: Code -> anything else.
> Please refer to your design as "my proposal for Par3".
> We should focus on problems that cannot be fixed using the current spec.

 I agree with you. It should be so.
The problem is that, nobody progress PAR3 design.
At first I thought that someone write PAR3 spec and 
I helped him by creating sample implementation.
For 2 years, no one write... everybody are busy.
Should I (and users) wait for more years or forever ?

 Now, I show the possibility from a programers side, PAR3 will do what.
It is only a proposal. (Thanks for teaching proper English word)
Then, who can be an editor ?

> Can you state the size of overhead for your proposal for Par3?

Par3Main is 22 from header + 4~18
Par3FileDescription is 22~23 from header + 2~32 + length of filename (say 20 bytes?)
Par3InputChecksum is 22~25 from header + 1~3 + 12*InputSize/SliceSize
Par3RecoveryPacket is 22~30 from header + 2~3 + SliceSize

As variable length format, the size is not fixed.
possible Par3Size (without PAR2 packets for compatibility) =  
  N* (40
       + 75*#ofInputFiles
       + 28*#ofInputFiles + 12*InputSize/SliceSize)
       + 33*RecoverySize/SliceSize + RecoverySize
=
  N*(40 + 103*#ofInputFiles + 12*InputSize/SliceSize)
     + 33*RecoverySize/SliceSize + RecoverySize

Par3's overhead per-file is around 60% of PAR2.
When a user create PAR2 file with 90% efficiency,
the efficiency will become 90/((100-90)*0.6+90) = 93% for PAR3.

 If PAR2 packets are added to PAR3 file,
possible Par3Size (with PAR2 packets for compatibility) =  
  N*(40 + 103*#ofInputFiles + 12*InputSize/SliceSize)
     + 33*RecoverySize/SliceSize + RecoverySize
     + 30 + (76 + 228*#ofInputFiles + 20*InputSize/SliceSize)

However the efficiency become worse for N=1or2, 
it will better for N=4 or more.
In this, PAR2 packets are used only by PAR2 client,
and PAR3 client will ignore those PAR2 packets.
I feel the idea of smaller packet header is not so bad.
Anyway I can implement any packet from, and I will follow official release.

> What are the use cases and problems you see in the Par2 client forums?

 I will search forums, but I can not use internet so much.
I think we need easy web-forum for users.
This mailing-list is hard to post for general public.
(At first I could not post for long time.)
Or can someone access/admin/edit SourseForge's Parchive forum ?

 from Michael Nahas 2010-06-17
> A good start would be to look at what languages QuickPar has been translated into.

 QuickPar does not translate (encode/decode filename).
I think it gets filename by System Local Codepage,
and write it directly on PAR file.
So, QuickPar does not write (7-bit) ASCII filename strictly.
This may not be problem for single-byte character. (8-bit ASCII extensions)

 This gives serious problem for users of multi-byte character (16-bit),
because sometimes the filename is not parsed correctly.
Some characters like \ are not usable for filename,
but multi-byte characters may have them at second byte.
QuickPar refused to accept those filenames as invalid,
then Japanese users can not use Japanese filename sometimes...

> Any objections to the proposed changes?
 from PAR2 spec
> File names are case sensitive and can be of any length.
> If a client is doing recovery on an operating system that has
> case-insensitive filenames or limited-length filenames,
> it is up to the client to rename files and directories.

 I mention about one more compatiblity issue; normalization.
This may be hard to understand for single-byte character users.
Normally normalization is used for search words.
You can search "ABC", "AbC", or "abc" by using a keyword "abc".
In PAR2 file, filename should be writen as it is,
"AbC.txt" is written as "AbC.txt", not as "ABC.TXT".
This is a good solution.

 For multi-byte character, normalization is more complex.
There are some method to show a character.
For easy sample, image [W] and [VV] (these are not real characters).
As graphic [W] and [VV] is similar, but the character code is different.
Unicode has a normalization method to search both by one keyword.
There are problem between OSs.
Windows OS, Linux, Java etc distinguish them.
Mac OS X does not. (I don't know why.)
I think PAR3 should not use normalization for Unicode,
(it should write a filename by its OS style.)
as same as PAR2 does not change case of original filename.

Best regards,
Yutaka Sawada