I started using QuickPar a few days ago and since I am a mathematician I was trying to understand the way QuickPar works.
I have read the "Parity Volume Set Specification 2.0" and both J. Plank's articles, which are cited in the specification.
My first impression was that I didn't know, why your solution of the Plank's mistake was going to work.
The idea of your solution is to make all the constants in the Vandermonde matrix used by the algorithm the generagors of the multiplicative group of GF(2^16).
Unfortunately, it does not seem to be enough:
If you divide two of such constants then the result does not need to be a generator of GF(2^16)^*.
This leads to the following examples:
Let us assume that all data blocks are numerated from 1 to D=number of data blocks.
Recovery blocks are numerated from 0 to R=number of data blocks - 1.
PAR2.0 algorithm will fail in the following situations:
D>=10925, there are two missing data blocks: 2 and 10925
and we have just two recovery blocks: 0 and 3
D>=6555, there are two missing data blocks: 1 and 6555
and we have just two recovery blocks: 0 and 5
D>=2186, there are two missing data blocks: 3 and 2186
and we have just two recovery blocks: 0 and 15
D>=1928, there are two missing data blocks: 1 and 1928
and we have just two recovery blocks: 0 and 17
D>=643, there are two missing data blocks: 1 and 643
and we have just two recovery blocks: 0 and 51
D>=386, there are two missing data blocks: 1 and 386
and we have just two recovery blocks: 0 and 85
D>=130, there are two missing data blocks: 2 and 130
and we have just two recovery blocks: 0 and 255
D>=129, there are two missing data blocks: 1 and 129
and we have just two recovery blocks: 0 and 257
It is easy to generate more similar examples.
I have tested the last two examples using QuickPar.
There were two posible scenarios:
One of them was that QuickPar wrote "Reed Solomon computation error".
Another one was that QuickPar just crashed.
These situations do not seem to happen frequently because the number of data blocks or the number of recovery blocks are large in all of these examples.
Note, that in all of them the number of missing data blocks equals to 2. (It is easy to characterize every such possible example.)
I wasn't trying to look for the faulty situations, when three or more data blocks are missing.
I think that there are plenty of them and that it is quite probable that they may happen in a real life.
I am not sure if all I write is known to you or not, so I decided to write...
One more question: What is wrong about the second of Planks articles?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I found (http://sourceforge.net/mailarchive/forum.php?forum_id=11471) that you already know about the problem. I have read all your comments and noticed that you believe that Plank's original algorithm would work if 2^n-1 was a prime number. (If 2^n-1 is a prime then Plank's original algorithm coincides with your solution.)
I don't get your reasoning. Primality of 2^n-1 has nothing to do here. Consider the following example:
Let n=3. Then 2^3=8 and 2^3-1=7 is a prime.
GF(8) consists of elements 0,1,2,3,4,5,6,7.
The addition in GF(8) is bitwise XOR.
The multiplication in GF(8) is determined by
2^0=1
2^1=2
2^2=4
2^3=3
2^4=6
2^5=7
2^6=5
Assume that our data is 6 blocks long and that we wish to generate four recovery blocks.
The matrix which will do that is the following:
100000
010000
001000
000100
000010
000001
111111
123456
145672
134567
Now, assume that we receive just the first, the third and the fifth of data blocks and recovery blocks 0, 1 and 3. We have to invert the matrix
100000
001000
000010
111111
123456
134567
But it is not invertible: the sum of the first, the fourth, the fifth and the sixth rows of this matrix is 000000.
What was the source of this? The source was that there exists a polynomial P(x) (for example x^3+x+1) which satisfies:
P(x) is a product of polynomials (over GF(8)) of degree 1,
the number of its monomials is less than its degree,
P(0)=1.
The same remains true for bigger values of n.
The primality of 2^n-1 guarantees that it is not possible to produce the faulty situation using up to two recovery blocks. When we may have more than two recovery blocks then the primality will not help.
I am sure that the idea presented in the second of Plank's articles works.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> I found (http://sourceforge.net/mailarchive/forum.php?forum_id=11471)
That link only points at the forum, not at a specific message.
> that you already know about the problem. I have read all your comments
> and noticed that you believe that Plank's original algorithm would
> work if 2^n-1 was a prime number. (If 2^n-1 is a prime then Plank's
> original algorithm coincides with your solution.)
> I don't get your reasoning. Primality of 2^n-1 has nothing to do here.
If I did say that Plank would work under all circumstances if 2^N-1 was prime, then I would just have been reporting what someone else had told me.
The only thing I myself have proven is for the case of 2 data blocks and 2 recovery blocks.
> Consider the following example:
> [deleted example with 3 blocks that fails]
> The same remains true for bigger values of n.
> The primality of 2^n-1 guarantees that it is not possible to produce
> the faulty situation using up to two recovery blocks. When we may have
> more than two recovery blocks then the primality will not help.
> I am sure that the idea presented in the second of Plank's articles works.
I believe it does.
PAR3 will use a similar general method of operation to Plank but will use GF(2^2^2^2^2^2...) and a different method to compute the matrix.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We have been aware that our solution to this bug in Plank's algorithm fails in exactly the same way for a very long time.
Fortunately as you note, for comparatively low block counts, the required distance between the two available recovery blocks has to be quite large.
Even for the very largest block counts, the distance only drops to 3, and in practice, it would be extremely rare for someone encounter exactly the right combination of missing data and available recovery blocks.
The same bug applies with higher numbers as well obviously, but the mathematical relationship between data and recovery block numbers gets a little more complicated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>Fortunately as you note, for comparatively low block counts, the required distance
>between the two available recovery blocks has to be quite large.
Not necessarily. Consider this example (counted on the paper, then tested with QuickPar):
Let the data be at least 13 blocks long.
The following data blocks are damaged:
4, 5, 6, 7, 10, 11, 12, 13
(the data blocks are numbered starting from 1).
We have the following recovery blocks
(the recovery blocks are numbered starting from 0):
0, 1, 2, 3, 4, 5, 6, 8
or
1, 2, 3, 4, 5, 6, 7, 9
or
2, 3, 4, 5, 6, 7, 8, 10
or ...
... an the recovery is not possible.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
More examples of situations causing Reed Solomon computation error:
The numbers of missing data blocks (we number data blocks starting from 1) are one of the following:
2, 4, 9, 10, 18 and 23
9, 12, 14, 17, 20 and 25
2, 15, 20, 21, 24 and 26
6, 8, 14, 17, 18 and 28
2, 14, 16, 22, 24 and 28
1, 10, 13, 18, 27 and 28
8, 10, 14, 16, 25 and 29
3, 5, 8, 9, 20 and 30
The only recovery blocks we have (data blocks are numbered starting from 0) are:
0, 1, 2, 3, 4 and 6
or
1, 2, 3, 4, 5 and 7
or
2, 3, 4, 5, 6 and 8
or
...
The numbers of missing data blocks (we number data blocks starting from 1) are one of the following:
2, 3, 11, 24 and 30
23, 24, 25, 29 and 32
4, 6, 18, 22 and 37
1, 11, 15, 21 and 38
31, 32, 33, 36 and 39
24, 25, 27, 35 and 40
The only recovery blocks we have (data blocks are numbered starting from 0) are:
0, 1, 2, 3 and 5
or
1, 2, 3, 4 and 6
or
2, 3, 4, 5 and 7
or
...
The numbers of missing data blocks (we number data blocks starting from 1) are one of the following:
2, 13, 20 and 35
3, 14, 21 and 36
4, 16, 23 and 38
5, 17, 24 and 39
7, 18, 26 and 41
8, 19, 27 and 42
10, 21, 29 and 44
11, 23, 31 and 46
12, 24, 32 and 47
14, 26, 34 and 49
The only recovery blocks we have (data blocks are numbered starting from 0) are:
0, 1, 2 and 4
or
1, 2, 3 and 5
or
2, 3, 4 and 6
or
...
The numbers of missing data blocks (we number data blocks starting from 1) are one of the following:
3, 49 and 238
7, 54 and 242
10, 57 and 245
14, 61 and 249
The only recovery blocks we have (data blocks are numbered starting from 0) are:
0,1 and 3
or
1, 2 and 4
or
2, 3 and 5
or
...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> More examples of situations causing Reed Solomon computation error:
>
> [examples deleted]
Presumably you have devised some formula that lets you enumerate these examples (or did you find them by brute force techniques)?
Can you compute an overal repair failure probability for the case where you have exactly the right number of recovery blocks for the number of missing data blocks? The probability would of course depend on the total number of data blocks and the highest recovery block number generated (which I suggest you assume is 10% of the number of data blocks).
Of course, the loss of blocks on UseNet is not completely random, so the effective failure probability is going to be much lower.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>Presumably you have devised some formula
>that lets you enumerate these examples
>(or did you find them by brute force techniques)?
Yes and no. Let me explain.
There is a characterization:
Let 2^n_1, 2^n_2, ..., 2^n_k be some bases of Vandermonde matrix you are using in your algorithm (n_1, n_2, ..., n_k are relatively prime with respect to 3*5*17*257).
Assume that the blocks of data corresponding to these bases are missing.
We have recovery blocks corresponding to exponents e_1, e_2,...,e_i.
Recovery is not possible if and only if i<k or 2^n_1, 2^n_2, ..., 2^n_k are roots of some polynomial of the form
a_1 x^e_1 + a_2 x^e_2 + ... + a_i x^e_i
(a_1, a_2, ..., a_i are some elements of GF(2^16)).
The brute-force part of what I've done was looking for small n_1, n_2, ..., n_k (relatively prime with respect to 3*5*17*257) such that
2^n_1 + 2^n_2 + ... + 2^n_k = 0.
Then 2^n_1, 2^n_2, ..., 2^n_k are the roots of the polynomial
(x - 2^n_1)(x - 2^n_2)...(x - 2^n_k) = x^k + a_{k-2} x^{k-2} + ... + a_0
(Note that a_{n-1} = 0 by Viete formula.)
By the characterization given above the recovery blocks 0, 1, 2,..., k-2, k will not help to recover missing data blocks corresponging to the bases 2^n_1, 2^n_2, ..., 2^n_k.
The first of the examples (the one with recovery 8 blocks) was found using a piece of paper. The others - using a computer.
>Can you compute an overal repair failure >probability for the case where you
>have exactly the right number of recovery blocks >for the number of missing data
>blocks? The probability would of course depend >on the total number of data blocks
>and the highest recovery block number generated >(which I suggest you assume
>is 10% of the number of data blocks).
It could be difficult. I think that some estimations could be written but the problem does not seem to be easy.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>>Presumably you have devised some formula
>>that lets you enumerate these examples
>>(or did you find them by brute force techniques)?
>Yes and no. Let me explain.
>There is a characterization:
> [Details deleted]
Very interesting. I think I understand.
>>Can you compute an overal repair failure >>probability for the case where you
>>have exactly the right number of recovery blocks >>for the number of missing data
>>blocks? The probability would of course depend >>on the total number of data blocks
>>and the highest recovery block number generated >(which I suggest you assume
>>is 10% of the number of data blocks).
>It could be difficult. I think that some estimations could be written but the problem does not seem to be easy.
Not to worry, I don't think it can be too high or else there would be constant reports from people who were unable to repair.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>What was the source of this? The source was that there exists a polynomial P(x)
>(for example x^3+x+1) which satisfies:
>P(x) is a product of polynomials (over GF(8)) of degree 1,
>the number of its monomials is less than its degree,
>P(0)=1.
Mistake. I meant "the number of its monomials is less than _or_equal_to its degree".
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I started using QuickPar a few days ago and since I am a mathematician I was trying to understand the way QuickPar works.
I have read the "Parity Volume Set Specification 2.0" and both J. Plank's articles, which are cited in the specification.
My first impression was that I didn't know, why your solution of the Plank's mistake was going to work.
The idea of your solution is to make all the constants in the Vandermonde matrix used by the algorithm the generagors of the multiplicative group of GF(2^16).
Unfortunately, it does not seem to be enough:
If you divide two of such constants then the result does not need to be a generator of GF(2^16)^*.
This leads to the following examples:
Let us assume that all data blocks are numerated from 1 to D=number of data blocks.
Recovery blocks are numerated from 0 to R=number of data blocks - 1.
PAR2.0 algorithm will fail in the following situations:
D>=10925, there are two missing data blocks: 2 and 10925
and we have just two recovery blocks: 0 and 3
D>=6555, there are two missing data blocks: 1 and 6555
and we have just two recovery blocks: 0 and 5
D>=2186, there are two missing data blocks: 3 and 2186
and we have just two recovery blocks: 0 and 15
D>=1928, there are two missing data blocks: 1 and 1928
and we have just two recovery blocks: 0 and 17
D>=643, there are two missing data blocks: 1 and 643
and we have just two recovery blocks: 0 and 51
D>=386, there are two missing data blocks: 1 and 386
and we have just two recovery blocks: 0 and 85
D>=130, there are two missing data blocks: 2 and 130
and we have just two recovery blocks: 0 and 255
D>=129, there are two missing data blocks: 1 and 129
and we have just two recovery blocks: 0 and 257
It is easy to generate more similar examples.
I have tested the last two examples using QuickPar.
There were two posible scenarios:
One of them was that QuickPar wrote "Reed Solomon computation error".
Another one was that QuickPar just crashed.
These situations do not seem to happen frequently because the number of data blocks or the number of recovery blocks are large in all of these examples.
Note, that in all of them the number of missing data blocks equals to 2. (It is easy to characterize every such possible example.)
I wasn't trying to look for the faulty situations, when three or more data blocks are missing.
I think that there are plenty of them and that it is quite probable that they may happen in a real life.
I am not sure if all I write is known to you or not, so I decided to write...
One more question: What is wrong about the second of Planks articles?
I found (http://sourceforge.net/mailarchive/forum.php?forum_id=11471) that you already know about the problem. I have read all your comments and noticed that you believe that Plank's original algorithm would work if 2^n-1 was a prime number. (If 2^n-1 is a prime then Plank's original algorithm coincides with your solution.)
I don't get your reasoning. Primality of 2^n-1 has nothing to do here. Consider the following example:
Let n=3. Then 2^3=8 and 2^3-1=7 is a prime.
GF(8) consists of elements 0,1,2,3,4,5,6,7.
The addition in GF(8) is bitwise XOR.
The multiplication in GF(8) is determined by
2^0=1
2^1=2
2^2=4
2^3=3
2^4=6
2^5=7
2^6=5
Assume that our data is 6 blocks long and that we wish to generate four recovery blocks.
The matrix which will do that is the following:
100000
010000
001000
000100
000010
000001
111111
123456
145672
134567
Now, assume that we receive just the first, the third and the fifth of data blocks and recovery blocks 0, 1 and 3. We have to invert the matrix
100000
001000
000010
111111
123456
134567
But it is not invertible: the sum of the first, the fourth, the fifth and the sixth rows of this matrix is 000000.
What was the source of this? The source was that there exists a polynomial P(x) (for example x^3+x+1) which satisfies:
P(x) is a product of polynomials (over GF(8)) of degree 1,
the number of its monomials is less than its degree,
P(0)=1.
The same remains true for bigger values of n.
The primality of 2^n-1 guarantees that it is not possible to produce the faulty situation using up to two recovery blocks. When we may have more than two recovery blocks then the primality will not help.
I am sure that the idea presented in the second of Plank's articles works.
> I found (http://sourceforge.net/mailarchive/forum.php?forum_id=11471)
That link only points at the forum, not at a specific message.
> that you already know about the problem. I have read all your comments
> and noticed that you believe that Plank's original algorithm would
> work if 2^n-1 was a prime number. (If 2^n-1 is a prime then Plank's
> original algorithm coincides with your solution.)
> I don't get your reasoning. Primality of 2^n-1 has nothing to do here.
If I did say that Plank would work under all circumstances if 2^N-1 was prime, then I would just have been reporting what someone else had told me.
The only thing I myself have proven is for the case of 2 data blocks and 2 recovery blocks.
> Consider the following example:
> [deleted example with 3 blocks that fails]
> The same remains true for bigger values of n.
> The primality of 2^n-1 guarantees that it is not possible to produce
> the faulty situation using up to two recovery blocks. When we may have
> more than two recovery blocks then the primality will not help.
> I am sure that the idea presented in the second of Plank's articles works.
I believe it does.
PAR3 will use a similar general method of operation to Plank but will use GF(2^2^2^2^2^2...) and a different method to compute the matrix.
We have been aware that our solution to this bug in Plank's algorithm fails in exactly the same way for a very long time.
Fortunately as you note, for comparatively low block counts, the required distance between the two available recovery blocks has to be quite large.
Even for the very largest block counts, the distance only drops to 3, and in practice, it would be extremely rare for someone encounter exactly the right combination of missing data and available recovery blocks.
The same bug applies with higher numbers as well obviously, but the mathematical relationship between data and recovery block numbers gets a little more complicated.
>Fortunately as you note, for comparatively low block counts, the required distance
>between the two available recovery blocks has to be quite large.
Not necessarily. Consider this example (counted on the paper, then tested with QuickPar):
Let the data be at least 13 blocks long.
The following data blocks are damaged:
4, 5, 6, 7, 10, 11, 12, 13
(the data blocks are numbered starting from 1).
We have the following recovery blocks
(the recovery blocks are numbered starting from 0):
0, 1, 2, 3, 4, 5, 6, 8
or
1, 2, 3, 4, 5, 6, 7, 9
or
2, 3, 4, 5, 6, 7, 8, 10
or ...
... an the recovery is not possible.
More examples of situations causing Reed Solomon computation error:
The numbers of missing data blocks (we number data blocks starting from 1) are one of the following:
2, 4, 9, 10, 18 and 23
9, 12, 14, 17, 20 and 25
2, 15, 20, 21, 24 and 26
6, 8, 14, 17, 18 and 28
2, 14, 16, 22, 24 and 28
1, 10, 13, 18, 27 and 28
8, 10, 14, 16, 25 and 29
3, 5, 8, 9, 20 and 30
The only recovery blocks we have (data blocks are numbered starting from 0) are:
0, 1, 2, 3, 4 and 6
or
1, 2, 3, 4, 5 and 7
or
2, 3, 4, 5, 6 and 8
or
...
The numbers of missing data blocks (we number data blocks starting from 1) are one of the following:
2, 3, 11, 24 and 30
23, 24, 25, 29 and 32
4, 6, 18, 22 and 37
1, 11, 15, 21 and 38
31, 32, 33, 36 and 39
24, 25, 27, 35 and 40
The only recovery blocks we have (data blocks are numbered starting from 0) are:
0, 1, 2, 3 and 5
or
1, 2, 3, 4 and 6
or
2, 3, 4, 5 and 7
or
...
The numbers of missing data blocks (we number data blocks starting from 1) are one of the following:
2, 13, 20 and 35
3, 14, 21 and 36
4, 16, 23 and 38
5, 17, 24 and 39
7, 18, 26 and 41
8, 19, 27 and 42
10, 21, 29 and 44
11, 23, 31 and 46
12, 24, 32 and 47
14, 26, 34 and 49
The only recovery blocks we have (data blocks are numbered starting from 0) are:
0, 1, 2 and 4
or
1, 2, 3 and 5
or
2, 3, 4 and 6
or
...
The numbers of missing data blocks (we number data blocks starting from 1) are one of the following:
3, 49 and 238
7, 54 and 242
10, 57 and 245
14, 61 and 249
The only recovery blocks we have (data blocks are numbered starting from 0) are:
0,1 and 3
or
1, 2 and 4
or
2, 3 and 5
or
...
> More examples of situations causing Reed Solomon computation error:
>
> [examples deleted]
Presumably you have devised some formula that lets you enumerate these examples (or did you find them by brute force techniques)?
Can you compute an overal repair failure probability for the case where you have exactly the right number of recovery blocks for the number of missing data blocks? The probability would of course depend on the total number of data blocks and the highest recovery block number generated (which I suggest you assume is 10% of the number of data blocks).
Of course, the loss of blocks on UseNet is not completely random, so the effective failure probability is going to be much lower.
>Presumably you have devised some formula
>that lets you enumerate these examples
>(or did you find them by brute force techniques)?
Yes and no. Let me explain.
There is a characterization:
Let 2^n_1, 2^n_2, ..., 2^n_k be some bases of Vandermonde matrix you are using in your algorithm (n_1, n_2, ..., n_k are relatively prime with respect to 3*5*17*257).
Assume that the blocks of data corresponding to these bases are missing.
We have recovery blocks corresponding to exponents e_1, e_2,...,e_i.
Recovery is not possible if and only if i<k or 2^n_1, 2^n_2, ..., 2^n_k are roots of some polynomial of the form
a_1 x^e_1 + a_2 x^e_2 + ... + a_i x^e_i
(a_1, a_2, ..., a_i are some elements of GF(2^16)).
The brute-force part of what I've done was looking for small n_1, n_2, ..., n_k (relatively prime with respect to 3*5*17*257) such that
2^n_1 + 2^n_2 + ... + 2^n_k = 0.
Then 2^n_1, 2^n_2, ..., 2^n_k are the roots of the polynomial
(x - 2^n_1)(x - 2^n_2)...(x - 2^n_k) = x^k + a_{k-2} x^{k-2} + ... + a_0
(Note that a_{n-1} = 0 by Viete formula.)
By the characterization given above the recovery blocks 0, 1, 2,..., k-2, k will not help to recover missing data blocks corresponging to the bases 2^n_1, 2^n_2, ..., 2^n_k.
The first of the examples (the one with recovery 8 blocks) was found using a piece of paper. The others - using a computer.
>Can you compute an overal repair failure >probability for the case where you
>have exactly the right number of recovery blocks >for the number of missing data
>blocks? The probability would of course depend >on the total number of data blocks
>and the highest recovery block number generated >(which I suggest you assume
>is 10% of the number of data blocks).
It could be difficult. I think that some estimations could be written but the problem does not seem to be easy.
>>Presumably you have devised some formula
>>that lets you enumerate these examples
>>(or did you find them by brute force techniques)?
>Yes and no. Let me explain.
>There is a characterization:
> [Details deleted]
Very interesting. I think I understand.
>>Can you compute an overal repair failure >>probability for the case where you
>>have exactly the right number of recovery blocks >>for the number of missing data
>>blocks? The probability would of course depend >>on the total number of data blocks
>>and the highest recovery block number generated >(which I suggest you assume
>>is 10% of the number of data blocks).
>It could be difficult. I think that some estimations could be written but the problem does not seem to be easy.
Not to worry, I don't think it can be too high or else there would be constant reports from people who were unable to repair.
>What was the source of this? The source was that there exists a polynomial P(x)
>(for example x^3+x+1) which satisfies:
>P(x) is a product of polynomials (over GF(8)) of degree 1,
>the number of its monomials is less than its degree,
>P(0)=1.
Mistake. I meant "the number of its monomials is less than _or_equal_to its degree".