Thread: [Postgres-xc-bugs] backup/restore problem using BARRIER

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-bugs

[Postgres-xc-bugs] backup/restore problem using BARRIER

From: Hitoshi H. <hem...@la...> - 2012-05-29 02:04:18

Hi folks,

This is a backup/restore problem assuming we understand
the BARRIER feature properly.


============================================================================
POSTGRES-XC BUG REPORT TEMPLATE
============================================================================

Your name : Hitoshi Hemmi
Your email address : hem...@la...


System Configuration:
---------------------
Architecture : x86_64

Operating Systems :
A. CentOS release 6.2 x86_64
B. RHEL5.7 x86_64
We tried in two environments. See also "Compilers"

Postgres-XC version : Postgres-XC 1.0beta2

Compilers used :
A. gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)
B. gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51)

Description of problems:
==============================================
We tested online backup/restore feature using BARRIER,
and found anomalies in restored serial values.

(Outline of procedure)
0. Setup all coordinators and datanodes for online backups.

1. Create a table via coordinator:
# CREATE TABLE tbl(I serial, j int);

2. First dataload via coordinator:
# INSERT INTO tbl(j) VALUES (generate_series(1,10));

3. Base backup of all coordinators and datanodes:
$ pg_basebackup -h localhost -p xxxx -D /xxxx

4. Second dataload via coordinator:
　# INSERT INTO tbl(j) VALUES (generate_series(11,20));

5. Install BARRIER via coordinator:
　# CREATE BARRIER 'test';

6. Archive WALs for all coordinators and datanodes:
　# SELECT pg_start_backup('xxx');
　# SELECT pg_stop_backup();

7. Third dataload via coordinator:
　# INSERT INTO tbl(j) VALUES (generate_series(21, 30));

8. Stop all coordinators and datanodes.

9. Discard existing database cluster, and resore base backups.

10. create recovery.conf for all coordinators and datanodes.

11. Start datanodes and coordinators.

12. Issue SELECT * ... and observe the result:
testdb=# SELECT * FROM tbl ORDER BY i;
i | j
----+----
1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
6 | 6
7 | 7
8 | 8
9 | 9
10 | 10
12 | 11
13 | 12
14 | 13
15 | 14
16 | 15
17 | 16
18 | 17
19 | 18
20 | 19
21 | 20
(20 rows)

*** Please note the displacement.

==============================================

Thanks

-- 
Hitoshi HEMMI
NTT Open Source Software Center
hem...@la...
(Please note that my address has changed.)
Tel:(03)5860-5115
Fax:(03)5463-5490

Re: [Postgres-xc-bugs] backup/restore problem using BARRIER

From: Michael P. <mic...@gm...> - 2012-05-29 02:13:12

I have a couple of questions.
This report lacks of details.

System Configuration:
> ---------------------
> Architecture : x86_64
>
> Operating Systems :
> A. CentOS release 6.2 x86_64
> B. RHEL5.7 x86_64
> We tried in two environments. See also "Compilers"
>
> Postgres-XC version : Postgres-XC 1.0beta2
>
> Compilers used :
> A. gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)
> B. gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51)
>
> Description of problems:
> ==============================================
> We tested online backup/restore feature using BARRIER,
> and found anomalies in restored serial values.
>
> (Outline of procedure)
> 0. Setup all coordinators and datanodes for online backups.
>
> 1. Create a table via coordinator:
> # CREATE TABLE tbl(I serial, j int);
>
> 2. First dataload via coordinator:
> # INSERT INTO tbl(j) VALUES (generate_series(1,10));
>
> 3. Base backup of all coordinators and datanodes:
> $ pg_basebackup -h localhost -p xxxx -D /xxxx
>
> 4. Second dataload via coordinator:
> # INSERT INTO tbl(j) VALUES (generate_series(11,20));
>
> 5. Install BARRIER via coordinator:
> # CREATE BARRIER 'test';
>
> 6. Archive WALs for all coordinators and datanodes:
> # SELECT pg_start_backup('xxx');
> # SELECT pg_stop_backup();
>
Was this launched directly at each node?


>
> 7. Third dataload via coordinator:
> # INSERT INTO tbl(j) VALUES (generate_series(21, 30));
>
> 8. Stop all coordinators and datanodes.
>
> 9. Discard existing database cluster, and resore base backups.
>
> 10. create recovery.conf for all coordinators and datanodes.
>
Just a confirmation:
Did you set up recovery_target_barrier to "test" in each recovery.conf?
I think you have done so but please confirm.

11. Start datanodes and coordinators.
>
> 12. Issue SELECT * ... and observe the result:
> testdb=# SELECT * FROM tbl ORDER BY i;
> i | j
> ----+----
> 1 | 1
> 2 | 2
> 3 | 3
> 4 | 4
> 5 | 5
> 6 | 6
> 7 | 7
> 8 | 8
> 9 | 9
> 10 | 10
> 12 | 11
> 13 | 12
> 14 | 13
> 15 | 14
> 16 | 15
> 17 | 16
> 18 | 17
> 19 | 18
> 20 | 19
> 21 | 20
> (20 rows)

Just by seeing what is written in this report, the data is recovered up to
the  second insert, to the point where the barrier has been created.
So it looks correct.
Am I missing something?


-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-bugs] backup/restore problem using BARRIER

From: Hitoshi H. <hem...@la...> - 2012-05-29 03:38:37

Value "10" is missing, in "i" column; therefore, i  != j in
each record in second half of the query result list.
That is the problem we think. Are we wrong?

If it indeed is a problem, I will ask the details to engineers
who operated this.

-hemmi


Michael Paquier さんは書きました:
> I have a couple of questions.
> This report lacks of details.
>
>     System Configuration:
>     ---------------------
>     Architecture : x86_64
>
>     Operating Systems :
>     A. CentOS release 6.2 x86_64
>     B. RHEL5.7 x86_64
>     We tried in two environments. See also "Compilers"
>
>     Postgres-XC version : Postgres-XC 1.0beta2
>
>     Compilers used :
>     A. gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)
>     B. gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51)
>
>     Description of problems:
>     ==============================================
>     We tested online backup/restore feature using BARRIER,
>     and found anomalies in restored serial values.
>
>     (Outline of procedure)
>     0. Setup all coordinators and datanodes for online backups.
>
>     1. Create a table via coordinator:
>     # CREATE TABLE tbl(I serial, j int);
>
>     2. First dataload via coordinator:
>     # INSERT INTO tbl(j) VALUES (generate_series(1,10));
>
>     3. Base backup of all coordinators and datanodes:
>     $ pg_basebackup -h localhost -p xxxx -D /xxxx
>
>     4. Second dataload via coordinator:
>     　# INSERT INTO tbl(j) VALUES (generate_series(11,20));
>
>     5. Install BARRIER via coordinator:
>     　# CREATE BARRIER 'test';
>
>     6. Archive WALs for all coordinators and datanodes:
>     　# SELECT pg_start_backup('xxx');
>     　# SELECT pg_stop_backup();
>
> Was this launched directly at each node?
>  
>
>
>     7. Third dataload via coordinator:
>     　# INSERT INTO tbl(j) VALUES (generate_series(21, 30));
>
>     8. Stop all coordinators and datanodes.
>
>     9. Discard existing database cluster, and resore base backups.
>
>     10. create recovery.conf for all coordinators and datanodes.
>
> Just a confirmation:
> Did you set up recovery_target_barrier to "test" in each recovery.conf?
> I think you have done so but please confirm.
>
>     11. Start datanodes and coordinators.
>
>     12. Issue SELECT * ... and observe the result:
>     testdb=# SELECT * FROM tbl ORDER BY i;
>     i | j
>     ----+----
>     1 | 1
>     2 | 2
>     3 | 3
>     4 | 4
>     5 | 5
>     6 | 6
>     7 | 7
>     8 | 8
>     9 | 9
>     10 | 10
>     12 | 11
>     13 | 12
>     14 | 13
>     15 | 14
>     16 | 15
>     17 | 16
>     18 | 17
>     19 | 18
>     20 | 19
>     21 | 20
>     (20 rows)
>
> Just by seeing what is written in this report, the data is recovered 
> up to the  second insert, to the point where the barrier has been created.
> So it looks correct.
> Am I missing something?
>
>
> -- 
> Michael Paquier
> http://michael.otacoo.com


-- 
Hitoshi HEMMI
NTT Open Source Software Center
hem...@la...
(Please note that my address has changed.)
Tel:(03)5860-5115
Fax:(03)5463-5490

Re: [Postgres-xc-bugs] backup/restore problem using BARRIER

From: Hitoshi H. <hem...@la...> - 2012-05-29 03:43:08

I made mistake. Value "11" is missing indeed.

-hemmi


Hitoshi HEMMI さんは書きました:
> Value "10" is missing, in "i" column; therefore, i  != j in
> each record in second half of the query result list.
> That is the problem we think. Are we wrong?
>
> If it indeed is a problem, I will ask the details to engineers
> who operated this.
>
> -hemmi
>
>
> Michael Paquier さんは書きました:
>   
>> I have a couple of questions.
>> This report lacks of details.
>>
>>     System Configuration:
>>     ---------------------
>>     Architecture : x86_64
>>
>>     Operating Systems :
>>     A. CentOS release 6.2 x86_64
>>     B. RHEL5.7 x86_64
>>     We tried in two environments. See also "Compilers"
>>
>>     Postgres-XC version : Postgres-XC 1.0beta2
>>
>>     Compilers used :
>>     A. gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)
>>     B. gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51)
>>
>>     Description of problems:
>>     ==============================================
>>     We tested online backup/restore feature using BARRIER,
>>     and found anomalies in restored serial values.
>>
>>     (Outline of procedure)
>>     0. Setup all coordinators and datanodes for online backups.
>>
>>     1. Create a table via coordinator:
>>     # CREATE TABLE tbl(I serial, j int);
>>
>>     2. First dataload via coordinator:
>>     # INSERT INTO tbl(j) VALUES (generate_series(1,10));
>>
>>     3. Base backup of all coordinators and datanodes:
>>     $ pg_basebackup -h localhost -p xxxx -D /xxxx
>>
>>     4. Second dataload via coordinator:
>>     　# INSERT INTO tbl(j) VALUES (generate_series(11,20));
>>
>>     5. Install BARRIER via coordinator:
>>     　# CREATE BARRIER 'test';
>>
>>     6. Archive WALs for all coordinators and datanodes:
>>     　# SELECT pg_start_backup('xxx');
>>     　# SELECT pg_stop_backup();
>>
>> Was this launched directly at each node?
>>  
>>
>>
>>     7. Third dataload via coordinator:
>>     　# INSERT INTO tbl(j) VALUES (generate_series(21, 30));
>>
>>     8. Stop all coordinators and datanodes.
>>
>>     9. Discard existing database cluster, and resore base backups.
>>
>>     10. create recovery.conf for all coordinators and datanodes.
>>
>> Just a confirmation:
>> Did you set up recovery_target_barrier to "test" in each recovery.conf?
>> I think you have done so but please confirm.
>>
>>     11. Start datanodes and coordinators.
>>
>>     12. Issue SELECT * ... and observe the result:
>>     testdb=# SELECT * FROM tbl ORDER BY i;
>>     i | j
>>     ----+----
>>     1 | 1
>>     2 | 2
>>     3 | 3
>>     4 | 4
>>     5 | 5
>>     6 | 6
>>     7 | 7
>>     8 | 8
>>     9 | 9
>>     10 | 10
>>     12 | 11
>>     13 | 12
>>     14 | 13
>>     15 | 14
>>     16 | 15
>>     17 | 16
>>     18 | 17
>>     19 | 18
>>     20 | 19
>>     21 | 20
>>     (20 rows)
>>
>> Just by seeing what is written in this report, the data is recovered 
>> up to the  second insert, to the point where the barrier has been created.
>> So it looks correct.
>> Am I missing something?
>>
>>
>> -- 
>> Michael Paquier
>> http://michael.otacoo.com
>>     
>
>
>   


-- 
Hitoshi HEMMI
NTT Open Source Software Center
hem...@la...
(Please note that my address has changed.)
Tel:(03)5860-5115
Fax:(03)5463-5490

Re: [Postgres-xc-bugs] backup/restore problem using BARRIER

From: Michael P. <mic...@gm...> - 2012-05-29 07:24:35

I just checked this test case with vanilla PostgreSQL.
postgres=# CREATE TABLE tbl(I serial, j int);
NOTICE:  CREATE TABLE will create implicit sequence "tbl_i_seq" for serial
column "tbl.i"
CREATE TABLE
postgres=# INSERT INTO tbl(j) VALUES (generate_series(1,10));
INSERT 0 10
postgres=# INSERT INTO tbl(j) VALUES (generate_series(11,20));
INSERT 0 10
postgres=# select * from tbl;
 i  | j
----+----
  1 |  1
  2 |  2
  3 |  3
  4 |  4
  5 |  5
  6 |  6
  7 |  7
  8 |  8
  9 |  9
 10 | 10
 12 | 11
 13 | 12
 14 | 13
 15 | 14
 16 | 15
 17 | 16
 18 | 17
 19 | 18
 20 | 19
 21 | 20
(20 rows)
Conclusion, even in  the case of Postgres, the value 11 is missing.
So this is not a bug and process works OK.
-- 
Michael Paquier
http://michael.otacoo.com