From: Hitoshi H. <hem...@la...> - 2012-05-29 02:04:18
|
Hi folks, This is a backup/restore problem assuming we understand the BARRIER feature properly. ============================================================================ POSTGRES-XC BUG REPORT TEMPLATE ============================================================================ Your name : Hitoshi Hemmi Your email address : hem...@la... System Configuration: --------------------- Architecture : x86_64 Operating Systems : A. CentOS release 6.2 x86_64 B. RHEL5.7 x86_64 We tried in two environments. See also "Compilers" Postgres-XC version : Postgres-XC 1.0beta2 Compilers used : A. gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) B. gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51) Description of problems: ============================================== We tested online backup/restore feature using BARRIER, and found anomalies in restored serial values. (Outline of procedure) 0. Setup all coordinators and datanodes for online backups. 1. Create a table via coordinator: # CREATE TABLE tbl(I serial, j int); 2. First dataload via coordinator: # INSERT INTO tbl(j) VALUES (generate_series(1,10)); 3. Base backup of all coordinators and datanodes: $ pg_basebackup -h localhost -p xxxx -D /xxxx 4. Second dataload via coordinator: # INSERT INTO tbl(j) VALUES (generate_series(11,20)); 5. Install BARRIER via coordinator: # CREATE BARRIER 'test'; 6. Archive WALs for all coordinators and datanodes: # SELECT pg_start_backup('xxx'); # SELECT pg_stop_backup(); 7. Third dataload via coordinator: # INSERT INTO tbl(j) VALUES (generate_series(21, 30)); 8. Stop all coordinators and datanodes. 9. Discard existing database cluster, and resore base backups. 10. create recovery.conf for all coordinators and datanodes. 11. Start datanodes and coordinators. 12. Issue SELECT * ... and observe the result: testdb=# SELECT * FROM tbl ORDER BY i; i | j ----+---- 1 | 1 2 | 2 3 | 3 4 | 4 5 | 5 6 | 6 7 | 7 8 | 8 9 | 9 10 | 10 12 | 11 13 | 12 14 | 13 15 | 14 16 | 15 17 | 16 18 | 17 19 | 18 20 | 19 21 | 20 (20 rows) *** Please note the displacement. ============================================== Thanks -- Hitoshi HEMMI NTT Open Source Software Center hem...@la... (Please note that my address has changed.) Tel:(03)5860-5115 Fax:(03)5463-5490 |
From: Michael P. <mic...@gm...> - 2012-05-29 02:13:12
|
I have a couple of questions. This report lacks of details. System Configuration: > --------------------- > Architecture : x86_64 > > Operating Systems : > A. CentOS release 6.2 x86_64 > B. RHEL5.7 x86_64 > We tried in two environments. See also "Compilers" > > Postgres-XC version : Postgres-XC 1.0beta2 > > Compilers used : > A. gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) > B. gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51) > > Description of problems: > ============================================== > We tested online backup/restore feature using BARRIER, > and found anomalies in restored serial values. > > (Outline of procedure) > 0. Setup all coordinators and datanodes for online backups. > > 1. Create a table via coordinator: > # CREATE TABLE tbl(I serial, j int); > > 2. First dataload via coordinator: > # INSERT INTO tbl(j) VALUES (generate_series(1,10)); > > 3. Base backup of all coordinators and datanodes: > $ pg_basebackup -h localhost -p xxxx -D /xxxx > > 4. Second dataload via coordinator: > # INSERT INTO tbl(j) VALUES (generate_series(11,20)); > > 5. Install BARRIER via coordinator: > # CREATE BARRIER 'test'; > > 6. Archive WALs for all coordinators and datanodes: > # SELECT pg_start_backup('xxx'); > # SELECT pg_stop_backup(); > Was this launched directly at each node? > > 7. Third dataload via coordinator: > # INSERT INTO tbl(j) VALUES (generate_series(21, 30)); > > 8. Stop all coordinators and datanodes. > > 9. Discard existing database cluster, and resore base backups. > > 10. create recovery.conf for all coordinators and datanodes. > Just a confirmation: Did you set up recovery_target_barrier to "test" in each recovery.conf? I think you have done so but please confirm. 11. Start datanodes and coordinators. > > 12. Issue SELECT * ... and observe the result: > testdb=# SELECT * FROM tbl ORDER BY i; > i | j > ----+---- > 1 | 1 > 2 | 2 > 3 | 3 > 4 | 4 > 5 | 5 > 6 | 6 > 7 | 7 > 8 | 8 > 9 | 9 > 10 | 10 > 12 | 11 > 13 | 12 > 14 | 13 > 15 | 14 > 16 | 15 > 17 | 16 > 18 | 17 > 19 | 18 > 20 | 19 > 21 | 20 > (20 rows) Just by seeing what is written in this report, the data is recovered up to the second insert, to the point where the barrier has been created. So it looks correct. Am I missing something? -- Michael Paquier http://michael.otacoo.com |
From: Hitoshi H. <hem...@la...> - 2012-05-29 03:38:37
|
Value "10" is missing, in "i" column; therefore, i != j in each record in second half of the query result list. That is the problem we think. Are we wrong? If it indeed is a problem, I will ask the details to engineers who operated this. -hemmi Michael Paquier さんは書きました: > I have a couple of questions. > This report lacks of details. > > System Configuration: > --------------------- > Architecture : x86_64 > > Operating Systems : > A. CentOS release 6.2 x86_64 > B. RHEL5.7 x86_64 > We tried in two environments. See also "Compilers" > > Postgres-XC version : Postgres-XC 1.0beta2 > > Compilers used : > A. gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) > B. gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51) > > Description of problems: > ============================================== > We tested online backup/restore feature using BARRIER, > and found anomalies in restored serial values. > > (Outline of procedure) > 0. Setup all coordinators and datanodes for online backups. > > 1. Create a table via coordinator: > # CREATE TABLE tbl(I serial, j int); > > 2. First dataload via coordinator: > # INSERT INTO tbl(j) VALUES (generate_series(1,10)); > > 3. Base backup of all coordinators and datanodes: > $ pg_basebackup -h localhost -p xxxx -D /xxxx > > 4. Second dataload via coordinator: > # INSERT INTO tbl(j) VALUES (generate_series(11,20)); > > 5. Install BARRIER via coordinator: > # CREATE BARRIER 'test'; > > 6. Archive WALs for all coordinators and datanodes: > # SELECT pg_start_backup('xxx'); > # SELECT pg_stop_backup(); > > Was this launched directly at each node? > > > > 7. Third dataload via coordinator: > # INSERT INTO tbl(j) VALUES (generate_series(21, 30)); > > 8. Stop all coordinators and datanodes. > > 9. Discard existing database cluster, and resore base backups. > > 10. create recovery.conf for all coordinators and datanodes. > > Just a confirmation: > Did you set up recovery_target_barrier to "test" in each recovery.conf? > I think you have done so but please confirm. > > 11. Start datanodes and coordinators. > > 12. Issue SELECT * ... and observe the result: > testdb=# SELECT * FROM tbl ORDER BY i; > i | j > ----+---- > 1 | 1 > 2 | 2 > 3 | 3 > 4 | 4 > 5 | 5 > 6 | 6 > 7 | 7 > 8 | 8 > 9 | 9 > 10 | 10 > 12 | 11 > 13 | 12 > 14 | 13 > 15 | 14 > 16 | 15 > 17 | 16 > 18 | 17 > 19 | 18 > 20 | 19 > 21 | 20 > (20 rows) > > Just by seeing what is written in this report, the data is recovered > up to the second insert, to the point where the barrier has been created. > So it looks correct. > Am I missing something? > > > -- > Michael Paquier > http://michael.otacoo.com -- Hitoshi HEMMI NTT Open Source Software Center hem...@la... (Please note that my address has changed.) Tel:(03)5860-5115 Fax:(03)5463-5490 |
From: Hitoshi H. <hem...@la...> - 2012-05-29 03:43:08
|
I made mistake. Value "11" is missing indeed. -hemmi Hitoshi HEMMI さんは書きました: > Value "10" is missing, in "i" column; therefore, i != j in > each record in second half of the query result list. > That is the problem we think. Are we wrong? > > If it indeed is a problem, I will ask the details to engineers > who operated this. > > -hemmi > > > Michael Paquier さんは書きました: > >> I have a couple of questions. >> This report lacks of details. >> >> System Configuration: >> --------------------- >> Architecture : x86_64 >> >> Operating Systems : >> A. CentOS release 6.2 x86_64 >> B. RHEL5.7 x86_64 >> We tried in two environments. See also "Compilers" >> >> Postgres-XC version : Postgres-XC 1.0beta2 >> >> Compilers used : >> A. gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) >> B. gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51) >> >> Description of problems: >> ============================================== >> We tested online backup/restore feature using BARRIER, >> and found anomalies in restored serial values. >> >> (Outline of procedure) >> 0. Setup all coordinators and datanodes for online backups. >> >> 1. Create a table via coordinator: >> # CREATE TABLE tbl(I serial, j int); >> >> 2. First dataload via coordinator: >> # INSERT INTO tbl(j) VALUES (generate_series(1,10)); >> >> 3. Base backup of all coordinators and datanodes: >> $ pg_basebackup -h localhost -p xxxx -D /xxxx >> >> 4. Second dataload via coordinator: >> # INSERT INTO tbl(j) VALUES (generate_series(11,20)); >> >> 5. Install BARRIER via coordinator: >> # CREATE BARRIER 'test'; >> >> 6. Archive WALs for all coordinators and datanodes: >> # SELECT pg_start_backup('xxx'); >> # SELECT pg_stop_backup(); >> >> Was this launched directly at each node? >> >> >> >> 7. Third dataload via coordinator: >> # INSERT INTO tbl(j) VALUES (generate_series(21, 30)); >> >> 8. Stop all coordinators and datanodes. >> >> 9. Discard existing database cluster, and resore base backups. >> >> 10. create recovery.conf for all coordinators and datanodes. >> >> Just a confirmation: >> Did you set up recovery_target_barrier to "test" in each recovery.conf? >> I think you have done so but please confirm. >> >> 11. Start datanodes and coordinators. >> >> 12. Issue SELECT * ... and observe the result: >> testdb=# SELECT * FROM tbl ORDER BY i; >> i | j >> ----+---- >> 1 | 1 >> 2 | 2 >> 3 | 3 >> 4 | 4 >> 5 | 5 >> 6 | 6 >> 7 | 7 >> 8 | 8 >> 9 | 9 >> 10 | 10 >> 12 | 11 >> 13 | 12 >> 14 | 13 >> 15 | 14 >> 16 | 15 >> 17 | 16 >> 18 | 17 >> 19 | 18 >> 20 | 19 >> 21 | 20 >> (20 rows) >> >> Just by seeing what is written in this report, the data is recovered >> up to the second insert, to the point where the barrier has been created. >> So it looks correct. >> Am I missing something? >> >> >> -- >> Michael Paquier >> http://michael.otacoo.com >> > > > -- Hitoshi HEMMI NTT Open Source Software Center hem...@la... (Please note that my address has changed.) Tel:(03)5860-5115 Fax:(03)5463-5490 |
From: Michael P. <mic...@gm...> - 2012-05-29 07:24:35
|
I just checked this test case with vanilla PostgreSQL. postgres=# CREATE TABLE tbl(I serial, j int); NOTICE: CREATE TABLE will create implicit sequence "tbl_i_seq" for serial column "tbl.i" CREATE TABLE postgres=# INSERT INTO tbl(j) VALUES (generate_series(1,10)); INSERT 0 10 postgres=# INSERT INTO tbl(j) VALUES (generate_series(11,20)); INSERT 0 10 postgres=# select * from tbl; i | j ----+---- 1 | 1 2 | 2 3 | 3 4 | 4 5 | 5 6 | 6 7 | 7 8 | 8 9 | 9 10 | 10 12 | 11 13 | 12 14 | 13 15 | 14 16 | 15 17 | 16 18 | 17 19 | 18 20 | 19 21 | 20 (20 rows) Conclusion, even in the case of Postgres, the value 11 is missing. So this is not a bug and process works OK. -- Michael Paquier http://michael.otacoo.com |