As you can see was executed in 153.243 ms. When I'm trying doing the same in psgl, it's more quicker:
explain analyse select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, d.node_list, e.router_id
from sym_data d
inner join sym_data_event e on d.data_id = e.data_id
inner join sym_outgoing_batch o on o.batch_id=e.batch_id
where o.batch_id = '14994635' and o.node_id = '000' order by d.data_id asc;
RESULT:
"Sort (cost=309.86..309.92 rows=24 width=1011) (actual time=0.183..0.183 rows=2 loops=1)"
" Sort Key: d.data_id"
" Sort Method: quicksort Memory: 26kB"
" -> Nested Loop (cost=1.27..309.31 rows=24 width=1011) (actual time=0.146..0.163 rows=2 loops=1)"
" -> Index Only Scan using sym_outgoing_batch_pkey on sym_outgoing_batch o (cost=0.42..8.44 rows=1 width=8) (actual time=0.026..0.027 rows=1 loops=1)"
" Index Cond: ((batch_id = 14994635::bigint) AND (node_id = '000'::text))"
" Heap Fetches: 0"
" -> Nested Loop (cost=0.85..300.63 rows=24 width=1019) (actual time=0.116..0.128 rows=2 loops=1)"
" -> Index Scan using sym_idx_de_batchid on sym_data_event e (cost=0.42..97.71 rows=24 width=37) (actual time=0.080..0.082 rows=2 loops=1)"
" Index Cond: (batch_id = 14994635::bigint)"
" -> Index Scan using sym_idx_d_channel_id on sym_data d (cost=0.42..8.45 rows=1 width=990) (actual time=0.018..0.018 rows=1 loops=2)"
" Index Cond: (data_id = e.data_id)"
"Total runtime: 0.251 ms"
Even using prepare statement it's still faster:
prepare sds as select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, d.node_list, e.router_id
from sym_data d
inner join sym_data_event e on d.data_id = e.data_id
inner join sym_outgoing_batch o on o.batch_id=e.batch_id
where o.batch_id = $1 and o.node_id = $2 order by d.data_id asc;
explain analyze execute sds('14994635', '000');
RESULT:
"Sort (cost=309.86..309.92 rows=24 width=1011) (actual time=0.072..0.072 rows=2 loops=1)"
" Sort Key: d.data_id"
" Sort Method: quicksort Memory: 26kB"
" -> Nested Loop (cost=1.27..309.31 rows=24 width=1011) (actual time=0.039..0.055 rows=2 loops=1)"
" -> Index Only Scan using sym_outgoing_batch_pkey on sym_outgoing_batch o (cost=0.42..8.44 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1)"
" Index Cond: ((batch_id = 14994635::bigint) AND (node_id = '000'::text))"
" Heap Fetches: 0"
" -> Nested Loop (cost=0.85..300.63 rows=24 width=1019) (actual time=0.021..0.033 rows=2 loops=1)"
" -> Index Scan using sym_idx_de_batchid on sym_data_event e (cost=0.42..97.71 rows=24 width=37) (actual time=0.010..0.011 rows=2 loops=1)"
" Index Cond: (batch_id = 14994635::bigint)"
" -> Index Scan using sym_idx_d_channel_id on sym_data d (cost=0.42..8.45 rows=1 width=990) (actual time=0.005..0.007 rows=1 loops=2)"
" Index Cond: (data_id = e.data_id)"
"Total runtime: 0.156 ms"
In class JdbcSqlReadCursor, after prepareStatement, you are setting values using sqlTemplate.setValues(..). In this method org.springframework.jdbc.core.StatementCreatorUtils.setParameterValue(..) to determine sqlType of parameters. I guess that this is Types.NUMERIC for batch_id and Types.VARCHAR for node_id.
Doing the same in psql, query execution is slower:
prepare sds_de (decimal, varchar) as select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, d.node_list, e.router_id
from sym_data d
inner join sym_data_event e on d.data_id = e.data_id
inner join sym_outgoing_batch o on o.batch_id=e.batch_id
where o.batch_id = $1 and o.node_id = $2 order by d.data_id asc;
explain analyze execute sds_de('14994635', '000');
RESULT:
"Sort (cost=38935.59..38948.15 rows=5026 width=1007) (actual time=1293.282..1293.284 rows=2 loops=1)"
" Sort Key: d.data_id"
" Sort Method: quicksort Memory: 26kB"
" -> Nested Loop (cost=5142.56..38626.61 rows=5026 width=1007) (actual time=306.025..1293.268 rows=2 loops=1)"
" -> Hash Join (cost=5142.13..34751.16 rows=5078 width=29) (actual time=306.004..1293.234 rows=2 loops=1)"
" Hash Cond: (e.batch_id = o.batch_id)"
" -> Seq Scan on sym_data_event e (cost=0.00..25741.91 rows=1017691 width=37) (actual time=0.005..615.269 rows=1019325 loops=1)"
" -> Hash (cost=5136.37..5136.37 rows=461 width=8) (actual time=31.920..31.920 rows=1 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 1kB"
" -> Index Only Scan using sym_outgoing_batch_pkey on sym_outgoing_batch o (cost=0.42..5136.37 rows=461 width=8) (actual time=29.833..31.915 rows=1 loops=1)"
" Index Cond: (node_id = '000'::text)"
" Filter: ((batch_id)::numeric = 14994635::numeric)"
" Rows Removed by Filter: 92302"
" Heap Fetches: 1415"
" -> Index Scan using sym_data_pkey on sym_data d (cost=0.42..0.75 rows=1 width=986) (actual time=0.009..0.010 rows=1 loops=2)"
" Index Cond: (data_id = e.data_id)"
"Total runtime: 1293.347 ms"
Is it possible to change batch_id type to Types.BIGINT, to optimize this query execution from even 0,5 sec to less then 1 ms?
Thanks in advance for any answer!
Last edit: Pawel Omelko 2014-04-25
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Does it help if you add the "protocolVersion=2" parameter to your database connection URL? For example, in your SymmetricDS properties file, it might look like this:
I did some more investigation and I found that in class DataService, method selectDataFor(Batch batch) is execution of query 'selectEventDataToExtractSql' with parameters BatchId and TargetNodeId. For first param there is defined sqlType as Types.NUMERIC. Query plan for this parameters is same as for decimal.
prepare sds_nu(numeric, varchar) as select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, d.node_list, e.router_id
from sym_data d
inner join sym_data_event e on d.data_id = e.data_id
inner join sym_outgoing_batch o on o.batch_id=e.batch_id
where o.batch_id = $1 and o.node_id = $2 order by d.data_id asc;
explain analyze execute sds_nu('14994635', '000');
On the other hand SDS generate sym_outgoing_batch table with batch_id column type set to bigint. So why there is no Types.BIGINT set for query 'selectEventDataToExtractSql'?
Is it possible to change this settings in next SDS version, customize this type in some way? As you can see execution without types can be thousands times faster.
Last edit: Pawel Omelko 2014-04-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Waiting for yours answer, I modified method selectDataFor(Batch batch) from class DataService to change parameter type from Types.NUMERIC to Types.BIGINT. After that I made symmetric-core-3.5.3.jar and replaced original jar by it in our SDS instalations.
The result was incredible, execution time of 'selectEventDataToExtractSql' query was less then 1 ms - before change about 30-160 ms (depends on current query plany). Without that change execution time was even 0,5 sec when sym_data has about 3 millions data!.
Average extract time (from sym_outgoing_batch.extract_millis) decreased from about 200 to 10 ms.
select channel_id, round(avg(extract_millis), 2) as extract,
count(*),
round(avg(byte_count), 0) as bytes
from sym_outgoing_batch
where last_update_time >='2014-04-15 12:00'
and last_update_time <='2014-04-15 13:00'
group by channel_id order by channel_id;
RESULT
channel
extract
count
bytes
ch_1
11.81
820
5969
ch_2
9.31
52
382
ch_3
7.67
3
691
ch_4
8.47
664
660
ch_5
10.00
7
850
ch_6
12.03
302
2418
ch_7
8.00
754
1383
ch_8
8.55
434
1053
ch_9
11.12
254
697
ch_10
15.48
278
13045
ch_11
7.09
32
1121
heartbeat
3.13
8
333
Is it possible to change type Types.BIGINT for described parameter in next SDS version? As you can see performance impact is huge.
Thanks in advance.
Last edit: Pawel Omelko 2014-04-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
We are using SymmetricDS 3.5.3 and PostgreSQL 9.3.
In db logs we can find execution details for SDS query 'selectEventDataToExtractSql' (defined in class DataServiceSqlMap)
As you can see was executed in 153.243 ms. When I'm trying doing the same in psgl, it's more quicker:
Even using prepare statement it's still faster:
In class JdbcSqlReadCursor, after prepareStatement, you are setting values using sqlTemplate.setValues(..). In this method org.springframework.jdbc.core.StatementCreatorUtils.setParameterValue(..) to determine sqlType of parameters. I guess that this is Types.NUMERIC for batch_id and Types.VARCHAR for node_id.
Doing the same in psql, query execution is slower:
Is it possible to change batch_id type to Types.BIGINT, to optimize this query execution from even 0,5 sec to less then 1 ms?
Thanks in advance for any answer!
Last edit: Pawel Omelko 2014-04-25
Does it help if you add the "protocolVersion=2" parameter to your database connection URL? For example, in your SymmetricDS properties file, it might look like this:
db.url=jdbc:postgresql://localhost/corp?protocolVersion=2&stringtype=unspecified
We already tried this with no effect.
I did some more investigation and I found that in class DataService, method selectDataFor(Batch batch) is execution of query 'selectEventDataToExtractSql' with parameters BatchId and TargetNodeId. For first param there is defined sqlType as Types.NUMERIC. Query plan for this parameters is same as for decimal.
On the other hand SDS generate sym_outgoing_batch table with batch_id column type set to bigint. So why there is no Types.BIGINT set for query 'selectEventDataToExtractSql'?
Is it possible to change this settings in next SDS version, customize this type in some way? As you can see execution without types can be thousands times faster.
Last edit: Pawel Omelko 2014-04-11
Hi again,
Waiting for yours answer, I modified method selectDataFor(Batch batch) from class DataService to change parameter type from Types.NUMERIC to Types.BIGINT. After that I made symmetric-core-3.5.3.jar and replaced original jar by it in our SDS instalations.
The result was incredible, execution time of 'selectEventDataToExtractSql' query was less then 1 ms - before change about 30-160 ms (depends on current query plany). Without that change execution time was even 0,5 sec when sym_data has about 3 millions data!.
Average extract time (from sym_outgoing_batch.extract_millis) decreased from about 200 to 10 ms.
RESULT
Is it possible to change type Types.BIGINT for described parameter in next SDS version? As you can see performance impact is huge.
Thanks in advance.
Last edit: Pawel Omelko 2014-04-16
Thanks for the suggestion. We'll incorporate the improvement in the next version of SymmetricDS.
http://www.symmetricds.org/issues/view.php?id=1684
Thanks Chris for your reply!