Re: [Hsqldb-user] Update on 1 000 000 rows very slow

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

The new script is doing a partial table scan for each chunk of
the update procedure. But the scan size goes up in each
iteration.

Scanning 500000 rows takes about a minute and the time is almost
linear when the row count varies.

My suggestion was to find the chunks in several partial scans,
but with fixed scan sizes. This may not be useful with your data
change pattern if one of the PK columns is changing.

I have not found any issue with cache management. Your selects
are simply reading more rows in each iteration.

Anyway, the kind of condition that does not cause big scans in
SELECT and UPDATE is KEY1 = 'val' AND KEY2 > 'val2' AND ROWNUM()
< chunk size

Anything using an IN subquery that scans a lot of rows, slows
down the operation.

Fred

On Friday, November 25, 2011 11:20 AM, "Renaud Pelissier"
<ren...@gm...> wrote:

I forgot to join the new version of the script:

*****************************************************************
*

\p Dropping previous datas and conf

DROP INDEX IDX_test_bool IF EXISTS;

DROP PROCEDURE populate_proc IF EXISTS;

DROP TABLE test IF EXISTS;

\p Creating table

CREATE CACHED TABLE test(key1 VARCHAR(64), key2 VARCHAR(1024),
bool BOOLEAN DEFAULT false, PRIMARY KEY (key1, key2));

\p Creating index

--CREATE INDEX IDX_test_bool ON test(bool);

\p Creating procedure

CREATE PROCEDURE populate_proc(IN key1_var VARCHAR(64), IN
max_var INT, IN iter_var INT)

MODIFIES SQL DATA

BEGIN ATOMIC

DECLARE count_var INT;

DECLARE batch_size_var INT;

SET count_var=0;

loop_label: WHILE count_var < max_var DO

    INSERT INTO test VALUES(key1_var,CONCAT('key2_',CAST(iter_var
AS VARCHAR(32)),'_',CAST(count_var AS VARCHAR(32))),
CAST(MOD(iter_var,2) AS BOOLEAN));

    SET count_var = count_var + 1;

END WHILE loop_label;

END;

.;

SET FILES LOG FALSE;

SET DATABASE REFERENTIAL INTEGRITY FALSE;

* timestamp = 0

* delta_time = 0

* count = 0

* nchunk = 100

* chunksize=1000

* i = 1

\p Starting *{nchunk} insert iteration with key1-A by chunks of
*{chunksize} rows

* while (*i <= *nchunk)

  CALL populate_proc('key1-A',*{chunksize},*{i});

  CALL UNIX_TIMESTAMP();

  * ((delta_time=?-timestamp))

  * timestamp = *{?}

  COMMIT;

    * ((i += 1))

    * ((count+=chunksize))

    \p >*{timestamp};*{delta_time};*{count}

* end while

SET FILES LOG TRUE;

SET DATABASE REFERENTIAL INTEGRITY TRUE;

SET FILES LOG FALSE;

SET DATABASE REFERENTIAL INTEGRITY FALSE;

* count = 0

* nchunk = 400

* chunksize=1000

* i = 1

\p Starting *{nchunk} insert iteration with key1-B by chunks of
*{chunksize} rows

* while (*i <= *nchunk)

  CALL populate_proc('key1-B',*{chunksize},*{i});

  CALL UNIX_TIMESTAMP();

  * ((delta_time=?-timestamp))

  * timestamp = *{?}

  COMMIT;

    * ((i += 1))

    * ((count+=chunksize))

    \p >*{timestamp};*{delta_time};*{count}

* end while

SET FILES LOG TRUE;

SET DATABASE REFERENTIAL INTEGRITY TRUE;

SET FILES LOG FALSE;

SET DATABASE REFERENTIAL INTEGRITY FALSE;

\p Starting update

* timestamp = 0

* delta_time = 0

* chunksize = 100000

* nupdated = 1

* count = 0

CALL UNIX_TIMESTAMP();

* timestamp = *{?}

* while (*nupdated > 0)

--UPDATE test SET bool=true WHERE rownum()<=*{chunksize} AND
bool=false AND key1='key1-B';

  --UPDATE test SET bool=true WHERE rownum()<=*{chunksize} AND
key1='key1-B' AND key2>=(SELECT key2 FROM test WHERE
key1='key1-B' LIMIT 1 OFFSET *{count});

--UPDATE test SET bool=true WHERE key1='key1-B' AND key2 IN
(SELECT key2 FROM test WHERE key1='key1-B' LIMIT *{chunksize}
OFFSET *{count});

--SELECT count(*) FROM test WHERE key1='key1-B' AND key2 IN
(SELECT key2 FROM test WHERE key1='key1-B' LIMIT *{chunksize}
OFFSET *{count});

SELECT count(*) FROM test WHERE rownum()<=*{chunksize} AND
key1='key1-B' AND key2>=(SELECT key2 FROM test WHERE
key1='key1-B' LIMIT 1 OFFSET *{count});

  * ((nupdated = ?))

  CALL UNIX_TIMESTAMP();

  * ((delta_time=?-timestamp))

  * timestamp = *{?}

  COMMIT;

    * ((count+=nupdated))

    \p >*{timestamp};*{delta_time};*{count}

* end while

\p Done updated

SET FILES LOG TRUE;

SET DATABASE REFERENTIAL INTEGRITY TRUE;

*****************************************************************
*

Good morning Fred,

Thank you again for your time and consideration!

I have made some additional testing this morning following your
advices.

I reduced the number of rows by 10 to make the script faster and
upgraded the chunk_size to 100 000. Now it takes less than 1
minute to run the test.

You suggestions didn't lead to better results so I changed the
UPDATE statement for a SELECT COUNT(*) statement and I was very
suprised to discover that the performances pb was still there.

So the problem has nothing to do neither with the index on the
"bool" field nor with the UPDATE statement! I think it has
something to do with the cache management which is getting slower
and slower over the time.

*************** FIRST SELECT REQUEST USING 'key2 IN (SELECT key2
LIMIT batch_size OFFSET count)'
*************************************

100000

1322218278

>1322218278;5;100000

100000

1322218292

>1322218292;14;200000

100000

1322218325

>1322218325;33;300000

100000

1322218381

>1322218381;56;400000

 0

1322218385

>1322218385;4;400000

*************** FIRST SELECT REQUEST USING 'rownum()<=batch_size
AND key2>(SELECT key2 LIMIT 1 OFFSET count)
*************************************

100000

1322218646

>1322218646;1;100000

100000

1322218650

>1322218650;4;200000

100000

1322218666

>1322218666;16;300000

100000

1322218699

>1322218699;33;400000

 0

1322218704

>1322218704;5;400000

--------------------------------------------------------------------------
----
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d

_______________________________________________
Hsqldb-user mailing list
Hsq...@li...
https://lists.sourceforge.net/lists/listinfo/hsqldb-user