Slow Import with replace statement
Brought to you by:
ansgarbecker
Originally created by: klchi... (code.google.com)@gmail.com
Hi,
I do not know this issue had report or not.
I had a file about 23Mbyte or 215,000 lines.
The queries were with "Replace Into" statement, even I
had drop the table, the import were very slow about
1.30 hr to complete the task, anyway to improve this
issue?
Tested the speed will improve alot with "Insert Into"
statement.
Regards,
KL Chin
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: rosenfie... (code.google.com)@gmail.com
> anyway to improve this
Definitely. The first step would be to find a way to profile the code so that we
actually identify the bottlenecks. I'm not, on top of my head, sure how to
efficiently profile Delphi Pascal code.
> Tested the speed will improve alot with "Insert Into" statement.
Any improvements you see from doing that means it's the server that performs replace
poorly/slowly. For example, I could imagine that it does a SELECT for every
REPLACE, instead of keeping, say, hash tables of unique keys in memory to compare
against on multiple REPLACEs.
Labels: Severity-Problematic Performance
Status: Accepted
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: rosenfie... (code.google.com)@gmail.com
(Note: http://stackoverflow.com/questions/368938/delphi-profiling-tools)
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: klchi... (code.google.com)@gmail.com
Ho Rosenfield Albert,
IMHO, for a fast work around, if the db had dropped or the script
had a delete the db first, why not change the REPLACE or INSERT.
Because, what I had notice the different was about 10min compare
to 1.30 hr.
If you required the sql scripts, I can send to you.
Regards,
KL Chin
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: rosenfie... (code.google.com)@gmail.com
Uhm, that doesn't make sense at all. If you, in the export dialog, choose:
- "replace data", then TRUNCATE + INSERT statements are generated.
- "update existing data", then REPLACE statements are generated.
Similarly if you choose to drop and recreate the entire table, then INSERT is always
used.
There's simply no possible way of doing something stupid such as what you suggest in
comment #3, creating a combination of a drop or truncate and REPLACE, in the current
export GUI.
And even if there was, that wouldn't be the subject of this issue - this issue as
originally submitted is about REPLACE INTO being slow.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: klchi... (code.google.com)@gmail.com
I think the problem was how often people check the query before
start import it to the db. Beside, sometime the export file was
done by novice (client), if they can export it we should be very
happy it about it.
BTW, would it be nice to have a smarter import, where it will
auto scan the script during import, and it will indicate the
import file too huge and take some times, or suggest to have
a new import script with the option for the export dialog.
Regards the subject, I not sure the best words but it just help
me to dscribe my problem/issue in general.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: rosenfie... (code.google.com)@gmail.com
> sometime the export file was done by novice (client),
> if they can export it we should be very happy it about it
Guess so.
The function could be renamed so that 'export as sql' is called 'script to sql'.
That way it doesn't sound nearly as much as a pure model/data export.
A larger improvement would be to add a pure model/data export. The question is
which data format to use. CSV doesn't include the schema plus the de facto
specification (Microsoft Office) is quite ambiguous, eg. NULL and empty string are
equivalent, the header line is optional, etc. frm/MYD is very convenient, at least
for MySQL databases. XML is quite wasteful, but could be used in gzipped format.
For either of these, an import function would be needed to, of course.
Anyway.
> would it be nice to have a smarter import, where it will
> auto scan the script during import,
For sure. Other intelligent add-ons could be:
- switch method on import (REPLACE INTO --> INSERT INTO or the other way around)
- map to another target (eg a different database name) on import
- skip some objects when more than one table is stuffed into the same SQL file
- rewrite from one-row-per-INSERT-statement (ANSI SQL) to multiple-rows-per-INSERT-
statement (MySQL) syntax
- rewrite to use MySQL protocol-level prepared statements (skips sending query
logic for each INSERT, only sends data. also skips quoting and unquoting values,
and sends values in binary format instead of string format over the network.)
Lots of good ideas, mostly a question of finding someone who has time to flesh out
the algorithms and codify them.
Patches are accepted graciously, by the way ;-).
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: klchi... (code.google.com)@gmail.com
Hi,
With the exsiting import file, I had replacing the keyword "REPLACE"
with "INSERT", the speed was slow too, the Queries and Affected Rows
changes were increment 1 by 1 and for 2 mins about 1,500 record
inserted.
BTW, cannot the source code compile with Delphi 5?
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: rosenfie... (code.google.com)@gmail.com
> replacing the keyword "REPLACE"
> with "INSERT", the speed was slow too
Okay, in that case it sounds like the main problem is probably a large per-query
overhead in HeidiSQL.
> BTW, cannot the source code compile with Delphi 5?
Hmm, no, looking at the source code repository, it seems that at the moment packages
are only maintained for Delphi 11.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: klchi... (code.google.com)@gmail.com
Hmm, any plan to make the code able compile with Delphi 5?
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: rosenfie... (code.google.com)@gmail.com
Delphi 5 plans: Not at all. The project used to maintain packages for older
compilers, but it needed constant decision making about what to do each time
something was implemented that required a feature which an old version didn't have.
Plus it was also a bit of a hassle to synchronize changes into the different project
files manually.
Ansgar managed to maintain the Subversion commit history when the project was moved
to Google Code, so you can still find these old packages by doing "svn log" on each
of the "packages" directories. For example see [r1056].
There used to be various versions of Delphi 11 available for download from the
manufacturer on the Internet, as trial versions and various offers. The product has
changed owner a couple of times now, so I'm not sure what the current situation is,
but you can probably still find a Torrent of some of the previously published
downloads.
Related
Commit: [r1056]
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: a...@anse.de
klchin66, I guess the new database layer tuned performance here, so it might be worth
trying the latest build to see if such an import is still so slow as you mentioned.
On my machine I just started to import a 50M/1Mio rows file, which was at 17,000 rows
after 1 minute. I don't think there is much room for further speed optimization but
we'll see.
Status: NeedInfo
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: klchi... (code.google.com)@gmail.com
Hi Ansgar Becker,
I had verify the new release 2717, look solved the issue.
* New export sql with replace option, fast - OKed (17M file abt 30sec)
* Use the old script with "insert" for every row, still very slow (22M file)
BTW, can the export performance be improve (17+M files about 3-4 min)?
TQ.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: a...@anse.de
Thanks for the update!
One INSERT per row is always slow, does not wonder. Won't be much room to optimize that.
Export dialog will be rewritten and tuned currently.
Status: Accepted
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: a...@anse.de
Export performance should be slightly faster in [r2735]. But there's still much room
for optimization. I just tested the half ready new export dialog with different logic
and simpler code, and it's ~4 times faster than the old one.
Related
Commit: [r2735]
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: klchi... (code.google.com)@gmail.com
Hi Angsa Becker,
I had downloaded [r2739], improvement cannot be see still required about 3-4 min,
for 17+Mb files, 13 tbales.
Regards,
KL Chin
Related
Commit: [r2739]
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: a...@anse.de
Testing with my half ready export stuff in the table tools dialog:
exporting a 20M table, 7 columns, 100,000 rows:
* New dialog: 4 seconds
* Current dialog: 1:45 minutes
exporting a tall 40M table, 3 columns, 1 Mio rows:
* New dialog: 13 seconds
* Current dialog: 2:10 minutes
I hope adding various features won't break this good performance again, like it
happened in the past.
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: klchi... (code.google.com)@gmail.com
Hi Angsar Becker,
Tested with [r2744], took abt 3min, I think a bit improvement.
Btw, is there any logs that will display the performance,
for all the tables?
Is there any performance diff, between database type
for exports for MyISAM and InnoDB? I using MyISAM.
Related
Commit: [r2744]
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: a...@anse.de
No - InnoDB and MyIsam behave the same when doing a "SELECT * FROM tbl LIMIT x, y"
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: a...@anse.de
New dialog is running since [r2748]
Status: Fixed
Related
Commit: [r2748]
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: klchi... (code.google.com)@gmail.com
Hi Ansgar Becker,
Great, really fast now.
Only problem, the GUI changes were too much.
Regards,
KL Chin
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: a...@anse.de
Thanks! There are at least two or three things which have to be fixed again, fx. a
memory leak and such.
> the GUI changes were too much
Could you be more specific?
View and moderate all "tickets Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Tickets"
Originally posted by: klchi... (code.google.com)@gmail.com
Hi Ansgar Becker,
What I mean was the GUI was so different now,
and hard for some non IT people to use it.