I have 1000s of blobs that I want to INSERT into a table. Currently I do one INSERT for each blob which takes forever. Is there a way I can insert all the blobs with one INSERT? Thanks again.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've not used the sqlite package in ooRexx so I can't give you a definitive
answer, but I wonder if you have sufficient memory whether building a
single mega-sql statement in rexx and forgoing the binds might improve your
performance. You might try something like this:
values = .array~new
do tile over tiles
values~append("("tile~zm,tile~column,tile~y,"'"tile~blob_data"')")
end
sql = "INSERT INTO tiles (zoom_level,tile_column,tile_row,tile_data)
VALUES" values~makestring('l',',')
stmt = .ooSQLiteStmt~new(dest, sql)
if stmt~initCode <> 0 then do
msg = 'Error: ooSQLiteStmt INSERT initialization error:'stmt~initCode
stmt~lastErrMsg
self~logmsg(msg)
stmt~finalize
dest~close
signal returnit
end
stmt~step --< not sure if you need this now
stmt~finalize --< not sure if you need this now
Of course the cost of this is if you have bad data everything fails and it
might perform worse than binding the rows one at a time.
Thanks for the suggestion. I already tried something similiar where I loaded all the blobs into a directory but as you warned this blew all the memory and caused the program to crash. There were about 50,000 blobs each reperesenting a small jpg.
I was hoping there might be an INSERT that contained all the filespecs of the jpgs. But I guess this doesn't exist and it probably wouln't be much faster than getting each blob and doing an INSERT.
What I am going to try next it do the SQL calls in C++ in a routine in my dll.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Just to let you know that doing the calls using C++ was a bit faster but still way to slow. I was still reading in the blobs one by one and inserting them.
But this gives me an idea: I could beak up the list of blob filesepes into into smaller lists and the dispatch separate threads for each list.
This brings up a question: can separate threads insert rows into the same table at the same time? Anyone know?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the info Rony.
My approach is now to break up the blob filesecs into into smaller lists and dispatch a thread for each list that creates a temporary sql db with the same table definition.
Once all the threads have finished. I attach each of the temporary databases and then use the:
INSERT INTO tiles SELECT * FROM source.tiles;
to merge them all into the main database table. I found that insert to be very fast.
This reduced the time to process 57,000 blobs from 13 minutes using one thread down to 4.5 minutes using 23 threads.
Increasing the number of threads to 115 didn't improve the time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Um, is your Rexx script and your database server close to each other? if not, you are probably waiting for the JPG upload to finish..
Maybe you could have Rexx create the INSERT statments and the JPG's, and upload the whole thing to the server and run the script there? Might be faster.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have 1000s of blobs that I want to INSERT into a table. Currently I do one INSERT for each blob which takes forever. Is there a way I can insert all the blobs with one INSERT? Thanks again.
I should add that each blob is a jpg in a file.
Hi Paul,
I think we need some context to answer that question. What database? Why
the ooRexx discussion list?
In Mysql for instance you can use the INSERT instruction thus
INSERT INTO table (fieldlist) VALUES (valuelist 1),(valuelist
2)...,(valuelist n)
which may be what you want
Jon
On Fri, 27 Mar 2020 at 20:14, Paul Higgins phiggins-99@users.sourceforge.net wrote:
Jon,
I am using the oorexx squlite interface that is why I posted here, Heere is my present INSERT:
Thanks Paul,
I've not used the sqlite package in ooRexx so I can't give you a definitive
answer, but I wonder if you have sufficient memory whether building a
single mega-sql statement in rexx and forgoing the binds might improve your
performance. You might try something like this:
values = .array~new
do tile over tiles
values~append("("tile~zm,tile~column,tile~y,"'"tile~blob_data"')")
end
sql = "INSERT INTO tiles (zoom_level,tile_column,tile_row,tile_data)
VALUES" values~makestring('l',',')
stmt = .ooSQLiteStmt~new(dest, sql)
if stmt~initCode <> 0 then do
msg = 'Error: ooSQLiteStmt INSERT initialization error:'stmt~initCode
stmt~lastErrMsg
self~logmsg(msg)
stmt~finalize
dest~close
signal returnit
end
stmt~step --< not sure if you need this now
stmt~finalize --< not sure if you need this now
Of course the cost of this is if you have bad data everything fails and it
might perform worse than binding the rows one at a time.
Please let us know how you get on
Jon
On Fri, 27 Mar 2020 at 21:29, Paul Higgins phiggins-99@users.sourceforge.net wrote:
Jon,
Thanks for the suggestion. I already tried something similiar where I loaded all the blobs into a directory but as you warned this blew all the memory and caused the program to crash. There were about 50,000 blobs each reperesenting a small jpg.
I was hoping there might be an INSERT that contained all the filespecs of the jpgs. But I guess this doesn't exist and it probably wouln't be much faster than getting each blob and doing an INSERT.
What I am going to try next it do the SQL calls in C++ in a routine in my dll.
Just to let you know that doing the calls using C++ was a bit faster but still way to slow. I was still reading in the blobs one by one and inserting them.
But this gives me an idea: I could beak up the list of blob filesepes into into smaller lists and the dispatch separate threads for each list.
This brings up a question: can separate threads insert rows into the same table at the same time? Anyone know?
Paul,
maybe there is a variant of import that you could use for blobls? Looking around sqlite there are a few places where import gets documented, e.g.
Maybe there is a means to use import/load for blobs directly.
---rony
P.S.: Ad multithreaded access: https://stackoverflow.com/questions/35717263/how-threadsafe-is-sqlite3.
Thanks for the info Rony.
My approach is now to break up the blob filesecs into into smaller lists and dispatch a thread for each list that creates a temporary sql db with the same table definition.
Once all the threads have finished. I attach each of the temporary databases and then use the:
INSERT INTO tiles SELECT * FROM source.tiles;
to merge them all into the main database table. I found that insert to be very fast.
This reduced the time to process 57,000 blobs from 13 minutes using one thread down to 4.5 minutes using 23 threads.
Increasing the number of threads to 115 didn't improve the time.
Um, is your Rexx script and your database server close to each other? if not, you are probably waiting for the JPG upload to finish..
Maybe you could have Rexx create the INSERT statments and the JPG's, and upload the whole thing to the server and run the script there? Might be faster.
I don't use servers. It just runs on a Windows PC.