Menu

#3 Batch size limit

open
None
5
2004-09-09
2004-08-27
No

It would be great if we could be able to specify a
limit on the number of rows that openDbCopy will
attempt to copy in a single batch, using the copy plugin.

Larger batches require more memory. I recently
attempted to migrate a 1.5 GB table and that was a bit
too much for my server to handle, as openDbCopy appears
to load the entire table in memory before moving the
data to the destination.

I needed to split my table in 30 batches in order to
make openDbCopy work. Defining those 30 batches in
openDbCopy was very time consuming of course.

Discussion

  • Anthony Smith

    Anthony Smith - 2004-09-09

    Logged In: YES
    user_id=877380

    I'll give it a shot next week, when I'll be working on a
    server with 4 GB RAM in it.

    I haven't analysed your code in detail (and haven't worked
    with java for 6-7 years, so I doubt I'd be able to anyway..)
    but I would suggest an alternative technique where you get
    the data row-by-row, collect NN rows and then batch them
    into the destination db. This is obviously slower, but this
    alternative access method would only be used if required, if
    the user indeed has specified a limit (per-table option) and
    that limit is below the number of rows in the table.

    Another option perhaps would be to check if the source RDBMS
    does support the LIMIT function, and use that if it does.

    A third way is to work with filters, you could allow for
    several filters per table, where each filter would select
    disjunct sets of data (it would be the users responsibility
    to check this) and then run one batch per table. Currently,
    the filter functionality is not usable in my case since it
    involves too much work.

    I had the idea to set up a single filter and then copy/paste
    the relevant sections of the job XML file, only changing the
    filter spec. Would this work? The XML file is huge, as it
    contains all the mappings etc per plugin instance, so it was
    a bit difficult to get the overview.

    I don't think you can let the application decide the max
    number of rows to be fetched, I guess different drivers will
    allocate memory differently so it would be impossible to
    know in advance how much memory a row in a given table would
    use (even more difficult when you have varchar/binary columns).

    I would need this for MSSQL and Oracle, but as you say, this
    functionality should be RDBMS independant if possible.

     
  • Anthony Smith

    Anthony Smith - 2004-09-09
    • assigned_to: nobody --> iloveopensource
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.