Menu

#14 Export data into RDBMS table using JDBC Batch update

open
5
2009-09-17
2009-09-17
No

JDBC batch update is faster than single update per row. I could see 5x performance improvement in some cases on our cluster. I just replace 'executeUpdate( )' with 'addBatch()' and 'executeBatch()'. and I call the 'executeBatch()' per file. and This is a straitforward implementation. I do not consider 'batch size', 'commit size' or something like that. but it works fine so far.

I think we can configure the batch size for export in Cloudbase's configuration.

Discussion

  • Tarandeep Singh

    Tarandeep Singh - 2009-09-17

    Thats a good point. Furhther, I was thinking of doing the insert into RDBS in parallel- once the mappers or reducers finished execution, directly push the data into RDBMS.

    I guess a new output format- DBOutputFormat can be created for this. Hadoop already has something like that, but I have not seen the code yet. If that is suitable, we can use that else create a new class. In that class, we can make use of your suggestions- use executeBatch( ) instead of executeUpdate( ).

    Any thoughts?

     
  • Tarandeep Singh

    Tarandeep Singh - 2009-09-17
    • assigned_to: nobody --> tsingh
     
  • YoungWoo Kim

    YoungWoo Kim - 2009-09-17

    You're right. DBOutputFormat would be better because DBOutputFormat is a straightforward interface using Hadoop's MapReduce API. but I think it is not urgent. CB's current Implementation works fine.

     
  • YoungWoo Kim

    YoungWoo Kim - 2009-09-17

    You're right. DBOutputFormat would be better because DBOutputFormat is a
    straightforward interface using Hadoop's MapReduce API. but I think it is
    not urgent. CB's current Implementation works fine.

     

Log in to post a comment.