CloudBase / Feature Requests / #14 Export data into RDBMS table using JDBC Batch update

#14 Export data into RDBMS table using JDBC Batch update

Status: open

Owner: Tarandeep Singh

Labels: Server improvements (10)

Priority: 5

Updated: 2009-09-17

Created: 2009-09-17

Creator: YoungWoo Kim

Private: No

JDBC batch update is faster than single update per row. I could see 5x performance improvement in some cases on our cluster. I just replace 'executeUpdate( )' with 'addBatch()' and 'executeBatch()'. and I call the 'executeBatch()' per file. and This is a straitforward implementation. I do not consider 'batch size', 'commit size' or something like that. but it works fine so far.

I think we can configure the batch size for export in Cloudbase's configuration.

Discussion

Tarandeep Singh - 2009-09-17

Thats a good point. Furhther, I was thinking of doing the insert into RDBS in parallel- once the mappers or reducers finished execution, directly push the data into RDBMS.

I guess a new output format- DBOutputFormat can be created for this. Hadoop already has something like that, but I have not seen the code yet. If that is suitable, we can use that else create a new class. In that class, we can make use of your suggestions- use executeBatch( ) instead of executeUpdate( ).

Any thoughts?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tarandeep Singh - 2009-09-17

assigned_to: nobody --> tsingh
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

YoungWoo Kim - 2009-09-17

You're right. DBOutputFormat would be better because DBOutputFormat is a straightforward interface using Hadoop's MapReduce API. but I think it is not urgent. CB's current Implementation works fine.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

YoungWoo Kim - 2009-09-17

You're right. DBOutputFormat would be better because DBOutputFormat is a
straightforward interface using Hadoop's MapReduce API. but I think it is
not urgent. CB's current Implementation works fine.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Export data into RDBMS table using JDBC Batch update

Group

Searches

Help

#14 Export data into RDBMS table using JDBC Batch update

Discussion