Let's you stream your Oracle table/query data to Amazon-S3 from Windows CLI (command line).
Features:
- Streams Oracle table data to Amazon-S3.
- No need to create CSV extracts before upload to S3.
- Data stream is compressed while upload to S3.
- No need for Amazon AWS CLI.
- Works from your OS Windows desktop (command line).
- It's executable (Oracle_To_S3_Uploader.exe) - no need for Python install.
- It's 64 bit - it will work on any vanilla DOS for 64-bit Windows.
- AWS Access Keys are not passed as arguments.
- Written using Python/boto/PyInstaller.
Executable is created using [pyInstaller] (http://www.pyinstaller.org/)
Audience
Database/ETL developers, Data Integrators, Data Engineers, Business Analysts, AWS Developers, DevOps,
Designated Environment
Pre-Prod (UAT/QA/DEV)
Usage
c:\Python35-32\PROJECTS\Ora2S3>dist\oracle_to_s3_uploader.exe##############################################################################Oracle to S3 Data Uploader (v1.2, beta, 04/05/2016 15:11:53) [64bit]#Copyright (c): 2016 Alex Buzunov, All rights reserved.#Agreement: Use this tool at your own risk. Author is not liable for any damages# or losses related to the use of this software.################################################################################Usage:setAWS_ACCESS_KEY_ID=<youaccesskey>setAWS_SECRET_ACCESS_KEY=<yousecretkey>setORACLE_LOGIN=tiger/scott@orclsetORACLE_CLIENT_HOME=C:\app\oracle12\product\12.1.0\dbhome_1oracle_to_s3_uploader.exe[<ora_query_file>][<ora_col_delim>][<ora_add_header>][<s3_bucket_name>][<s3_key_name>][<s3_use_rr>][<s3_public>]--ora_query_file--SQLquerytoexecureinsourceOracledb.--ora_col_delim--CSVcolumndelimiter(|).--ora_add_header--AddheaderlinetoCSVfile(False).--ora_lame_duck--Limitrowsfortrialupload(1000).--create_data_dump--Useitifyouwanttopersiststreameddataonyourfilesystem.--s3_bucket_name--S3bucketname(alwayssetit).--s3_location--Newbucketlocationname(us-west-2)Setitifyouarecreatingnewbucket--s3_key_name--CSVfilename(tostorequeryresultsonS3).if<s3_key_name>isnotspecified,theoraclequeryfilename(ora_query_file)willbeused.--s3_use_rr--Usereducedredundancystorage(False).--s3_write_chunk_size--ChunksizeformultipartupoadtoS3(10<<21,~20MB).--s3_public--Makeuploadedfilepublic(False).OracledatauploadedtoS3isalwayscompressed(gzip).
Example
Environment variables
Set the following environment variables (for all tests):
set_env.bat:
Can developers integrate Oracle_To_S3_Data_Uploader into their ETL pipelines?
Yes. Assuming they are doing it on OS Windows.
How fast is data upload using CSV Loader for Redshift?
As fast as any implementation of multi-part load using Python and boto.
How to inscease upload speed?
Input data stream is getting compressed before upload to S3. So not much could be done here.
You may want to run it closer to source or target for better performance.
What are the other ways to move large amounts of data from Oracle to S3?
You can write a sqoop script that can be scheduled as an 'EMR Activity' under Data Pipeline.
Does it create temporary data file to facilitate data load to S3?
No
Can I log transfered data for analysis?
Yes, Use -s, --create_data_dump to dump streamed data.
Explain first step of data transfer?
The query file you provided is used to select data form target Oracle server.
Stream is compressed before load to S3.
Explain second step of data transfer?
Compressed data is getting uploaded to S3 using multipart upload protocol.
What technology was used to create this tool
I used SQLPlus, Python, Boto to write it.
Boto is used to upload file to S3.
SQLPlus is used to spool data to compressor pipe.
Oracle-to-S3 data uploader.
Let's you stream your Oracle table/query data to Amazon-S3 from Windows CLI (command line).
Features:
- Streams Oracle table data to Amazon-S3.
- No need to create CSV extracts before upload to S3.
- Data stream is compressed while upload to S3.
- No need for Amazon AWS CLI.
- Works from your OS Windows desktop (command line).
- It's executable (Oracle_To_S3_Uploader.exe) - no need for Python install.
- It's 64 bit - it will work on any vanilla DOS for 64-bit Windows.
- AWS Access Keys are not passed as arguments.
- Written using Python/boto/PyInstaller.
Version
OS|Platform|Version
---|---|---- | -------------
Windows|64bit|[1.2 beta]
Purpose
How it works
Audience
Database/ETL developers, Data Integrators, Data Engineers, Business Analysts, AWS Developers, DevOps,
Designated Environment
Pre-Prod (UAT/QA/DEV)
Usage
Example
Environment variables
set_env.bat:
Test upload with data dump.
In this example complete table
test2
get's uploaded to Aamzon-S3 as compressed CSV file.Contents of the file table_query.sql:
Also temporary dump file is created for analysis (by default there are no files created)
Use
-s, --create_data_dump
to dump streamed data.If target bucket does not exists it will be created in user controlled region.
Use argument
-t, --s3_location
to set target region nameContents of the file test.bat:
Executing
test.bat
:Download
git clone https://github.com/alexbuz/Oracle_To_S3_Data_Uploader
oracle_to_s3_uploader 1.2
FAQ
Can it load Oracle data to Amazon S3 file?
Yes, it is the main purpose of this tool.
Can developers integrate
Oracle_To_S3_Data_Uploader
into their ETL pipelines?Yes. Assuming they are doing it on OS Windows.
How fast is data upload using
CSV Loader for Redshift
?As fast as any implementation of multi-part load using Python and boto.
How to inscease upload speed?
Input data stream is getting compressed before upload to S3. So not much could be done here.
You may want to run it closer to source or target for better performance.
What are the other ways to move large amounts of data from Oracle to S3?
You can write a sqoop script that can be scheduled as an 'EMR Activity' under Data Pipeline.
Does it create temporary data file to facilitate data load to S3?
No
Can I log transfered data for analysis?
Yes, Use
-s, --create_data_dump
to dump streamed data.Explain first step of data transfer?
The query file you provided is used to select data form target Oracle server.
Stream is compressed before load to S3.
Explain second step of data transfer?
Compressed data is getting uploaded to S3 using multipart upload protocol.
What technology was used to create this tool
I used SQLPlus, Python, Boto to write it.
Boto is used to upload file to S3.
SQLPlus is used to spool data to compressor pipe.
Where are the sources?
Please, contact me for sources.
Can you modify functionality and add features?
Yes, please, ask me for new features.
Do you have any AWS Certifications?
Yes, AWS Certified Developer (Associate)
Can you create similar/custom data tool for our business?
Yes, you can PM me here or email at
alex_buz@yahoo.com
.I'll get back to you within hours.
Links
Last edit: Alex Buzunov 2016-04-07