Let's you stream your Oracle table/query data to Amazon-S3 from Windows CLI (command line).
Features:
- Streams Oracle table data to Amazon-S3.
- No need to create CSV extracts before upload to S3.
- Data stream is compressed while upload to S3.
- No need for Amazon AWS CLI.
- Works from your OS Windows desktop (command line).
- It's executable (Oracle_To_S3_Uploader.exe) - no need for Python install.
- It's 64 bit - it will work on any vanilla DOS for 64-bit Windows.
- AWS Access Keys are not passed as arguments.
- Written using Python/boto/PyInstaller.
OS | Platform | Version | |
---|---|---|---|
Windows | 64bit | [1.2 beta] |
Database/ETL developers, Data Integrators, Data Engineers, Business Analysts, AWS Developers, DevOps,
Pre-Prod (UAT/QA/DEV)
c:\Python35-32\PROJECTS\Ora2S3>dist\oracle_to_s3_uploader.exe ############################################################################# #Oracle to S3 Data Uploader (v1.2, beta, 04/05/2016 15:11:53) [64bit] #Copyright (c): 2016 Alex Buzunov, All rights reserved. #Agreement: Use this tool at your own risk. Author is not liable for any damages # or losses related to the use of this software. ################################################################################ Usage: set AWS_ACCESS_KEY_ID=<you access key> set AWS_SECRET_ACCESS_KEY=<you secret key> set ORACLE_LOGIN=tiger/scott@orcl set ORACLE_CLIENT_HOME=C:\app\oracle12\product\12.1.0\dbhome_1 oracle_to_s3_uploader.exe [<ora_query_file>] [<ora_col_delim>] [<ora_add_header>] [<s3_bucket_name>] [<s3_key_name>] [<s3_use_rr>] [<s3_public>] --ora_query_file -- SQL query to execure in source Oracle db. --ora_col_delim -- CSV column delimiter (|). --ora_add_header -- Add header line to CSV file (False). --ora_lame_duck -- Limit rows for trial upload (1000). --create_data_dump -- Use it if you want to persist streamed data on your filesystem. --s3_bucket_name -- S3 bucket name (always set it). --s3_location -- New bucket location name (us-west-2) Set it if you are creating new bucket --s3_key_name -- CSV file name (to store query results on S3). if <s3_key_name> is not specified, the oracle query filename (ora_query_file) will be used. --s3_use_rr -- Use reduced redundancy storage (False). --s3_write_chunk_size -- Chunk size for multipart upoad to S3 (10<<21, ~20MB). --s3_public -- Make uploaded file public (False). Oracle data uploaded to S3 is always compressed (gzip).
set AWS_ACCESS_KEY_ID=<you access key> set AWS_SECRET_ACCESS_KEY=<you secret key> set ORACLE_LOGIN=tiger/scott@orcl set ORACLE_CLIENT_HOME=C:\\app\\oracle12\\product\\12.1.0\\dbhome_1
In this example complete table test2
get's uploaded to Aamzon-S3 as compressed CSV file.
Contents of the file table_query.sql:
SELECT * FROM test2;
Also temporary dump file is created for analysis (by default there are no files created)
Use -s, --create_data_dump
to dump streamed data.
If target bucket does not exists it will be created in user controlled region.
Use argument -t, --s3_location
to set target region name
Contents of the file test.bat:
dist\oracle_to_s3_uploader.exe ^ -q table_query.sql ^ -d "|" ^ -e ^ -b test_bucket ^ -k oracle_table_export ^ -r ^ -p ^ -s
Executing test.bat
:
c:\Python35-32\PROJECTS\Ora2S3>dist\oracle_to_s3_uploader.exe -q table_query.sql -d "|" -e -b test_bucket -k oracle_table_export -r -p -s Uploading results of "table_query.sql" to existing bucket "test_bucket" Dumping data to: c:\Python35-32\PROJECTS\Ora2S3\data_dump\table_query\test_bucket\oracle_table_export.20160405_235310.gz 1 chunk 10.0 GB [8.95 sec] 2 chunk 5.94 GB [5.37 sec] Uncompressed data size: 15.94 GB Compressed data size: 63.39 MB Upload complete (17.58 sec). Your PUBLIC upload is at: https://s3-us-west-2.amazonaws.com/test_bucket/oracle_table_export.gz
git clone https://github.com/alexbuz/Oracle_To_S3_Data_Uploader
oracle_to_s3_uploader 1.2
Yes, it is the main purpose of this tool.
Oracle_To_S3_Data_Uploader
into their ETL pipelines?Yes. Assuming they are doing it on OS Windows.
CSV Loader for Redshift
?As fast as any implementation of multi-part load using Python and boto.
Input data stream is getting compressed before upload to S3. So not much could be done here.
You may want to run it closer to source or target for better performance.
You can write a sqoop script that can be scheduled as an 'EMR Activity' under Data Pipeline.
No
Yes, Use -s, --create_data_dump
to dump streamed data.
The query file you provided is used to select data form target Oracle server.
Stream is compressed before load to S3.
Compressed data is getting uploaded to S3 using multipart upload protocol.
I used SQLPlus, Python, Boto to write it.
Boto is used to upload file to S3.
SQLPlus is used to spool data to compressor pipe.
Please, contact me for sources.
Yes, please, ask me for new features.
ping
Amazon-S3 bucket to see if it's publicly readable.Yes, AWS Certified Developer (Associate)
Yes, you can PM me here or email at alex_buz@yahoo.com
.
I'll get back to you within hours.
Wiki: CSV_Loader_For_Redshift
Wiki: Oracle_To_Redshift_Data_Loader
Wiki: S3_Sanity_Check