Loads CSV file to Amazon-Redshift table from Windows command line.
Features:
- Loads local (to your Windows desktop) CSV file to Amazon Redshift.
- No need to preload your data to S3 prior to insert to Redshift.
- No need for Amazon AWS CLI.
- Works from your OS Windows desktop (command line).
- It's executable (csv_loader_for_redshift.exe) - no need for Python install.
- It's 32 bit - it will work on any vanilla Windows.
- AWS Access Keys are not passed as arguments.
- Written using Python/boto/PyInstaller.
OS | Platform | Version | |
---|---|---|---|
Windows | 32bit | [0.1.0 beta] |
Database/ETL developers, Data Integrators, Data Engineers, Business Analysts, AWS Developers, DevOps,
Pre-Prod (UAT/QA/DEV)
## Load CSV file to Amazon Redshift table. ## ## Load % progress outputs to the screen. ## Usage: set AWS_ACCESS_KEY_ID=<you access key> set AWS_SECRET_ACCESS_KEY=<you secret key> set REDSHIFT_CONNECT_STRING="dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'" csv_loader_for_redshift.py <file_to_transfer> <bucket_name> [<use_rr>] [<public>] [<delim>] [<quote>] [<to_table>] [<gzip_source_file>] --use_rr -- Use reduced redundancy storage (False). --public -- Make uploaded files public (False). --delim -- CSV file delimiter (','). --quote -- CSV quote ('"'). --to_table -- Target Amazon-Redshit table name. --gzip_source_file -- gzip input CVS file before upload to Amazon-S3 (False). Input filename will be used for S3 key name. Boto S3 docs: http://boto.cloudhackers.com/en/latest/ref/s3.html psycopg2 docs: http://initd.org/psycopg/docs/ """
set AWS_ACCESS_KEY_ID=<you access key> set AWS_SECRET_ACCESS_KEY=<you secret key> set REDSHIFT_CONNECT_STRING="dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'"
test2
set AWS_ACCESS_KEY_ID=<you access key> set AWS_SECRET_ACCESS_KEY=<you secret key> set REDSHIFT_CONNECT_STRING="dbname='***' port='5439' user='***' password='***' host='mycluster.***.redshift.amazonaws.com'" cd c:\tmp\CSV_Loader csv_loader_for_redshift.exe c:\tmp\data.csv test123 -r -d "," -t test2 -z
S3 | data.csv.gz | 100% Redshift | test2 | DONE Time elapsed: 5.7 seconds
CREATE TABLE test2 (id integer , num integer, data varchar,num2 integer, data2 varchar,num3 integer, data3 varchar,num4 integer, data4 varchar);
git clone https://github.com/alexbuz/CSV_Loader_For_Redshift
csv_loader_for_redshift 0.1.0
Yes, it is the main purpose of this tool.
Yes. Assuming they are doing it on OS Windows.
CSV Loader for Redshift
?As fast as any AWS API provided by Amazon.
Compress input file or provide -z
or --gzip_source_file
arg in command line and this tool will compress it for you before upload to S3.
You can use 'aws s3api' and psql COPY command to do pretty much the same.
No, Redshift will not recognize *.zip file format.
You have to gzip
it. You can use 7-Zip to do that.
No
No
Yes. Use -z
or --gzip_source_file
argument so the tool does compression for you.
The CSV you provided is getting preloaded to Amazon-S3.
It doesn't have to be made public for load to Redshift.
It can be compressed or uncompressed.
Your input file is getting compressed (optional) and uploaded to S3 using credentials you set in shell.
You Redshift cluster has to be open to the world (accessible via port 5439 from internet).
It uses PostgreSQL COPY command to load file located on S3 into Redshift table.
Yes, but you have to use 'gzip' compression type.
I used Python, Boto, and psycopg2 to write it.
Boto is used to upload file to S3.
psycopg2 is used to establish ODBC connection with Redshift clusted and execute COPY
command.
Please, contact me for sources.
Yes, please, ask me for new features.
ping
Amazon-S3 bucket to see if it's publicly readable.Yes, AWS Certified Developer (Associate)
Yes, you can PM me here or email at alex_buz@yahoo.com
.
I'll get back to you within hours.
Wiki: Oracle_To_Redshift_Data_Loader
Wiki: Oracle_To_S3_Data_Uploader
Wiki: S3_Sanity_Check