aws-sdk-pandas (formerly AWS Data Wrangler) bridges pandas with the AWS analytics stack so DataFrames flow seamlessly to and from cloud services. With a few lines of code, you can read from and write to Amazon S3 in Parquet/CSV/JSON/ORC, register tables in the AWS Glue Data Catalog, and query with Amazon Athena directly into pandas. The library abstracts efficient patterns like partitioning, compression, and vectorized I/O so you get performant data lake operations without hand-rolling boilerplate. It also supports Redshift, OpenSearch, and other services, enabling ETL tasks that blend SQL engines and Python transformations. Operational helpers handle IAM, sessions, and concurrency while exposing knobs for encryption, versioning, and catalog consistency. The result is a productive workflow that keeps your analytics in Python while leveraging AWS-native storage and query engines at scale.
Features
- High-level read/write of DataFrames to S3 with Parquet, CSV, JSON, and ORC
- Tight integration with AWS Glue Catalog and Athena for schema and SQL queries
- Convenience methods for Redshift COPY/UNLOAD and data migration patterns
- Automatic handling of partitions, compression, and columnar formats
- Session and IAM helpers with options for encryption and versioning
- Scalable I/O paths optimized for large data lake workloads