Originally created by: cola3k
Originally owned by: SouravRoy-ETL
Currently, when exporting data to SQL Server, Duckle appears to use row-by-row INSERT operations via the SQL Server connector. For large datasets (e.g., 1M+ rows), this approach is extremely slow and becomes a major bottleneck in data pipeline workflows.
I have verified that DuckDB itself supports a community mssql extension that connects natively via TDS protocol and supports high-performance bulk writes. Switching to this method would solve the performance issue entirely.
Replace the current row-by-row insert logic with DuckDB's community mssql extension, leveraging the COPY TO statement with BCP protocol for batch/bulk writes.
According to the extension's documentation, this approach can achieve ~1.2M rows/sec load performance, which would be a massive improvement over the current incremental insert method.
DuckDB mssql extension docs: https://duckdb.org/community_extensions/extensions/mssql
The extension requires no ODBC/JDBC drivers — it uses pure TDS protocol over TCP, which is lightweight and easy to bundle.
Originally posted by: SouravRoy-ETL
Let me try that
Originally posted by: SouravRoy-ETL
Done in v0.5.0.
snk.sqlserver(andsnk.synapse) now bulk-load through the DuckDBmssqlcommunity extension over TDS - it ATTACHes the target and writes via COPY/INSERT instead of row-by-row, which is far faster on big loads. It's on by default (the Bulk write toggle); flip it off to fall back to the row-by-row driver for fully offline runs. Verified live against SQL Server 2022.Please download - https://github.com/slothflowlabs/duckle/releases/tag/v0.5.0 and let me know!
Originally posted by: cola3k
First of all, thank you for implementing the SQL Server bulk write via the mssql extension in v0.5.0. I've tested it and it works great.
However, I noticed two configuration gaps in the GUI regarding the new bulk write mode:
Batch size is not configurable
The bulk insert currently uses a hardcoded batch size of 1,000 rows per batch. For large datasets, this is too small and impacts performance. It would be helpful to expose this as a user-configurable setting in the Sink node properties.
TLS configuration
The current connection string for the mssql extension defaults to TrustServerCertificate=true. When connecting to an older SQL Server instance that does not support TLS, it fails with:
SQL Server: Invalid Input Error: MSSQL connection validation failed: TLS handshake failed to <ip>:1433 - check TLS configuration</ip>
The GUI already has two options for the legacy driver mode:
However, these settings currently only apply to the legacy row-by-row driver. It would be great if they could also apply to the new DuckDB mssql extension bulk write mode, so users can configure TLS behavior and batch size regardless of which write mode is used.
Thank you for the great work on this release!