duckle / Tickets / #10 Feature: Support Per-Column Date/Timestamp Format Transformations via String-Cast Ingestion

#10 Feature: Support Per-Column Date/Timestamp Format Transformations via String-Cast Ingestion

Status: open

Owner: nobody

Labels: enhancement (21)

Updated: 2026-06-06

Created: 2026-05-30

Creator: Anonymous

Private: No

Originally created by: DarekDan
Originally owned by: SouravRoy-ETL

This feature request stems from observations in [#3].

Currently, DuckDB imposes a strict limitation where it only supports one global date and timestamp format configuration per file during data ingestion. If a dataset contains multiple date or timestamp columns that utilize completely different string formats (for example, one column using dd/mm/yy and another using yyyy-mm-dd), DuckDB cannot natively auto-detect or parse both correctly at the same time.

To bypass this natively in DuckDB, users have to read columns as strings and parse them manually post-ingestion.

Proposed Solution
Introduce a new, explicit transformation layer within duckle that implements a two-step parsing strategy for date/timestamp columns:

Staged Ingestion: Read the specified date or timestamp columns from the source file explicitly as standard text (VARCHAR/STRING), preventing DuckDB's automatic global parsing from failing or misinterpreting the values.

Post-Read Conversion: After or during the initial ingestion phase, apply an explicit type-casting transformation using DuckDB's strptime() function (or an equivalent mapping method), passing a user-specified format string uniquely configured for each individual column.

This approach will decouple format specifications from the file-level constraints and permit true per-column format specification.

Proposed UI / UX Adjustments
Schema Mapping View: When a user updates a column type to Date or Timestamp in the UI schema editor, provide an optional text field to supply a custom format string (e.g., %d/%m/%Y).

Under the Hood: If a custom format is supplied, duckle should construct the internal DuckDB query to ingest that specific column as a VARCHAR initially, appending a SQL transformation rule: strptime(column_name, 'user_format_string')::DATE.

Additional Context
This addresses a common real-world limitation of DuckDB auto-detection handling complex schemas, as mentioned in the conversation on [#3].

Anonymous - 2026-06-01

assigned_to: SouravRoy-ETL
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Anonymous - 2026-06-04

Originally posted by: SouravRoy-ETL

Implemented the solution.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Originally posted by: mitslabo

I was able to get it working by specifying a FORMAT in the Schema tab of sync.csv.

However, after the pipeline is executed once, the setting is removed from the plan.

Before execution (with FORMAT specified)

CREATE OR REPLACE VIEW "n_mq1lgav8_nfgv3" AS
SELECT * REPLACE (
    try_strptime("date", '%Y-%m-%d')::DATE AS "date"
)
FROM read_csv_auto(
    '${DUCKLE_PATH}',
    header=true,
    delim=' ',
    quote='"',
    encoding='utf-8',
    dateformat='%Y-%m-%d',
    timestampformat='%Y-%m-%dT%H:%M:%S+%Z',
    types = {'date': 'VARCHAR', 'remote_addr': 'VARCHAR'}
);

After execution

The generated plan changes to:

-- VIEW · TSV (n_mq1lgav8_nfgv3)
CREATE OR REPLACE VIEW "n_mq1lgav8_nfgv3" AS
SELECT * FROM read_csv_auto(
    '${DUCKLE_PATH}',
    header=true,
    delim=' ',
    quote='"',
    encoding='utf-8',
    dateformat='%Y-%m-%d',
    timestampformat='%Y-%m-%dT%H:%M:%S+%Z',
    types = {'date': 'DATE', 'remote_addr': 'VARCHAR'}
);

The try_strptime() conversion is removed, and the column type is changed from VARCHAR to DATE.
As a result, the original FORMAT setting appears to be lost after execution. Is this expected behavior, or is it a bug?

Feature: Support Per-Column Date/Timestamp Format Transformations via...

Local-first ETL/ELT studio: a drag-and-drop visual pipeline designer

Milestone

Searches

Help

#10 Feature: Support Per-Column Date/Timestamp Format Transformations via String-Cast Ingestion

Related

Discussion

Before execution (with FORMAT specified)

After execution