Download Latest Version datacontract_cli-1.0.0.tar.gz (447.1 kB)
Email in envelope

Get an email when there's a new version of Data Contract CLI

Home / v1.0.0
Name Modified Size InfoDownloads / Week
Parent folder
datacontract_cli-1.0.0-py3-none-any.whl 2026-06-04 360.7 kB
datacontract_cli-1.0.0-py3-none-any.whl.sigstore.json 2026-06-04 9.5 kB
datacontract_cli-1.0.0.tar.gz 2026-06-04 447.1 kB
datacontract_cli-1.0.0.tar.gz.sigstore.json 2026-06-04 9.5 kB
README.md 2026-06-04 6.5 kB
v1.0.0 source code.tar.gz 2026-06-04 2.9 MB
v1.0.0 source code.zip 2026-06-04 3.2 MB
Totals: 7 Items   6.9 MB 1

[1.0.0] - 2026-06-04

Breaking Changes

  • Replaced the Soda Core quality/test engine with ibis. datacontract test now compiles schema and quality checks into ibis expressions (dialect-correct SQL per backend via sqlglot, local/remote files via DuckDB) instead of generating SodaCL. Install extras now pull ibis-framework[<backend>] instead of soda-core-*. Check semantics and pass/fail results are preserved for the supported sources (postgres, redshift, mysql, snowflake, bigquery, databricks, sqlserver, oracle, trino, athena, impala, kafka/dataframe via the ibis Spark backend, and local/S3/GCS/Azure files).
  • Raw SodaCL custom quality checks (quality.type: custom with engine: soda) are no longer executed and now report a warning. Migrate them to quality.type: sql or a library metric (e.g. metric: rowCount).

Added

  • Python 3.13 and 3.14 support (requires-python now allows 3.10–3.14). On 3.13/3.14 the Spark-backed extras resolve to PySpark 4.0 (Spark 3.5 has no 3.13+ build); the Kafka/Avro connector jars already adapt to the runtime Spark/Scala version. create_spark_session now pins PYSPARK_PYTHON/PYSPARK_DRIVER_PYTHON to the running interpreter so Spark's Python workers match the driver.
  • datacontract test against Databricks now supports more authentication methods beyond the personal access token (DATACONTRACT_DATABRICKS_TOKEN): an OAuth service principal for machine-to-machine auth (DATACONTRACT_DATABRICKS_CLIENT_ID + DATACONTRACT_DATABRICKS_CLIENT_SECRET), a local config profile via the Databricks SDK unified auth (DATACONTRACT_DATABRICKS_PROFILE, also covers Azure CLI/MSI), and an explicit connector auth type (DATACONTRACT_DATABRICKS_AUTH_TYPE, e.g. databricks-oauth for the interactive browser flow).
  • datacontract test now records structured diagnostics on each check explaining why it passed or failed: the metric, measured value, threshold, and (for "bad row" metrics) the total row count and failed fraction. invalid_count checks also report the validity rule they enforced (e.g. {"max_length": 20}, {"pattern": "^.+@.+$"}). The diagnostics surface in the JSON output and the JUnit failure text. This replaces the Soda-specific diagnostics payload that the ibis migration had left unpopulated.
  • datacontract test now honors ODCS quality.unit: percent on the count-of-bad-rows library metrics (nullValues, missingValues, invalidValues). The threshold is then compared against the failed fraction (0–100) of the model row count instead of an absolute count, so e.g. metric: nullValues, unit: percent, mustBeLessThan: 5 passes when fewer than 5% of rows are null. Percent on metrics where a row fraction has no meaning (rowCount, duplicateValues) logs a warning and falls back to the absolute count. The measured percent is added to the check diagnostics.
  • datacontract test now honors ODCS quality.severity: a non-blocking severity (info, warning, low, minor, trivial) downgrades a failing quality check to a warning instead of a failed, so it no longer fails the run. Any other severity (or none) still fails. The severity is recorded in the check diagnostics.
  • datacontract test --include-failed-samples collects a small sample of the rows that failed each missing/invalid/duplicate check (off by default). Each sample is restricted to the contract's identifier columns (unique / primaryKey fields) plus the offending column; duplicate checks report the duplicated key values and their counts. Columns whose ODCS classification marks them sensitive (pii, personal, confidential, restricted, sensitive, secret) are omitted. Samples are capped at 5 rows per check and surface on Check.failed_samples in the JSON output and in the JUnit failure text. This is local-only and needs no Soda Cloud (soda-core itself collects failed-row samples only via Soda Cloud).

Changed

  • MySQL is tested through DuckDB's mysql extension instead of ibis's native MySQL backend, so the mysql extra stays pure-pip (no mysqlclient C build / system MySQL client libraries required).
  • Bumped DuckDB to the 1.5.x line (from 1.0.x) with the bundled duckdb-extension-* wheels (httpfs/aws/azure) pinned in lockstep. The 1.5.x extension wheels publish arm64 Linux builds, so air-gapped installs on arm64 Linux are now supported (the previous platform skip markers were removed).
  • The Kafka/Avro Spark connector jars are now derived from the installed PySpark at runtime (Spark version + Scala binary version), so both PySpark 3.5.x (Scala 2.12) and 4.x (Scala 2.13) work. The kafka and databricks extras allow pyspark<5.0.
  • import protobuf now uses the pure-Python proto-schema-parser instead of the protoc system binary. The protobuf extra no longer requires protoc (or the protobuf runtime), so .proto import works out of the box, including transitive imports across subdirectories.
  • Container image is now based on Docker Hardened Images: signed, ships SBOM/VEX, and has tighter CVE patch SLAs than upstream Debian. Runs as nonroot (uid 65532) instead of root. pip / uv installs at build time are routed through Socket Firewall Free, which blocks malicious dependencies. (#1275, [#1277])
  • Container image now ships Eclipse Temurin JRE 17, so PySpark-backed engines (Kafka, Spark) actually run inside the image — previously they failed at SparkSession startup because the base image had no JVM. End users pulling datacontract/cli are unaffected by the build-side changes. (#1277)

Fixed

  • DuckDB S3 secret creation no longer fails (Secret Validation Failure) on DuckDB ≥1.5: explicit KEY_ID/SECRET now use the default config provider instead of CREDENTIAL_CHAIN.
  • import csv no longer fails with a DuckDB binder error on DuckDB ≥1.5 (the uniqueness probe now uses count(DISTINCT ...) via SQL).

Removed

  • The soda-core runtime dependency and all soda-core-* install extras, plus the setuptools runtime shim they required. The sodacl export format (datacontract export sodacl) is unchanged and is now generated independently of any Soda runtime.
  • The unused details field on test-result checks (Run.checks[].details). It was a Soda-era placeholder that was never populated; the new structured diagnostics field replaces it. The JUnit failure text no longer prints a Details: line.
Source: README.md, updated 2026-06-04