Spice v1.5.0-rc.2 (July 14, 2025)
This is the second release candidate for v1.5.0, which introduces SQL-integrated vector and full-text search, partitioning for DuckDB acceleration, and automated refreshes for search indexes and views. It adds a new AWS Bedrock Embeddings Model Provider, a new Oracle Database connector, and promotes the Spice.ai Cloud Data Connector to stable, alongside multi-column vector search for expanded search. This release also upgrades DuckDB from v1.1.3 to v1.3.2, accelerating Spice.ai datasets with improved indexes, query performance, and internal storage optimizations.
What's New in v1.5.0-rc.2
SQL-integrated Search: Vector and full-text search capabilities are now natively available in SQL queries, extending the power of the POST v1/search
endpoint to all SQL workflows.
Example Vector-Similarity-Search (VSS) using the new vector_search
UDTF on the table reviews
for the search term "Cricket bats".
:::sql
SELECT review_id, review_text, review_date, score
FROM vector_search(reviews, "Cricket bats")
WHERE country_code="AUS"
LIMIT 3
Example Full-Text-Search (FTS) using the new text_search
UDTF on the table reviews
for the search term "Cricket bats".
:::sql
SELECT review_id, review_text, review_date, score
FROM text_search(reviews, "Cricket bats")
LIMIT 3
DuckDB v1.3.2 Upgrade: Upgraded DuckDB engine from v1.1.3 to v1.3.2. Key improvements include support for adding primary keys to existing tables, resolution of over-eager unique constraint checking for smoother inserts, and 13% reduced runtime on TPC-H SF100 queries through extensive optimizer refinements. The v1.2.x release of DuckDB was skipped due to a regression in indexes.
- Read the DuckDB v1.2.0 announcement.
- Read the DuckDB v1.3.0 announcement.
Partitioned Acceleration: DuckDB file-based accelerations now support partition_by
expressions, enabling queries to scale to large datasets through automatic data partitioning and query predicate pruning. New UDFs, bucket
and truncate
, simplify partition logic.
New UDFs useful for partition_by
expressions:
bucket(num_buckets, col)
: Partitions a column into a specified number of buckets based on a hash of the column value.truncate(width, col)
: Truncates a column to a specified width, aligning values to the nearest lower multiple (e.g.,truncate(10, 101) = 100
).
Example Spicepod.yml configuration:
:::yaml
datasets:
- from: s3://my_bucket/some_large_table/
name: my_table
params:
file_format: parquet
acceleration:
enabled: true
engine: duckdb
mode: file
partition_by: bucket(100, account_id) # Partition account_id into 100 buckets
Full-Text-Search (FTS) Index Refresh: Accelerated datasets with search indexes maintain up-to-date results with configurable refresh intervals.
Example refreshing search indexes on body
every 10 seconds (based on acceleration.refresh_check_interval
).
:::yaml
datasets:
- from: github:github.com/spiceai/docs/pulls
name: spiceai.doc.pulls
params:
github_token: ${secrets:GITHUB_TOKEN}
acceleration:
enabled: true
refresh_mode: full
refresh_check_interval: 10s
columns:
- name: body
full_text_search:
enabled: true
row_id:
- id
Scheduled View Refresh: Accelerated Views now support cron-based refresh schedules using refresh_cron
, automating updates for accelerated data.
Example Spicepod.yml configuration:
:::yaml
views:
- name: my_view
sql: SELECT 1
acceleration:
enabled: true
refresh_cron: '0 * * * *' # Every hour
For more details, refer to Scheduled Refreshes.
- Multi-column Vector Search: For datasets configured with embeddings on more than one column,
POST v1/search
andsimilarity_search
will perform parallel vector search on each column, and aggregate results using a reciprocal rank fusion scoring method.
Example Spicepod.yml where search results will consider both the Github issue's title and the content of its body.
:::yaml
datasets:
- from: github:github.com/apache/datafusion/issues
name: datafusion.issues
params:
github_token: ${secrets:GITHUB_TOKEN}
columns:
- name: title
embeddings:
- from: hf_minilm
- name: body
embeddings:
- from: openai_embeddings
AWS Bedrock Embeddings Model Provider: Added support for AWS Bedrock embedding models, including Amazon Titan Text Embeddings and Cohere Text Embeddings.
Example Spicepod.yaml:
:::yaml
embeddings:
- from: bedrock:cohere.embed-english-v3
name: cohere-embeddings
params:
aws_region: us-east-1
input_type: search_document
truncate: END
- from: bedrock:amazon.titan-embed-text-v2:0
name: titan-embeddings
params:
aws_region: us-east-1
dimensions: '256'
For more details, refer to the AWS Bedrock Embedding Models Documentation.
Oracle Data Connector: Use from: oracle:
to access and accelerate data stored in Oracle databases, deployed on-premises or in the cloud.
Example Spicepod.yml
:
:::yaml
datasets:
- from: oracle:"SH"."PRODUCTS"
name: products
params:
oracle_host: 127.0.0.1
oracle_username: scott
oracle_password: tiger
See the Oracle Data Connector documentation for details.
Spice.ai Cloud Data Connector: Graduated to Stable.
Contributors
Breaking Changes
-
Search HTTP API Response:
POST v1/search
response payload has changed. See the new API documentation for details. -
Model Provider Parameter Prefixes: Model Provider parameters use provider-specific prefixes instead of
openai_
prefixes (e.g.,hf_temperature
instead ofopenai_temperature
for HuggingFace,anthropic_max_completion_tokens
for Anthropic,perplexity_tool_choice
for Perplexity). Theopenai_
prefix remains supported for backward compatibility but is now deprecated will be removed in a future release.
Cookbook Updates
- Added Oracle Data Connector cookbook: Connect to tables in Oracle databases.
- Added Hashed Partitioning with DuckDB cookbook: Accelerate data on large datasets by partitioning data into a fixed number of buckets.
The Spice Cookbook now includes 72 recipes to help you get started with Spice quickly and easily.
Upgrading
To upgrade to v1.5.0-rc.2, download and install the specific binary from github.com/spiceai/spiceai/releases/tag/v1.5.0-rc.2 or pull the v1.5.0-rc.2 Docker image (spiceai/spiceai:1.5.0-rc.2
).
What's Changed
Dependencies
- delta_kernel: Upgraded to v0.12.1
- DuckDB: Upgraded from v1.1.3 to v1.3.2
- iceberg: Upgraded from v0.4.0 to v0.5.1
Changelog
- fix llm integraion test (#6398) by @Sevenannn in #6398
- Promote spice cloud connector to stable quality (#6221) by @Sevenannn in #6221
- v1.5.0-rc.1 release notes (#6397) by @lukekim in #6397
- Fix model nsql integration tests (#6365) by @Sevenannn in #6365
- Fix incorrect UDTF name and SQL query (#6404) by @lukekim in #6404
- Update v1.5.0-rc.1.md (#6407) by @sgrebnov in #6407
- Improve error messages (#6405) by @lukekim in #6405
- build(deps): bump Jimver/cuda-toolkit from 0.2.25 to 0.2.26 (#6388) by @app/dependabot in #6388
- Upgrade dependabot dependencies (#6411) by @phillipleblanc in #6411
- Fix projection pushdown issues for document based file connector (#6362) by @Advayp in #6362
- Create a new crate for UDFs (#6416) by @kczimm in #6416
- Add a PartitionedDuckDB Accelerator (#6338) by @kczimm in #6338
- Use
vector_search()
UDTF in HTTP APIs (#6417) by @Jeadie in #6417 - add supported types (#6409) by @kczimm in #6409
- Enable session time zone override for MySQL (#6426) by @sgrebnov in #6426
- Acceleration-like indexing for full text search indexes. (#6382) by @Jeadie in #6382
- Provide error message when partition by expression changes (#6415) by @kczimm in #6415
- Add support for Oracle Autonomous Database connections (Oracle Cloud) (#6421) by @sgrebnov in #6421
- prune partitions for exact and in list with and without UDFs (#6423) by @kczimm in #6423
- Fixes and reenable FTS tests (#6431) by @Jeadie in #6431
- Updating
text-embedding-inference
&mistralrs
dependency (#6366) by @Jeadie in #6366 - Upgrade DuckDB to 1.3.2 (#6434) by @phillipleblanc in #6434
- Fix issue in limit clause for the Github Data connector (#6443) by @Advayp in #6443
- Upgrade iceberg-rust to 0.5.1 (#6446) by @phillipleblanc in #6446