Spice v1.5.0-rc.1 (July 7, 2025)
This is the first release candidate for v1.5.0, which introduces partitioning for DuckDB acceleration, SQL-integrated vector and full-text search, and automated refreshes for search indexes and views. It adds a new AWS Bedrock Embeddings Model Provider, a new Oracle Database connector, and promotes the Spice.ai Cloud Data Connector to stable, alongside multi-column vector search for expanded search.
What's New in v1.5.0-rc.1
Partitioned Acceleration: DuckDB file-based accelerations now support partition_by
expressions, enabling queries to scale to large datasets through automatic data partitioning and query predicate pruning. New UDFs, bucket
and truncate
, simplify partition logic.
New UDFs useful for partition_by
expressions:
bucket(num_buckets, col)
: Partitions a column into a specified number of buckets based on a hash of the column value.truncate(width, col)
: Truncates a column to a specified width, aligning values to the nearest lower multiple (e.g.,truncate(10, 101) = 100
).
Example Spicepod.yml configuration:
:::yaml
datasets:
- from: s3://my_bucket/some_large_table/
name: my_table
params:
file_format: parquet
acceleration:
enabled: true
engine: duckdb
mode: file
partition_by: bucket(100, account_id) # Partition account_id into 100 buckets
SQL-integrated Search: Vector and full-text search capabilities are now natively available in SQL queries, extending the power of the POST v1/search
endpoint to all SQL workflows.
Example Vector-Similarity-Search (VSS) using the new similarity_search
UDTF on the table reviews
for the search term "Cricket bats".
:::sql
SELECT review_id, review_text, review_date, score
FROM similarity_search(reviews, "Cricket bats")
WHERE country_code="AUS"
LIMIT 3
Example Full-Text-Search (FTS) using the new text_search
UDTF on the table reviews
for the search term "Cricket bats".
:::sql
SELECT review_id, review_text, review_date, score
FROM reviews
FROM text_search(reviews, "Cricket bats")
LIMIT 3
Full-Text-Search (FTS) Index Refresh: Accelerated datasets with search indexes maintain up-to-date results with configurable refresh intervals.
Example refreshing search indexes on body
every 10 seconds (based on acceleration.refresh_check_interval
).
:::yaml
datasets:
- from: github:github.com/spiceai/docs/pulls
name: spiceai.doc.pulls
params:
github_token: ${secrets:GITHUB_TOKEN}
acceleration:
enabled: true
refresh_mode: full
refresh_check_interval: 10s
columns:
- name: body
full_text_search:
enabled: true
row_id:
- id
Scheduled View Refresh: Accelerated Views now support cron-based refresh schedules using refresh_cron
, automating updates for accelerated data.
Example Spicepod.yml configuration:
:::yaml
views:
- name: my_view
sql: SELECT 1
acceleration:
enabled: true
refresh_cron: '0 * * * *' # Every hour
For more details, refer to Scheduled Refreshes.
- Multi-column Vector Search: For datasets configured with embeddings on more than one column,
POST v1/search
andsimilarity_search
will perform parallel vector search on each column, and aggregate results using a reciprocal rank fusion scoring method.
Example Spicepod.yml where search results will consider both the Github issue's title and the content of its body.
:::yaml
datasets:
- from: github:github.com/apache/datafusion/issues
name: datafusion.issues
params:
github_token: ${secrets:GITHUB_TOKEN}
columns:
- name: title
embeddings:
- from: hf_minilm
- name: body
embeddings:
- from: openai_embeddings
AWS Bedrock Embeddings Model Provider: Added support for AWS Bedrock embedding models, including Amazon Titan Text Embeddings and Cohere Text Embeddings.
Example Spicepod.yaml:
:::yaml
embeddings:
- from: bedrock:cohere.embed-english-v3
name: cohere-embeddings
params:
aws_region: us-east-1
input_type: search_document
truncate: END
- from: bedrock:amazon.titan-embed-text-v2:0
name: titan-embeddings
params:
aws_region: us-east-1
dimensions: '256'
For more details, refer to the AWS Bedrock Embedding Models Documentation.
Oracle Data Connector: Use from: oracle:
to access and accelerate data stored in Oracle databases, deployed on-premises or in the cloud.
Example Spicepod.yml
:
:::yaml
datasets:
- from: oracle:"SH"."PRODUCTS"
name: products
params:
oracle_host: 127.0.0.1
oracle_username: scott
oracle_password: tiger
See the Oracle Data Connector documentation for details.
Spice.ai Cloud Data Connector: Graduated to Stable.
Contributors
Breaking Changes
-
Search HTTP API Response:
POST v1/search
response payload has changed. See the new API documentation for details. -
Model Provider Parameter Prefixes: Model Provider parameters use provider-specific prefixes instead of
openai_
prefixes (e.g.,hf_temperature
instead ofopenai_temperature
for HuggingFace,anthropic_max_completion_tokens
for Anthropic,perplexity_tool_choice
for Perplexity). Theopenai_
prefix remains supported for backward compatibility but is now deprecated will be removed in a future release.
Cookbook Updates
- Added Oracle Data Connector cookbook: Connect to tables in Oracle databases.
The Spice Cookbook now includes 71 recipes to help you get started with Spice quickly and easily.
Upgrading
To upgrade to v1.5.0-rc.1, download and install the specific binary from github.com/spiceai/spiceai/releases/tag/v1.5.0-rc.1 or pull the v1.5.0-rc.1 Docker image (spiceai/spiceai:1.5.0-rc.1
).
What's Changed
Dependencies
- delta_kernel: Upgraded to v0.12.1
Changelog
- Jeadie/25 06 10/finance (#6182) by @Jeadie in #6182
- chore: Update dependencies (#6196) by @peasee in #6196
- Fix FlightSQL GetDbSchemas and GetTables schemas to fully match the protocol (#6197) by @sgrebnov in #6197
- Use spice-rs in test operator and retry on connection reset error (#6136) by @Sevenannn in #6136
- Move model-grading evals to testoperator (#6195) by @Jeadie in #6195
- Don't use base table for full text search post apply vector search (#6215) by @Jeadie in #6215
- Fix
content-type
header inv1/sql
response (#6217) by @Jeadie in #6217 - Add v1.4.0-rc.1 release into qa_analytics.csv (#6209) by @sgrebnov in #6209
- fix: Reschedule AI benchmarks, set max parallel to 1 (#6224) by @peasee in #6224
- task: Add MySQL indexes (#6227) by @peasee in #6227
- fix pagination (#6222) by @Jeadie in #6222
- Add build links to release notes (#6220) by @kczimm in #6220
- feat: Enable additional testoperator tests (#6218) by @peasee in #6218
- chore: Update testoperator release target to 1.4 (#6235) by @peasee in #6235
- fix: Update benchmark snapshots (#6234) by @app/github-actions in #6234
- fix: Lower SF100 memory limit (#6236) by @peasee in #6236
- Add glue integration test using hive and iceberg tables (#6248) by @kczimm in #6248
- allow database for empty patterns (#6258) by @kczimm in #6258
- add Glue catalog to README.md (#6179) by @kczimm in #6179
- Add bucket UDF for partitioning (#6200) by @kczimm in #6200
- New tool
parsley
(#6232) by @Jeadie in #6232 - Upgrade dependabot dependencies (#6261) by @phillipleblanc in #6261
- Upgrade delta_kernel to 0.12.1 (#6263) by @phillipleblanc in #6263
- fix: Throughput test dispatching (#6265) by @peasee in #6265
- fix: badges on README.md show correct status (#6268) by @phillipleblanc in #6268
- Extend Flight CommandGetTables with source native data type info (#6259) by @sgrebnov in #6259
- fix: Docker image build with profile (#6270) by @peasee in #6270
- docs: Post-release update (#6275) by @peasee in #6275
- Improve error message for incorrect/missing Glue table or database (#6257) by @kczimm in #6257
- Update spicepod.schema.json (#6274) by @app/github-actions in #6274
- Update openapi.json (#6279) by @app/github-actions in #6279
- Add Remote Spicepod support (#6233) by @phillipleblanc in #6233
- Update QA analytics for v1.4.0 (#6277) by @ewgenius in #6277
- Add truncate UDF (#6278) by @kczimm in #6278
- Update qa_analytics.csv for 1.4.0 (#6284) by @sgrebnov in #6284
- Default grok to 'grok-3' (#6285) by @Jeadie in #6285
- For Spice.ai connectors, do not default to dev SCP for dev builds (#6254) by @Jeadie in #6254
- fix: Deny extra caching parameters (#6288) by @peasee in #6288
- Make DynamoDB connectivity errors more specific and actionable (#6294) by @sgrebnov in #6294
- Create a table provider from full text search index + query (#6286) by @Jeadie in #6286
- Update Flight CommandGetTables to Return Native DataFusion SQL Data Types (#6297) by @sgrebnov in #6297
- Adds a synchronous
get_table
function on the DataFusion context (#6300) by @phillipleblanc in #6300 - Better Glue connector error messages (#6289) by @kczimm in #6289
- fix: consume response stream before reading
authorization
metadata (#6292) by @Sevenannn in #6292 - feat: Use retryable stream in test operator (#6231) by @Sevenannn in #6231
- Support reserved word column names in DynamoDB (#6308) by @sgrebnov in #6308
- fix: Implement Default manually for SQLResultsCacheConfig (#6310) by @peasee in #6310
- Add integration test for DynamoDB Data Connector (#6311) by @sgrebnov in #6311
- fix: Warn about no configured datasets if no datasets and catalogs are present (#6296) by @Advayp in #6296
- Add better error messages for cases when a port is already in use (#6313) by @Advayp in #6313
- Disallow datasets with protected names (#6309) by @Advayp in #6309
- Roadmap updates June 2025 (#6319) by @lukekim in #6319
- Add partitioning models (#6298) by @kczimm in #6298
- Encode ScalarValues for use in filenames (#6318) by @kczimm in #6318
- Standardize model parameter handling & prioritize
<model-prefix>_<param>
for model default overrides (#6199) by @Sevenannn in #6199 - Add initial support for Oracle Data Connector (#6321) by @sgrebnov in #6321
- Oracle connector: Support all major Oracle data types (#6323) by @sgrebnov in #6323
- Oracle connector: support filter predicate pushdown (#6326) by @sgrebnov in #6326
text_search
UDTF and required AnalyzerRule. (#6280) by @Jeadie in #6280- Build indexes as part of accelerations (#6324) by @phillipleblanc in #6324
- feat: Add support for cron-based view refresh (#6341) by @peasee in #6341
- Surface table not found errors immediately (#6317) by @Advayp in #6317
- runtime-datafusion-index: Stop infinite recursion for IndexTableScanOptimizerRule (#6353) by @phillipleblanc in #6353
- Add optional behaviors to DataAccelerator tables + add WantsUnderlyingTableBehavior to VoidTable (#6354) by @phillipleblanc in #6354
- AWS Bedrock models. (#6358) by @Jeadie in #6358
- Ensure views load even if they're the only components defined (#6359) by @Advayp in #6359
- Improve type conversion and add integration tests for the Oracle connector (#6327) by @sgrebnov in #6327
- Upgrade dependabot dependencies (#6375) by @phillipleblanc in #6375
- Don't run tests that require a Databricks cluster on every PR (#6379) by @phillipleblanc in #6379
- Properly handle duplicate flags to
spice run
(#6364) by @Advayp in #6364 - Fix the case sensitivity of the key in env secrets store (#6371) by @ewgenius in #6371
vector_search
UDTF and related changes (#6381) by @Jeadie in #6381- Update end_game.md (#6380) by @sgrebnov in #6380
- fix: openai model endpoint (#6394) by @Sevenannn in #6394
- Enable Oracle connector in default build configuration by @sgrebnov in #6395
- Enable configuring otel endpoint from
spice run
by @Advayp in #6360