This release contains 164 contributions from 29 contributors. We also have 12 new contributors. Thank you all for the contributions!
Some notable changes and improvements in this release are:
- New Parquet mode for Store Gateway
- Configurable OTLP metric suffixes via
-distributor.otlp.add-metric-suffixes - Multiple PRW2 bug fixes for data corruption and panics
- Graduate Ruler API, Alertmanager API/sharding, tenant federation, FIFO/Redis cache, instance limits, and memcached DNS-based service discovery from experimental support
- New Overrides API module to control tenant limits via api
- HATracker memberlist experimental support
- Tenant federation partial response experimental support
- Alertmanager upgraded to v0.31.1 with IncidentIO and Mattermost integrations
- Bucket index enabled by default
What's Changed
- [CHANGE] Ruler: Graduate Ruler API from experimental. [#7312]
- Flag: Renamed
-experimental.ruler.enable-apito-ruler.enable-api. The old flag is kept as deprecated. - Ruler API is no longer marked as experimental.
- [CHANGE] Alertmanager: Graduate Alertmanager API and sharding from experimental. [#7315]
- Flag: Renamed
-experimental.alertmanager.enable-apito-alertmanager.enable-api. The old flag is kept as deprecated. - Alertmanager sharding is no longer marked as experimental.
- [CHANGE] Blocks storage: Bucket index is now enabled by default. Disabling the bucket index (
-blocks-storage.bucket-store.bucket-index.enabled=false) is not recommended for production. [#7259] - [CHANGE] Users Scanner: Rename user index update configuration. [#7180]
- Flag: Renamed
-*.users-scanner.user-index.cleanup-intervalto-*.users-scanner.user-index.update-interval. - Config: Renamed
clean_up_intervaltoupdate_intervalwithin theusers_scannerconfiguration block.. - [CHANGE] Querier: Refactored parquet cache configuration naming. [#7146]
- Metrics: Renamed
cortex_parquet_queryable_cache_*tocortex_parquet_cache_*. - Flags: Renamed
-querier.parquet-queryable-shard-cache-sizeto-querier.parquet-shard-cache-sizeand-querier.parquet-queryable-shard-cache-ttlto-querier.parquet-shard-cache-ttl. - Config: Renamed
parquet_queryable_shard_cache_sizetoparquet_shard_cache_sizeandparquet_queryable_shard_cache_ttltoparquet_shard_cache_ttl. - [FEATURE] Overrides: Add new Overrides API component and rename old overrides module to
overrides-configs. [#6975] - [FEATURE] HATracker: Add experimental support for
memberlistandmultias a KV store backend. [#7284] - [FEATURE] Distributor: Add
-distributor.otlp.add-metric-suffixesflag. If true, suffixes will be added to the metrics for name normalization. [#7286] - [FEATURE] StoreGateway: Introduces a new parquet mode. [#7046]
- [FEATURE] StoreGateway: Add a parquet shard cache to parquet mode. [#7166]
- [FEATURE] Distributor: Add a per-tenant flag
-distributor.enable-type-and-unit-labelsthat enables adding__unit__and__type__labels for remote write v2 and OTLP requests. This is a breaking change; the-distributor.otlp.enable-type-and-unit-labelsflag is now deprecated, operates as a no-op, and has been consolidated into this new flag. [#7077] - [FEATURE] Querier: Add experimental projection pushdown support in Parquet Queryable. [#7152]
- [FEATURE] Ingester: Add experimental active series queried metric. [#7173]
- [FEATURE] Update prometheus Alertmanager version to v0.31.1 and add new integration to IncidentIO and Mattermost. [#7092] [#7267]
- [FEATURE] Tenant Federation: Add experimental support for partial responses using the
-tenant-federation.allow-partial-dataflag. When enabled, failures from individual tenants during a federated query are treated as warnings, allowing results from successful tenants to be returned. [#7232] - [FEATURE] Alertmanager: Add
-alertmanager.disable-replica-set-extensionflag to limit blast radius during config corruption incidents. [#7153] - [ENHANCEMENT] Tenant Federation: Add a local cache to regex resolver. [#7363]
- [ENHANCEMENT] Distributor: Add
cortex_distributor_push_requests_totalmetric to track the number of push requests by type. [#7239] - [ENHANCEMENT] Querier: Add
-querier.store-gateway-series-batch-sizeflag to configure the maximum number of series to be batched in a single gRPC response message from Store Gateways. [#7203] - [ENHANCEMENT] HATracker: Add
-distributor.ha-tracker.enable-startup-syncflag. If enabled, the ha-tracker fetches all tracked keys on startup to populate the local cache. [#7213] - [ENHANCEMENT] Distributor: Add validation to ensure remote write v2 requests contain at least one sample or histogram. [#7201]
- [ENHANCEMENT] Ingester: Add support for ingesting Native Histogram with Custom Buckets. [#7191]
- [ENHANCEMENT] Ingester: Optimize labels out-of-order (ooo) check by allowing the iteration to terminate immediately upon finding the first unsorted label. [#7186]
- [ENHANCEMENT] Distributor: Skip attaching
__unit__and__type__labels when-distributor.enable-type-and-unit-labelsis enabled, as these are appended from metadata. [#7145] - [ENHANCEMENT] Distributor: Add
cortex_distributor_ingester_push_timeouts_totalmetric to track the number of push requests to ingesters that were canceled due to timeout. [#7155] [#7229] - [ENHANCEMENT] StoreGateway: Add tracings to parquet mode. [#7125]
- [ENHANCEMENT] Querier: Add a
-querier.parquet-queryable-shard-cache-ttlflag to add TTL to parquet shard cache. [#7098] - [ENHANCEMENT] Ingester: Add
enable_matcher_optimizationconfig to apply low selectivity matchers lazily. [#7063] - [ENHANCEMENT] Distributor: Add a label references validation for remote write v2 request. [#7074]
- [ENHANCEMENT] Distributor: Add count, spans, and buckets validations for native histogram. [#7072]
- [ENHANCEMENT] Alertmanager/Ruler: Introduce a user scanner to reduce the number of list calls to object storage. [#6999]
- [ENHANCEMENT] Ruler: Add DecodingConcurrency config flag for Thanos Engine. [#7118]
- [ENHANCEMENT] Query Frontend: Add query priority based on operation. [#7128]
- [ENHANCEMENT] Compactor: Avoid double compaction by cleaning partition files in 2 cycles. [#7130] [#7209] [#7257]
- [ENHANCEMENT] Distributor: Optimize memory usage by recycling v2 requests. [#7131]
- [ENHANCEMENT] Compactor: Avoid double compaction by not filtering delete blocks on real time when using bucketIndex lister. [#7156]
- [ENHANCEMENT] Upgrade to go 1.25.8 [#7164] [#7340]
- [ENHANCEMENT] Upgraded container base images to
alpine:3.23. [#7163] - [ENHANCEMENT] Ingester: Instrument Ingester CPU profile with userID for read APIs. [#7184]
- [ENHANCEMENT] Ingester: Add fetch timeout for Ingester expanded postings cache. [#7185]
- [ENHANCEMENT] Ingester: Add feature flag to collect metrics of how expensive an unoptimized regex matcher is and new limits to protect Ingester query path against expensive unoptimized regex matchers. [#7194] [#7210]
- [ENHANCEMENT] Querier: Add active API requests tracker logging to help with OOMKill troubleshooting. [#7216]
- [ENHANCEMENT] Compactor: Add partition group creation time to visit marker. [#7217]
- [ENHANCEMENT] Compactor: Add concurrency for partition cleanup and mark block for deletion [#7246]
- [ENHANCEMENT] Distributor: Validate metric name before removing empty labels. [#7253]
- [ENHANCEMENT] Ruler/Ingester: Propagate append hints to discard out of order samples on Ingester [#7226]
- [ENHANCEMENT] Make cortex_ingester_tsdb_sample_ooo_delta metric per-tenant [#7278]
- [ENHANCEMENT] Distributor: Add dimension
nhcbto keep track of nhcb samples incortex_distributor_received_samples_totalandcortex_distributor_samples_in_totalmetrics. - [ENHANCEMENT] Distributor: Add
-distributor.accept-unknown-remote-write-content-typeflag. When enabled, requests with unknown or invalid Content-Type header are treated as remote write v1 instead of returning 415 Unsupported Media Type. Default is false. [#7293] - [ENHANCEMENT] Ingester: Added
cortex_ingester_ingested_histogram_bucketsmetric to track number of histogram buckets ingested per user. [#7297] - [ENHANCEMENT] Ring: Reuse timers in lifecycler and backoff loops to reduce allocations. [#7270]
- [ENHANCEMENT] Ring/KV: Reuse timers in DynamoDB watch loops to avoid per-poll allocations. [#7266]
- [ENHANCEMENT] Ring/KV: Reuse timers in memberlist client to reduce allocations. [#7285]
- [ENHANCEMENT] PromQL: Add
holt_wintersbackwards compatibility as alias fordouble_exponential_smoothing. [#7223] - [ENHANCEMENT] Query Frontend: Add logical plan fragmentation for distributed query execution. [#7018]
- [ENHANCEMENT] Parquet: Support sharded parquet files in parquet converter and queryable. [#7189]
- [ENHANCEMENT] Compactor: Add graceful period for compaction groups to prevent compacting recently written blocks. [#7182]
- [ENHANCEMENT] Query Engine: Add projection pushdown optimizer for improved query performance. [#7141]
- [ENHANCEMENT] Distributor: Optimize memory allocations by pooling PreallocWriteRequestV2 and preserving the capacity of the Symbols slice during resets. [#7404]
- [ENHANCEMENT] Ruler: Allow ExternalPusher and ExternalQueryable to be specified separately. [#7224]
- [BUGFIX] Distributor: Add bounds checking for symbol references in Remote Write V2 requests to prevent panics when UnitRef or HelpRef exceed the symbols array length. [#7290]
- [BUGFIX] Distributor: If remote write v2 is disabled, explicitly return HTTP 415 (Unsupported Media Type) for Remote Write V2 requests instead of attempting to parse them as V1. [#7238]
- [BUGFIX] Ring: Change DynamoDB KV to retry indefinitely for WatchKey. [#7088]
- [BUGFIX] Ruler: Add XFunctions validation support. [#7111]
- [BUGFIX] Querier: propagate Prometheus info annotations in protobuf responses. [#7132]
- [BUGFIX] Scheduler: Fix memory leak by properly cleaning up query fragment registry. [#7148]
- [BUGFIX] Compactor: Add back deletion of partition group info file even if not complete [#7157]
- [BUGFIX] Query Frontend: Add Native Histogram extraction logic in results cache [#7167]
- [BUGFIX] Alertmanager: Fix alertmanager reloading bug that removes user template files [#7196]
- [BUGFIX] Query Scheduler: If max_outstanding_requests_per_tenant value is updated to lesser value than the current number of requests in the queue, the excess requests (newest ones) will be dropped to prevent deadlocks. [#7188]
- [BUGFIX] Distributor: Return remote write V2 stats headers properly when the request is HA deduplicated. [#7240]
- [BUGFIX] Cache: Fix Redis Cluster EXECABORT error in MSet by using individual SET commands instead of transactions for cluster mode. [#7262]
- [BUGFIX] Distributor: Fix an
index out of rangepanic in PRW2.0 handler caused by dirty metadata when reusing requests fromsync.Pool. [#7299] - [BUGFIX] Distributor: Fix data corruption in the push handler caused by shallow copying
SamplesandHistogramswhen converting Remote Write V2 requests to V1. [#7337] - [BUGFIX] Ingester: Fix panic due to concurrent access to rand in active queried series. [#7329]
- [BUGFIX] Distributor: Fix request slice not being properly reused in push error paths. [#7123]
- [BUGFIX] Memberlist: Skip nil values delivered by
WatchPrefixwhen a key is deleted, preventing a panic in the HA tracker caused by a failed type assertion on a nil interface value. [#7429] - [BUGFIX] Tenant Federation: Fix
unsupported charactererror whentenant-federation.regex-matcher-enabledis enabled and the input regex matches 0 or 1 existing tenant. [#7424] - [BUGFIX] KV store: Fix false-positive
status_code="500"metrics for HA tracker CAS operations when using memberlist. [#7408] - [BUGFIX] Fix nil when ingester_query_max_attempts > 1. [#7369]
- [BUGFIX] Alertmanager: Fix disappearing user config and state when ring is temporarily unreachable. [#7372]
- [BUGFIX] Fix memory leak in
ReuseWriteRequestV2by explicitly clearing theSymbolsbacking array string pointers before returning the object tosync.Pool. [#7373] - [BUGFIX] Querier: Fix queryWithRetry and labelsWithRetry returning (nil, nil) on cancelled context by propagating ctx.Err(). [#7375]
New Contributors
- @thc1006 made their first contribution in https://github.com/cortexproject/cortex/pull/7068
- @zanderfriz made their first contribution in https://github.com/cortexproject/cortex/pull/7085
- @b-wu26 made their first contribution in https://github.com/cortexproject/cortex/pull/7153
- @sh4shv4t made their first contribution in https://github.com/cortexproject/cortex/pull/7215
- @rice-junhaoyu made their first contribution in https://github.com/cortexproject/cortex/pull/7224
- @venkatchinmay made their first contribution in https://github.com/cortexproject/cortex/pull/7262
- @siddharthahuja1 made their first contribution in https://github.com/cortexproject/cortex/pull/7286
Full Changelog: https://github.com/cortexproject/cortex/compare/v1.20.0...v1.21.0
New Contributors
- @thc1006 made their first contribution in https://github.com/cortexproject/cortex/pull/7068
- @zanderfriz made their first contribution in https://github.com/cortexproject/cortex/pull/7085
- @kishorekg1999 made their first contribution in https://github.com/cortexproject/cortex/pull/7179
- @b-wu26 made their first contribution in https://github.com/cortexproject/cortex/pull/7153
- @sh4shv4t made their first contribution in https://github.com/cortexproject/cortex/pull/7215
- @rice-junhaoyu made their first contribution in https://github.com/cortexproject/cortex/pull/7224
- @Shvejan made their first contribution in https://github.com/cortexproject/cortex/pull/7241
- @venkatchinmay made their first contribution in https://github.com/cortexproject/cortex/pull/7262
- @sandy2008 made their first contribution in https://github.com/cortexproject/cortex/pull/7266
- @archy-rock3t-cloud made their first contribution in https://github.com/cortexproject/cortex/pull/7320
- @siddharthahuja1 made their first contribution in https://github.com/cortexproject/cortex/pull/7286
Full Changelog: https://github.com/cortexproject/cortex/compare/v1.20.0...v1.21.0