Download Latest Version Apache Pinot Release 1.4.0 source code.tar.gz (134.7 MB)
Email in envelope

Get an email when there's a new version of Pinot

Home / release-1.4.0
Name Modified Size InfoDownloads / Week
Parent folder
Apache Pinot Release 1.4.0 source code.tar.gz 2025-09-25 134.7 MB
Apache Pinot Release 1.4.0 source code.zip 2025-09-25 141.2 MB
README.md 2025-09-25 43.5 kB
Totals: 3 Items   276.0 MB 0

What Changed

This release delivers significant improvements to the Multistage Engine, Pauseless Consumption, Time Series Engine, Logical Table support, Upsert and Deduplication Enhancement, Minion Jobs (including smallSegmentMerger), and Rebalancing capabilities. It also includes numerous smaller features and general bug fixes.

Multistage Engine Lite Mode (Beta) | Runbook

There's an all new query mode added for running Multistage Engine queries against Pinot, heavily inspired from Uber's Presto over Pinot query architecture.

The MSE Lite Mode runs queries following a Scatter-Gather paradigm, same as Pinot's V1 Query Engine. Moreover, it adds a configurable limit on the number of records returned by each instance of the leaf stage. This limit is set to 100k records by default.

With MSE Lite Mode, you can enable MSE access for all users without worrying about them breaking their production workloads. Moreover, MSE Lite Mode can scale to 1000s of QPS with minimal hardware, meaning users can now run complicated multi-stage queries leveraging features such as sub-queries, window functions, etc. at high-qps and low-latencies, with minimal reliability risks.

You can enable this by setting all of the following query options:

SET useMultistageEngine=true; 
SET usePhysicalOptimizer=true; 
SET useLiteMode=true;

Multistage Engine Physical Optimizer (Beta) | Runbook

We have added a new query optimizer for the Multistage Engine that can automatically eliminate or simplify redundant Exchanges. We aim to make this query optimizer the default in future versions.

Uber adopted this optimizer for one of their workloads that needs Colocated Join support, and it proved to be 5-7x faster with 50% less CPU consumed. issue [#15871]

To enable this, set the query options:

SET useMultistageEngine=true;
SET usePhysicalOptimizer=true;

Features

  • Capable of simplifying Exchanges for arbitrary complicated queries. No query-hints required.
  • Supports group-by, joins, union-all, etc.
  • Can solve constant queries within the Broker itself.
  • Can simplify Exchange even if the number of partitions of the two join inputs are different. e.g. if the table on the left is partitioned into 8 partitions over 4 servers, and the table on the right is partitioned into 16 partitions over 4 servers, the Physical Optimizer will automatically switch to "Identity Exchanges".
  • Can simplify Exchange even if the servers selected for the two sides of a join are different.

Unsupported but Coming Soon

  • Support for customizing query parallelism via SET stageParallelism=x.
  • Support for dynamic filters for Semi-Join queries.
  • Support for SubPlan based execution for eligible queries.
  • Support for "Lookup Join" optimization.
  • Support for Spools.

Here are some of the key PRs that have been merged as part of this feature

  • [multistage] Replace LogicalTableScan with PinotLogicalTableScan #15225
  • [multistage] Adding Basic Constructs for Physical Optimization #15371
  • [multistage] Add Physical Plan Nodes / Trait Assignment / Logical Agg Rule #15439
  • [multistage] Add Leaf Stage Worker Assignment / Boundary / Agg Rules #15481
  • [multistage] Add Pushdown and Worker Rules #15658
  • [multistage] Support Physical Optimizer E2E #15698
  • [multistage] Multistage Engine Lite Mode (prototype) #15743
  • [multistage] Add Support for Inferring Invalid Segment Partition Id #15760
  • [multistage] Handle Excluded New Segments in MSE Physical Optimizer #15780
  • [multistage] Add userFactEvents Table to Colocated Join Quickstart #15800
  • [multistage] Add Support for Broker Server/Segment Pruning #15959
  • [multistage] Lite Mode with Scatter Gather Execution #16000
  • [multistage] Add Support for inferRealtimeSegmentPartition #16023
  • [multistage] Enable runInBroker / useBrokerPruning by Default #16204
  • [multistage] Add Support for Values in Physical Optimizer #16221
  • [multistage] Add broker config defaults for physical optimizer and lite mode #16240
  • [multistage] Support Hash Functions Gracefully in V2 Optimizer #16296
  • [multistage] Fix Bugs in SetOp Handling and Multi-Column Join #16330

Multistage Engine Enhancements

Multiple Window Functions in MSE #16109

The multi-stage engine now supports multiple WINDOW functions in a single query plan, enabling more expressive and efficient analytical queries with improved stage fusion and execution planning.

ASOF JOIN Support #15630

Introduced support for ASOF JOIN, allowing time-aligned joins commonly used in time-series analytics. This unlocks use cases where approximate matches based on time proximity are required.

Colocated Join with Different Partitions #15764

The MSE engine now supports colocated joins between tables with different partitioning schemes, improving join flexibility and compatibility with real-world data layouts.

Local Replicated Join & Local Exchange Parallelism #14893

Optimized join strategies by enabling local replicated joins and local exchanges. This reduces cross-node shuffles and improves performance for high-selectivity joins and co-partitioned data.

Distribution Type Hint for Broadcast Join #14797

Introduced a planner hint for specifying distribution type (e.g., BROADCAST) to force broadcast joins when appropriate. This gives users more control over join strategy and execution plans.

Dynamic Rule Toggling in Optimizer #15999

Users can now dynamically enable or disable optimization rules in the query planner (optProgram), allowing fine-grained control and easier tuning for query behavior and debugging.

Parser Enhancements for Type Aliases #15615

Added support for SQL type aliases like LONG being interpreted as BIGINT, improving compatibility and developer ergonomics.

Task Throttling Based on Heap Usage #16271

Throttling logic has been introduced for Segment Split Executor (SSE) and Multi-Stage Execution (MSE) tasks. Tasks will be throttled when server heap usage exceeds a configurable threshold to safeguard system stability under load.

Query Cancellation for MSQE with Client-Provided ID #14823

Extended support for query cancellation in the Multi-Stage Query Engine (MSQE), including cancellation via client-specified query identifiers. This enables better integration with external systems and more robust control over long-running queries.

Pauseless Consumption (Design)

Pauseless consumption is introduced in Pinot 1.4.0, it enhances real-time analytics by minimizing ingestion delays and improving data freshness in Apache Pinot.

In the current architecture of Apache Pinot, real-time data ingestion pauses during the build and upload phases of the previous segment. These phases can sometimes take a few minutes to complete, causing delays in data availability. As a result, users face a gap in accessing the most recent data, impacting real-time analytics capabilities.

Pauseless consumption resolves this issue by allowing Pinot to continue ingesting data while completing the build and upload phases of the previous segment. This enhancement ensures more up-to-date data availability, significantly reducing latency between ingestion and query.

Here are some of the key PRs that have been merged as part of this feature

  • Pauseless ingestion without failure scenarios #14741
  • Pauseless Ingestion [#2]: Handle Failure scenarios without DR #14798
  • Pauseless Consumption [#3]: Disaster Recovery with Reingestion #14920
  • Add validations for Pauseless Tables #15567, #15953
  • Adds Disaster Recovery modes for Pauseless #16071
  • Adding metrics for pauseless observability #15384
  • Compatibility for Pauseless Dedup and Upsert table #15383
  • Allows segments deletion in build for pauseless tables #15299
  • Support size based threshold for pauseless consumption #15347
  • Fix pauseless consumption segment download #15316
  • Registering pauseless FSM by default and picking this FSM for pauseless enabled tables #15241

Logical Table Support (Design)

A logical table is a collection of physical tables (REALTIME and OFFLINE tables). A SQL query that uses a logical table will internally scan ALL the physical tables. Conceptually, a logical table is similar to a specific definition of a VIEW in relational databases.

Logical table 'l' of physical tables t1_REALTIME, t2_REALTIME, t1_OFFLINE is similar to

CREATE VIEW l AS
SELECT <columns> FROM t1_REALTIME
UNION
SELECT <columns> FROM t2_REALTIME
UNION
SELECT <columns> FROM t1_OFFLINE

Logical tables are designed to simplify and unify a wide range of use cases by abstracting the complexity of managing multiple physical tables. They enable ZK node scalability by allowing large tables to be split into smaller OFFLINE tables and a REALTIME table, while presenting a single logical table to users—making operations on IdealState, ExternalView, and segment transparently. Logical tables also support ALTER TABLE workflows, such as Kafka topic reconfiguration, schema changes, and table renames, by allowing replacement of the underlying physical table list. For data layout management, like re-streaming and time-based partitioning, logical tables help ensure that ingestion changes remain invisible to users.

Here are some of the key PRs that have been merged as part of this feature

  • Logical table: quick refactoring and java docs #15770
  • Handle remove build routing for logical tables #15862
  • Execute Queries on Logical Tables in SSE #15634
  • Logical table time boundary #15776
  • Add configs to logical tables #15720
  • Logical table CRUD operations #15515
  • Database name validation for logical table #15994
  • Cache configs for logical table context in server #15881
  • Schema and table config deletion to validate with logical table ref table names #15900
  • Logical table schema enforcement #15733
  • Broker selection for logical tables #15726
  • GET /logicalTables API to return database specific tables #15944
  • Logical table query quota enforcement - SSE #15839
  • Support logical tables in MSE #15773

Time Series Engine is Now in Beta

Pinot 1.3.0 introduced a Generic Time Series Query Engine in Apache Pinot, enabling native support for various time-series query languages (e.g., PromQL, M3QL) through a pluggable framework. Multiple enhancements and bugfixes have been added in 1.4.0.

Timeseries Query Execution UI in Pinot Controller #16305

Added a new UI in the Pinot Controller for visualizing timeseries query execution plans. This feature helps developers and operators better understand query breakdowns, execution stages, and time-series–specific optimizations, making troubleshooting and tuning more intuitive.

Adding controller endpoint to access timeseries API #16286

Introduces a Prometheus-compatible /query_range endpoint to support time series queries in Pinot. Refactors broker request handling to generalize support for both GET and POST methods, simplifies header extraction, and improves error handling and logging. Includes minor code cleanups and enhancements to maintainability.

Enhancements

  • [timeseries] Add Support for Passing Raw Time Values to Leaf Stage #15000
  • [timeseries] Fix Num Groups Limit Default Value #15026
  • [timeseries] Add Support for limit and numGroupsLimit #14945
  • [timeseries] Add Metadata Provider to Time Series Query Planner #15604
  • [timeseries] Adding working E2E quickstart for TimeSeriesEngineAuth #16169

Upsert and Dedup

Ensure consistent creation time across replicas to prevent upsert data inconsistency #16034

Enhancement addresses inconsistent segment creation times across replicas that result in non-deterministic upsert behavior while uploading UploadedRealtimeSegment, leading to data inconsistency. The solution adds zkCreationTime in SegmentMetadataImpl and uses that for comparison tie breaking logic. During segment loading, ZK time is set during all loading flows, and the upsert logic introduces getAuthoritativeCreationTime() to prefer ZK time, ensuring consistent upsert decisions across all replicas while maintaining backward compatibility by falling back to local time if ZK time is unavailable.

Bug fixes

  • Introduce Enablement enum with value ENABLE, DISABLE and DEFAULT to control the enablement of a feature. For DEFAULT enablement, use the default config from upper level (e.g. instance level)
  • Introduce snapshot and preload field as Enablement into UpsertConfig and DedupConfig so that the value can be properly overridden. Currently there is no way to disable at table level when instance level is enabled
  • Always read properties from UpsertContext and DedupContext to avoid the inconsistency of server level override and config change

Cleanups

  • Simplify the constructor for upsert/dedup related configs
  • Re-order some fields/methods for readability
  • Unify the metadata manager creation logic for upsert/dedup
  • Move some constants to CommonConstants

Incompatibility

  • enableSnapshot and enablePreload are deprecated and replaced with snapshot and preload

Here are some of the key PRs

  • Allows consumption during build for dedup/partial-upsert #15296
  • Remove snapshotRWlock in upsert table partition mgr #15420
  • Validate and reject MV primary-keys for upsert/dedup, BIG_DECIMAL, and JSON #16079
  • Add segmentCreationTimeMillis in validDocIdsMetadata #15938
  • Check for timeColumn data type in Upsert & Dedup Tables #15761
  • Add server status in the validDocIds info API - Upsert Tables #16165
  • Make SegmentOperationsThrottler more extensible and modify interfaces for upsert and Dedup to take this as an argument #15973
  • Bug Fix Segments going into BAD state for Dedup Tables using TTL and RF > 1 #15178
  • Do not allow consumption for Dedup Tables enabled during segment download and replacement #15268
  • Add a config to skip updating dedup metadata for non-default tier segments #15576
  • Fix and clean-up upsert and dedup config #15528
  • Allow RealtimeSegmentValidationManager to fix error segments for partial upsert and dedup tables. #15987

Minion Improvements

Small Segment Merger Task Enhancement #16086

Enhancement addresses data inconsistency issues in UpsertCompactMerge tasks caused by segment replica creation time mismatches. Instead of using the creation time from the server, the system now uses the creation time from ZK metadata which aligns with upsert tie breaking logic. The task generator passes the maximum creation time of merging segments as task input, ensuring deterministic segment metadata across replicas without additional server calls.

Added config to skip dedup metadata updates for non-default tiers #15576

For dedup-enabled tables, when segments are moved to the cold tier, usually they are out of the metadata TTL thus we can skip updating the dedup metadata for it to reduce the overhead of metadata construction.

New added config: - Table level under dedupConfig: ignoreNonDefaultTiers: ENABLE, DISABLE, or DEFAULT (default, use instance level config) - Instance level: pinot.server.instance.dedup,default.ignore.non.default.tiers

Notable Improvements and Bug Fixes

  • Fix segment completion FSM on uploaded segment #15062
  • Track the actor that triggers the minion task #14829
  • Improve validations for minion instance tag checks #15239
  • Minion tasks should not pick up problematic consuming segments #15173
  • Add Obfuscator for task config logging in minion builtin tasks #16192

Ingestion and Indexing

Add Multi-column Text index #16103

Introduced the ability to create a single text index across multiple columns. This reduces indexing overhead for multi-field text search and enables faster search queries where text relevance spans multiple fields.

Apart from saving space on shared intra-column tokens within Lucene, the new index uses a single document id mapping. Example configuration (within table config):

"tableIndexConfig": {
   "multiColumnTextIndexConfig": {
      "columns": ["hobbies", "skills", "titles" ],
      "properties": {
         "caseSensitive": "false"
       }
       "perColumnProperties": {
          "titles": {
             "caseSensitive": "true"
          }
       }
 }

As shown in example above, index configuration allows for both: - setting shared index properties that apply to all columns with "properties". Allowed keys are : enableQueryCacheForTextIndex, luceneUseCompoundFile, luceneMaxBufferSizeMB, reuseMutableIndex and all allowed in perColumnProperties. - setting column-specific properties (overriding shared ones) with perColumnProperties. Allowed keys: useANDForMultiTermTextIndexQueries, enablePrefixSuffixMatchingInPhraseQueries, stopWordInclude, stopWordExclude, caseSensitive, luceneAnalyzerClass, luceneAnalyzerClassArgs, luceneAnalyzerClassArgTypes, luceneQueryParserClass.

Max JSON Index Heap Usage Configuration #15685

Introduced a maxBytesSize configuration for mutable JSON indexes to cap memory usage during ingestion. This prevents excessive heap consumption when processing large JSON documents.

Logical Type Support in Avro Enabled by Default #15654

The pinot-avro ingestion plugin now automatically enables support for Avro logical types such as timestamps and decimals. This improves schema accuracy and reduces the need for manual configuration.

Fix for Real-Time Segment Download #15316

Resolved an issue that caused failures when downloading real-time table segments during ingestion. This fix improves data availability and reduces ingestion errors.

JSON Confluent Schema Registry Decoder #15273

Added the KafkaConfluentSchemaRegistryJsonMessageDecoder, enabling seamless ingestion of JSON messages registered in Confluent Schema Registry. This broadens compatibility with Kafka-based pipelines.

Canonicalize BigDecimal Values During Ingestion #14958

Standardized BigDecimal ingestion by converting values into a canonical form. This ensures consistent deduplication, accurate comparisons, and stable upsert behavior.

New Scalar Functions Support

JSON_MATCH Function Extension Points #15508

Added extension points for the JSON_MATCH function, allowing developers to plug in custom matching logic during JSON query evaluation.

JsonKeyValueArrayToMap Function #15352

Introduced a function that converts a JSON key-value array into a map, simplifying certain ETL and query transformations.

H3 Geospatial Functions: gridDisk and gridDistance #15349, #15259

Added new geospatial functions for H3 indexing: - gridDisk — returns all H3 cells within a given radius. - gridDistance — computes the distance between two H3 cells.

Plugin & API Enhancements

ArrowResponseEncoder Implementation #15410

Added a new ArrowResponseEncoder to support Apache Arrow format responses, enabling faster and more efficient data transfer to compatible clients.

S3 Plugin Checksum Support #15304

The S3 plugin now supports request and response checksum validation via configuration. This improves data integrity verification when reading from or writing to S3.

Security

Row-Level Security (RLS) Support #16043

Implemented row-level security policies, allowing fine-grained data access control where different users or groups see only rows they are authorized to view. This is particularly useful for multi-tenant environments.

Groovy Script Static Analysis #14844

Added static analysis checks for Groovy scripts to detect unsafe patterns before execution, improving the security posture of custom UDFs and transforms.

Notable Features and Updates

Support orderedPreferredReplicas query option for customizable routing strategy #15203

Introduced the orderedPreferredPools query option, allowing users to provide a prioritized list of server pools as a routing hint. The broker attempts to route queries to these pools in order, falling back gracefully, which enables precise traffic control for canary deployments.

Enforce Schema for All Tables #15333

Now enforces that all tables have an associated schema, ensuring data integrity and consistency across ingestion and query execution.

Default Load Mode Changed to MMAP #15089

Updated the default segment load mode to MMAP for better memory efficiency, especially for large datasets.

Workload Configurations for Query Resource Isolation #15109

Introduced workload-based query resource isolation. Administrators can now define workload profiles with specific resource allocations, improving multi-tenant fairness.

Server-Level Segment Batching for Rebalance #15617

Added the ability to batch segment assignments at the server level during rebalance operations. This reduces the number of rebalance steps and minimizes disruption.

ClusterConfigChangeHandler and Segment Reindex Throttle #14894

Introduced a ClusterConfigChangeHandler on servers and added throttling for segment reindexing operations. This prevents excessive load during cluster configuration changes.

Misc. Improvements

  • Add a dry-run summary mode for TableRebalance which only returns a summary of the dry-run results #15050
  • Add mergedTextIndexPrefixToExclude config to SchemaConformingTransformer #15542
  • Adding broker grpc query endpoint and BrokerRequest/BrokerResponse protobuf #15081
  • Add null checks when sampling thread resource usage #15069
  • Added support to pause and resume ingestion based on resource utilization #15008
  • Add support for performing pre-checks for TableRebalance #15029
  • Add server level dynamically configurable segment download throttler #15001
  • [Build] Add Maven Wrapper + increase Maven version in Docker images #15035
  • Adding a testcase for testing a table where kafka partition has been reduced after creating the table #15028
  • Adding jute.maxBuffer in java opts as an example #15047
  • Adding a new list in the propertystore to capture the committing segments #15016
  • Add broker config with default value for is_enable_group_trim hint #14990
  • Add segment StarTree index rebuild throttler configurable via ZK cluster configs #14943
  • Adding a new config skipSegmentPreprocess in table IndexingConfig #14982
  • Add configs to customize http client/request timeouts #15010
  • Add option to delete table and schema immediately #14736
  • Enhance query visualization by preventing duplicate node additions and improving edge connections in flowchart generation #15025
  • Add numGroupsWarningLimitReached stat #15279
  • Enhance index and field config validation to block adding bloomfilter on boolean column #15283
  • Add MetricsExecutor to track number of tasks started and finished in MSE #15357
  • Add getter for precheckcontext #15339
  • [HELM]: add initContainers support #15275
  • Add metric for multstiage num groups limit reached #15221
  • Add recommended defaults for table rebalance config, add preChecks to RebalanceTableCommand #15232
  • Ui: add instruction to use required nodejs version #15181
  • Add metrics for tracking zookeeper's max buffer size in services #15380
  • Add metrics to track segment operation throttle thresholds and count of number of segments undergoing operation #15392
  • Adding encoding type into the BrokerGrpcServer #15395
  • Adds logs for consumption in order #15408
  • Add an interface for SecretStore #15226
  • Add Tenants Info to Rebalance API summary #15284
  • Add default impl to read string values in ForwardIndexReader #15391
  • Add off-heap set implementations #15205
  • Add BrokerGrpcServer for a streaming response fetching endpoint #15088
  • HELM: Add TLS-only support and publish 0.3.3 Chart version #15356
  • Adds a new config to create a configurable JFR recording in continuous mode #15364
  • Add some comments about why using Semaphore as the consumer lock #15361
  • Add metrics to track external view's node size #15338
  • Add segmentNames as an optional parameter in the get IS and EV APIs for a table #15332
  • Add Disk Utilization Info into Rebalance API #15175
  • Add DATE_TRUNC Optimizer #14385
  • Add support for orphan segment cleanup #15142
  • Add integration test for lookup join in the multi-stage engine #15244
  • Add Murmur2 as an alias for Murmur Partition function #15298
  • Add sample and check to FunnelBaseAggregationFunction #15251
  • Add minimizeDataMovement to Rebalacne API #15110
  • Add tests for DUAL as well as non-compliant DUAL with unknown table #15260
  • Add polymorphic scalar function implementation for ARRAY_LENGTH #15243
  • SAR-635 - Add YouTube contributions to readme #15235
  • Add predownload functionality to Pinot #14686
  • Add broker gauge metric for estimated MSE query server threads #15189
  • Adding a plan listener to report each rule evaluation time in the optimizer #15192
  • Add fetchPartitionCount and fetchStreamPartitionOffset implementation api for pinot-kinesis #14713
  • Add direct state transition from CONSUMING/ONLINE to DROPPED, increase state transition priority for DROPPED and OFFLINE #15190
  • Add server channel shutdown assertions to QueryRoutingTest #15165
  • Add type coercion (implicit cast) rules for TIMESTAMP / LONG in the multi-stage engine #15679
  • Add TableSegmentsInfo to Thrift and Proto broker/server messages #15573
  • Adding CityHash support to scalar functions #15655
  • Added timeouts to the resource deletion calls from SegmentDeletionManager #15638
  • Add button to Pause/Resume consumption in Table UI #15657
  • Allow raw JSON as input for Add table button in Pinot UI #15647
  • Add netty memory related metrics for grpc query/broker servers #15625
  • Add netty memory related metrics for GrpcMailboxServer #15651
  • Add a button to view consuming segments info directly in table UI #15623
  • Add a New Pre-check Item for Replica Groups Info #15575
  • Add button to repair realtime table by triggering RealtimeSegmentValidationManager #15584
  • Add hooks to service starters to apply custom cluster and instance configs #15595
  • Add progress stats rollback support for rebalance progress stats on IdealState update failure #15510
  • Add a parameter for thrift binary path #15546
  • Add null handling support for MV aggregation functions - COUNT, MIN, MAX, SUM, AVG, MINMAXRANGE #15524
  • Add tests to clarify the limits of base64 encoded string detector #15497
  • Adding codecov token #15493
  • Add auth info while querying auth enabled broker endpoints from the controller #15486
  • Add DISTINCT_COUNT_OFF_HEAP aggregate function #15469
  • Add per query stats to QuerySummary in pinot-tools QueryRunner #15376
  • Added support to apply record enrichers before complex transformations #15359
  • Add with childOption #15628
  • [Minor] Make all constants in public access #15605
  • add_init #15471
  • Add the ability to override plugins using an integer priority #15766
  • Add a test to create star-tree on new added columns #15745
  • Add memory allocation stats per query on server #15828
  • Disable japicmp by default. Add a CI execution that just runs that #15888
  • Add APIs to IndexSegment as a preparation to support virtual DataSource #15869
  • Add ISO 8601 date conversion functions and tests #15793
  • Add CaseSensitiveAnalyzer and support for case-sensitive text indexing #15803
  • Add Maven Enforcer Rule to automatically enforce Dependency Management Guidelines during PR check-in (Part 2) #15795
  • Add Maven Enforcer Rule to automatically enforce Dependency Management Guidelines during PR check-in #15739
  • Add skipSegmentPreprocess flag to TableConfigBuilder #15758
  • Add metrics for monitoring server's message queue size #15722
  • Add Table config decorator support #15714
  • Add segment end criteria check for SVForwardIndex and Dictionary #15120
  • Adds Intgeration tests for Dedup and Pauseless #15398
  • Add updateTargetTier check to the rebalance configs pre-check for table rebalancer #15689
  • Make the ResourceUtilizationChecker more easily extensible and add a PK count endpoint to the server side #16146
  • Updated README.md - added deepwiki, so it enables auto refresh of the wiki #16151
  • Add guava as dep in pinot-s3 #16052
  • Add test scope for assertj-core #16019
  • Add new parameter in Table Rebalance API: Disk Utilization Pre-check Threshold Override #16144
  • Add server startup check meter #16128
  • [index] Add json creator initializer for VectorIndexConfig #16191
  • Add log while skipping creation of new consuming segment #16134
  • Add multistage thread limiting configs at the broker and server level #16080
  • Add Filter for segment state in the Table UI #16085
  • Add reset segment operation to UI #16078
  • Adds API to return badSegments per partitionId #16067
  • Adds remove ingestion-metrics API #16045
  • Add exception for resource limit exceeded in MultiStageOperator #16077
  • Add config for logging queries before processing on broker instances #16056
  • Add config to use LegacyMd5Plugin in S3 client, which restores the pre-2.30.0 MD5 checksum behavior #16065
  • Add new ImplicitRealtimeTablePartitionSelector strategy for instance assignment #15930
  • Add sanity checks to ensure write permissions for data dir #15876
  • Add bad data handling for some IndexCreator::add() functions to skip record or add dummy record #16094
  • Add the continueOnError flag to the IndexCreationContext #16331
  • Add withTableProperties to Get Tenant Table API #16202
  • Adds metric to emit consumer sempahore acquire time #16278
  • Add JsonProperty to correctly serialize and deserialize 'rlsFiltersApplied' in broker response #16250
  • docs: add macOS gRPC Java plugin build workaround and instructions #16329

Bug Fixes

  • Spool intermediate stage fix #15024
  • [Bugfix] Adds Check to Ignore Committing Segments as Completed #15065
  • fix MSE incorrect stats visualization #15188
  • fix numGroups metric and add metric for warnings #15280
  • rebalance api url builder fix #15389
  • Minor Refactoring and fixes #15419
  • fixing NPE when ArrowResponseEncoder reading the vector data as null #15457
  • Ensure that minAvailableReplicas has an upper bound of existing numReplicas to fix infinite loop in rebalance for StrictReplicaGroup assignment #15468
  • [bugfix][ui] Fixes for table rebalance UI #15511
  • Fixes realtimeBytesConsumed metric and jmx export rule #16158
  • Fix RefreshSegmentMinionClusterIntegrationTest.checkColumnAddition() #16125
  • [ui] Fix: Prevent Runtime Errors by Validating Add Schema JSON Fields #15680
  • Kinesis partition split fixes #15563
  • Bug fix: table names misaligned in pinot UI #15670
  • [bugfix] Remove deleting segments from table status info #15725
  • Enhance OfflineClusterIntegrationTest to check index size and fix index removal for default column #15740
  • update the Baseline jars for japicmp plugin to fix the build failure. #15879
  • Reduce ZK access for tier metadata update, and fix parallel push #15933
  • bug fix: close admin client on consumer close #16227
  • Change wait condition to fix flaky test #16141
  • Misc fixes for virtual column support #16121
  • try fixing CI error by running unit tests single-threadedly #16098
  • fix name of multipleselect component #16009
  • fix: Add macOS ARM64 profiles for protobuf-maven-plugin with Homebrew support #16335
  • switch log.debug to log.info on query request #15264
  • Return JSON instead of str in the broker debug api /debug/serverRoutingStats #15018
  • Fixed couple of issues in CLP V2 implementation and added integration tests #16298
Source: README.md, updated 2025-09-25