| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| docker.tar.gz | 2026-02-27 | 79.6 kB | |
| README.md | 2026-02-27 | 16.7 kB | |
| v1.4.0 source code.tar.gz | 2026-02-27 | 7.5 MB | |
| v1.4.0 source code.zip | 2026-02-27 | 9.9 MB | |
| Totals: 4 Items | 17.5 MB | 5 | |
Major Features
1. Active-Active Domains
Cadence domains have been running in active-passive mode for years, which has been limiting for use cases requiring processing in all clusters (regions). Since late 2025, Cadence can process domains in both regions and distribute traffic based on users’ preferences within domains. This change will make your domains more flexible and more efficient due to utilizing resources in all regions.
Key Capabilities
- Introduced ClusterAttribute as a flexible abstraction for defining cluster groupings beyond traditional region-based configurations
- Active Cluster Selection Policy allows workflows to specify which cluster attributes they should run on
Migration Path
Active-active support is designed for backward compatibility:
- All domains will support active-active by default and without breaking existing behavior. While users setting “cluster attributes” while starting their workflows will be able to benefit from active-active processing
- Existing active-passive domains continue to work without changes if leaving “cluster attributes” empty
Current Limitations
This feature is currently implemented for Cassandra and the support for other DBs will come in Q1. We will also release a blog explaining how this improved related use cases, a wiki explaining how to use it and a code lab to help you try out. There is a risk for failover which causes workflows to be stuck if the schedule to start latency plus replication lag is more than 25 mins. We're working on a project to resolve the risk.
2. Replication Improvements
Cadence orchestrates its own replication, which allows us to seamlessly migrate from one DB technology to another, one cloud provider to another etc. The way it was working in the past was that the replication messages would be generated by reading workflow tasks from the database.
Given that replication is a continuous process between Cadence regions, we implemented a cache to keep the replication messages in memory until a replication poll message arrives so we could eliminate the DB calls due to replication. This came with a 99%+ cache hit rate, which almost entirely eliminated the DB calls due to replication, which used to be more than 20% of all DB calls. Another big benefit was for replication latencies; since we can directly serve the messages from memory, our replication latencies dropped from 13s to 2s.
Key Capabilities
- Replication Budget Manager: New cache capacity control mechanism to prevent memory exhaustion during replication bursts
- Improved task fetcher concurrency: Better handling of concurrent replication task fetching with enhanced metrics
- BoundedAckCache optimizations: Generic cache implementation with improved ack handling
3. Shard Distributor Service (in progress)
The Shard Distributor is a new service component that provides dynamic shard assignment and load balancing across matching service instances. This enables better resource utilization, improved scalability, and operational flexibility for large-scale deployments.
#### Service Components
- Leader Processor: Centralized controller for shard assignment decisions
- Executor Client: Integration point for matching service instances to receive shard assignments
- Spectator Client: Read-only monitoring interface for shard state
- Canary: Health verification and ping protocol for shard ownership validation
Key Capabilities
- Dynamic Shard Assignment
- Dynamic spectator client control with enable/disable support
- Load Balancing
- Drain watching support for graceful shard handoff
- Automatic retry on rebalancing loop failures
- Migration Support: Migration mode for gradual rollout alongside existing static shard assignment
- Integration with Matching Service: The matching service has been refactored to support shard distributor integration:
Configuration
The shard distributor can be enabled via dynamic config:
shard-distributor:
enabled: true
loadBalancingMode: "shadow-mode" # Options: naive, shard-stats, shadow-mode
migrationMode: true # Enable gradual migration
Monitoring
- shard_handover_latency: Time taken for shard ownership transfer
- active_shards_count: Number of active shards per executor
- shard_assignment_conflicts: Concurrent assignment conflicts detected
- executor_heartbeat_status: Executor health and liveness
- ETCD watch event metrics for observability
4. Caller Type-Based Rate Limiting
A new caller type tracking and bypass mechanism has been introduced to allow granular rate limiting control for debugging and mitigation purposes. Key Capabilities:
- Caller Type Header Propagation (#7644, [#7653], [#7638]): Introduced cadence-caller-type header that propagates through service boundaries
- Extracted at service inbound boundaries using middleware
- Available in CLI via header support
- Minimal performance impact (~150-300ns per request)
- Persistence Rate Limiter Bypass (#7656): Dynamic config persistence.rateLimiterBypassCallerTypes allows specific caller types to bypass persistence rate limits
- Frontend Regional Rate Limiter Bypass (#7662): Caller type-based bypass for regional frontend rate limiter to allow priority requests during high load
5. Visibility Enhancements - Cron and Execution Status (#7527)
Added comprehensive cron workflow visibility with new fields in visibility records:
New Fields:
- CronSchedule: Display the cron schedule for workflows
- ExecutionStatus: Show actual execution status (PENDING, STARTED, or close statuses) instead of just CONTINUED_AS_NEW for cron workflows
- ScheduledExecutionTime: Track the actual scheduled execution time for cron workflows
Schema Changes Required: This feature requires database schema upgrades for all persistence stores (Cassandra, MySQL, PostgreSQL, SQLite, Elasticsearch)
CLI Updates:
- New --print_cron flag to display cron-related fields in cadence workflow list
- Shows execution status by default
Performance & Scalability Improvements
-
Database & Persistence
-
PostgreSQL timer task pagination (#7621): Improved pagination logic to handle large timer task queries efficiently
- History node deletion (#7484): Configurable page size for history deletion via dynamic config
- Snappy compression for history blobs (#7269): Reduced storage footprint and network transfer for history events
-
SQLite fixes (#7469): Resolved database locking issues for local/test deployments
-
Memory & Resource Usage
-
ETCD watch optimization (#7578): Removed WithPrevKV() to reduce memory overhead in shard manager
- Reduced allocations in metrics (#7456): Optimized insertReportIntoSizes to minimize GC pressure
- History deletion improvements (#7472): Fixed infinite loop in RangeCompleteHistoryTask when invalid page size provided
Notable Bug Fixes
-
History & Workflow Execution
-
Child workflow duplicate events (#7400): Proper handling of duplicate child workflow started events
- Activity scheduled time on reset (#7597): Correctly update not-started activities scheduled time when resetting workflows
- Restart workflow cron scheduling (#7247): Fixed bug where each restart skipped an additional cron scheduled run
- History cleanup timeout handling (#7617): Avoid dangerous timeout conditions in history cleanup process
- Workflow creation leak (#7523): Fixed resource leak during workflow creation in history service
- Signal-with-start cleanup (#7540): Proper handling of signal-with-start in cleanup logic
-
Signal handling with DelayStart (#7702): Prevent signals from bypassing DelayStart configuration
-
Cross-Datacenter Replication
-
Domain ID usage in replication (#7550): Use domain ID instead of domain name for more reliable replication
- Replication panic logging (#7396): Improved error handling and logging for replication stack panics
-
Database consistency error detection (#7573): More accurate detection of DB consistency errors
-
Active-Active Operations
-
Race condition in failover (#7587): Fixed race condition during active-active failover
- Query workflow support (#7339): Proper query handling for active-active domains
- StartWorkflow with terminate-if-running (#7361): Correct policy enforcement for active-active workflows
- Auto-forwarding (#7356): Fixed cluster forwarding logic for active-active domains
-
Standby task handling (#7423): Prevent premature dropping of standby tasks in active-active scenarios
-
Matching Service
-
TaskList stop on shard stop (#7581): Properly stop task lists when stopping shard processor
- TaskListActivitiesPerSecond enforcement (#7575): Correct rate limiting enforcement
- Nil load hints handling (#7551): Added nil pointer checks for load hints
- TaskList partition config invalidation (#7618): Properly invalidate TaskListPartitionConfig on attempted writes to read-only partitions
- Domain not active error handling (#7676): Fixed domain not active error to be non-retryable for matching service in active-active scenarios
- TaskList management with shard distributor (#7682): Properly handle shard processor lifecycle when onboarded to SD
-
Task list registry pattern (#7720): Introduced registry for better task list management
-
Persistence & Database
- Host tag reversion (#7675): Reverted addition of host tag to persistence calls due to issues
- History cleanup defaults (#7661): Changed defaults for history cleanup configuration
- History cleanup error classification (#7627): Tightened error classifications for history cleanup operations
-
Visibility upsert optimization (#7693): Only upsert search attributes when advanced visibility is enabled
-
CLI
- Rate limiter fix (#7585): Replaced token bucket with standard limiter in CLI for more reliable rate limiting
- Admin CLI config parsing (#7726): Fixed parsing of multiple config values
Observability Enhancements
-
Metrics
-
Host tagging for persistence metrics (#7530): Better attribution of persistence operations to specific hosts
- Shard handover latency (#7442, [#7614]): Track time taken for shard ownership transfers
- Replication task fetcher metrics (#7462): Enhanced visibility into task fetcher performance
-
Workflow access tracking (#7331): New metrics to track workflow access patterns
-
Logging
-
CallerType and CallerInfo propagation (#7564, [#7574], [#7588]): Context propagation for request tracing
- Improved replication logging (#7584): Better logs for matching and active-active operations
- Cluster redirection logs (#7333): Enhanced logging for cluster redirection handler
-
Error classification (#7466): Better categorization of errors in shard distributor
-
Monitoring Tools
-
Canary Grafana dashboard (#7464): Documentation for workflow success counter panels
- Dynamic config observability (#7605): Standardized comments for all dynamic configs
CLI Enhancements
- Failover domain command (#7295): New cadence domain failover command for active-active domains
- Cluster attributes in start workflow (#7494): Support --cluster-attributes flag when starting workflows
- Describe workflow improvements (#7461): Show ActiveClusterSelectionPolicy in describe output
- Cluster attributes in domain describe (#7539): Display cluster attributes when describing domains
- Failover history rendering (#7444, [#7407]): Improved rendering of domain failover history
- Workflow refresh tasks command (#7657): New cadence workflow refresh-tasks command allows non-admin users to refresh workflow tasks
Code Quality & Maintenance
-
Refactoring
-
Domain handler refactoring (#7403, [#7401], [#7395]): Multi-phase cleanup and modernization
- History engine test refactoring (#7343, [#7342]): Improved test structure and maintainability
- Matching engine refactoring (#7593, [#7592], [#7591], [#7547]): Replaced callbacks with explicit Registry pattern
- Mapper nil handling (#7434): Comprehensive nil checking improvements across all mappers
- Remove panics in type conversions (#7258): Safer error handling in proto/thrift mappers
- QueueManager/Queue interfaces (#7652): Refactored queue manager and queue interfaces for better maintainability
- Matching load calculation (#7647): Simplified load calculation for shards
-
Domain audit filter cleanup (#7725): Removed MaxCreatedTime from DomainAuditLogFilter
-
Testing
-
Simulation test improvements (#7303, [#7283]): Better active-active simulation testing
- Integration tests (#7344): New DescribeCluster integration test
-
Conventional commit enforcement (#7278): CI validation of commit message structure
-
Deprecation & Cleanup
-
Removed deprecated dynamic configs (#7613): Cleanup of unused configuration properties
- Removed deprecated fields (#7388, [#7365], [#7357], [#7275]): Cleanup of legacy active-active fields
- Commented code removal (#7592): Removed obsolete code in matching engine
-
ExternalEntityProvider removal (#7292): Removed deprecated active-active provider
-
Infrastructure
-
CI improvements (#7649): Added pull request reviewer automation via gitar
- PR template updates (#7596): Added comprehensive reviewers checklist to pull request description
- Issue automation (#7615): GitHub action to standardize issue descriptions and labeling
-
Code ownership (#7659): Added natemort and c-warren to CODEOWNERS
-
Code Generation
-
Missing codegen (#7663): Fixed missing code generation that had snuck into master
Security & Authorization
- ResetStickyTaskList auth (#7340): Added to non-domain auth API list
- Auth README documentation (#7368): Comprehensive authentication documentation
Infrastructure
- Docker compose for OpenSearch (#7510): Local development support for OpenSearch
- Go toolchain upgrade (#7414): Updated Go toolchain and mockery
- ETCD integration test fixes (#7502): Stabilized ETCD-based tests
- Multi-cluster test scripts (#7327): Moved scripts to proper location
Migration Notes
Schema Upgrades Required
- Visibility Schema v0.10 (Cassandra): Adds ExecutionStatus, CronSchedule, and ScheduledExecutionTime fields
- Visibility Schema v0.8 (MySQL): Adds execution-related fields
- Visibility Schema v0.9 (PostgreSQL): Adds execution-related fields
- Visibility Schema v0.2 (SQLite): Adds execution-related fields
- Elasticsearch Templates: Updated for v6, v7, and OpenSearch v2
- PostgreSQL support for DomainAudit (#7665)
- Cassandra schema updates synced with IDL changes (#7727, [#7723])
Dynamic Config Changes
- New config: persistence.rateLimiterBypassCallerTypes - List of caller types to bypass persistence rate limiting
Documentation
- Auth system documentation (#7368)
- Canary monitoring dashboard guide (#7464)
- Domain update and failover documentation (#7382)
- Multi-cluster setup improvements (#7282)
- Contributor guide updates (#7337)
### New Contributors
- @eleonoradgr made their first contribution in https://github.com/cadence-workflow/cadence/pull/7312
- @ramazan made their first contribution in https://github.com/cadence-workflow/cadence/pull/7349
- @instamitch made their first contribution in https://github.com/cadence-workflow/cadence/pull/7353
- @AndreasHolt made their first contribution in https://github.com/cadence-workflow/cadence/pull/7354
- @zawadzkidiana made their first contribution in https://github.com/cadence-workflow/cadence/pull/7427
- @joannalauu made their first contribution in https://github.com/cadence-workflow/cadence/pull/7557
- @shuprime made their first contribution in https://github.com/cadence-workflow/cadence/pull/7580
- @Scanf-s made their first contribution in https://github.com/cadence-workflow/cadence/pull/7585