Dapr 1.16.1
This update includes bug fixes:
- Actor Initialization Timing Fix
- Sidecar Injector Crash with Disabled Scheduler
- Workflow actors reminders stopped after Application Health check transition
- Fix Scheduler Etcd client port networking in standalone mode
- Component initialization timeout check before using reporter
- Fix Regression in pubsub.kafka Avro Message Publication
- Ensure Files are Closed Before Reading in SFTP Component
- Fix AWS Secrets Manager YAML Metadata Parsing
- Reuse Kafka Clients in AWS v2 Migration
- Fix Kafka AWS Authentication Configuration Bug
- Enhanced debug logs for placement server
- Workflow actors never registered again after failed actors registration on GetWorkItems connection callback
Actor Initialization Timing Fix
Problem
When running Dapr with an --app-port
specified but no application listening on that port (either due to no server or delayed server startup), the actor runtime would initialize immediately before the app channel was ready. This created a race condition where actors were trying to communicate with an application that wasn't available yet, resulting in repeated error logs:
WARN[0064] Error processing operation DaprBuiltInActorNotFoundRetries. Retrying in 1s…
DEBU[0064] Error for operation DaprBuiltInActorNotFoundRetries was: failed to lookup actor: api error: code = FailedPrecondition desc = did not find address for actor
Impact
This created a poor user experience with confusing error messages when users specified an --app-port
but had no application listening on that port.
Root cause
The actor runtime initialization was occurring before the application channel was ready, creating a race condition where actors attempted to communicate with an unavailable application.
Solution
Defer actor runtime initialization until the application channel is ready. The runtime now:
- Defers actor runtime initialization until the application is listening on the specified port
- Provides informative
waiting for application to listen on port XXXX
messages instead of confusing error logs - Prevents actor lookup errors during startup
Sidecar Injector Crash with Disabled Scheduler
Problem
The sidecar injector crashes with error (dapr-scheduler-server StatefulSet not found
) when the scheduler is disabled via Helm chart (global.scheduler.enabled: false
).
Impact
The crash prevents the sidecar injector from functioning correctly when the scheduler is disabled, disrupting deployments.
Root cause
A previous change caused the dapr-scheduler-server
StatefulSet to be removed when the scheduler was disabled, instead of scaling it to 0 as originally intended. The injector, hardcoded to check for the StatefulSet in the injector.go
file, fails when it is not found.
Solution
Revert the behavior to scale the dapr-scheduler-server
StatefulSet to 0 when the scheduler is disabled, instead of removing it, as implemented in the Helm chart.
Workflow actors reminders stopped after Application Health check transition
Problem
Application Health checks transitioning from unhealthy to healthy were incorrectly configuring the scheduler clients to stop watching for actor reminder jobs.
Impact
The misconfiguration in the scheduler clients made workflows to stop executing because reminders no longer executed.
Root cause
On Application Health change daprd was able to trigger an actors update for an empty slice, which caused a scheduler client reconfiguration. However because there were no changes in the actor types, daprd never received a new version of the placement table which caused the scheduler clients to get misconfigured. This happens because when daprd sends an actor types update to the placement server daprd wipes out the known actor types in the scheduler client, and because daprd never received an acknowledgement from placement with a new table version then the scheduler client never got updated back with the actor types.
Solution
Prevent any changes to hosted actor types if the input slice is empty
Fix Scheduler Etcd client port networking in standalone mode
Problem
The Scheduler Etcd client port is not available when running in Dapr CLI standalone mode.
Impact
Cannot perform Scheduler Etcd admin operations in Dapr CLI standalone mode.
Root cause
The Scheduler Etcd client port is only listened on localhost.
Solution
The Scheduler Etcd client listen address is now configurable via the --scheduler-etcd-client-listen-address
CLI flag, meaning port can be exposed when running in standalone mode.
Fix Helm chart not honoring --etcd-embed argument
Problem
The Scheduler would always treat --etcd-embed
as true, even when set to false in the context of the Helm chart.
Impact
Cannot use external etcd addresses since Scheduler would always assume embedded etcd is used.
Root cause
The Helm template format treated the boolean argument as a seperate argument rather than inline.
Solution
The template format string was fixed to allow for .etcdEmbed
to be set to false
.
Component initialization timeout check before using reporter
Problem
The Component init timeout was checked after using the component reporter
Impact
This misalignment could lead to false positives, dapr could have reported success when later dapr was returning an error due the timeout check
Solution
Move the timeout check to be right after the actual component initialization and before the component reporter
Fix Regression in pubsub.kafka Avro Message Publication
Problem
The pubsub.kafka component failed to publish Avro messages in Dapr 1.16, breaking existing workflows.
Impact
Avro messages could not be published correctly, causing failures in Kafka message pipelines and potential data loss or dead-lettering issues.
Root cause
The Kafka pubsub component did not correctly create codecs in the SchemaRegistryClient. Additionally, the goavro library had a bug converting default null values that broke legitimate schemas.
Solution
Enabled codec creation in the Kafka SchemaRegistryClient and upgraded github.com/linkedin/goavro/v2
from v2.13.1 to v2.14.0 to fix null value handling. Metadata options useAvroJson
and excludeHeaderMetaRegex
were validated to ensure correct message encoding and dead-letter handling. Manual tests confirmed Avro and JSON message publication works as expected.
Ensure Files are Closed Before Reading in SFTP Component
Problem
Some SFTP servers require files to be closed before they become available for reading. Without closing, read operations could fail or return incomplete data.
Impact
SFTP file reads could fail or return incomplete data on certain servers, causing downstream processing issues.
Root cause
The SFTP component did not explicitly close files after writing, which some servers require to make files readable.
Solution
Updated the SFTP component to close files after writing, ensuring they are available for reading on all supported servers.
Fix AWS Secrets Manager YAML Metadata Parsing
Problem
The AWS Secrets Manager component failed to correctly parse YAML metadata, causing boolean fields like multipleKeyValuesPerSecret
to be misinterpreted.
Impact
Incorrect metadata parsing could lead to misconfiguration, preventing secrets from being retrieved or handled properly.
Root cause
The component used a JSON marshal/unmarshal approach in getSecretManagerMetadata
, which did not handle string-to-boolean conversion correctly for YAML metadata.
Solution
Replaced JSON marshal/unmarshal with kitmd.DecodeMetadata
to correctly parse YAML metadata and convert string fields to their proper types, ensuring multipleKeyValuesPerSecret
works as expected.
Reuse Kafka Clients in AWS v2 Migration
Problem
After migrating to the AWS v2 Kafka client, a new client was created for every message published, causing inefficiency and unnecessary resource usage.
Impact
Frequent client creation led to performance degradation, increased connection overhead, and potential resource exhaustion during high-throughput message publishing.
Root cause
The AWS v2 client integration did not implement client reuse, resulting in a new client being instantiated for each publish operation.
Solution
Updated the Kafka component to reuse clients instead of creating a new one for each message, improving performance and resource efficiency.
Fix Kafka AWS Authentication Configuration Bug
Problem
The Kafka AWS authentication configuration was not initialized correctly, causing authentication failures.
Impact
Kafka components using AWS authentication could fail to connect, preventing message publishing and consumption.
Root cause
A bug in the Kafka AWS auth config initialization prevented proper setup of authentication parameters.
Solution
Fixed the initialization logic in the Kafka AWS auth configuration to ensure proper authentication and connectivity.
Enhanced debug logs for placement server
Problem
Users experiencing issues with Placement server don't get enough information from the debug logs to troubleshoot or understand in what state the Placement server is
Impact
Inability to troubleshoot placement server.
Solution
Add more debug logs to get more detailed information about placement server dissemination logic.
Workflow actors never registered again after failed actors registration on GetWorkItems connection callback
Problem
Workflow workers connect to dapr but the workflow actors are never registered, resulting in workflows not executing and being unable to schedule new workflows.
Impact
Workflows API becoming unavailable.
Root cause
When the durabletask-go library executes the "on GetWorkItems connection callback" if this callback fails to actually register the actors and returns an error, then the "on GetWorkItems disconnect callback" was not being invoked. This resulted in sidecar not trying to register the actors ever again, because the workflow engine kept a counter that was incremented by 1 but never got decreased.
Solution
Refactor durabletask-go to guarantee that the "on disconnect" callback will always be invoked if the "on connection" callback has been invoked.