Layer	Change
MCP server	Replace stdio JSON-RPC handler with an HTTP server (e.g. Express or native `node:http`). Expose a POST endpoint for tool calls and an SSE endpoint (`/events`) for push notifications.
MCP client config	`mcp.json` changes from `"type": "stdio"` to `"type": "sse"` with a URL pointing to the local HTTP server.
Event bus	Internal pub/sub bus inside fleet so any subsystem (auth socket, task monitor, stall detector) can emit events that get forwarded onto the SSE stream.
Claude Code client	Claude Code already supports the SSE transport. Whether it surfaces `notifications/message` as LLM conversation injections is a separate Anthropic ask — but the server side is ready.

Originally posted by: kumaakh

Analysis: Non-blocking execute_prompt with SSE

TL;DR

The proposal shifts the execute_prompt MCP tool from a synchronous, blocking model to an asynchronous, fire-and-forget pattern. Instead of waiting for the LLM execution to complete, the server would immediately return a unique stream identifier. The caller can then subscribe to the stream to receive real-time updates via Server-Sent Events (SSE). This unlocks parallel prompt dispatching for the orchestrator (PM) and provides a versatile streaming primitive for all long-running fleet tasks.

Current State

Currently, execute_prompt is fully synchronous from the MCP caller's perspective:

Registration: Registered as an MCP tool in src/index.ts (L224).
Execution: The execution logic in src/tools/execute-prompt.ts blocks on await strategy.execCommand(...) (L190).
Concurrency: The server enforces a strict per-member lock using inFlightAgents.add(agent.id) (L120 in execute-prompt.ts). Concurrent dispatches to the same member are instantly rejected.
Monitoring: The stallDetector (src/services/stall/index.ts) polls the LLM's conversation log (e.g. .jsonl files) to ensure the process hasn't hung.
Response: The tool returns a single, concatenated string payload containing the complete LLM response and token usage (L226-L232 in execute-prompt.ts). No partial results are streamed to the caller.
Async Analogue: The execute_command tool (src/tools/execute-command.ts) already implements a long_running fire-and-forget flag that returns a task_id, which the caller polls via monitor_task (src/index.ts L248).

Why the current model strains at scale

Orchestrator Connection Limits: Holding open an MCP request over stdio for 5-15 minutes (or longer) ties up resources and increases the risk of timeout drops on the PM side, even when the remote member is healthily computing.
Parallel Dispatch Bottleneck: The PM cannot efficiently dispatch parallel prompts to different members if the orchestrator thread is blocked waiting on the first prompt's resolution.
Lack of Live Feedback: The orchestrator is blind to the prompt's progress until the final payload arrives, degrading UX and making it harder to catch early failures.

SSE in MCP — what's actually possible

The @modelcontextprotocol/sdk (v1.27.0 in package.json) supports two primary transports: StdioServerTransport and SSEServerTransport.

Current Transport: apra-fleet uses stdio (JSON-RPC over standard input/output) for its MCP server because it operates primarily via CLI.
SSE Constraints: SSE requires an HTTP layer. To natively support SSE, the server would need to run SSEServerTransport over an HTTP server (e.g., Express).
Alternatives: If switching to an HTTP transport is undesirable, we can simulate SSE-like behavior over stdio using MCP notifications/message or by implementing a long-polling tool (similar to how monitor_task currently polls execute_command).

Implementation plan in phases

Phase 1: Async Dispatch Foundation
Modify executePromptSchema to accept an async: z.boolean().default(false) parameter.
If async=true, bypass the await strategy.execCommand block. Instead, spawn the process, capture the inv (invocation ID) or sessionId, and return immediately: { stream_id: "<id>", status: "started" }.
Phase 2: Streaming Endpoint / Transport Migration
Option A (Native SSE): Expose an HTTP port and configure SSEServerTransport. Implement the GET /sse endpoint where clients subscribe using stream_id.
Option B (Stdio Notifications): Use the existing MCP connection to send custom JSON-RPC notifications (e.g., prompt_progress) containing partial tokens or status updates keyed by stream_id.
Phase 3: PM Skill Evolution
Update skills/pm/SKILL.md to instruct the PM to use async=true for multi-member dispatches.
The PM will subscribe to the stream or poll monitor_task to gather results across parallel tasks.
Phase 4: Generalized Event Stream
Expand the SSE/Notification channel to broadcast fleet_status changes, file transfer progress, and execute_command outputs, deprecating the need for manual polling via monitor_task.

Risks

Transport Switch Overhead: Moving from stdio to HTTP/SSE introduces network configuration, port binding, and potential firewall/SSH-tunneling complexity.
Backward Compatibility: Existing sequential workflows in the PM assume execute_prompt blocks until completion. We must ensure async=false preserves the exact current behavior.
Resource Leaks: If a client dispatches asynchronously but never subscribes or disconnects, the server must still reap the process and clean up the stallDetector state.
Error Semantics: In async mode, if auth fails or the SSH connection drops, the error cannot be returned in the initial tool response. It must be propagated asynchronously.

Open Questions

Is the team willing to expose an HTTP server for the SSE transport, or should we strictly implement this using JSON-RPC notifications over the existing stdio transport?
If we use HTTP SSE, how do we handle authentication across the HTTP boundary, given that MCP over stdio is implicitly authorized by the local process?

Out-of-scope notes

Changing the underlying CLI commands or the mechanism by which Claude/Gemini logs are parsed.
Removing the stallDetector (it is still required to detect frozen background tasks).

feat: switch MCP transport from stdio to HTTP+SSE for server-push and...

Apra Fleet is an open-source MCP server

Milestone

Searches

Help

#258 feat: switch MCP transport from stdio to HTTP+SSE for server-push and event-driven workflows

Background

What needs to change in fleet

Immediate use case that motivates this

Other event-driven workflows this unlocks

Why this matters architecturally

Suggested approach

Labels

Discussion