Anonymous - 4 days ago

Originally posted by: kumaakh

Analysis: Non-blocking execute_prompt with SSE

TL;DR

The proposal shifts the execute_prompt MCP tool from a synchronous, blocking model to an asynchronous, fire-and-forget pattern. Instead of waiting for the LLM execution to complete, the server would immediately return a unique stream identifier. The caller can then subscribe to the stream to receive real-time updates via Server-Sent Events (SSE). This unlocks parallel prompt dispatching for the orchestrator (PM) and provides a versatile streaming primitive for all long-running fleet tasks.

Current State

Currently, execute_prompt is fully synchronous from the MCP caller's perspective:

  • Registration: Registered as an MCP tool in src/index.ts (L224).
  • Execution: The execution logic in src/tools/execute-prompt.ts blocks on await strategy.execCommand(...) (L190).
  • Concurrency: The server enforces a strict per-member lock using inFlightAgents.add(agent.id) (L120 in execute-prompt.ts). Concurrent dispatches to the same member are instantly rejected.
  • Monitoring: The stallDetector (src/services/stall/index.ts) polls the LLM's conversation log (e.g. .jsonl files) to ensure the process hasn't hung.
  • Response: The tool returns a single, concatenated string payload containing the complete LLM response and token usage (L226-L232 in execute-prompt.ts). No partial results are streamed to the caller.
  • Async Analogue: The execute_command tool (src/tools/execute-command.ts) already implements a long_running fire-and-forget flag that returns a task_id, which the caller polls via monitor_task (src/index.ts L248).

Why the current model strains at scale

  1. Orchestrator Connection Limits: Holding open an MCP request over stdio for 5-15 minutes (or longer) ties up resources and increases the risk of timeout drops on the PM side, even when the remote member is healthily computing.
  2. Parallel Dispatch Bottleneck: The PM cannot efficiently dispatch parallel prompts to different members if the orchestrator thread is blocked waiting on the first prompt's resolution.
  3. Lack of Live Feedback: The orchestrator is blind to the prompt's progress until the final payload arrives, degrading UX and making it harder to catch early failures.

SSE in MCP — what's actually possible

The @modelcontextprotocol/sdk (v1.27.0 in package.json) supports two primary transports: StdioServerTransport and SSEServerTransport.

  • Current Transport: apra-fleet uses stdio (JSON-RPC over standard input/output) for its MCP server because it operates primarily via CLI.
  • SSE Constraints: SSE requires an HTTP layer. To natively support SSE, the server would need to run SSEServerTransport over an HTTP server (e.g., Express).
  • Alternatives: If switching to an HTTP transport is undesirable, we can simulate SSE-like behavior over stdio using MCP notifications/message or by implementing a long-polling tool (similar to how monitor_task currently polls execute_command).

Implementation plan in phases

  1. Phase 1: Async Dispatch Foundation
  2. Modify executePromptSchema to accept an async: z.boolean().default(false) parameter.
  3. If async=true, bypass the await strategy.execCommand block. Instead, spawn the process, capture the inv (invocation ID) or sessionId, and return immediately: { stream_id: "<id>", status: "started" }.
  4. Phase 2: Streaming Endpoint / Transport Migration
  5. Option A (Native SSE): Expose an HTTP port and configure SSEServerTransport. Implement the GET /sse endpoint where clients subscribe using stream_id.
  6. Option B (Stdio Notifications): Use the existing MCP connection to send custom JSON-RPC notifications (e.g., prompt_progress) containing partial tokens or status updates keyed by stream_id.
  7. Phase 3: PM Skill Evolution
  8. Update skills/pm/SKILL.md to instruct the PM to use async=true for multi-member dispatches.
  9. The PM will subscribe to the stream or poll monitor_task to gather results across parallel tasks.
  10. Phase 4: Generalized Event Stream
  11. Expand the SSE/Notification channel to broadcast fleet_status changes, file transfer progress, and execute_command outputs, deprecating the need for manual polling via monitor_task.

Risks

  • Transport Switch Overhead: Moving from stdio to HTTP/SSE introduces network configuration, port binding, and potential firewall/SSH-tunneling complexity.
  • Backward Compatibility: Existing sequential workflows in the PM assume execute_prompt blocks until completion. We must ensure async=false preserves the exact current behavior.
  • Resource Leaks: If a client dispatches asynchronously but never subscribes or disconnects, the server must still reap the process and clean up the stallDetector state.
  • Error Semantics: In async mode, if auth fails or the SSH connection drops, the error cannot be returned in the initial tool response. It must be propagated asynchronously.

Open Questions

  • Is the team willing to expose an HTTP server for the SSE transport, or should we strictly implement this using JSON-RPC notifications over the existing stdio transport?
  • If we use HTTP SSE, how do we handle authentication across the HTTP boundary, given that MCP over stdio is implicitly authorized by the local process?

Out-of-scope notes

  • Changing the underlying CLI commands or the mechanism by which Claude/Gemini logs are parsed.
  • Removing the stallDetector (it is still required to detect frozen background tasks).