Menu

#152 arch: inter-fleet messaging via per-member named pipes / UDS

open
nobody
None
2026-04-26
2026-04-18
Anonymous
No

Originally created by: kumaakh

Idea

Explore an architecture where fleet-mcp instances can inject messages into each other's running conversations, enabling real-time PM↔member communication without polling.

Current limitation

Today, PM communicates with members only at dispatch time (execute_prompt). Once a session is running, the PM has no way to send new information into it. Members have no way to surface mid-session questions or status updates back to the PM without stopping (STOP + report).

Proposed architecture

Each fleet member, on startup, creates a named channel keyed to its member GUID:

  • Unix: a Unix Domain Socket at e.g. /tmp/fleet/<member-guid>.sock
  • Windows: a Named Pipe at \.\pipe\fleet-<member-guid>

The channel stays open for the lifetime of the Claude Code session.

PM → Member (inject):
PM uses its fleet-mcp instance to write a message to fleet/<member-guid> channel. The running member's fleet-mcp instance reads from that channel and injects the message into the active conversation turn (e.g. appended as a system-level prompt injection or a new user turn).

Member → PM (reply):
Members write back on a well-known PM channel (e.g. fleet/pm.sock) or a response channel. PM's fleet-mcp surfaces the message.

Use cases

  • PM sends a mid-sprint clarification without killing and re-dispatching the session
  • Member asks a blocking question without STOP — PM sees it and responds in-band
  • PM can signal STOP or model escalation to a running member
  • Members can report incremental progress that PM can surface to the user in real-time

Open questions

  • How does the fleet-mcp instance inject a message mid-turn in Claude Code? Does the Claude Code API support this? (Context append? Tool result injection?)
  • Security: channel should be locked to the local user (socket permissions / pipe ACLs)
  • What happens if PM is not running when a member writes to the PM channel? Buffer or drop?
  • Does this require changes to the Claude Code CLI itself, or can it be done purely via MCP tool results?
  • Named pipes on Windows vs UDS on Unix — need an abstraction layer in fleet-mcp

Status

Very preliminary — needs architecture exploration before any implementation. Log here for future sprint consideration.

  • Closes no issues — this is net-new capability
  • Potentially supersedes the polling pattern in fleet_status + monitor_task for local members

Related

Tickets: #152

Discussion

  • Anonymous

    Anonymous - 2026-04-23

    Originally posted by: kumaakh

    Technical direction: This is a research/architecture spike — no immediate implementation. Before committing, verify two prerequisites:

    1. Claude Code injection feasibility: Does the Claude Code CLI support injecting a message into an active running session from outside the process? The --output-format stream-json interface is output-only. The /inject or stdin-append path needs investigation — if the CLI doesn't support mid-session injection, the architecture is blocked at the member side regardless of the pipe mechanism.

    2. MCP tool result injection: The more feasible near-term approach is for the member's fleet-mcp to expose a receive_message tool that the member's Claude session can call as a poll ('any new messages for me?'). This avoids the injection problem entirely and works today without CLI changes.

    If proceeding with pipes:

    • Abstract the channel in a src/services/ipc-channel.ts module using net.createServer (Unix UDS) on macOS/Linux and \\.\pipe\fleet-<guid> via \\?\pipe\ on Windows (Node.js net module supports both).
    • Scope: PM-side write, member-side read only. PM channel is well-known (fleet/pm); member channels keyed to agent.id.
    • Security: fs.chmodSync(sockPath, 0o700) on Unix; on Windows use default pipe ACLs (local user only).

    Recommend starting with the polling pattern as a lower-risk alternative — defer the pipe architecture until Claude Code CLI supports injection natively.

     
  • Anonymous

    Anonymous - 2026-04-26

    Originally posted by: kumaakh

    Research: Inter-fleet messaging via named pipes / UDS — Feasibility findings

    1. Node.js IPC capabilities for UDS and Named Pipes

    Node.js net module has solid cross-platform support:

    • Unix (Linux/macOS): net.createServer() can listen on a filesystem path (Unix Domain Socket). Standard POSIX behavior — the socket file appears in the filesystem with permission bits.
    • Windows: net.createServer() can listen on \\.\pipe\<name>. Named Pipes are a first-class Windows IPC mechanism. Node.js handles the \\.\pipe\ prefix natively in the net module — no special libraries needed.
    • Cross-platform abstraction: Node.js abstracts both behind the same net.Server / net.Socket API. You pass a filesystem path on Unix or a \\.\pipe\... path on Windows. The fleet codebase already does this successfully in src/services/auth-socket.ts.

    Gotchas:

    • Windows pipe naming: Must use \\.\pipe\<name> format. Backslash escaping in JS strings requires \\\\.\\pipe\\.... Fleet's existing code handles this correctly (auth-socket.ts line 33).
    • Stale socket cleanup: On Unix, if the process crashes, the .sock file persists and blocks re-listen. Must fs.unlinkSync() before binding. On Windows, named pipes are kernel objects and auto-cleanup on process exit. Fleet already handles both cases (auth-socket.ts lines 49-52, 99-104).
    • EADDRINUSE on Windows: Named pipes may not release immediately after close. Fleet already implements retry logic for this (auth-socket.ts lines 103-104, up to 5 retries with 100ms delay).

    Existing precedent in apra-fleet: The auth-socket.ts module is a production-quality UDS/Named Pipe implementation using newline-delimited JSON protocol. It demonstrates the pattern works end-to-end across platforms.

    2. Claude Code injection point: Can a UDS message reach a running Claude Code process?

    This is the critical blocker. Even if fleet can deliver a message to the member's channel, the question is how Claude Code receives and acts on it.

    Claude Code has NO mechanism for mid-turn message injection:

    • claude -p reads prompt → runs agentic loop → exits. No external input channel during execution.
    • No signal handler, no watchdog file, no tool-result injection API, no IPC listener.
    • --input-format stream-json is for providing the initial input as a stream, not for injecting messages during execution.
    • The MCP (Model Context Protocol) server interface is for tools the model can call, not for external messages pushed to the model.

    What a UDS channel COULD do today (without Claude Code changes):

    1. Signal the fleet orchestrator (not Claude Code directly). The PM sends a message to the member's fleet-side channel, which then:
    2. Kills the running claude -p process
    3. Resumes with -c and the new directive
    4. This is the "cancel and redirect" pattern from issue [#75]
    5. Write to a watched file that the member's prompt instructs it to check. Same as the file-polling approach from [#75], but with UDS as the notification trigger instead of the PM writing the file directly.

    What would require upstream Claude Code changes:

    • A --listen <socket-path> flag that makes claude -p poll a UDS for injected user turns during its agentic loop
    • Or a /inject API endpoint if Claude Code exposed a local HTTP server during sessions
    • Or support for a custom MCP tool that blocks until a message arrives (effectively turning a tool call into a message receive)

    3. Security implications of UDS / Named Pipes

    Unix Domain Sockets:

    • Governed by filesystem permissions. Set socket file to 0600 (owner read/write only) — other users cannot connect.
    • Fleet already does this: fs.chmodSync(sockPath, 0o600) in auth-socket.ts line 112.
    • Any process running as the same user CAN connect. This is acceptable for fleet's threat model (same-user, same-machine).
    • For stronger isolation: place the socket inside a directory with 0700 permissions. Fleet uses ~/.apra-fleet/data/ with 0o700 mode.

    Windows Named Pipes:

    • By default, Named Pipes created by Node.js net.createServer() inherit the creating process's security descriptor. Only the same user session can connect.
    • For explicit control, you'd need native ACL manipulation (not exposed by Node.js net module). However, the default same-user-session scoping is sufficient.
    • Fleet's existing auth-socket.ts relies on this default behavior successfully.

    Risk assessment: Low. The same-user scoping is adequate. An attacker with same-user access already has full access to the member's files and processes anyway.

    4. Verdict: Feasibility and effort assessment

    Is this feasible without changes to Claude Code itself?

    Partially. The UDS/Named Pipe infrastructure is fully feasible — fleet already has a working implementation (auth-socket.ts). The gap is on the receiving end: Claude Code cannot accept mid-turn injected messages.

    What's achievable today (no upstream changes):

    • Each member runs a UDS listener (fleet-side, not Claude Code-side)
    • PM sends messages to this channel
    • The member's fleet handler receives the message and acts on it:
    • Option A: Kill + resume with -c ("cancel and redirect")
    • Option B: Write to a file in the work folder for the agent to discover on its next tool call
    • Option C: Queue the message and deliver it as part of the next execute_prompt resume

    What requires upstream Claude Code changes:

    • True mid-turn injection where the agent sees a new user message during its agentic loop
    • This would need something like claude -p --listen /path/to/socket or an HTTP control plane

    Effort comparison:

    Approach Effort Reliability Requires upstream changes?
    File polling (#75 Approach A) Low Low (timing-dependent) No
    Cancel + resume (#75 Approach B) Low-Medium High (deterministic) No
    UDS channel → cancel + resume (#152) Medium High No
    UDS channel → mid-turn inject (#152 full vision) High Highest Yes (Claude Code changes)

    Recommendation:

    1. Phase 1: Implement "cancel and redirect" in execute_prompt (issue [#75]). This is the highest-value, lowest-effort change.
    2. Phase 2: Add per-member UDS channels (#152) as the transport layer. Reuse the auth-socket.ts pattern. The channel delivers messages to the fleet handler, which uses cancel+resume to act on them.
    3. Phase 3 (future/upstream): If Claude Code adds a mid-turn injection mechanism, wire the UDS channel directly to it.

    The UDS infrastructure from [#152] is the right long-term architecture, but its value is limited until Claude Code supports mid-turn injection. The pragmatic path is cancel+resume first, UDS transport second.

     

    Related

    Tickets: #152
    Tickets: #75

  • Anonymous

    Anonymous - 2026-04-26

    Originally posted by: kumaakh

    Research: Inter-fleet messaging via UDS — supplemental findings

    Building on the prior research, I explored the codebase more deeply and identified both a strong existing precedent and an alternative to the "Claude Code injection blocker."

    1. Node.js IPC: auth-socket.ts is a production-ready template

    The existing src/services/auth-socket.ts is effectively a complete UDS/Named Pipe implementation that can be adapted:

    Capability auth-socket.ts Needed for [#152]
    Cross-platform (UDS + Named Pipe) ✅ lines 30-35
    Newline-delimited JSON protocol ✅ lines 57-95
    Stale socket cleanup (Unix) ✅ lines 49-51
    EADDRINUSE retry (Windows) ✅ lines 99-104
    Socket permissions (0o600) ✅ line 112
    Message size limits ✅ line 12 (64KB)
    Async waiter pattern waitForPassword() Adapt for message queue

    The path convention would be:

    • Unix: ~/.apra-fleet/data/channels/<member-guid>.sock
    • Windows: \\.\pipe\apra-fleet-channel-<member-guid>

    Effort estimate for the channel infra itself: ~2-3 days. Most of the code can be factored out of auth-socket.ts into a generic ipc-channel.ts.

    2. Claude Code injection: --input-format stream-json as a bypass

    The prior research correctly identified that Claude Code has no mid-turn injection API. However, I found evidence that --input-format stream-json mode may accept ongoing user messages via stdin (see my comment on [#75] for full details).

    If confirmed, the architecture becomes:

    PM → UDS channel → member's fleet handler → writes to claude -p stdin (stream-json)
    

    This eliminates the "Claude Code injection blocker" without requiring any upstream changes. The UDS channel serves as the transport; stream-json stdin serves as the delivery mechanism.

    3. Security assessment (expanded)

    Same-user scoping is the right model. Both UDS and Named Pipes naturally scope to the user:

    • Unix: Socket at 0600 in a 0700 directory. Only the owning user can connect. Fleet already uses this pattern (FLEET_DIR is created with 0o700).
    • Windows: Named pipes inherit the creator's security descriptor. Same-session scoping by default.
    • Threat model: An attacker with same-user access already owns the member's files, SSH keys, and API keys. The UDS channel doesn't expand the attack surface.

    One additional concern: If the PM channel (fleet/pm.sock) accepts messages from any member, a compromised member could impersonate another member's responses. Mitigation: include member_id in the message and validate it against the socket's expected sender (though this is weak since a same-user attacker can spoof). For fleet's trust model (all members are the same user), this is acceptable.

    4. Revised architecture with stream-json

    Phase 1 — No UDS needed (issue [#75] scope):

    • Implement stream-json stdin injection in execute_prompt for local members
    • Keep running process handles in memory: Map<agentId, { stdin, sessionId }>
    • Add update: true parameter to inject into running sessions

    Phase 2 — UDS channels (this issue):

    • Factor auth-socket.ts into generic ipc-channel.ts
    • Each member listens on ~/.apra-fleet/data/channels/<id>.sock
    • PM writes messages to member channels
    • Member fleet handler forwards to stream-json stdin (local) or queues for next resume (remote)
    • PM channel for member→PM messages (status updates, blocking questions)

    Phase 3 — Remote member support:

    • For SSH members, the UDS channel runs on the PM's machine
    • PM writes to local UDS → fleet handler sends message over SSH to the running process
    • Requires keeping SSH channel stdin open (not calling stream.end())
    • Fallback: queue messages and deliver on next execute_prompt resume

    5. Verdict update

    The UDS architecture is feasible today with the stream-json approach, though the full vision requires two prerequisites to be confirmed:

    1. --input-format stream-json multi-turn injection works (needs testing — high confidence based on CLI help text but unverified)
    2. SSH channels can keep stdin open for long-running sessions (engineering challenge, not a blocker)

    If stream-json injection is confirmed, the UDS architecture from this issue becomes the right long-term transport layer — it decouples message delivery (UDS) from message consumption (stream-json stdin), and both are implementable without upstream Claude Code changes.

    Recommended priority: Test stream-json injection first (1-2 hours). If confirmed, implement Phase 1 (no UDS, just stdin injection) under [#75], then build UDS channels as Phase 2 under this issue.

     

    Related

    Tickets: #152
    Tickets: #75


Log in to post a comment.

MongoDB Logo MongoDB