>cogtrix v0.3.0

WebSocket protocol

Cogtrix WebSocket Protocol

Version: v1 Endpoint: ws://host/ws/v1/sessions/{session_id} Log stream: ws://host/ws/v1/logs

Related documents:

  • docs/API/OVERVIEW.md — API orientation, quick start, authentication model
  • docs/API/CLIENT_CONTRACT.md — TypeScript types and full WebSocket usage example (SessionSocket class)
  • docs/API/WEBUI_DEVELOPMENT_GUIDE.md — React integration patterns for streaming

1. Overview

WebSockets are used exclusively for real-time streaming surfaces:

  • Token-by-token agent output streaming (session WebSocket)
  • Tool execution progress (session WebSocket)
  • Tool confirmation dialogs (session WebSocket)
  • Live log streaming (log WebSocket, admin only)

All other operations use the REST API.


2. Authentication

The JWT bearer token (or cgx_live_ API key) may be provided via either of two paths. Both reach the same downstream validation pipeline; pick whichever matches your client environment.

2.1 Authorization header — CLI / SDK clients

Authorization: Bearer <jwt>

Case-insensitive on the scheme (bearer/Bearer/BEARER all accepted). This is the canonical path for CLI tools, server-side SDKs, and any HTTP client that can set custom request headers on the WebSocket upgrade.

2.2 Sec-WebSocket-Protocol — browser clients (#1887)

Sec-WebSocket-Protocol: bearer, <jwt>

The browser WebSocket constructor does not allow setting custom headers on the upgrade request; the only browser-portable way to attach auth to a WebSocket connection is the protocols argument:

const ws = new WebSocket(url, ["bearer", token]);

The server extracts the second list element as the token when the first is bearer (case-insensitive). Per RFC 6455 the server echoes the selected subprotocol back on accept (Sec-WebSocket-Protocol: bearer) — without that echo Chromium / Firefox close the connection client-side with 1002 (Protocol error).

When both paths are present, the Authorization header wins; no subprotocol is echoed in the response.

Operator note — handshake-header logging

The Sec-WebSocket-Protocol header is logged by some reverse proxies that do not redact it the way Authorization is conventionally redacted (nginx $http_sec_websocket_protocol, for example, is captured by default in many configurations). TLS protects the value in transit; this concern is server-side logging only.

If your ingress logs handshake headers, add a redaction rule for Sec-WebSocket-Protocol containing bearer, — or strip the header from access logs entirely. Authorization-header path clients are unaffected.

2.3 Close codes

If the token is missing, malformed, or invalid the server closes with:

  • Close code 4001 — unauthorized (missing token, invalid token, revoked API key, inactive user)
  • Close code 4003 — forbidden (valid token but wrong role / ownership)
  • Close code 4004 — session not found (auth succeeded, no such session id)

3. Message Envelope

All messages in both directions use this JSON envelope:

Server → Client

{
  "type": "<message_type>",
  "session_id": "<uuid>",
  "payload": { ... },
  "seq": 42,
  "ts": "2026-03-04T12:34:56.789Z"
}
FieldTypeDescription
typestringMessage type discriminator (see Section 4)
session_idstringUUID v4 of the session this message belongs to
payloadobjectType-specific payload (see Section 4)
seqintMonotonically increasing per-connection sequence number
tsstringISO 8601 UTC server timestamp

Client → Server

{
  "type": "<message_type>",
  "payload": { ... }
}

4. Message Types

4.1 Server → Client Messages

token — Incremental LLM Output Token

Emitted once per output token during agent generation. The frontend appends text to the response buffer.

{
  "type": "token",
  "session_id": "3f2504e0-4f89-11d3-9a0c-0305e82c3301",
  "payload": {
    "text": " Paris",
    "final": true
  },
  "seq": 42,
  "ts": "2026-03-04T12:34:56.789Z"
}
Payload fieldTypeDescription
textstringIncremental token text
finalbooltrue when this token is part of the final response (after all tool calls complete). false during preamble text before tool calls. Use this to distinguish intermediate reasoning from the actual answer. Only meaningful when tool_call_count > 0; false until the first tool call is seen.

tool_start — Tool Execution Began

Emitted when the agent invokes a tool.

{
  "type": "tool_start",
  "session_id": "3f2504e0-4f89-11d3-9a0c-0305e82c3301",
  "payload": {
    "tool_name": "web_search",
    "tool_call_id": "call_abc123",
    "input": {
      "query": "climate policy 2025"
    }
  },
  "seq": 43,
  "ts": "2026-03-04T12:34:57.001Z"
}
Payload fieldTypeDescription
tool_namestringTool name
tool_call_idstringUnique invocation ID (links to tool_end)
inputobjectArguments passed to the tool

tool_end — Tool Execution Completed

Emitted when a tool returns (success or error).

{
  "type": "tool_end",
  "session_id": "3f2504e0-4f89-11d3-9a0c-0305e82c3301",
  "payload": {
    "tool_name": "web_search",
    "tool_call_id": "call_abc123",
    "duration_ms": 340,
    "error": null
  },
  "seq": 44,
  "ts": "2026-03-04T12:34:57.341Z"
}
Payload fieldTypeDescription
tool_namestringTool name
tool_call_idstringUnique invocation ID (matches tool_start)
duration_msintExecution time in milliseconds
errorstring/nullError description on failure; null on success

tool_confirm_request — Tool Awaiting User Confirmation

Emitted when a safety-wrapped tool requires human approval before execution. The agent is blocked until the client sends a tool_confirm response.

The frontend must display a confirmation dialog immediately.

{
  "type": "tool_confirm_request",
  "session_id": "3f2504e0-4f89-11d3-9a0c-0305e82c3301",
  "payload": {
    "confirmation_id": "conf_3f2504e0",
    "tool": "write_file",
    "parameters": {
      "path": "/home/user/report.md",
      "content": "# Climate Report\n..."
    },
    "message": "Write 2 KB to /home/user/report.md"
  },
  "seq": 45,
  "ts": "2026-03-04T12:34:58.001Z"
}
Payload fieldTypeDescription
confirmation_idstringOpaque ID to echo in the tool_confirm response
toolstringTool requiring confirmation
parametersobjectTool call parameters (large values sorted last)
messagestringHuman-readable description of the action

agent_state — Agent State Machine Transition

Emitted when the agent transitions between execution phases.

{
  "type": "agent_state",
  "session_id": "3f2504e0-4f89-11d3-9a0c-0305e82c3301",
  "payload": {
    "state": "thinking"
  },
  "seq": 46,
  "ts": "2026-03-04T12:34:58.050Z"
}
Payload fieldTypeDescription
statestringOne of: idle, thinking, analyzing, researching, deep_thinking, writing, delegating, done, error

State transitions by mode:

stateDiagram-v2 direction LR [*] --> idle idle --> thinking thinking --> done : normal thinking --> analyzing : think mode analyzing --> researching : if web tools analyzing --> deep_thinking researching --> deep_thinking deep_thinking --> done thinking --> delegating : delegate mode delegating --> done done --> [*]

memory_update — Memory Compaction Occurred

Emitted when the background memory subsystem runs a summarization or compression pass.

{
  "type": "memory_update",
  "session_id": "3f2504e0-4f89-11d3-9a0c-0305e82c3301",
  "payload": {
    "mode": "conversation",
    "tokens_used": 1200,
    "summarized": true
  },
  "seq": 47,
  "ts": "2026-03-04T12:34:58.200Z"
}
Payload fieldTypeDescription
modestringActive memory mode
tokens_usedintEstimated context token count after update
summarizedbooleanTrue when a LLM summarization pass ran

error — Agent-Level Error

Emitted when the agent encounters an error during the turn (not a WebSocket protocol error). The connection stays open; the frontend should display the error in the chat UI.

{
  "type": "error",
  "session_id": "3f2504e0-4f89-11d3-9a0c-0305e82c3301",
  "payload": {
    "code": "TOOL_EXPANSION_FAILED",
    "message": "web_search could not be loaded: API key not configured."
  },
  "seq": 48,
  "ts": "2026-03-04T12:34:58.300Z"
}
Payload fieldTypeDescription
codestringMachine-readable error code
messagestringHuman-readable description safe to display

done — Agent Turn Complete

Emitted when the agent turn finishes (successfully or after an error recovery). Always the last message for a turn.

{
  "type": "done",
  "session_id": "3f2504e0-4f89-11d3-9a0c-0305e82c3301",
  "payload": {
    "message_id": "7a3c1b2e-5d4f-11ee-be56-0242ac120002",
    "total_tokens": 1800,
    "input_tokens": 1420,
    "output_tokens": 380,
    "duration_ms": 4200,
    "tool_calls": 3,
    "text": "The capital of France is Paris."
  },
  "seq": 49,
  "ts": "2026-03-04T12:34:59.200Z"
}
Payload fieldTypeDescription
message_idstringUUID of the AI message created
total_tokensintTotal tokens for this turn
input_tokensintInput tokens
output_tokensintOutput tokens
duration_msintWall-clock turn duration in milliseconds
tool_callsintNumber of tool invocations
textstringFull assembled agent response text for this turn

pong — Keepalive Response

Response to a client ping message.

{
  "type": "pong",
  "session_id": "3f2504e0-4f89-11d3-9a0c-0305e82c3301",
  "payload": {},
  "seq": 50,
  "ts": "2026-03-04T12:35:00.001Z"
}

log_line — Live Log Record (log stream only)

Emitted on the /ws/v1/logs endpoint only. Note: log stream messages are plain JSON dicts — they do NOT use the common ServerMessage envelope (no session_id, seq, or ts fields).

{
  "type": "log_line",
  "level": "INFO",
  "logger": "cogtrix.orchestration.runner",
  "message": "Agent turn completed in 4.2s",
  "timestamp": "2026-03-04T12:34:59.200Z"
}

4.2 Client → Server Messages

user_message — Send a Message Over WebSocket

Alternative to the REST POST /api/v1/sessions/{id}/messages. Useful for low-latency chat UIs that want to avoid an extra HTTP round-trip.

{
  "type": "user_message",
  "payload": {
    "text": "What is the capital of France?",
    "mode": "normal"
  }
}
Payload fieldTypeDescription
textstringUser message text (1–65536 chars)
modestringnormal, think, or delegate

tool_confirm — User Decision on Tool Confirmation

Must be sent in response to a tool_confirm_request message.

{
  "type": "tool_confirm",
  "payload": {
    "confirmation_id": "conf_3f2504e0",
    "action": "allow"
  }
}
Payload fieldTypeDescription
confirmation_idstringThe confirmation_id from the tool_confirm_request
actionstringallow, deny, allow_all, disable, forbid_all, or cancel

Action semantics (mirrors CLI options):

ActionCLI keyDescription
allowyAllow this invocation once
denynDeny this invocation; agent may retry
allow_allaAuto-approve this tool for the entire session
disabledDisable this tool for the entire session
forbid_allfBlock all further tool requests this turn
cancelcCancel the current agent workflow entirely

ping — Keepalive

Must be sent every 30 seconds. Connections silent for 90 seconds are dropped.

{
  "type": "ping",
  "payload": {}
}

cancel — Cancel Current Agent Turn

Signals the server to abort the in-progress agent turn. The server transitions to agent_state: idle first, then sends an error message with code CANCELLED. A done message is not sent. The connection remains open for the next turn.

{
  "type": "cancel",
  "payload": {}
}

5. Connection Lifecycle

sequenceDiagram participant C as Client participant S as Server C->>S: WS connect + JWT Note right of S: validate token & session ownership S->>C: agent_state (idle) C->>S: user_message / POST REST S->>C: agent_state (thinking) S->>C: tool_start (web_search) S->>C: tool_confirm_request Note right of S: if tool needs confirmation C->>S: tool_confirm (allow) S->>C: tool_end (web_search) S->>C: agent_state (analyzing) Note right of S: think mode: classifying task S->>C: agent_state (researching) Note right of S: think mode: research delegate S->>C: agent_state (deep_thinking) Note right of S: think mode: deep reasoning S->>C: agent_state (delegating) Note right of S: delegate mode: parallel delegation S->>C: agent_state (writing) S->>C: token ("The capital") S->>C: token (" is Paris") S->>C: memory_update Note right of S: if summarization ran S->>C: agent_state (done) S->>C: done C->>S: ping Note right of S: every 30s S->>C: pong

6. Error Handling

Sending a message while a turn is in progress

If a user_message arrives while an agent turn is already running, the server sends an error payload with code TURN_IN_PROGRESS. The connection remains open and the in-progress turn is unaffected. Wait for the done message before sending another message.

Agent crashes mid-stream

If the agent raises an unrecoverable exception during a turn:

  1. Server sends type: error with a descriptive error code and message.
  2. Server transitions to agent_state: error, then agent_state: idle.
  3. The connection remains open for the next turn.

WebSocket protocol errors

Close codeMeaning
4000Session registry unavailable — server is still starting up
4001Unauthorized — no token, invalid signature
4003Forbidden — valid token, wrong role or session ownership
4004Session not found — session does not exist or was archived
1000Normal closure
1001Server going away (shutdown); also used when a second connection replaces the first
1011Internal server error

7. Reconnection Strategy

The seq field enables the frontend to detect dropped messages and recover:

  1. Store last_seen_seq in memory (reset to -1 on new page load).
  2. On reconnect, send ?last_seq=<last_seen_seq> as a query parameter.
  3. The server replays buffered messages with seq > last_seq (buffer kept 30 s post-disconnect).
  4. If last_seq is too old (buffer expired), the server sends the current state only.

Recommended reconnect strategy:

  • Immediate reconnect on first disconnect.
  • Exponential backoff: 1s → 2s → 4s → 8s → 16s → cap at 30s.
  • Stop retrying after 10 consecutive failures; show the user an error.
  • On successful reconnect, fetch message history via REST to fill any gap.

8. Log Stream WebSocket (/ws/v1/logs)

Admin-only endpoint for live log streaming.

ws://host/ws/v1/logs?token=<jwt>&level=INFO

Query parameters:

  • token — JWT bearer token (admin required).
  • level — minimum log level to stream: DEBUG, INFO, WARNING, ERROR (default INFO).

Streams log_line messages as they are emitted. Same keepalive timing applies (drop after 90 s of silence). Note: the log stream uses a plain text "ping" string for keepalive (not the ClientMessage JSON envelope used by session WebSockets). The server responds with {"type": "pong"}.