Cogtrix Configuration Reference
This page covers every way to configure Cogtrix — from the simplest environment variable to a full multi-provider config file. If you just want to get running, the Quick Start in the README is all you need; come back here when you want to customize.
Table of Contents
- Configuration Priority
- Configuration File
- Environment Variables
- Command Line Arguments
- Complete Configuration Example
- Migration Guide
- Debugging & Logging
Configuration Priority
Configuration is loaded from multiple sources with the following priority (highest to lowest):
- Command line arguments — Override everything
- Environment variables — Override config file
- Configuration file (
.cogtrix.json/.cogtrix.yml/.cogtrix.yaml) — Base settings - Built-in defaults — Fallback values
Configuration File
Both JSON and YAML formats are supported. Create a config file in one of these locations (first found wins):
./.cogtrix.json./.cogtrix.ymlor./.cogtrix.yaml~/.cogtrix.json~/.cogtrix.ymlor~/.cogtrix.yaml~/.config/cogtrix/cogtrix.json~/.config/cogtrix/cogtrix.ymlor~/.config/cogtrix/cogtrix.yaml
Within each directory, JSON is checked first, then .yml, then .yaml.
General Settings
session: default
| Option | Type | Default | Description |
|---|---|---|---|
session | string | "default" | Session ID for memory persistence |
The active model is selected via models.default (see Models Section). The legacy top-level provider and model keys still work but are deprecated — they are auto-migrated at load time.
Cron Jobs
Define recurring jobs in the config file so they are loaded at startup. Each job can run in a fresh isolated context or inherit the current session state when the host process provides an inherited-context runner.
cron:
- name: nightly status
schedule: "0 2 * * *"
prompt: "Summarize the latest team status."
context: inherit
| Option | Type | Default | Description |
|---|---|---|---|
name | string | "" | Human-readable label used in cron_list |
schedule | string | required | 5- or 6-field cron expression |
prompt | string | required | Prompt to send when the job fires |
context | string | "fresh" | fresh uses an isolated invocation; inherit reuses the current session history and tools when available |
Providers Section
Providers hold connection info only — the type, endpoint, and credentials needed to reach an LLM API. Model settings (model name, temperature, context window, max tokens) live in the Models Section instead.
providers:
spark-cluster:
type: openai
base_url: "http://192.168.70.254:8080/v1"
api_key: "sk-..."
openai:
type: openai
api_key: "sk-..."
local:
type: ollama
base_url: "http://localhost:11434"
groq:
type: openai
base_url: "https://api.groq.com/openai/v1"
api_key: "gsk-..."
Note: The key
"providers"is preferred. The legacy key"inference"still works as an alias for backward compatibility.
Provider Options
| Option | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Provider type: "openai", "ollama", "anthropic", or "google" (case-insensitive) |
base_url | string | No | API endpoint URL |
api_key | string | No | API key. Omit or leave empty for unauthenticated OpenAI-compatible endpoints (vLLM, LM Studio). Required for OpenAI, Groq, Together, Anthropic, Google, xAI, and DeepSeek. Not used by Ollama. |
tool_instructions | string | No | Custom tool-call formatting instructions appended to the system prompt. Not injected by default — bind_tools() handles formatting at the API level. Set a non-empty string only for providers that need explicit guidance. |
Provider Types
| Type | Use For | Default Model | Default Base URL |
|---|---|---|---|
openai | OpenAI, Groq, Together, vLLM, LocalAI | gpt-4.1-mini | https://api.openai.com/v1 |
ollama | Ollama servers | qwen3:8b | http://localhost:11434 |
anthropic | Anthropic Claude | claude-sonnet-4-5 | SDK default |
google | Google Gemini | gemini-2.5-flash | SDK default |
xAI (Grok) and DeepSeek use type: openai with a custom base_url (https://api.x.ai/v1 and https://api.deepseek.com/v1 respectively). The setup wizard offers both as named choices and auto-detects XAI_API_KEY / DEEPSEEK_API_KEY from the environment.
Optional dependencies: langchain-anthropic (uv pip install "cogtrix[anthropic]"), langchain-google-genai (uv pip install "cogtrix[google]").
Memory Section
Configure memory management:
memory:
mode: conversation
modes:
conversation:
working_memory_size: 25
summarization: true
vector_recall_k: 3
code:
working_memory_size: 30
max_files: 20
summarization: true
vector_recall_k: 3
reasoning:
working_memory_size: 30
max_decisions: 20
summarization: true
vector_recall_k: 3
| Option | Type | Default | Description |
|---|---|---|---|
mode | string | "conversation" | Active memory mode |
modes | object | {} | Mode-specific configurations |
Hybrid Memory Options (per mode)
All modes support hybrid memory — a combination of a sliding window, incremental summarization, and optional vector recall that keeps the agent aware of older conversation context.
| Option | Type | Default | Description |
|---|---|---|---|
summarization | bool | true | Enable LLM-based rolling summary of older messages. Set to false to save LLM calls on metered APIs. |
vector_recall_k | int | 3 | Number of semantically similar past exchanges to retrieve per turn. Set to 0 to disable vector recall. |
Hybrid memory is automatically enabled when an LLM is available. The vector recall component additionally requires an embedding provider — Cogtrix attempts to auto-detect one at startup (tries Ollama’s nomic-embed-text first, then falls back to OpenAI if OPENAI_API_KEY is set). If no embedding provider is available, vector recall is silently skipped while summarization still functions normally.
See Memory modes for detailed mode options and a full explanation of the hybrid memory system.
RAG Section
Configure document ingestion for knowledge base:
rag:
docs_dir: docs
vectordb_dir: vectordb
chunk_size: 2000
chunk_overlap: 200
model: embed-local
| Option | Type | Default | Description |
|---|---|---|---|
docs_dir | string | "docs" | Source documents directory |
vectordb_dir | string | "vectordb" | Vector database output directory |
chunk_size | int | 2000 | Text chunk size in characters |
chunk_overlap | int | 200 | Overlap between chunks |
model | string | null | Model name from the models registry to use for embeddings. Falls back to the active provider when not set. |
Note: The model field references a named entry in the top-level models registry. Define an embedding model there and point rag.model at it. The provider connection details (type, base_url, api_key) are resolved automatically from the matching provider config.
See RAG / knowledge base for detailed setup instructions.
Models Section
The models registry assigns short names to specific provider/model combinations. All model settings (model name, temperature, context window, max output tokens) live here. Providers hold only connection info.
The models.default key selects which model alias is active when Cogtrix starts. It is the primary way to choose which model to use.
models:
default: oss # active model alias at startup
oss:
provider: spark-cluster
model: gpt-oss
temperature: 0.5
gpt-4o:
provider: openai
model: gpt-4o
temperature: 0.7
local-qwen:
provider: local
model: qwen3:8b
context_window: 131072
embed:
provider: spark-cluster
model: qwen3-embedding
temperature: 0.0
regular: spark-cluster/gpt-oss # string shorthand
The models registry is used by:
models.default— selects the active model alias at startup- The
-mCLI flag — start Cogtrix with any model alias:python cogtrix.py -m gpt-4o - The
/modelcommand — switch at runtime:/model local-qwen - The delegation tools — the agent uses model aliases to pick the best model for a subtask
- The
rag.modelfield — reference an embedding model by alias - The
context_compression.modelfield — reference a compression model by alias
Backward compatibility: The key
model_aliasesstill works in config files as an alias formodels. New configs should usemodels.
Model Entry Formats
String shorthand — "provider/model" creates a minimal model entry with no overrides:
models:
regular: spark-cluster/gpt-oss
fast: local/qwen3:8b
Object format — full control over all model-level settings:
models:
coder:
provider: local
model: qwen3-coder
temperature: 0.3
context_window: 32768
max_tokens: 8192
Model Object Fields
| Field | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | References a key in the providers section |
model | string | Yes | Model name as the provider expects it |
temperature | float | No | Sampling temperature, 0.0–2.0 |
context_window | int | No | Context window size in tokens (>= 256). Forwarded to Ollama as num_ctx; silently ignored for OpenAI, Anthropic, and Google. Accepted aliases: context_length, num_ctx. |
max_tokens | int | No | Maximum output tokens per LLM call (>= 1) |
Using Models
python cogtrix.py -m oss # Use the "oss" model alias
python cogtrix.py -m local-qwen # Use local-qwen with its configured context_window
At runtime:
You: /model gpt-4o
Switched to model gpt-4o (openai)
The /model command lists all aliases with an active marker (*) next to the current selection. The /provider command is read-only — use /model to switch models.
Delegate Section
Configure task delegation to other models:
delegate:
enabled: true
default_timeout: 60
allowed_models:
- coder
- smart
- fast
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable/disable delegation |
default_timeout | int | 60 | Default timeout in seconds |
allowed_models | array | All models | Model names from the models registry the agent may delegate to |
allowed_providers | array | All providers | Provider names allowed for delegation |
allowed_models restricts which model names the agent may use when delegating. If omitted, all entries in the models registry are available. This is the recommended way to control delegation scope — configure a broad set of models in models, then whitelist a subset in allowed_models:
models:
fast: my-server/qwen3:8b
smart: openai/gpt-4.1
coder: my-server/qwen3-coder
delegate:
enabled: true
allowed_models: [coder, smart] # agent can only delegate to these two
allowed_providers restricts by provider name and is an additional guard. Both checks must pass for delegation to proceed.
Backward compatibility:
delegate.modelsstill works for defining models scoped to the delegate section. If both top-levelmodelsanddelegate.modelsare present, the top-level definition takes priority. The olderdelegate.model_aliaseskey is also still recognized.
Research Delegate Section
When the user requests deep reasoning (via /think or “think deeply” in a prompt) and the agent has
used web tools during its initial research, Cogtrix can spawn a research delegate — a sub-agent
that re-fetches the same URLs with a much larger context budget and extracts structured, verbatim
specifications instead of lossy summaries. The extracted content is then fed into the deep_think
engine as high-fidelity context.
research_delegate:
enabled: true
cap_ratio: 0.85
timeout: 300
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable/disable the research delegate pipeline |
cap_ratio | float | 0.85 | Proportion of max_context_tokens allocated to the delegate’s tool output cap. Higher values let the delegate load more page content. Clamped to 0.50–0.95. |
timeout | int | 300 | Maximum seconds for the delegate agent to run. Clamped to 60–600. |
auto | bool | false | When true, automatically trigger the research delegate whenever the agent’s tool output exceeds auto_threshold of the context window. |
auto_threshold | float | 0.50 | Fraction of context used by tool output that triggers automatic delegation when auto: true. |
How it works:
- The main agent runs its initial research (web searches, content fetching) with the normal output cap.
- Cogtrix extracts the URLs the agent visited from its tool call history.
- A research delegate agent is spawned with the same provider/model configuration. Its web tools are temporarily patched to allow output up to
cap_ratio × max_context_tokens × 4characters. - The delegate is instructed to fetch each URL and extract exact specifications — schemas, field names, code examples, file paths — without summarizing or paraphrasing.
- The delegate’s structured output replaces the raw tool dumps as primary context for
deep_think. - After the delegate finishes (or times out), the original tool output caps are restored.
When to tune:
- Set
enabled: falseif you don’t use web research with deep thinking, or if you want to save LLM calls on a metered API. - Increase
cap_ratiotoward0.95if the delegate’s output is being truncated and you have a large context window. - Increase
timeoutif the delegate is timing out on slow models or large pages.
Decision Accountability
Decision accountability adds an explicit self-debate layer to the agent’s reasoning. When enabled, the agent is instructed to produce a structured plan with assumptions and evidence, then generate a counter-plan (“why this might be wrong”) before acting. Responses where the adjusted confidence falls below the threshold receive an uncertainty note so you can review before proceeding.
Off by default — this feature is opt-in. Enable it for high-stakes autonomous work (code changes, shell commands, API calls) where traceable reasoning matters.
decision_accountability:
enabled: true
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable the self-debate prompt and response parsing. Off by default — opt in when you need traceable reasoning. |
min_confidence_threshold | float | 7.0 | Adjusted confidence (0–10) below which the agent appends an uncertainty note. Adjusted confidence = base confidence − 1.0 per identified critical flaw. |
require_counter_plan | bool | true | When true, the accountability prompt instructs the agent to always produce a counter-plan before acting. |
report_uncertainty | bool | true | When true, responses where adjusted confidence falls below the threshold receive a visible ⚠️ Decision accountability: note. |
How it works:
- When
enabled: true, Cogtrix appends the accountability block to the system prompt at session start. The block instructs the agent to structure each action plan using named delimiters (---PLAN---,---ASSUMPTIONS---,---EVIDENCE---,---CONFIDENCE---,---COUNTER-PLAN---,---FLAWS---). - After every model response, Cogtrix parses this structure from the response text.
- The confidence score is adjusted: each identified critical flaw reduces it by 1.0.
- When the adjusted confidence falls below
min_confidence_thresholdandreport_uncertainty: true, a note is appended to the response:
⚠️ Decision accountability: confidence 5.0/10 with 2 critical flaw(s): Missing validation;
No rollback path. Adjusted confidence 3.0/10 is below threshold 7.0. Proceeding with caution.
- The full structured output (plan, assumptions, evidence, counter-plan, flaws, confidence) is logged at INFO level under
decision_accountability:for auditing.
Interaction with /think:
Decision accountability and Deep Think (/think) are independent features. Deep Think explores multiple solution branches in parallel. Decision accountability adds a plan/counter-plan layer to every agent action turn. Both can be active at the same time.
When to use:
- Enable for agents running autonomously on sensitive tasks (file edits, git operations, deployments).
- Keep disabled for conversational sessions, simple lookups, or when using fast/small models where the extra prompt tokens would hurt performance.
- The additional ~600 tokens in the system prompt add roughly 0.2–0.5s to TTFT depending on provider; no extra LLM calls are made (the agent reasons within its own response).
Task Ownership Classifier
The task ownership classifier analyses each user prompt before the agent starts to determine whether the request asks the agent to execute an action or explain how to do it. This prevents the agent from acting when the user only wants information (e.g. “check how to install gh” should explain, not install).
On by default. Disable only if you want the agent to always proceed without ownership analysis.
task_ownership_classifier:
enabled: true # set to false to disable entirely
llm_fallback: false # when true, calls the LLM for ambiguous prompts (adds latency)
ambiguous_action: ask # what to do when ownership is ambiguous: ask | inform | execute
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable pre-prompt ownership classification. |
llm_fallback | bool | false | Use an LLM micro-call for borderline cases. Improves accuracy at the cost of added latency. Off by default. |
ambiguous_action | string | "ask" | How to handle a prompt whose ownership cannot be determined. ask — inject a clarification constraint into the system prompt and run the agent normally; the agent asks one focused question and does not execute until intent is confirmed; inform — treat as informational; execute — treat as execution request. |
Pre-Action Confirmation
When enabled, the agent is instructed to request explicit confirmation before any irreversible operation (delete, uninstall, deploy to production, drop table/database, overwrite data, format/wipe storage). Before executing such a tool, the agent states exactly what it is about to do and asks “Shall I proceed?” — it waits for explicit consent before continuing.
Consent is recognized when the user’s reply contains execution keywords: go ahead, yes do it, proceed, confirmed, yes install it, etc. Without such confirmation, the agent does not call the tool.
For the pre-execution safety gate available independently of this setting, see Task Ownership Classifier above — it constrains the agent to explain rather than act when it detects informational or advisory intent, and prompts for clarification on ambiguous requests.
pre_action_confirmation:
enabled: false # set to true to require confirmation before irreversible operations
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | When true, the agent requires explicit confirmation before irreversible operations. The confirmation prompt is injected into the system prompt at session start. |
Prompt Optimizer
The prompt optimizer preprocesses complex user prompts before the agent executes them. It uses a one-shot LLM call to evaluate whether the prompt needs restructuring and rewrites it with a high-level approach and practical guardrails if needed.
prompt_optimizer: true
| Option | Type | Default | Description |
|---|---|---|---|
prompt_optimizer | bool | true | Enable/disable prompt optimization before agent execution |
How it works:
- Prompts shorter than 400 characters skip optimization entirely (no LLM call).
- The LLM evaluates the prompt — if already clear and actionable, it returns it unchanged.
- If the prompt is complex or vague, it rewrites it to preserve the goal, add a high-level approach (phases/steps), and include practical guardrails.
- The optimizer’s system instructions are ephemeral — they do not persist in conversation history or affect subsequent prompts.
Important: The original prompt is always used for deep-think detection (_user_wants_deep_think) and memory context preparation. Only run_agent() receives the optimized version.
Set prompt_optimizer: false to disable this feature (e.g., when running automated pipelines where prompts are already structured).
Context Compression
During long agent runs, tool outputs (file contents, shell output, search results) accumulate in the message history and are re-sent to the LLM on every cycle. Context compression summarizes old, large ToolMessages before each LLM call to reduce per-cycle token usage while preserving important context.
# Simple toggle
context_compression: true
# Or with custom thresholds
context_compression:
model: fast
min_age: 8 # call_model cycles before eligible (default: 6)
min_chars: 6000 # minimum content length to qualify (default: 2000)
# Hard cap on retained history length
context_max_messages: 200
| Option | Type | Default | Description |
|---|---|---|---|
context_compression | bool or object | true | Enable/disable context compression, or configure thresholds |
enabled | bool | true | Enable/disable compression when using the object form |
model | string | null | Model alias or provider/model string for a dedicated compression LLM. Uses the main agent LLM when not set. |
min_age | int | 6 | Number of call_model cycles a ToolMessage must survive before it becomes eligible for compression |
min_chars | int | 2000 | Minimum character length of a ToolMessage’s content to qualify for compression |
context_max_messages | int | 200 | Maximum message count retained before the oldest messages are dropped with pair-safe truncation |
How it works:
- On each
call_modelcycle, the compression pass checks whether total message size exceeds 72% of the context window. - ToolMessages that are both old enough (age >=
min_age) and large enough (length >=min_chars) are compressed. Multiple eligible messages are compressed in parallel (up to 4 concurrent LLM calls). - The LLM preserves file paths, error messages, stack traces, line numbers, schemas, exact values, and code snippets while removing verbose prose and boilerplate.
- When
modelis set, a dedicated LLM is used for compression instead of the main agent model — a smaller/faster model reduces latency. - Compressed messages are cached by
tool_call_idto avoid re-summarizing. - Compression operates on a copy of the message list — graph state is never mutated.
- On LLM failure, the compressor falls back to middle-truncation (
_truncate_tool_output).
When to tune:
- Set
context_compression: falseif you have a very large context window and want to avoid the extra LLM calls. - Set
modelto a fast/cheap model alias to avoid using the main agent model for compression. Without this, each compression call uses the same (potentially slow) model. - Increase
min_ageif you find recent tool outputs are being compressed too early. - Increase
min_charsto only compress very large outputs (e.g., full file contents). - Lower
context_max_messagesif long-running sessions accumulate too much history; Cogtrix trims from the oldest end without splitting AI/tool pairs.
Parallel Tool Execution
When the LLM emits multiple tool calls in a single response, Cogtrix can execute them concurrently using a thread pool instead of processing them sequentially.
parallel_tool_execution: true
| Option | Type | Default | Description |
|---|---|---|---|
parallel_tool_execution | bool | true | Enable/disable concurrent execution of independent tool calls |
How it works:
- When the LLM returns multiple tool calls, a classification pass splits them into two groups:
- Serial-first —
request_toolscalls and calls to tools not yet loaded (require auto-expansion). These run sequentially first. - Parallel — all other calls to already-active tools. These run concurrently via a
ThreadPoolExecutor(up to 8 workers).
- Serial-first —
- A single tool call in a batch skips pool overhead and runs inline.
UserCancelledRunfrom any tool stops all remaining execution immediately.- The system prompt instructs models to batch independent operations when possible.
When to tune:
- Set
parallel_tool_execution: falseif you experience issues with tools that have hidden shared state or if you need deterministic tool execution order. - Models that support parallel tool calls (GPT-4o, Claude, Gemini) benefit most from this feature. Models that emit one call per response (some open-source/vLLM models) are unaffected.
Allowed Write Paths
By default, file write operations (write_file, append_file, patch_file) are restricted to the current working directory. You can extend this with additional directories:
allowed_write_paths:
- /data/output
- /shared/workspace
This is especially useful in Docker deployments where the working directory differs from the application install path:
# Via environment variable (colon-separated)
docker run -it -e COGTRIX_ALLOWED_WRITE_PATHS="/tmp:/data/output:/shared" ghcr.io/northlandpositronics/cogtrix:latest
# Via CLI flag (repeatable)
cogtrix.py --allow-write-path /data/output --allow-write-path /shared/workspace
Read operations default to the working directory and application install directory. To allow reads from additional directories, use allowed_read_paths.
Priority: CLI (--allow-write-path) > env var (COGTRIX_ALLOWED_WRITE_PATHS) > config file.
Allowed Read Paths
By default, file read operations (read_file, list_directory) are restricted to the current working directory and the application install directory. You can extend this with additional directories for read access:
allowed_read_paths:
- /workspace
- /data/external
This is especially useful in Docker deployments where the project is mounted at a different location than the working directory. For example, if you mount your project at /workspace but the container’s working directory is /app:
# Via environment variable (colon-separated)
docker run -v /home/user/project:/workspace:ro \
-e COGTRIX_ALLOWED_READ_PATHS=/workspace \
cogtrix --prompt "Analyze /workspace/docs"
# Via CLI flag (repeatable)
cogtrix.py --allow-read-path /workspace
Priority: CLI (--allow-read-path) > env var (COGTRIX_ALLOWED_READ_PATHS) > config file.
MCP Servers
Cogtrix can connect to external tool servers via the Model Context Protocol (MCP). Configure servers in the mcp_servers section — each key is a server name:
mcp_servers:
filesystem:
command: npx
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
env:
HOME: /home/user
requires_confirmation: true
timeout: 30
remote-api:
url: http://localhost:8000/sse
headers:
Authorization: "Bearer your-token"
requires_confirmation: false
Transport is auto-detected from the config keys:
- Stdio (local process): set
commandand optionallyargs,env - SSE (remote HTTP): set
urland optionallyheaders
| Option | Type | Default | Description |
|---|---|---|---|
command | string | — | Executable to launch for stdio transport |
args | array | [] | Command-line arguments for the executable |
env | object | null | Environment variables (all values must be strings) |
url | string | — | Full URL for SSE transport |
headers | object | null | HTTP headers for SSE transport (e.g., auth tokens) |
requires_confirmation | bool | true | Whether tools from this server need user confirmation |
timeout | int | 30 | Per-call timeout in seconds |
pin | bool | true | Pin all tools from this server into the active set at startup so the LLM sees them directly. Set to false for very large servers (hundreds of tools) to keep them in the on-demand pool instead. |
MCP tools with pin: true (the default) are pinned into the active tool set at startup — the LLM sees them in its bound function list from the very first turn and will prefer the most specific tool for a task. Tools from servers with pin: false remain in the on-demand pool and must be loaded via request_tools.
They appear in /tools with an [mcp] tag.
Prerequisite: Install the MCP SDK: uv pip install "cogtrix[mcp]" (or pip install mcp). If the package is not installed and mcp_servers is configured, a warning is logged and servers are skipped.
Use /mcp to list connected servers and their tools. Use /mcp restart [name] to reconnect.
Docker (SSE via supergateway)
When running Cogtrix in Docker, stdio MCP servers can’t be spawned directly because each service runs in its own container. Use supergateway to bridge stdio servers to SSE:
# Example: run supergateway as a separate container
# docker run -d --name mcp-filesystem -v mcp-data:/data supercorp/supergateway \
# -v /home/user/project:/workspace:ro \
# --stdio "npx -y @modelcontextprotocol/server-filesystem /data /workspace" --port 8000
Then configure Cogtrix to connect via SSE:
# .cogtrix.yml
mcp_servers:
filesystem:
url: http://mcp-filesystem:8000/sse
requires_confirmation: false
Tool Loading
When Cogtrix starts, you see a line like:
Tools : [██████████░░] 41 on demand (3 unavailable)
This means 41 tools are configured and ready to use, while 3 are hidden because their API keys aren’t set. The progress bar shows the ratio of configured to total registered tools.
How it works
The agent starts with a single meta-tool called request_tools. Its description contains a catalog of every available tool. When the agent needs a tool, it calls request_tools(add=["tool_a", "tool_b"]) and the system activates the requested tools before the agent’s next turn. This keeps the initial prompt lean — only the tools relevant to the current task are loaded.
The agent can also release tools it no longer needs to keep its toolkit small:
request_tools(remove=["tool_a"])
Released tools return to the catalog and can be re-requested later.
Startup banner
| Element | Meaning |
|---|---|
[██████████░░] | Ratio of configured tools to total registered (e.g. 41/44) |
41 on demand | Tools the agent can request |
(3 unavailable) | Tools hidden due to missing API keys |
What the agent sees
The request_tools tool description includes a one-line summary of every available tool, so the agent can choose intelligently. For example, if you ask a date question it will request get_current_datetime; if you ask to search the web it will request web_search.
Fuzzy name matching
If the agent tries to call a tool by an approximate name (e.g. list_dir instead of list_directory), Cogtrix resolves it automatically, activates the correct tool, and retries the request.
Overriding with --tools
Use the --tools CLI flag to bypass the on-demand system and load specific tools directly:
python cogtrix.py --tools none # No tools (pure LLM chat)
python cogtrix.py --tools minimal # Basic set only
python cogtrix.py --tools "web_search,calculate" # Specific tools
When --tools is used, all specified tools are active immediately (no on-demand pool).
Pinning tools with --activate-tools
Use --activate-tools to pin specific tools as active on startup while keeping the on-demand system for everything else:
python cogtrix.py --activate-tools web_search,shell
python cogtrix.py --activate-tools query_knowledge_base -M code
Pinned tools stay active across prompt cycles — they are not auto-unloaded between turns. You can unpin them interactively with /tools unload <name>.
Two-tier tool loading
Tools loaded by the agent via request_tools during a turn are agent-loaded — they are automatically unloaded at the start of the next prompt cycle so the LLM doesn’t carry stale tools between turns. Tools loaded manually (via /tools load, --activate-tools, or the API PATCH /sessions/{id}/tools endpoint) are pinned — they persist across prompt cycles until explicitly unloaded.
Services Section
Configure API keys for external services (search providers, weather, etc.) in a single place:
services:
tavily:
api_key: "tvly-..."
exa:
api_key: "exa-..."
brave:
api_key: "BSA..."
serpapi:
api_key: "..."
google:
api_key: "AIza..."
cse_id: "abc123..."
openweather:
api_key: "..."
slack:
bot_token: "xoxb-..."
Tools that require an API key are automatically hidden from the agent when the key is not configured — no errors, they simply don’t appear in the tool list.
Search Providers
Cogtrix exposes a single research tool to the agent — web_search — that fans
out to up to seven backend providers in parallel, fetches the top-K results,
extracts page content, synthesises a topic-organised answer, and returns a
structured Markdown report. The per-provider modules (tavily_search,
brave_search, etc.) are not agent-facing tools; they are wired into the
web_search pipeline at run time when their credentials are present.
ADR-0056 (private documentation submodule) captures the full pipeline design;
TOOLS_REFERENCE.md documents the web_search tool schema.
DuckDuckGo is always available with no setup. The other six (Tavily, Exa, Brave, Google, SerpAPI, SearXNG) require an API key and some require an additional Python package.
| Provider | Auto-included by web_search when | Package | API Key | Free Tier |
|---|---|---|---|---|
| DuckDuckGo | Always (no setup) | Included (ddgs) | None | Unlimited |
| Tavily | TAVILY_API_KEY set | tavily-python | TAVILY_API_KEY | 1 000/month |
| Exa | EXA_API_KEY set | exa-py | EXA_API_KEY | 1 000/month |
| Brave | BRAVE_API_KEY set | Included (requests) | BRAVE_API_KEY | 2 000/month |
GOOGLE_API_KEY + GOOGLE_CSE_ID set | Included (requests) | GOOGLE_API_KEY + GOOGLE_CSE_ID | 100/day | |
| SerpAPI | SERPAPI_API_KEY set | google-search-results | SERPAPI_API_KEY | 100/month |
| SearXNG | SEARXNG_URL set | Included (requests) | SEARXNG_URL | Self-hosted |
The agent only ever calls web_search. The pipeline picks the configured backends per-call; missing keys mean the backend is silently skipped (no errors). Tavily also exposes a separate tavily_extract agent tool for one-URL deep extraction outside the search pipeline.
Configuring SearXNG:
SearXNG is a self-hosted meta-search engine. To enable it as a web_search backend, set SEARXNG_URL to your instance URL:
export SEARXNG_URL=http://localhost:8888
Or in config YAML:
services:
searxng:
url: http://localhost:8888
SearXNG joins the web_search fan-out automatically once the URL is configured.
Installing optional search packages:
Tavily, Exa, and SerpAPI need extra Python packages not included by default:
# All at once (recommended)
uv sync --extra search
# Or individually with pip
pip install tavily-python exa-py google-search-results
Brave, Google, and SearXNG use only requests, which is already a core dependency.
Legacy service format
For backward compatibility, top-level service keys still work:
{
"openweather": { "api_key": "..." },
"tavily": { "api_key": "..." }
}
The "services" section takes priority when both are present.
WhatsApp Messaging
Cogtrix can send and receive WhatsApp messages via a self-hosted Waha Docker container. Run it alongside Cogtrix:
docker run -p 3000:3000 devlikeapro/waha
Then open http://localhost:3000 in your browser, scan the QR code with your phone, and configure Cogtrix:
services:
whatsapp:
waha_url: "http://localhost:3000"
api_key: "yoursecretkey"
session: default
allow_send: true
allow_receive: true
require_confirmation: true
filter_mode: allow
contacts: ["+14155551234", "+442071234567"]
phonebook:
alice: "+14155551234"
bob: "+442071234567"
contact_prompts:
alice: |
You are replying to Alice on behalf of the user.
Be friendly and casual.
rate_limit: 30
max_message_length: 4096
| Option | Type | Default | Description |
|---|---|---|---|
waha_url | string | "http://localhost:3000" | Waha server URL |
api_key | string | — | Waha X-Api-Key header value |
session | string | "default" | Waha session name |
allow_send | bool | true | Enable send tools (whatsapp_send, whatsapp_send_image) |
allow_receive | bool | true | Enable receive tool (whatsapp_check) |
require_confirmation | bool | true | Prompt user before sending messages |
filter_mode | string | "none" | "none", "allow", "ignore", or "blacklist". Legacy "whitelist" maps to "allow". |
contacts | array | [] | E.164 phone numbers for the filter list |
phonebook | object | {} | Nickname → phone number map |
contact_prompts | object | {} | Per-contact system prompts (see Contact Prompts) |
rate_limit | int | 30 | Max outbound messages per hour (0 = unlimited) |
max_message_length | int | 4096 | Truncate outgoing messages to this length |
overview_limit | int | 50 | Maximum number of chats returned per overview poll cycle. A warning is logged when the response reaches this limit, indicating that some chats may have been missed. |
message_fetch_limit | int | 50 | Maximum number of messages fetched per chat per poll cycle. Prevents silent message loss when a chat receives more messages than the limit between poll intervals. |
ignore_archived | bool | true | Skip archived chats during polling. When enabled, chats marked as archived in WhatsApp are not fetched or processed. |
ignore_older_than | string | — | Skip messages older than this duration. Accepts human-readable strings like "24h", "30m", "7d", "1d12h". Disabled by default (all messages processed). |
lid_negative_ttl | float | 300.0 | Cache duration (seconds) for failed LID-to-phone resolutions |
Filter mode behaviour:
none— respond to all contactsallow— only respond to contacts in thecontactslistignore— skip listed contacts (no response, message kept)blacklist— delete the message and archive the chat for listed contacts
When both allow_send and allow_receive are false, no WhatsApp tools are loaded.
See Tools Reference — WhatsApp for tool parameters and usage. For a complete step-by-step walkthrough, see the WhatsApp Guide.
Telegram Messaging
Cogtrix can send and receive Telegram messages via a bot. Create a bot with @BotFather and configure the token:
services:
telegram:
bot_token: "123456:ABC-DEF..."
allow_send: true
allow_receive: true
require_confirmation: true
filter_mode: allow
contacts: ["123456789", "@alice_username"]
phonebook:
alice: "123456789"
team: "-1001234567890"
contact_prompts:
alice: |
You are replying to Alice on behalf of the user.
Be friendly and casual.
rate_limit: 30
max_message_length: 4096
| Option | Type | Default | Description |
|---|---|---|---|
bot_token | string | — | Bot token from @BotFather (required) |
allow_send | bool | true | Enable send tools (telegram_send, telegram_send_photo) |
allow_receive | bool | true | Enable receive tool (telegram_check) |
require_confirmation | bool | true | Prompt user before sending messages |
filter_mode | string | "none" | "none", "allow", "ignore", or "blacklist". Legacy "whitelist" maps to "allow". |
contacts | array | [] | Chat IDs or @usernames for the filter list |
phonebook | object | {} | Nickname → chat ID map |
contact_prompts | object | {} | Per-contact system prompts (see Contact Prompts) |
rate_limit | int | 30 | Max outbound messages per hour (0 = unlimited) |
max_message_length | int | 4096 | Truncate outgoing messages to this length |
ignore_older_than | string | — | Skip messages older than this duration. Accepts human-readable strings like "24h", "30m", "7d", "1d12h". Disabled by default (all messages processed). |
Filter mode behaviour:
none— respond to all contactsallow— only respond to contacts in thecontactslistignore— skip listed contacts (no response, message kept)blacklist— delete the message and archive the chat for listed contacts
Quick setup:
- Message @BotFather on Telegram and create a bot (
/newbot) - Copy the bot token
- Set
COGTRIX_TELEGRAM_TOKEN="123456:ABC-DEF..."or add it to the config file - Start a chat with your bot on Telegram (send it
/start) - Run Cogtrix — the Telegram tools appear automatically
Note: Telegram bots can only receive messages from users who have started a conversation with the bot first. The bot cannot initiate contact with unknown users.
When both allow_send and allow_receive are false, no Telegram tools are loaded.
See Tools Reference — Telegram for tool parameters and usage. For a complete walkthrough, see the Telegram Guide.
Assistant Mode
Run Cogtrix as a headless messaging daemon that maintains ongoing conversations over WhatsApp and Telegram. Launch with --assistant:
python cogtrix.py --assistant --log --debug
Configure under services.assistant:
services:
assistant:
max_concurrent: 4 # concurrent LLM calls across all chats
max_sessions: 50 # active chat sessions in memory
idle_timeout: 3600 # seconds before idle session is evicted
max_response_length: 4000 # truncate replies for messaging
system_prompt: null # null = built-in assistant persona
excluded_tools: [] # additional tools to exclude (beyond defaults)
debounce_seconds: 3.0 # quiet window before rapid messages are batched
channels:
whatsapp:
enabled: true
poll_interval: 5 # seconds between polls
telegram:
enabled: true
poll_interval: 1
long_poll_timeout: 30 # Telegram long-polling timeout
knowledge:
enabled: true
extraction_model: null # model alias for fact extraction LLM
recall_k: 5 # facts retrieved per query
max_facts: 10000
guardrails:
datamarking: true # Microsoft Spotlighting prompt injection defense
enabled: true # master kill switch
max_input_length: 4000 # chars
unicode_checks: true # invisible/RTL character detection
input_patterns: [] # additional regex patterns to block
rate_limit:
per_minute: 10 # per chat
per_hour: 60 # per chat
encoding_detection:
enabled: true # detect Morse/Base64/hex/leetspeak bypasses
min_score: 0.6 # 0.0-1.0; lower = more sensitive
tool_call_guard:
enabled: true # inspect tool arguments before execution
injection_scan: true # check all string args for injection patterns
path_blocking: true # block sensitive paths in file tool args
exfiltration_detection: true # detect secrets/PII in web tool URL args
sensitive_paths: [] # additional path prefixes to block
auto_blacklist:
enabled: true # auto-blacklist repeat offenders
max_violations: 2 # violations before blacklist triggers
window_minutes: 30 # sliding window for violation count
banned_output_strings: [] # system prompt fragments to redact
block_urls_in_output: true # strip URLs from responses
pii_detection: true # regex PII scanning on output
llm_judge:
enabled: false # opt-in (adds ~500ms-2s latency)
model: null # model alias or provider/model
| Option | Type | Default | Description |
|---|---|---|---|
max_concurrent | int | 4 | Maximum simultaneous agent runs across all chats |
max_sessions | int | 50 | Maximum active chat sessions in memory |
idle_timeout | float | 3600 | Seconds of inactivity before a session is evicted to disk |
max_response_length | int | 4000 | Truncate agent responses to this length |
system_prompt | string | null | Custom system prompt (null = built-in messaging persona) |
excluded_tools | array | [] | Additional tools to exclude. Messaging tools, shell, write, and read tools are always excluded. Queue management tools (schedule_reply, queue_reply, edit_last_reply, list_scheduled_messages, edit_scheduled_message, cancel_scheduled_message) and deferral tools (defer_processing, suppress_reply) can also be added here. |
debounce_seconds | float | 3.0 | Quiet window in seconds before rapid messages from the same chat are batched into a single agent turn. Increase to tolerate longer bursts; decrease for faster single-message response. |
dispatch_interval | float | 30.0 | Seconds between scheduler checks for due messages |
channels.{name}.enabled | bool | true | Enable/disable a specific channel |
channels.{name}.poll_interval | float | varies | Seconds between poll cycles |
channels.<name>.poll_interval_min | float | base interval | Minimum poll interval (seconds); polling backs off on idle, recovers on activity |
channels.<name>.poll_interval_max | float | 60.0 | Maximum poll interval during idle backoff |
channels.<name>.poll_backoff_factor | float | 1.5 | Multiplier when no messages received (clamped >= 1.0) |
channels.<name>.poll_recovery_factor | float | 2.0 | Divisor when messages received (clamped >= 1.0) |
channels.telegram.long_poll_timeout | int | 30 | Telegram getUpdates timeout |
knowledge.enabled | bool | true | Enable cross-chat fact extraction and recall |
knowledge.extraction_model | string | null | Model alias for fact extraction (null = main LLM) |
knowledge.recall_k | int | 5 | Number of facts recalled per query |
knowledge.max_facts | int | 10000 | Maximum stored facts |
knowledge.data_dir | string | "data" | Base directory for knowledge persistence (facts.json, FAISS index) |
guardrails.datamarking | bool | true | Enable Microsoft Spotlighting (datamarking) — interleaves a random token at word boundaries in user messages so the LLM treats them as data, not instructions |
guardrails.enabled | bool | true | Master kill switch for all guardrails |
guardrails.max_input_length | int | 4000 | Maximum input length in characters |
guardrails.unicode_checks | bool | true | Detect invisible/RTL Unicode steganography |
guardrails.input_patterns | array | [] | Additional regex patterns to block on input |
guardrails.rate_limit.per_minute | int | 10 | Maximum messages per minute per chat |
guardrails.rate_limit.per_hour | int | 60 | Maximum messages per hour per chat |
guardrails.encoding_detection.enabled | bool | true | Detect encoding-based bypass attempts (Morse, Base64, hex, leetspeak) |
guardrails.encoding_detection.min_score | float | 0.6 | Minimum detection score (0.0–1.0) to block a message. Lower values are more sensitive. |
guardrails.tool_call_guard.enabled | bool | true | Inspect tool arguments before execution |
guardrails.tool_call_guard.injection_scan | bool | true | Scan all string tool arguments for injection patterns |
guardrails.tool_call_guard.path_blocking | bool | true | Block sensitive filesystem paths in file tool arguments |
guardrails.tool_call_guard.exfiltration_detection | bool | true | Detect API keys, SSH keys, and SSNs in web tool URL/query arguments |
guardrails.tool_call_guard.sensitive_paths | array | [] | Additional path prefixes to block in file tool arguments |
guardrails.auto_blacklist.enabled | bool | true | Auto-blacklist chats that exceed the violation threshold |
guardrails.auto_blacklist.max_violations | int | 2 | Number of security violations before a chat is blacklisted |
guardrails.auto_blacklist.window_minutes | int | 30 | Sliding window (in minutes) for counting violations |
guardrails.banned_output_strings | array | [] | Strings to redact from agent responses (e.g. system prompt fragments) |
guardrails.block_urls_in_output | bool | true | Strip URLs from agent responses |
guardrails.pii_detection | bool | true | Redact email, credit card, SSN, and private IP addresses from responses |
guardrails.llm_judge.enabled | bool | false | Enable LLM-as-judge classifier (opt-in; adds ~500ms–2s latency) |
guardrails.llm_judge.model | string | null | Model alias or provider/model for the judge LLM (null = main LLM) |
How it works:
- One polling thread per channel checks for new messages at the configured interval.
- New messages are passed to
MessageBuffer, which resets a per-chat debounce timer. When the timer expires (afterdebounce_secondsof silence from that chat), all buffered messages are concatenated and dispatched as a single agent turn viahandle_batch(). A single message with no follow-ups dispatches immediately after the quiet window. - Each incoming message (or batch) is checked by the
GuardrailPipeline(rate limit, input validation, injection detection). Blocked messages receive a canned reply without reaching the agent. - Each
(channel, chat_id)pair gets an independentConversationMemoryManager— no context blending between chats. - The agent runs with the same tool pipeline as interactive mode (minus excluded tools). Up to
eight message management tools are injected per turn:
schedule_reply,queue_reply,edit_last_reply(only when a prior message ID is available),list_scheduled_messages,edit_scheduled_message,cancel_scheduled_message,defer_processing(only when deferral is enabled and below max depth), andsuppress_reply(only during re-processing passes). - After each turn, durable facts are extracted and stored in a shared knowledge store (
data/knowledge/facts.json). - On each new message, relevant facts are recalled and injected into the agent’s context — enabling cross-chat knowledge without exposing raw conversation history.
- The agent response is routed by
_route_response: edit and schedule paths run independently — both can fire in the same turn. Output is sanitized (PII redaction, URL stripping, banned string removal) in each delivery branch before being sent and before being written to memory. - SIGINT/SIGTERM triggers graceful shutdown: all sessions saved, knowledge store persisted.
Prerequisites: WhatsApp requires a running Waha container. Telegram requires a bot token. Both must be configured in their respective services.whatsapp / services.telegram sections.
Contact Prompts
contact_prompts lets operators assign a per-contact system prompt that replaces the default assistant system prompt entirely for that contact. Configure it inside the channel config (services.whatsapp or services.telegram), keyed by the same names used in phonebook.
services:
whatsapp:
phonebook:
alice: "+1234567890"
contact_prompts:
alice: |
You are replying to Alice on behalf of the user.
Be friendly and casual. Use the schedule_reply tool
to delay responses by 1-3 hours.
# Or reference a file:
# alice: /path/to/alice_prompt.txt
Matching: the handler looks up the incoming message’s phone number / chat ID in phonebook to find a contact name, then checks contact_prompts for that name. If no match is found, or the resolved prompt is empty, the default assistant system prompt is used unchanged.
File paths: a value that starts with /, ~, ./, or ../ is treated as a file path. Relative paths are resolved against the data_dir and must remain inside it (path containment enforced). All other values are used as inline prompt text.
Workflows
Workflows bundle a system prompt, a per-workflow FAISS knowledge base, and a tool policy into a named, reusable unit. A chat can be bound to a workflow manually, via auto-detection, or inherited from a contact_prompts entry. Workflows are stored as YAML files in data/workflows/<id>/workflow.yaml.
Workflow definition (data/workflows/bike-sales/workflow.yaml):
id: "bike-sales" # Required. URL-safe slug (must match directory name).
name: "Bike Sales Assistant" # Required. Human-readable label.
description: "Specialist assistant for bike sales inquiries"
system_prompt: | # Optional. Overrides global system_prompt.
You are a specialist bike sales advisor...
# system_prompt_file: prompts/bike.txt # Alternative: path to a file (resolved against data_dir).
knowledge_base: true # If true, per-workflow FAISS index at
# data/workflows/bike-sales/vectordb/faiss_index/
# is searched alongside the global index.
tool_policy:
excluded_tools: [] # Tools to block for this workflow.
additional_approved_tools: [] # Tools auto-approved without confirmation.
auto_detect:
enabled: false
keywords: ["bike", "bicycle"] # Case-insensitive substring matches.
patterns: ["\\bbike\\b"] # Python regex patterns.
min_confidence: 1 # Keyword matches needed to trigger.
Chat-to-workflow bindings are stored in data/workflows/bindings.json and managed via the API or auto-detection:
{
"whatsapp::14155551234@c.us": "bike-sales",
"telegram::987654321": "support-desk"
}
Resolution order (first match wins):
- Explicit binding —
bindings.jsonentry for thissession_key - Contact prompt fallback — if a
contact_promptsentry exists for the sender, it is used as an ephemeral workflow (not persisted) - Auto-detect — if any workflow has
auto_detect.enabled: true, incoming messages are scored against keywords and regex patterns; the highest-scoring workflow abovemin_confidenceis assigned and persisted as a binding - No match — global
system_promptand default tool policy apply
API management: 11 CRUD endpoints at /api/v1/assistant/workflows/ — create, list, get, update, delete workflows; upload and manage per-workflow documents; bind and unbind chats. See the API Reference for details.
Per-workflow knowledge base: when knowledge_base: true, upload documents to data/workflows/<id>/docs/ via the API. A FAISS index is built at data/workflows/<id>/vectordb/faiss_index/ and searched alongside the global index when the query_knowledge_base tool runs for a chat bound to that workflow.
Scheduled Reply Delivery
Up to eight message management tools are injected automatically when assistant mode is active — no extra config is required to enable them. They can be blocked via excluded_tools if not needed.
| Tool | Purpose |
|---|---|
schedule_reply | Queue a reply for deferred delivery. Provide the full reply text and a delay in minutes (1–1440). |
queue_reply | Append a message after the queue tail for this chat. Supports multiple calls per turn with optional gap_minutes spacing. |
edit_last_reply | Edit/replace the most recently sent message in this chat. Only available after at least one reply has been sent in the session. |
list_scheduled_messages | List pending queued messages. Filter by recipient (phone/name substring), chat_id (exact), or contact_name (phonebook key). Returns short IDs for use with the edit and cancel tools. |
edit_scheduled_message | Update the text and/or reschedule the delivery time of a pending message (identified by short ID prefix). |
cancel_scheduled_message | Cancel a specific pending message so it will not be delivered. |
defer_processing | Postpone the reasoning pass without sending any reply. Only injected when deferral is enabled and below max depth. |
suppress_reply | Send nothing and skip memory update. Only injected during re-processing passes. |
The agent decides when to use these tools based on instructions in its system prompt (or a contact_prompts entry).
Scheduled message behavior:
- Queued messages are persisted to
data/assistant/schedule.jsonand survive restarts. Each record includes arecipientfield (human-readable phone, username, or display name) for filtering vialist_scheduled_messages. - Delivery is retried up to 3 times on failure, with backoffs of 30 s, 2 min, and 10 min.
- Messages that are still pending more than 2 hours past their scheduled time are marked
expired. - Terminal-state messages (sent, cancelled, failed, expired) are cleaned up after 24 hours.
- When a new message arrives from the same chat, any pending scheduled reply for that chat is cancelled automatically.
Message editing behavior:
edit_last_replycallsChannel.edit_message()on the channel that originally sent the message. WhatsApp and Telegram both support message editing; channels that do not implement it return a failure result (the tool reports the error to the agent but does not raise an exception).- Only one edit per agent turn is allowed (idempotency guard). If the agent calls
edit_last_replymultiple times in a single turn, only the first call takes effect.
Deferred Message Processing
The deferral system lets the agent postpone its reasoning pass via the defer_processing tool. Messages arriving during a deferral are coalesced with the original batch and re-processed together when the timer fires.
services:
assistant:
deferral:
enabled: true
max_depth: 3
check_interval: 10
stale_threshold: 7200
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable/disable the deferral system |
max_depth | int | 3 | Maximum re-processing depth (prevents infinite deferral loops) |
check_interval | float | 10.0 | Seconds between checks for due deferrals |
stale_threshold | float | 7200.0 | Seconds before a deferred record is considered stale and cancelled |
Deferred records are persisted to data/assistant/deferrals.json and survive restarts. Records in "firing" state are reset to "pending" on reload (at-least-once semantics).
Outbound Campaigns
The campaign system enables multi-contact outbound messaging with automatic follow-ups, escalation, and goal classification. Campaigns are managed via the API (/api/v1/assistant/campaigns/*) and tracked in data/assistant/campaigns.json.
services:
assistant:
campaigns:
enabled: true
check_interval: 60
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable/disable the campaign system |
check_interval | float | 60.0 | Seconds between follow-up check passes |
Campaign lifecycle:
- Create —
POST /api/v1/assistant/campaignswith name, goal, instructions, and target contacts - Launch —
POST /api/v1/assistant/campaigns/{id}/launchsends initial outbound to all pending targets (or setauto_launch: trueon create) - Track — incoming replies from campaign targets are tracked automatically; the
report_campaign_outcometool is injected so the agent can classify each target ascompleted,failed, orin_progress - Follow-up — the background thread sends follow-ups to non-responsive targets after
follow_up_interval_hours(default 24h); escalates aftermax_follow_ups(default 3) - Complete — campaign auto-completes when all targets reach a terminal state (completed, failed, or escalated)
Per-campaign settings (set at creation time):
| Field | Type | Default | Description |
|---|---|---|---|
max_follow_ups | int | 3 | Maximum follow-ups per target before escalation (0–20) |
follow_up_interval_hours | float | 24.0 | Hours between follow-up attempts (0.5–720) |
Response Timing / Quiet Hours
response_timing under services.assistant defers scheduled replies that would be delivered during a contact’s quiet hours. Entries are keyed by contact name; _default applies to any contact without a specific entry.
services:
assistant:
response_timing:
_default:
timezone: "UTC"
quiet_hours: [23, 8] # 11 pm to 8 am
alice:
timezone: "America/New_York"
quiet_hours: [22, 7] # 10 pm to 7 am EST
| Field | Type | Description |
|---|---|---|
timezone | string | IANA timezone name (e.g. "Asia/Dubai", "America/New_York"). Defaults to "UTC" if omitted or invalid. |
quiet_hours | [start, end] | Two-element list of hours (0–23). The quiet window runs from start up to (but not including) end. Wraps midnight when start > end (e.g. [23, 8] covers 11 pm–8 am). start and end must differ. |
When a scheduled reply’s delivery time falls inside the quiet window, the scheduler defers it to the moment the window ends (end_hour:00 in the contact’s timezone).
Quiet hours only affect the MessageScheduler — they do not block immediate (non-scheduled) replies.
Assistant Guardrails
Every message handled by assistant mode passes through a GuardrailPipeline in src/assistant/guardrails.py. Guardrails run before the agent processes input, before each tool call executes, and again before the reply is sent to the channel. Configure under services.assistant.guardrails (shown in the config block above).
Input pipeline order: blacklist → rate_limiter → input_guard → encoding_guard → llm_judge
Rate limit violations are recorded but do not increment the security violation counter (and therefore cannot trigger auto-blacklisting on their own).
Input guard details:
- Length check: messages exceeding
max_input_lengthcharacters are rejected. - Unicode check: invisible characters and RTL override codepoints (used in steganographic injection) are detected and rejected. A UTF-8 BOM at position 0 is allowed.
- Injection patterns: 15 pre-compiled regexes cover common prompt injection and jailbreak patterns (DAN mode, persona override, system tag injection, etc.). Add site-specific patterns via
input_patterns.
Encoding detection:
EncodingDetectionGuard scores each message with four independent sub-detectors (Morse code,
Base64, hex encoding, leetspeak/ROT13), each returning 0–1. The maximum of the four scores is
compared against min_score (default 0.6). Messages that exceed the threshold are rejected.
Violations are counted toward auto-blacklisting. Tune min_score downward to catch more attempts
(with higher false-positive risk) or upward to reduce false positives on legitimate content.
Tool call guard:
ToolCallGuard inspects tool arguments before each tool executes:
- Injection scan — checks all string arguments of any tool for prompt injection patterns.
- Path blocking — for file tools (
read_file,write_file, etc.), rejects arguments that reference sensitive paths such as/etc/,/proc/,.envfiles, and private key files. Add custom prefixes viasensitive_paths. - Exfiltration detection — for web tools (
web_search,http_request, etc.), detects API keys, SSH keys, and SSNs embedded in URL or query arguments.
Auto-blacklist:
ViolationTracker maintains a per-chat sliding window of security violation timestamps. When a
chat’s violation count within the last window_minutes minutes reaches max_violations, all
subsequent messages from that chat are rejected immediately (before any other check) with a
blacklist reason. The blacklist state is persisted to data/assistant/violations.json and survives
assistant restarts. Expired violations (older than the sliding window) are pruned on load.
Output sanitization:
- Markdown images are stripped (alt text preserved).
- HTML tags are removed.
- Strings listed in
banned_output_stringsare replaced with[REDACTED](case-insensitive). - PII is replaced with typed placeholders:
[EMAIL_REDACTED],[CREDIT_CARD_REDACTED],[SSN_REDACTED],[IP_ADDRESS_REDACTED]. - URLs are replaced with
[link removed]whenblock_urls_in_outputis true.
LLM judge: When llm_judge.enabled: true, an additional LLM call classifies the input as SAFE
or UNSAFE. The judge is fail-closed — if the LLM call fails or returns an empty response, the
message is blocked. This is intentional secure-by-default behavior: a deliberate crash of the judge
must not bypass the guardrail. Use llm_judge.model to point the judge at a fast/cheap model alias
to avoid adding 500ms–2s to every request.
Disabling: Set guardrails.enabled: false to bypass the entire pipeline. The GuardrailPipeline still exists in the handler but all checks return safe immediately.
Environment Variables
| Variable | Description | Example |
|---|---|---|
COGTRIX_CONFIG_FILE | Path to a specific config file (bypasses automatic search) | /etc/cogtrix/config.yaml |
COGTRIX_MODEL | Active model alias (sets models.default at runtime) | oss |
COGTRIX_SESSION | Session ID | my-project |
COGTRIX_MEMORY_MODE | Memory mode | code |
COGTRIX_DATA_DIR | Root directory for data storage. Docker images default to /data; bare Python defaults to ./data. | /data |
COGTRIX_ALLOWED_READ_PATHS | Colon-separated list of absolute directory paths the agent is allowed to read. When set, restricts file read operations to these directories. | /workspace:/data/external |
COGTRIX_ALLOWED_WRITE_PATHS | Colon-separated extra write-allowed paths. Docker default: /tmp:/data/output. | /tmp:/data/output |
COGTRIX_OLLAMA | Ollama server address (host or host:port) | 192.168.1.100 or 192.168.1.100:8080 |
OPENAI_API_KEY | OpenAI API key | sk-... |
ANTHROPIC_API_KEY | Anthropic API key | sk-ant-... |
GEMINI_API_KEY | Google Gemini API key | AIza... |
GROQ_API_KEY | Groq API key | gsk-... |
XAI_API_KEY | xAI (Grok) API key | xai-... |
DEEPSEEK_API_KEY | DeepSeek API key | sk-... |
OLLAMA_BASE_URL | Ollama server URL (legacy, full URL) | http://192.168.1.100:11434 |
OPENWEATHER_API_KEY | OpenWeather API key | abc123 |
COGTRIX_EMBEDDING_PROVIDER | RAG embedding provider | openai |
OLLAMA_EMBEDDING_MODEL | Ollama embedding model | nomic-embed-text |
TAVILY_API_KEY | Tavily search API key | tvly-... |
EXA_API_KEY | Exa search API key | exa-... |
BRAVE_API_KEY | Brave search API key | BSA... |
GOOGLE_API_KEY | Google Custom Search API key | AIza... |
GOOGLE_CSE_ID | Google Programmable Search Engine ID | abc123... |
SERPAPI_API_KEY | SerpAPI search API key | ... |
SEARXNG_URL | SearXNG instance URL. When set, enables the searxng_search tool. | http://localhost:8888 |
COGTRIX_WHATSAPP_URL | Waha server URL | http://localhost:3000 |
COGTRIX_WHATSAPP_API_KEY | Waha API key | yoursecretkey |
COGTRIX_WHATSAPP_SESSION | Waha session name | default |
COGTRIX_TELEGRAM_TOKEN | Telegram bot token | 123456:ABC-DEF... |
COGTRIX_SLACK_BOT_TOKEN | Slack bot token for cogtrix_slack_post_message tool. Overrides services.slack.bot_token from the config file when set to a non-empty value. | xoxb-... |
COGTRIX_JWT_SECRET | JWT signing secret for API mode (min 32 chars, required) | your-secret-key-at-least-32-chars |
COGTRIX_DB_URL | Database URL for API mode (default: SQLite aiosqlite) | postgresql+asyncpg://user:pass@host/db |
COGTRIX_CORS_ORIGINS | Comma-separated CORS allowed origins for API mode | http://localhost:5173,https://app.example.com |
COGTRIX_API_HOST | API server bind host (default 0.0.0.0) | 127.0.0.1 |
COGTRIX_API_PORT | API server bind port (default 8000) | 3001 |
COGTRIX_API_WORKERS | Number of uvicorn workers (default 1) | 4 |
Docker Healthcheck
The container image includes a built-in healthcheck that probes GET /api/v1/health using Python’s stdlib urllib (no curl or wget required). This enables depends_on: condition: service_healthy in docker-compose:
services:
cogtrix:
image: ghcr.io/northlandpositronics/cogtrix:latest
command: ["api"]
environment:
COGTRIX_JWT_SECRET: "your-secret-key-at-least-32-chars"
ports:
- "8000:8000"
webui:
image: ghcr.io/northlandpositronics/cogtrix-webui:latest
depends_on:
cogtrix:
condition: service_healthy
ports:
- "5173:80"
The healthcheck runs every 30 seconds with a 5-second deadline (4-second socket timeout + 1 second margin), starting 15 seconds after container launch. It only passes in API mode — CLI and assistant modes do not expose the health endpoint.
Command Line Arguments
General Options
python cogtrix.py [OPTIONS]
| Option | Short | Description |
|---|---|---|
--model NAME | -m | Active model alias from the models registry |
--session ID | -s | Session ID for memory persistence |
--memory-mode MODE | -M | Memory mode: conversation, code, reasoning |
--config-file FILE | -c | Path to a specific config file (JSON or YAML). Bypasses the automatic config file search. |
--data-dir PATH | Root directory for data storage (history, vectordb, assistant state) | |
--no-confirm | -y | Skip all tool safety confirmations (auto-approve file writes, shell commands, etc.) |
--output FILE | -o | Save responses to file. Non-interactive: single write. Interactive: append each exchange as Markdown. |
--debug | Enable debug mode (auto-enables --log and --verbose) | |
--verbose | -v | Log full LLM interactions: tokens, thinking, tool calls |
--verbosity N | Verbosity level: 0=normal, 1=debug, 2=verbose, 3=trace | |
--log [FILE] | Enable logging to file (default: cogtrix.log) | |
--silent | -S | Silent scripting mode: no spinner/ANSI, plain stdout, tool confirmations auto-denied. Use -y to auto-approve instead. |
--quick | -Q | Skip optimizer, memory, and compression (fast one-off queries) |
--auto-route | -R | Route simple queries to a fast model (requires auto_route_fast_model in config) |
--git-native | -G | Auto stage and commit after each file write (requires a git repository) |
--no-banner | Suppress the startup banner | |
--pipe | -I | Read prompt from stdin, run once, exit. Suppresses the banner when stdout is not a tty. |
--profile NAME | -P | Apply a named config profile (defined in the config file) |
--tools LIST | Comma-separated tools to load (default: all) | |
--activate-tools LIST | Comma-separated tools to pin as active on startup | |
--allow-write-path DIR | Allow file writes to DIR (repeatable; multiple paths allowed) | |
--allow-read-path DIR | Allow file reads from DIR (repeatable; multiple paths allowed) | |
--assistant | Run as a headless WhatsApp/Telegram messaging daemon | |
--check-config | Validate configuration and exit | |
--version | Show version and exit | |
--install-completion [SHELL] | Print shell completion script (bash/zsh). Source it to enable tab-completion. Use auto to auto-detect. |
Run Modes
Control how Cogtrix executes and handles output:
python cogtrix.py --silent "Process this task" # Scripting: no spinner, auto-deny confirmations
echo "Task description" | python cogtrix.py --pipe # Stdin: read prompt, run once, exit
python cogtrix.py --quick "Quick one-off query" # Fast: skip optimizer, memory, compression
python cogtrix.py --auto-route # Route simple queries to fast model
python cogtrix.py --git-native --prompt "..." # Auto-stage and commit after file writes
python cogtrix.py --no-banner --prompt "..." # Suppress startup banner
python cogtrix.py --profile myprofile --prompt "..." # Apply named config profile
| Option | Short | Description |
|---|---|---|
--silent | -S | Silent scripting mode: no spinner/ANSI, tool confirmations auto-denied. Use -y to auto-approve instead. |
--pipe | -I | Read prompt from stdin, run once, exit. Suppresses the startup banner when stdout is not a tty. |
--quick | -Q | Skip optimizer, memory, and compression for fast one-off queries |
--auto-route | -R | Route simple queries to a fast model (requires auto_route_fast_model in config) |
--git-native | -G | Auto stage and commit after each file write (requires a git repository) |
--no-banner | Suppress the startup banner | |
--profile NAME | -P | Apply a named config profile (defined in the config file) |
Non-interactive Mode
Process a single prompt and exit (useful for scripting and automation):
python cogtrix.py --prompt "What is 2+2?"
python cogtrix.py --prompt-file task.txt
python cogtrix.py --prompt "Summarize this" -o summary.md
python cogtrix.py --prompt "Generate JSON" --no-stream -o data.json
| Option | Short | Description |
|---|---|---|
--prompt TEXT | Send a single prompt and exit | |
--prompt-file FILE | Read prompt from file and exit | |
--output FILE | -o | Write response to file |
--no-stream | Disable streaming output |
Assistant Mode
Run Cogtrix as a headless WhatsApp/Telegram messaging daemon:
python cogtrix.py --assistant --log --debug
python cogtrix.py --assistant --system-prompt "You are a helpdesk bot for Acme Corp."
python cogtrix.py --assistant --system-prompt-file ./prompts/helpdesk.txt
| Option | Description |
|---|---|
--assistant | Run as a headless WhatsApp/Telegram messaging daemon |
--system-prompt TEXT | Override the default system prompt with inline text |
--system-prompt-file FILE | Override the default system prompt by loading text from FILE |
Tool Filtering
Control which tools are loaded at startup:
python cogtrix.py --tools none # No tools (pure LLM chat)
python cogtrix.py --tools minimal # Basic set (file ops + calculate)
python cogtrix.py --tools "web_search,calculate" # Specific tools only
Path allowlisting restricts which directories a tool can read from or write to. Use --allow-write-path DIR and --allow-read-path DIR (both repeatable) to open specific directories. See Allowed Write Paths and Allowed Read Paths for full detail.
Pinning Tools at Startup
Pin specific on-demand tools as active without changing the overall tool filter:
python cogtrix.py --activate-tools web_search,shell,write_file
Pinned tools persist across prompt cycles (unlike agent-loaded tools which are cleared between turns). Unpin interactively with /tools unload <name>.
RAG Ingestion Options
python cogtrix.py --ingest [OPTIONS]
| Option | Description |
|---|---|
--ingest | Build vector database and exit |
--docs-dir PATH | Documents directory |
--vectordb-dir PATH | Vector database output directory |
--embedding-provider NAME | Embedding provider: openai or ollama |
--embedding-model NAME | Embedding model name |
Setup Wizard
The setup wizard generates a valid Cogtrix config file through an interactive three-phase process: scripted LLM bootstrap, conversational Q&A, and YAML validation and write. It works for both first-time setup and editing an existing config.
python cogtrix.py --setup
python cogtrix.py --setup --setup-output ~/myproject/.cogtrix.yml
python cogtrix.py --setup --setup-docs https://example.com/cogtrix-config-docs
| Option | Description |
|---|---|
--setup | Launch the interactive setup wizard and exit |
--setup-docs URL | Fetch configuration documentation from URL instead of the bundled docs/CONFIGURATION.md. Useful when running the wizard against a different documentation version. |
--setup-output FILE | Write the generated config to this path (default: ~/.cogtrix.yml) |
How the wizard works:
- Scripted bootstrap — detects
OPENAI_API_KEY,ANTHROPIC_API_KEY,GEMINI_API_KEY,XAI_API_KEY,DEEPSEEK_API_KEY, and Ollama atlocalhost:11434. Prompts for provider type (ollama,openai,anthropic,google,xai, ordeepseek), model name, and API key if needed. Tests LLM connectivity before proceeding. - LLM conversation — loads the configuration reference (bundled or fetched), loads any existing config from the standard search paths, and runs an interactive Q&A loop. The wizard LLM asks targeted questions and produces a complete YAML config in a code fence when it has enough information. Type
quitat any prompt to cancel. - Validation and write — extracts the YAML from the LLM response, injects the real API key collected during bootstrap, validates the result via an internal config round-trip, shows a masked preview for confirmation, and writes the file.
Notes:
- The wizard detects an existing config automatically and asks whether to edit it or start fresh.
- The API key field echoes
*for each character typed. The masked preview shows the first 3 and last 4 characters (e.g.sk-***4bcd) for keys ≥ 10 characters, or***for shorter keys. - Leave the API key blank for endpoints that do not require authentication (vLLM, LM Studio, and other self-hosted OpenAI-compatible servers).
- All values entered during bootstrap (provider type, base URL, model, API key) are preserved as defaults if the connection test fails — retry without re-entering unchanged fields.
- API keys entered during bootstrap are injected into the final YAML, so the LLM never sees the actual key value.
- The output file is shown after writing:
Config written to: ~/.cogtrix.yml.
Docker auto-start: When running the official container image, the container automatically
launches the setup wizard if all of the following are true: (1) no command-line arguments were
passed to the container, (2) no config file exists at /app/.cogtrix.yml or /app/.cogtrix.json,
(3) none of OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, XAI_API_KEY,
DEEPSEEK_API_KEY, COGTRIX_OLLAMA, or OLLAMA_BASE_URL is set, and (4) stdin is a TTY. This
simplifies first-run setup:
docker run -it -v ~/.cogtrix.yml:/app/.cogtrix.yml ghcr.io/northlandpositronics/cogtrix:latest
# → wizard starts automatically, writes config to the mounted path
Shell Completion
Enable tab completion for bash or zsh:
# Auto-detect your shell
python cogtrix.py --install-completion
# Explicit bash
python cogtrix.py --install-completion bash
# Explicit zsh
python cogtrix.py --install-completion zsh
The command prints a script to stdout. Source it in your shell profile to activate completion:
# Add to ~/.bashrc or ~/.zshrc
eval "$(python cogtrix.py --install-completion)"
Completion works for options, subcommands, model aliases, and session IDs.
Complete Configuration Example
Below is a full configuration in both YAML and JSON. Both formats are functionally identical — pick whichever you prefer.
YAML (.cogtrix.yml)
session: default
# ─── LLM Providers (connection info only) ───────────────────────
providers:
my-server:
type: ollama
base_url: "http://192.168.1.100:11434"
openai:
type: openai
api_key: "sk-..."
groq:
type: openai
base_url: "https://api.groq.com/openai/v1"
api_key: "gsk-..."
local-gpu:
type: ollama
base_url: "http://192.168.1.101:11434"
# ─── External Services ──────────────────────────────────────────
services:
tavily:
api_key: "tvly-..."
exa:
api_key: "exa-..."
brave:
api_key: "BSA..."
openweather:
api_key: "..."
whatsapp:
waha_url: "http://localhost:3000"
allow_send: true
allow_receive: true
filter_mode: allow
contacts: ["+14155551234"]
phonebook:
alice: "+14155551234"
telegram:
bot_token: "123456:ABC-DEF..."
phonebook:
alice: "123456789"
# ─── Models (chat + embedding) ───────────────────────────────────
models:
default: fast # active model alias at startup
fast: my-server/qwen3:8b
smart:
provider: openai
model: gpt-4.1
temperature: 0.7
coder:
provider: local-gpu
model: qwen3-coder:30b-a3b
temperature: 0.3
embed-local:
provider: local-gpu
model: nomic-embed-text
# ─── Memory ─────────────────────────────────────────────────────
memory:
mode: conversation
modes:
conversation:
working_memory_size: 25
summarization: true
vector_recall_k: 3
code:
working_memory_size: 30
max_files: 20
summarization: true
vector_recall_k: 3
reasoning:
working_memory_size: 30
max_decisions: 20
summarization: true
vector_recall_k: 3
# ─── RAG ────────────────────────────────────────────────────────
rag:
docs_dir: docs
vectordb_dir: vectordb
model: embed-local
# ─── Delegation ─────────────────────────────────────────────────
delegate:
enabled: true
default_timeout: 60
allowed_models: [fast, smart, coder]
# ─── Research Delegate ───────────────────────────────────────────
research_delegate:
enabled: true
cap_ratio: 0.85
timeout: 300
# ─── Decision Accountability ────────────────────────────────────
# Off by default. Enable for high-stakes autonomous work.
decision_accountability:
enabled: false
min_confidence_threshold: 7.0
require_counter_plan: true
report_uncertainty: true
# ─── Prompt Optimizer ────────────────────────────────────────────
prompt_optimizer: true
# ─── Context Compression ────────────────────────────────────────
context_compression:
enabled: true
model: fast
min_age: 6
min_chars: 2000
# ─── MCP Servers (requires: uv pip install "cogtrix[mcp]") ──────
mcp_servers:
filesystem:
command: npx
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
requires_confirmation: true
timeout: 30
# remote-api:
# url: http://localhost:8000/sse
# headers:
# Authorization: "Bearer token"
# requires_confirmation: false
# ─── Assistant Guardrails (under services.assistant) ─────────────
# services:
# assistant:
# guardrails:
# enabled: true
# max_input_length: 4000
# unicode_checks: true
# input_patterns: []
# rate_limit:
# per_minute: 10
# per_hour: 60
# encoding_detection:
# enabled: true
# min_score: 0.6
# tool_call_guard:
# enabled: true
# injection_scan: true
# path_blocking: true
# exfiltration_detection: true
# sensitive_paths: []
# auto_blacklist:
# enabled: true
# max_violations: 2
# window_minutes: 30
# banned_output_strings: []
# block_urls_in_output: true
# pii_detection: true
# llm_judge:
# enabled: false
# model: null
JSON (.cogtrix.json)
{
"session": "default",
"providers": {
"my-server": {
"type": "ollama",
"base_url": "http://192.168.1.100:11434"
},
"openai": {
"type": "openai",
"api_key": "sk-..."
},
"groq": {
"type": "openai",
"base_url": "https://api.groq.com/openai/v1",
"api_key": "gsk-..."
},
"local-gpu": {
"type": "ollama",
"base_url": "http://192.168.1.101:11434"
}
},
"services": {
"tavily": { "api_key": "tvly-..." },
"exa": { "api_key": "exa-..." },
"brave": { "api_key": "BSA..." },
"openweather": { "api_key": "..." },
"whatsapp": {
"waha_url": "http://localhost:3000",
"allow_send": true,
"allow_receive": true,
"filter_mode": "allow",
"contacts": ["+14155551234"],
"phonebook": { "alice": "+14155551234" }
},
"telegram": {
"bot_token": "123456:ABC-DEF...",
"phonebook": { "alice": "123456789" }
},
"assistant": {
"guardrails": {
"enabled": true,
"max_input_length": 4000,
"unicode_checks": true,
"input_patterns": [],
"rate_limit": {
"per_minute": 10,
"per_hour": 60
},
"encoding_detection": {
"enabled": true,
"min_score": 0.6
},
"tool_call_guard": {
"enabled": true,
"injection_scan": true,
"path_blocking": true,
"exfiltration_detection": true,
"sensitive_paths": []
},
"auto_blacklist": {
"enabled": true,
"max_violations": 2,
"window_minutes": 30
},
"banned_output_strings": [],
"block_urls_in_output": true,
"pii_detection": true,
"llm_judge": {
"enabled": false,
"model": null
}
}
}
},
"models": {
"default": "fast",
"fast": "my-server/qwen3:8b",
"smart": {
"provider": "openai",
"model": "gpt-4.1",
"temperature": 0.7
},
"coder": {
"provider": "local-gpu",
"model": "qwen3-coder:30b-a3b",
"temperature": 0.3
},
"embed-local": {
"provider": "local-gpu",
"model": "nomic-embed-text"
}
},
"memory": {
"mode": "conversation",
"modes": {
"conversation": { "working_memory_size": 25, "summarization": true, "vector_recall_k": 3 },
"code": { "working_memory_size": 30, "max_files": 20, "summarization": true, "vector_recall_k": 3 },
"reasoning": { "working_memory_size": 30, "max_decisions": 20, "summarization": true, "vector_recall_k": 3 }
}
},
"rag": {
"docs_dir": "docs",
"vectordb_dir": "vectordb",
"model": "embed-local"
},
"delegate": {
"enabled": true,
"default_timeout": 60,
"allowed_models": ["fast", "smart", "coder"]
},
"research_delegate": {
"enabled": true,
"cap_ratio": 0.85,
"timeout": 300
},
"decision_accountability": {
"enabled": false,
"min_confidence_threshold": 7.0,
"require_counter_plan": true,
"report_uncertainty": true
},
"prompt_optimizer": true,
"context_compression": {
"enabled": true,
"model": "fast",
"min_age": 6,
"min_chars": 2000
},
"mcp_servers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
"requires_confirmation": true,
"timeout": 30
}
}
}
Note: Both examples use
"providers"(preferred). The legacy key"inference"still works as an alias.models.defaultsets the active model alias; the deprecated top-levelproviderandmodelkeys are auto-migrated on load.
Interactive Commands
See the Interactive Commands table in the README for the full list of slash commands, or type /help inside a running session.
Tip: Commands like /mode, /model, /provider, and /session work in two ways: run them without arguments to display the current value, or pass a name to switch at runtime (e.g. /mode code).
Line Editing
The interactive prompt supports full line editing via Python’s readline module:
- Left/Right arrows — Move cursor within the line
- Home/End — Jump to beginning/end of line
- Up/Down arrows — Navigate input history
- Ctrl+A / Ctrl+E — Beginning/end of line (Emacs-style)
- Ctrl+W — Delete previous word
This works out of the box on Linux and macOS. On Windows, install pyreadline3 for equivalent functionality.
Migration Guide
Migrating from the old provider/model format
In earlier versions, model settings (model name, temperature, context_window) were placed inside the provider entry, and the active model was selected via top-level provider and model keys:
# Old format — still accepted but deprecated
provider: my-server
model: qwen3:8b
providers:
my-server:
type: ollama
base_url: "http://192.168.1.100:11434"
model: qwen3:8b
temperature: 0.5
What changed: Providers now hold only connection info. All model settings live in the models registry. The active model is selected via models.default.
New format:
# New format
providers:
my-server:
type: ollama
base_url: "http://192.168.1.100:11434"
models:
default: main
main:
provider: my-server
model: qwen3:8b
temperature: 0.5
Auto-migration: Old configs continue to work without changes. When Cogtrix loads a config that
has model fields (model, temperature, context_window) inside a provider entry, they are
automatically migrated to the models registry as a new entry named after that provider. Similarly,
top-level provider and model keys are mapped to models.default by matching against existing
registry entries. A log warning is emitted for each migrated field.
Recommended steps to update manually:
- Move
model,temperature,context_window, andmax_tokensout of each provider entry and into a named entry in themodelssection. - Set
models.defaultto the alias you want active at startup. - Remove the top-level
providerandmodelkeys. - Update
--providerCLI usage to--model <alias>. - Replace the
COGTRIX_PROVIDERenvironment variable withCOGTRIX_MODEL=<alias>.
Environment variable changes
| Old | New | Notes |
|---|---|---|
COGTRIX_PROVIDER=ollama | COGTRIX_MODEL=my-alias | Set any model alias defined in models |
COGTRIX_MODEL=qwen3:8b | COGTRIX_MODEL=my-alias | Now expects a registry alias, not a bare model name |
CLI flag changes
| Old | New |
|---|---|
--provider ollama | --model <alias> |
-p ollama | -m <alias> |
New feature: Decision Accountability (ADR-0052)
No migration required. The feature is opt-in (enabled: false by default) and adds no breaking changes to existing configuration.
To enable, add the following to your .cogtrix.yml:
decision_accountability:
enabled: true
If you have an existing config that previously contained a decision_accountability key with a dict value (from a pre-release build that used a different schema), replace it with the scalar fields shown above. The old dict form is no longer read by the parser.
Debugging & Logging
Enable logging to troubleshoot issues:
# Enable logging to default file (cogtrix.log)
python cogtrix.py --log
# Enable logging to specific file
python cogtrix.py --log ~/my-logs/session.log
# Log full LLM interactions (tokens, thinking, tool calls)
python cogtrix.py --log -v
# Enable debug mode (auto-enables --log and --verbose)
python cogtrix.py --debug
python cogtrix.py --debug --log ~/debug.log
API Server Logging
The API server (python -m src.api) supports the same logging flags with one addition: debug log streaming.
python -m src.api --debug # DEBUG/INFO → stdout, WARNING+ → stderr
python -m src.api --debug --log-file /tmp/api.log # all levels → file (overrides streaming)
python -m src.api --log # INFO → cogtrix-api.log
python -m src.api --log-file /var/log/api.log # INFO → specified file
When --debug is used without --log-file, log output is split across standard streams:
- DEBUG and INFO messages go to stdout
- WARNING, ERROR, and CRITICAL messages go to stderr
This is useful for docker logs, live terminals, and log aggregators that distinguish stdout from stderr. When --log-file is provided, it takes priority and all levels are written to the file.
The COGTRIX_LOG_STREAM=1 environment variable is set internally to propagate the stream mode to the application lifespan.
Log Levels
| Mode | Level | What’s Logged |
|---|---|---|
--log | INFO | User messages, agent responses, tool calls, errors |
--log -v | INFO | Above plus: full LLM interactions, tokens, thinking content |
--debug | DEBUG | All of the above plus: message details, context info, tool inputs/outputs |
What Gets Logged
| Event | Level | Example |
|---|---|---|
| User message | INFO | User: What's the weather? |
| Agent response | INFO | Agent response |
| Tool execution | INFO | Tool: get_weather |
| Tool input | DEBUG | Tool input: {'location': 'Auckland'} |
| Tool output | DEBUG | Tool output: Current weather in... |
| Memory context | DEBUG | Context: mode=conversation, 10 messages |
| Errors | ERROR | Tool failed: get_weather - Connection error |
Example Log Output
2025-01-15 10:30:15.123 [INFO] [a1b2c3d4] User: What's the weather in Auckland?
2025-01-15 10:30:15.124 [DEBUG] [a1b2c3d4] Context: mode=conversation, 5 messages, ~1200 tokens
2025-01-15 10:30:16.500 [INFO] [a1b2c3d4] Tool: get_weather
2025-01-15 10:30:16.500 [DEBUG] [a1b2c3d4] Tool input: {'location': 'Auckland, New Zealand', 'units': 'metric'}
2025-01-15 10:30:17.200 [DEBUG] [a1b2c3d4] Tool output: Current weather in Auckland: 18°C, partly cloudy...
2025-01-15 10:30:18.500 [INFO] [a1b2c3d4] Agent response
The [a1b2c3d4] is a request ID that groups all log entries for a single user query.
Debugging Tips
-
Tool not being called? Check if the agent outputs JSON text instead of calling the tool. This may indicate conversation history issues — try a fresh session with
-s new_session. -
Timeout errors? The model may be slow. Check the provider’s status and consider using a faster model.
-
Connection errors? Verify the provider URL and that the service is running.
See Also
- Providers — provider setup, model aliases, switching at runtime
- Memory modes — conversation / code / reasoning, when to use each
- RAG / knowledge base — ingestion, vector DB, embedding providers
- CLI Reference — every flag and slash command
- REST API — every endpoint and WebSocket event