Web & Search · Cogtrix

Search

Cogtrix exposes a single canonical research tool: web_search. It runs a multi-provider fan-out (DuckDuckGo always; Tavily / Exa / Brave / Google / SerpAPI / SearXNG when their API keys are configured), fetches top results, extracts page content with trafilatura, and returns a structured Markdown picture — sources with ①②③… citation indices, optional synthesis section, disagreements between sources called out explicitly, and a coverage block reporting per-provider + per-fetch outcomes.

Architecture and design rationale: ADR-0056 (held in the private documentation submodule).

web_search

Universal web research tool — multi-provider fan-out, fetch, extract, format.

Parameters:

Parameter	Type	Required	Default	Description
`query`	string	Yes	—	The research query
`depth`	int	No	`3`	Top-K sources to fetch + extract (1–10). Higher = more breadth, longer wall time. The historical lxml GIL bottleneck that motivated lowering the default from `6` was removed in PR #1716 — extraction now runs in a `ProcessPoolExecutor` so pages are parsed in true parallel; the in-process `_LXML_LOCK` is retained as an unused export for back-compat with callers that still imported it. Default of `3` is now a latency choice, not a serialisation workaround. Set `depth` explicitly (5–10) for deep research.
`region`	string	No	`"wt-wt"`	Region hint for providers that accept one (e.g. DDG).
`compact`	bool	No	`false`	When `true`, drop per-source extracts and the Additional Sources tail (~5 KB vs ~18 KB output).

Returns: Markdown blob with sections (in order):

# Research: <query> header.
## Key findings — synthesised cross-source facts with [①②③…] citations. Stage 5 synthesis runs in-tool by default; the section is omitted only when synthesis is explicitly disabled or its deadline (10 s) expires.
## Disagreements — emitted when sources state directly contradictory facts.
## Gaps — aspects of the query the search couldn’t answer.
## Sources — flat index of cited URLs with domain class + recency tag.
Per-source extract bodies (non-compact mode).
## Additional sources — snippet-only tail of URLs that survived dedup but didn’t make top-K (non-compact mode).
## Coverage — operator-facing summary: providers responded, raw vs distinct count, fetch outcomes, synthesis model + elapsed, total wall time.

Failure modes: The full reliability table is in ADR-0056. Key categories: validation-failed, blocked-robots, cross-domain-redirect, ssl-error, rate-limited, http-status, timeout. Every failure produces partial-but-useful output; the hard outer deadline is 25 s (raised from 15 s in PR #1687).

SSRF safety: Every fetch (including the robots.txt probe and every redirect hop) is DNS-pinned to the IP that _validate_url resolved up front — the connect target cannot diverge from the validated address. See src/tools/_http_fetch.py for the mechanism.

Retired legacy tools

The following tools were removed from the agent catalogue when web_search shipped. The underlying functions remain importable from their respective modules for power users and internal use; the agent simply no longer sees them as discoverable tools:

search_web (DuckDuckGo, see src/tools/web_search.py; search_news is also importable but is not part of the agent catalogue)
tavily_search (src/tools/tavily_search.py)
brave_search (src/tools/brave_search.py)
google_search (src/tools/google_search.py)
exa_search (src/tools/exa_search.py)
serpapi_search (src/tools/serpapi_search.py)
searxng_search (src/tools/searxng_search.py)

tavily_extract, exa_find_similar, and exa_get_contents remain in the catalogue — they cover use cases (URL-targeted extraction, semantic similarity) that web_search does not subsume.

Web & HTTP

http_get

Make HTTP GET requests.

Parameters:

Parameter	Type	Required	Description
`url`	string	Yes	URL to request
`headers`	string	No	Request headers as JSON string
`timeout`	int	No	Timeout in seconds (default: 30)

Returns: Response body and status code

http_post

Make HTTP POST requests with JSON data.

Requires Confirmation: Yes

Parameters:

Parameter	Type	Required	Description
`url`	string	Yes	URL to request
`data`	string	Yes	Request body as JSON string
`headers`	string	No	Request headers as JSON string
`timeout`	int	No	Timeout in seconds (default: 30)

Weather

get_weather

Get current weather for any location.

Requires: OpenWeather API key (set in config or OPENWEATHER_API_KEY)

Parameters:

Parameter	Type	Required	Description
`location`	string	Yes	City name or coordinates
`units`	string	No	Units: `metric`, `imperial` (default: metric)

Returns:

{
  "temperature": 22,
  "feels_like": 24,
  "humidity": 65,
  "description": "partly cloudy",
  "wind_speed": 12
}