Cogtrix RAG Guide

Turn your own documents into a searchable knowledge base that the agent can query during conversation. This feature uses Retrieval-Augmented Generation (RAG): your documents are split into chunks, converted into vector embeddings, and stored in a local FAISS index. When you ask a question, the most relevant chunks are retrieved and sent to the LLM alongside your query.

Overview
Quick Start
Document Preparation
Ingestion
Querying
Embedding Providers
Configuration
Troubleshooting

Overview

RAG (Retrieval-Augmented Generation) allows the agent to answer questions based on your documents. The process:

graph TD Docs([Documents in docs/]) Chunks(Split into chunks) Embed(Create embeddings) Store(Store in FAISS index) Search(Semantic search) Chunks2([Relevant chunks]) LLM([LLM]) Answer([Answer]) Docs -- Ingestion --> Chunks Chunks --> Embed Embed --> Store Store -- Query --> Search Search --> Chunks2 Chunks2 --> LLM LLM --> Answer

Quick Start

1. Add Documents

mkdir -p docs
cp your-documents.pdf docs/
cp your-notes.md docs/

2. Build Vector Database

# Using Ollama embeddings (default — local, free)
python cogtrix.py --ingest

# Using OpenAI embeddings instead (requires API key)
python cogtrix.py --ingest --embedding-provider openai

3. Query

python cogtrix.py
You: What does the policy say about remote work?

Document Preparation

Supported Formats

Format	Extensions	Notes
PDF	`.pdf`	Text-based PDFs (not scanned images)
Markdown	`.md`, `.markdown`	Plain text with formatting
Text	`.txt`	Plain text files
CSV	`.csv`	Tabular data

Best Practices

Use text-based PDFs — Scanned documents won’t work without OCR
Structure documents — Use headings, sections, lists
Include context — Document titles and sources help retrieval
Keep files focused — One topic per document improves relevance

Directory Structure

Place files in the docs directory — subdirectories are traversed recursively, so any folder layout is supported.

docs/
├── remote-work-policy.pdf
├── expense-policy.pdf
├── onboarding-guide.md
├── tech-stack.md
└── employees.csv

You can organize files into subdirectories; they will all be ingested. Use --docs-dir to point at a specific subdirectory if you only want to ingest part of your document tree.

Ingestion

Basic Ingestion

python cogtrix.py --ingest

With Options

# Custom documents directory
python cogtrix.py --ingest --docs-dir ./company-docs

# Custom output location
python cogtrix.py --ingest --vectordb-dir ./my-vectordb

# Use Ollama embeddings
python cogtrix.py --ingest --embedding-provider ollama

# Specific embedding model
python cogtrix.py --ingest --embedding-provider ollama --embedding-model mxbai-embed-large

# Full customization
python cogtrix.py --ingest \
  --docs-dir ./legal-docs \
  --vectordb-dir ./legal-vectordb \
  --embedding-provider ollama \
  --embedding-model nomic-embed-text

Ingestion Output

📚 RAG Document Ingestion

  Documents directory: docs
  Vector DB output:    vectordb
  Embedding provider:  ollama

✓ Loaded 15 document(s)
✓ Created 234 chunk(s)
✓ Saved to vectordb/faiss_index

Re-ingestion

To update the knowledge base after adding new documents:

# Re-run ingestion (overwrites existing index)
python cogtrix.py --ingest

Querying

Auto-Activation

When a knowledge base exists (either a global CLI index or per-document API indexes), the query_knowledge_base tool is automatically pinned as active at startup. The agent can use it immediately without loading it via request_tools. The tool description dynamically shows the number of indexes and their total size.

The tool searches all available FAISS indexes:

Global CLI index — built via --ingest, stored at data/vectordb/faiss_index/
Per-document API indexes — created when documents are uploaded via the API, stored at data/api/uploads/{doc_id}/vectordb/faiss_index/

Results from all indexes are merged and deduplicated by content (first 200 characters).

In Conversation

The agent automatically uses the knowledge base when relevant:

You: What are the requirements for expense reports?

Agent: Based on the expense policy document, the requirements are:
1. Submit within 30 days of expense
2. Include receipts for amounts over $25
3. Use the standard expense form
[Source: expense-policy.pdf, page 3]

Direct Tool Usage

The query_knowledge_base tool can be used explicitly:

You: Search the knowledge base for "vacation policy"

Saving Notes

Use save_to_knowledge_base to persist a note, fact, or decision for later retrieval by the agent.

You: Save this: the deployment window is Friday at 18:00 UTC.

The tool accepts:

content - required note text to store
source - optional origin label, default agent
tags - optional list of topic tags

Saved notes go to the dedicated agent-notes sub-index when FAISS is available. If FAISS is unavailable, Cogtrix falls back to a JSONL log so the information is still preserved.

Query Parameters

Parameter	Default	Description
`question`	Required	Search query
`k`	4	Number of chunks to retrieve (1-10)

Embedding Providers

The default embedding provider is ollama (local, no API key required). OpenAI is also supported via --embedding-provider openai.

Ollama Embeddings (default)

Pros: Free, local, no API key Cons: Requires Ollama running

# Make sure Ollama is running
ollama serve

# Pull embedding model
ollama pull nomic-embed-text

# Run ingestion (default — no flags needed)
python cogtrix.py --ingest

Default model: nomic-embed-text

OpenAI Embeddings

Pros: High quality, fast Cons: Requires API key, costs money

export OPENAI_API_KEY="sk-..."
python cogtrix.py --ingest --embedding-provider openai

Default model: text-embedding-3-small

Google Embeddings

Note: Google embeddings are supported via the config file (rag.model referencing a Google provider) but are NOT available via the --embedding-provider CLI flag.

Pros: High quality Cons: Requires API key (GEMINI_API_KEY)

export GEMINI_API_KEY="..."
# Configure Google embeddings via .cogtrix.yml (see "Using Named Providers" below)

Default model: text-embedding-004

Requires langchain-google-genai: uv pip install "cogtrix[google]"

Using Named Providers

You can reference any named provider from your config for embeddings by defining a model entry in the models registry and pointing rag.model at it. The provider connection details (type, base_url, api_key) are resolved automatically.

providers:
  gpu-server:
    type: ollama
    base_url: "http://192.168.1.100:11434"
  cloud-openai:
    type: openai
    api_key: "sk-..."

models:
  embed-local:
    provider: gpu-server
    model: nomic-embed-text
  embed-cloud:
    provider: cloud-openai
    model: text-embedding-3-small

rag:
  model: embed-local

Switch between embedding providers by changing the rag.model value — no need to touch the provider entries themselves.

Available Ollama Embedding Models

Model	Size	Quality
`nomic-embed-text`	274M	Good
`mxbai-embed-large`	670M	Better
`all-minilm`	46M	Fast, smaller
`nomic-embed-text-v2-moe`	MoE	Advanced

Configuration

Via Config File

rag:
  docs_dir: docs
  vectordb_dir: vectordb
  chunk_size: 2000
  chunk_overlap: 200
  model: embed-local

Configuration Options

Option	Default	Description
`docs_dir`	`"docs"`	Source documents directory
`vectordb_dir`	`"vectordb"`	Vector database output
`chunk_size`	`2000`	Characters per chunk
`chunk_overlap`	`200`	Overlap between chunks
`model`	`null`	Model name from the `models` registry for embeddings. Falls back to the active provider when not set.

Chunk Size Guidelines

Document Type	Recommended Size	Overlap
Technical docs	1000-1500	150-200
Legal documents	800-1200	200-300
General text	1200-1500	150-200
Short FAQs	500-800	100-150

Smaller chunks = more precise retrieval, larger context window usage
Larger chunks = more context per chunk, fewer chunks needed

Troubleshooting

”No vector store found"

Cause: Vector database hasn't been built
Solution: Run python cogtrix.py --ingest

"Documents directory not found"

Cause: docs/ directory doesn't exist
Solution: mkdir -p docs && cp your-files.pdf docs/

"No documents loaded"

Cause: No supported files in docs/
Solution: Add PDF, MD, TXT, or CSV files to docs/

"Failed to create embeddings"

Cause: Missing or invalid API key (OpenAI/Google), or Ollama not running
Solutions:
  # For OpenAI
  export OPENAI_API_KEY="sk-..."
  python cogtrix.py --ingest --embedding-provider openai

  # For Google (config file only — not available via --embedding-provider)
  # See "Google Embeddings" section above for config-based setup

  # Use Ollama (default, no API key needed)
  python cogtrix.py --ingest

"Failed to connect to Ollama"

Cause: Ollama not running
Solution:
  1. Start Ollama: ollama serve
  2. Pull model: ollama pull nomic-embed-text
  3. Retry ingestion

"Out of memory during ingestion”

Cause: Too many documents or large files
Solutions:
  1. Process fewer documents at a time
  2. Use smaller embedding model
  3. Reduce chunk_size in config

Poor retrieval quality

Causes & Solutions:
  1. Chunk size too large → Reduce chunk_size
  2. Wrong embedding model → Try different model
  3. Documents poorly structured → Improve formatting
  4. Query too vague → Be more specific

Embedding mismatch error

Cause: Query uses different embedding model than index
Solution: Rebuild index with same model you'll use for queries
  python cogtrix.py --ingest --embedding-provider <same-provider>

Advanced Usage

Multiple Knowledge Bases

Create separate knowledge bases for different topics:

# Legal documents
python cogtrix.py --ingest --docs-dir ./legal --vectordb-dir ./data/legal-vectordb

# Technical docs
python cogtrix.py --ingest --docs-dir ./tech --vectordb-dir ./data/tech-vectordb

All available indexes (global CLI index and per-document API indexes) are searched automatically and results are merged.

Programmatic Access

from src.rag import ingest_documents, IngestConfig
from pathlib import Path

# Using Ollama (default)
config = IngestConfig(
    docs_dir=Path("./my-docs"),
    vectordb_dir=Path("./my-vectordb"),
    embedding_provider="ollama",
    embedding_model="nomic-embed-text",
)

# Using OpenAI or Google — pass the api_key explicitly
# config = IngestConfig(
#     docs_dir=Path("./my-docs"),
#     vectordb_dir=Path("./my-vectordb"),
#     embedding_provider="openai",
#     embedding_model="text-embedding-3-small",
#     api_key="sk-...",
# )

result = ingest_documents(config)

if result.success:
    print(f"Created {result.chunks_created} chunks")
else:
    print(f"Errors: {result.errors}")

RAG / knowledge base