Cogtrix RAG Guide
Turn your own documents into a searchable knowledge base that the agent can query during conversation. This feature uses Retrieval-Augmented Generation (RAG): your documents are split into chunks, converted into vector embeddings, and stored in a local FAISS index. When you ask a question, the most relevant chunks are retrieved and sent to the LLM alongside your query.
Table of Contents
- Overview
- Quick Start
- Document Preparation
- Ingestion
- Querying
- Embedding Providers
- Configuration
- Troubleshooting
Overview
RAG (Retrieval-Augmented Generation) allows the agent to answer questions based on your documents. The process:
Quick Start
1. Add Documents
mkdir -p docs
cp your-documents.pdf docs/
cp your-notes.md docs/
2. Build Vector Database
# Using Ollama embeddings (default — local, free)
python cogtrix.py --ingest
# Using OpenAI embeddings instead (requires API key)
python cogtrix.py --ingest --embedding-provider openai
3. Query
python cogtrix.py
You: What does the policy say about remote work?
Document Preparation
Supported Formats
| Format | Extensions | Notes |
|---|---|---|
.pdf | Text-based PDFs (not scanned images) | |
| Markdown | .md, .markdown | Plain text with formatting |
| Text | .txt | Plain text files |
| CSV | .csv | Tabular data |
Best Practices
- Use text-based PDFs — Scanned documents won’t work without OCR
- Structure documents — Use headings, sections, lists
- Include context — Document titles and sources help retrieval
- Keep files focused — One topic per document improves relevance
Directory Structure
Place files in the docs directory — subdirectories are traversed recursively, so any folder layout is supported.
docs/ ├── remote-work-policy.pdf ├── expense-policy.pdf ├── onboarding-guide.md ├── tech-stack.md └── employees.csv
You can organize files into subdirectories; they will all be ingested. Use --docs-dir to point at a specific subdirectory if you only want to ingest part of your document tree.
Ingestion
Basic Ingestion
python cogtrix.py --ingest
With Options
# Custom documents directory
python cogtrix.py --ingest --docs-dir ./company-docs
# Custom output location
python cogtrix.py --ingest --vectordb-dir ./my-vectordb
# Use Ollama embeddings
python cogtrix.py --ingest --embedding-provider ollama
# Specific embedding model
python cogtrix.py --ingest --embedding-provider ollama --embedding-model mxbai-embed-large
# Full customization
python cogtrix.py --ingest \
--docs-dir ./legal-docs \
--vectordb-dir ./legal-vectordb \
--embedding-provider ollama \
--embedding-model nomic-embed-text
Ingestion Output
📚 RAG Document Ingestion
Documents directory: docs
Vector DB output: vectordb
Embedding provider: ollama
✓ Loaded 15 document(s)
✓ Created 234 chunk(s)
✓ Saved to vectordb/faiss_index
Re-ingestion
To update the knowledge base after adding new documents:
# Re-run ingestion (overwrites existing index)
python cogtrix.py --ingest
Querying
Auto-Activation
When a knowledge base exists (either a global CLI index or per-document API indexes), the query_knowledge_base tool is automatically pinned as active at startup. The agent can use it immediately without loading it via request_tools. The tool description dynamically shows the number of indexes and their total size.
The tool searches all available FAISS indexes:
- Global CLI index — built via
--ingest, stored atdata/vectordb/faiss_index/ - Per-document API indexes — created when documents are uploaded via the API, stored at
data/api/uploads/{doc_id}/vectordb/faiss_index/
Results from all indexes are merged and deduplicated by content (first 200 characters).
In Conversation
The agent automatically uses the knowledge base when relevant:
You: What are the requirements for expense reports?
Agent: Based on the expense policy document, the requirements are:
1. Submit within 30 days of expense
2. Include receipts for amounts over $25
3. Use the standard expense form
[Source: expense-policy.pdf, page 3]
Direct Tool Usage
The query_knowledge_base tool can be used explicitly:
You: Search the knowledge base for "vacation policy"
Saving Notes
Use save_to_knowledge_base to persist a note, fact, or decision for later retrieval by the agent.
You: Save this: the deployment window is Friday at 18:00 UTC.
The tool accepts:
content- required note text to storesource- optional origin label, defaultagenttags- optional list of topic tags
Saved notes go to the dedicated agent-notes sub-index when FAISS is available. If FAISS is unavailable, Cogtrix falls back to a JSONL log so the information is still preserved.
Query Parameters
| Parameter | Default | Description |
|---|---|---|
question | Required | Search query |
k | 4 | Number of chunks to retrieve (1-10) |
Embedding Providers
The default embedding provider is ollama (local, no API key required). OpenAI is also supported via --embedding-provider openai.
Ollama Embeddings (default)
Pros: Free, local, no API key Cons: Requires Ollama running
# Make sure Ollama is running
ollama serve
# Pull embedding model
ollama pull nomic-embed-text
# Run ingestion (default — no flags needed)
python cogtrix.py --ingest
Default model: nomic-embed-text
OpenAI Embeddings
Pros: High quality, fast Cons: Requires API key, costs money
export OPENAI_API_KEY="sk-..."
python cogtrix.py --ingest --embedding-provider openai
Default model: text-embedding-3-small
Google Embeddings
Note: Google embeddings are supported via the config file (
rag.modelreferencing a Google provider) but are NOT available via the--embedding-providerCLI flag.
Pros: High quality
Cons: Requires API key (GEMINI_API_KEY)
export GEMINI_API_KEY="..."
# Configure Google embeddings via .cogtrix.yml (see "Using Named Providers" below)
Default model: text-embedding-004
Requires langchain-google-genai: uv pip install "cogtrix[google]"
Using Named Providers
You can reference any named provider from your config for embeddings by defining a model entry in the models registry and pointing rag.model at it. The provider connection details (type, base_url, api_key) are resolved automatically.
providers:
gpu-server:
type: ollama
base_url: "http://192.168.1.100:11434"
cloud-openai:
type: openai
api_key: "sk-..."
models:
embed-local:
provider: gpu-server
model: nomic-embed-text
embed-cloud:
provider: cloud-openai
model: text-embedding-3-small
rag:
model: embed-local
Switch between embedding providers by changing the rag.model value — no need to touch the provider entries themselves.
Available Ollama Embedding Models
| Model | Size | Quality |
|---|---|---|
nomic-embed-text | 274M | Good |
mxbai-embed-large | 670M | Better |
all-minilm | 46M | Fast, smaller |
nomic-embed-text-v2-moe | MoE | Advanced |
Configuration
Via Config File
rag:
docs_dir: docs
vectordb_dir: vectordb
chunk_size: 2000
chunk_overlap: 200
model: embed-local
Configuration Options
| Option | Default | Description |
|---|---|---|
docs_dir | "docs" | Source documents directory |
vectordb_dir | "vectordb" | Vector database output |
chunk_size | 2000 | Characters per chunk |
chunk_overlap | 200 | Overlap between chunks |
model | null | Model name from the models registry for embeddings. Falls back to the active provider when not set. |
Chunk Size Guidelines
| Document Type | Recommended Size | Overlap |
|---|---|---|
| Technical docs | 1000-1500 | 150-200 |
| Legal documents | 800-1200 | 200-300 |
| General text | 1200-1500 | 150-200 |
| Short FAQs | 500-800 | 100-150 |
Smaller chunks = more precise retrieval, larger context window usage
Larger chunks = more context per chunk, fewer chunks needed
Troubleshooting
”No vector store found"
Cause: Vector database hasn't been built
Solution: Run python cogtrix.py --ingest
"Documents directory not found"
Cause: docs/ directory doesn't exist
Solution: mkdir -p docs && cp your-files.pdf docs/
"No documents loaded"
Cause: No supported files in docs/
Solution: Add PDF, MD, TXT, or CSV files to docs/
"Failed to create embeddings"
Cause: Missing or invalid API key (OpenAI/Google), or Ollama not running
Solutions:
# For OpenAI
export OPENAI_API_KEY="sk-..."
python cogtrix.py --ingest --embedding-provider openai
# For Google (config file only — not available via --embedding-provider)
# See "Google Embeddings" section above for config-based setup
# Use Ollama (default, no API key needed)
python cogtrix.py --ingest
"Failed to connect to Ollama"
Cause: Ollama not running
Solution:
1. Start Ollama: ollama serve
2. Pull model: ollama pull nomic-embed-text
3. Retry ingestion
"Out of memory during ingestion”
Cause: Too many documents or large files
Solutions:
1. Process fewer documents at a time
2. Use smaller embedding model
3. Reduce chunk_size in config
Poor retrieval quality
Causes & Solutions:
1. Chunk size too large → Reduce chunk_size
2. Wrong embedding model → Try different model
3. Documents poorly structured → Improve formatting
4. Query too vague → Be more specific
Embedding mismatch error
Cause: Query uses different embedding model than index
Solution: Rebuild index with same model you'll use for queries
python cogtrix.py --ingest --embedding-provider <same-provider>
Advanced Usage
Multiple Knowledge Bases
Create separate knowledge bases for different topics:
# Legal documents
python cogtrix.py --ingest --docs-dir ./legal --vectordb-dir ./data/legal-vectordb
# Technical docs
python cogtrix.py --ingest --docs-dir ./tech --vectordb-dir ./data/tech-vectordb
All available indexes (global CLI index and per-document API indexes) are searched automatically and results are merged.
Programmatic Access
from src.rag import ingest_documents, IngestConfig
from pathlib import Path
# Using Ollama (default)
config = IngestConfig(
docs_dir=Path("./my-docs"),
vectordb_dir=Path("./my-vectordb"),
embedding_provider="ollama",
embedding_model="nomic-embed-text",
)
# Using OpenAI or Google — pass the api_key explicitly
# config = IngestConfig(
# docs_dir=Path("./my-docs"),
# vectordb_dir=Path("./my-vectordb"),
# embedding_provider="openai",
# embedding_model="text-embedding-3-small",
# api_key="sk-...",
# )
result = ingest_documents(config)
if result.success:
print(f"Created {result.chunks_created} chunks")
else:
print(f"Errors: {result.errors}")
See Also
- CONFIGURATION.md — Full configuration reference
- TOOLS_REFERENCE.md — query_knowledge_base tool
- PROVIDERS.md — Embedding provider setup