Skip to main content
Imagine a FastAPI service that ingests your documentation, stores it in a vector database, and streams Agno agent responses with citations that CometChat can consume in real time.

What You’ll Build

  • An Agno agent that joins conversations as a documentation expert.
  • An ingestion pipeline that writes markdown artifacts into knowledge_agent/data/knowledge/<namespace>.
  • Retrieval and answering logic that always cites the sources it used.
  • A /stream endpoint that outputs newline-delimited JSON events so CometChat can subscribe without changes.

Prerequisites

  • Python 3.10 or newer (3.11 recommended).
  • OPENAI_API_KEY with access to GPT-4o or any compatible model.
  • Optional: alternate OpenAI base URL or model IDs if you self-host OpenAI-compatible APIs.
  • curl or an API client (Hoppscotch, Postman) to call the FastAPI endpoints.


How it works

This example recreates the Vercel knowledge base workflow using Agno:
  • Ingestcollect_documents accepts URLs, markdown, plain text, uploads, or multipart forms. Sources are deduplicated by a SHA-256 hash and normalized into markdown.
  • StoreKnowledgeManager keeps one ChromaDb collection per namespace, with metadata persisted under knowledge_agent/data/knowledge/<namespace>.
  • Retrieve — Searches hit the vector DB via Agno’s Knowledge class, returning ranked snippets and the metadata used for citations.
  • Answercreate_agent enables search_knowledge and add_knowledge_to_context, forcing every response to cite sources via the system prompt.
  • Stream/stream emits newline-delimited JSON events (text_delta, tool_*, text_done, done, error) that match CometChat’s Bring Your Own Agent expectations. Every event echoes the caller’s thread_id and run_id.

Setup

1

Clone & install

git clone https://github.com/cometchat/ai-agent-agno-examples.git, then inside the repo run:
python3 -m venv .venv && source .venv/bin/activate && pip install -e .
2

Configure environment

Create .env (or export env vars) with at least OPENAI_API_KEY. Optional overrides: OPENAI_BASE_URL, KNOWLEDGE_OPENAI_MODEL, KNOWLEDGE_STORAGE_PATH, KNOWLEDGE_CHROMA_PATH.
3

Start the server

Launch FastAPI with uvicorn knowledge_agent.main:app —host 0.0.0.0 —port 8000 —reload. The app exposes health, ingestion, search, generate, and /stream endpoints (newline-delimited JSON).

Project Structure


Step 1 - Configure the Knowledge Agent

KnowledgeManager.create_agent builds an Agno agent bound to the current namespace:
  • Uses OpenAIChat with OPENAI_API_KEY, optional custom base URL, and temperature from settings.
  • Enables search_knowledge=True and add_knowledge_to_context=True so retrieved snippets feed the model.
  • Injects a system prompt that demands a knowledge search before every reply and enforces the "Sources: <file>.md" footer.
  • Reuses the namespace-specific ChromaDb collection initialised in _get_namespace.

Step 2 - Ingest Knowledge

POST /api/tools/ingest accepts JSON or multipart payloads. Highlights:
  • Up to 30 sources per call, 6 MB per file, 200 kB per inline text/markdown.
  • URLs, PDFs, HTML pages, plain text, and uploads are normalized to markdown with metadata and timestamps.
  • Duplicate hashes are skipped with a "duplicate-content" reason; existing files return "already-ingested".
  • Responses provide saved, skipped, errors, and the resolved namespace.
Example JSON payload:
curl -X POST http://localhost:8000/api/tools/ingest \
  -H "Content-Type: application/json" \
  -d '{
        "namespace": "default",
        "sources": [
          { "type": "url", "value": "https://docs.agno.com/concepts/agents/overview" },
          { "type": "markdown", "title": "Playbook", "value": "# Notes\n\nAgno rocks!" }
        ]
      }'

Step 3 - Search & Validate

POST /api/tools/searchDocs lets you confirm retrieval before opening the agent to users:
  • Required body: {"query": "How do I add tools?"} with optional namespace and max_results.
  • Returns ranked snippets with metadata (hashes, distances converted to scores).
  • Empty queries immediately return an error so the UI can prompt the operator.

Step 4 - Chat & Stream

  • POST /api/agents/knowledge/generate handles non-streaming responses.
  • POST /stream streams newline-delimited JSON events that include tool calls, intermediate reasoning, text deltas, and completion markers.
Streaming example (newline-delimited JSON):
curl -N http://localhost:8000/stream \
  -H "Content-Type: application/json" \
  -d '{
        "thread_id": "thread_1",
        "run_id": "run_001",
        "messages": [
          { "role": "user", "content": "Summarize the agent lifecycle." }
        ]
      }'
Each line is a JSON object with a type field such as text_delta, tool_call_start, tool_result, text_done, or done. thread_id and run_id are echoed back so CometChat can correlate partial events.

Step 5 - Connect to CometChat

  • Deploy the FastAPI service behind HTTPS (e.g., Fly.io, Render, Railway, or your own Kubernetes cluster).
  • Add auth headers or gateway middleware if you need to validate incoming requests from CometChat.
  • In the CometChat dashboard, point the Agno agent’s Deployment URL at the /stream endpoint; use Headers for bearer tokens or basic auth if required.
  • Provide namespace (or toolParams.namespace from CometChat) when you need to target non-default knowledge stores; the service normalizes values before lookup.
With that, you have a fully grounded Agno agent that streams CometChat-compatible events into your UI.