Give your chat experience document intelligence: ingest PDFs, run hybrid semantic + lexical retrieval, and stream grounded answers into CometChat.
  • Live (UI served from docs/ if deployed to GitHub Pages): static upload + ask demo
  • Source code: GitHub repository
  • Agent files: src/mastra/agents/*
  • Tools: src/mastra/tools/*
  • Workflow: src/mastra/workflows/pdf-workflow.ts
  • Server: src/server.ts

What you’ll build

  • Dual Mastra agents:
    • pdfAgent (multi‑PDF retrieval across all uploaded docs)
    • singlePdfAgent (restricted to latest uploaded PDF)
  • Deterministic retrieval + answer workflow (pdfWorkflow)
  • Express API with upload, ask (JSON), and streaming (SSE) endpoints
  • Persisted hybrid vector + BM25 store (.data/ namespace pdf)
  • Integration with CometChat AI Agents (endpoint wiring)


How it works

  • Users upload PDF(s) → server parses pages → chunks + embeddings generated → persisted in vector + manifest store.
  • Hybrid retrieval ranks candidate chunks: score = α * cosine + (1-α) * sigmoid(BM25) where α = hybridAlpha.
  • Multi‑query expansion (optional) generates paraphrased variants to boost recall.
  • Tools (retrieve-pdf-context, retrieve-single-pdf-context) return stitched context + source metadata.
  • Workflow (pdfWorkflow) orchestrates retrieval + answer; streaming endpoints emit meta then incremental token events.
  • Mastra agent(s) are exposed via REST endpoints you wire into CometChat AI Agents.

Repo layout (key files)

  • src/mastra/agents/pdf-agent.ts – multi‑PDF agent
  • src/mastra/agents/single-pdf-agent.ts – single latest PDF agent
  • src/mastra/tools/retrieve-pdf-context.ts / retrieve-single-pdf-context.ts – hybrid retrieval tools
  • src/mastra/workflows/pdf-workflow.ts – deterministic orchestration
  • src/lib/* – vector store, embeddings, manifest, PDF parsing
  • src/server.ts – Express API (upload, ask, streaming, manifest ops)
  • docs/index.html – optional static UI
  • .data/ – persisted vectors + manifest JSON

Prerequisites

  • Node.js 20+
  • OPENAI_API_KEY (embeddings + chat model)
  • A CometChat app (to register the agent)
  • (Optional) CORS_ORIGIN if restricting browser origins

Step 1 — Clone & install

Clone the example and install dependencies:
git clone https://github.com/cometchat/ai-agent-mastra-examples.git
cd ai-agent-mastra-examples/mastra-knowledge-agent-pdf
npm install
Create a .env with at least:
OPENAI_API_KEY=sk-...
PORT=3000

Step 2 — Define retrieval tools

File: src/mastra/tools/retrieve-pdf-context.ts (multi) & retrieve-single-pdf-context.ts (single)
import { createTool } from '@mastra/core/tools';
import { z } from 'zod';

export const retrieverTool = createTool({ /* simplified example for tutorial brevity */ });

Step 3 — Create agents

Files: src/mastra/agents/pdf-agent.ts, src/mastra/agents/single-pdf-agent.ts
import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { retrieverTool } from '../tools/retriever-tool';

export const pdfAgent = new Agent({ /* instruct to use retrieve-pdf-context, cite sources */ });
export const singlePdfAgent = new Agent({ /* restrict answers to latest doc */ });

Step 4 — Wire Mastra & workflow

File: src/mastra/index.ts registers agents + pdfWorkflow with storage (LibSQL or file‑backed).
import { Mastra } from '@mastra/core/mastra';
import { LibSQLStore } from '@mastra/libsql';
import { knowledgeAgent } from './agents/knowledge-agent';

export const mastra = new Mastra({ /* agents, workflow, storage */ });
Start the dev server:
npm run dev

Step 5 — Run locally

 ┌──────────────┐          ┌──────────────┐
 │  Express API │ upload → │   PDF Parser │
 └──────┬───────┘          └──────┬───────┘
   │  chunks + embeddings    │
   ▼                         │
 ┌──────────────┐   upsert/search ┌──────────────┐
 │ Vector Store │◀───────────────▶│ Embeddings   │
 └──────┬───────┘                  └──────────────┘
   │  hybrid retrieve

 ┌──────────────┐  tool calls  ┌────────────────────┐
 │ Mastra Agent │─────────────▶│ retrieve-* tools   │
 └──────┬───────┘               └─────────┬──────────┘
   │ stitched context                 │ fallback
   ▼                                  │ broaden
 ┌──────────────┐ answer tokens (SSE) ┌──────────────┐
 │  Workflow    │────────────────────▶│   Client     │
 └──────────────┘                     └──────────────┘

Step 6 — Upload and ask (API)

AgentFilePurposeTool
pdfAgentsrc/mastra/agents/pdf-agent.tsMulti‑PDF retrieval QAretrieve-pdf-context
singlePdfAgentsrc/mastra/agents/single-pdf-agent.tsLatest single PDF QAretrieve-single-pdf-context
Tool input examples:
// retrieve-pdf-context
{ query, docIds?, topK=5, hybridAlpha=0.7, multiQuery=true, qVariants=3, maxContextChars=4000 }

// retrieve-single-pdf-context
{ query, docId?, topK=5, hybridAlpha=0.7, multiQuery=true, qVariants=3, maxContextChars=4000 }
Fallback widens search (higher topK, more qVariants) if initial context is sparse.

JSON ask (Mastra dev agent route)

Mastra automatically exposes the API:
curl -X POST http://localhost:4111/api/agents/knowledge/generate   -H "Content-Type: application/json"   -d '{"messages":[{"role":"user","content":"What is covered in our docs?"}]}'
Expected response:
{
  "reply": "The docs cover..."
}

Step 7 — Deploy & connect to CometChat

  1. Deploy the project (e.g., Vercel, Railway, or AWS).
  2. Copy the deployed endpoint URL.
  3. In CometChat Dashboard → AI Agents, add a new agent:
    • Agent ID: knowledge
    • Endpoint: https://your-deployed-url/api/agents/knowledge/generate

Step 8 — Optimize & extend

  • Add more documents to the docs/ folder.
  • Use embeddings + vector DB (Pinecone, Weaviate) for large datasets.
  • Extend the agent with memory or multi-tool workflows.


Environment variables

NameDescription
OPENAI_API_KEYRequired for embeddings + model
CORS_ORIGINOptional allowed browser origin
PORTServer port (default 3000)
.env example:
OPENAI_API_KEY=sk-...
CORS_ORIGIN=http://localhost:3000
PORT=3000

Endpoint summary

MethodPathDescription
POST/api/uploadUpload a PDF (multipart) returns { docId, pages, chunks }
GET/api/documentsList ingested documents
DELETE/api/documents/:idDelete a document + vectors
GET/api/documents/:id/fileStream stored original PDF
POST/api/askMulti‑PDF retrieval + answer (JSON)
POST/api/ask/fullSame as /api/ask (deterministic path)
POST/api/ask/streamMulti‑PDF streaming (SSE)
POST/api/ask/singleSingle latest PDF answer (JSON)
POST/api/ask/single/streamSingle latest PDF streaming (SSE)

Curl Examples

Upload:
curl -X POST http://localhost:3000/api/upload \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/path/to/file.pdf"
Ask (multi):
curl -X POST http://localhost:3000/api/ask \
  -H 'Content-Type: application/json' \
  -d '{"question":"Summarize the abstract","topK":6}'
Stream (multi):
curl -N -X POST http://localhost:3000/api/ask/stream \
  -H 'Content-Type: application/json' \
  -d '{"question":"List key methods","multiQuery":true}'
Ask (single):
curl -X POST http://localhost:3000/api/ask/single \
  -H 'Content-Type: application/json' \
  -d '{"question":"What are the main conclusions?"}'
Stream (single):
curl -N -X POST http://localhost:3000/api/ask/single/stream \
  -H 'Content-Type: application/json' \
  -d '{"question":"Give me an outline"}'

SSE Events

EventPayloadNotes
meta{ sources, docId? }First packet with retrieval metadata
token{ token }Incremental answer token chunk
done{}Completion marker
error{ error }Error occurred

Tuning & retrieval knobs

ParameterEffectTrade‑off
hybridAlphaHigher = more semantic weightToo high reduces keyword recall
topKMore chunks = broader contextLarger responses, slower
multiQueryRecall across paraphrasesExtra model + embedding cost
qVariantsAlternative queries for expansionDiminishing returns >5
maxContextCharsCaps stitched context sizeToo small omits evidence
Tip: For exploratory QA try topK=8, qVariants=5.

Troubleshooting & debugging

  • Enable internal logging (if available) to inspect scoring.
  • Inspect vectors: open .data/pdf-vectors.json.
  • Manifest corrupted? Delete .data/manifest.json and re‑upload.
  • Low lexical relevance? Lower hybridAlpha (e.g. 0.55).
  • Noise / irrelevant chunks? Reduce topK or lower qVariants.

Hardening & roadmap

  • SSE/WebSocket answer token streaming to clients (UI consumption)
  • Source highlighting + per‑chunk confidence
  • Semantic / layout‑aware advanced chunking
  • Vector deduplication & compression
  • Auth layer (API keys / JWT) & per‑user isolation
  • Background ingestion queue for large docs
  • Retrieval quality regression tests

Repository Links
  • Source: GitHub Repository
  • Multi agent: pdf-agent.ts
  • Single agent: single-pdf-agent.ts
  • Tools: retrieve-pdf-context.ts, retrieve-single-pdf-context.ts
  • Workflow: pdf-workflow.ts
  • Server: server.ts