A RAG chatbot is not a model that “knows your business.” It is a retrieval system bolted onto a language model: when a customer asks something, the system first finds the most relevant passages from your own documentation, then asks the LLM to answer using only those passages. Get the retrieval right and the model stays factual. Get it wrong and you have a confident liar wearing your brand.
The architecture, end to end
- Knowledge sources: help-centre articles, product docs, past tickets, policies. Clean, deduplicated, with clear ownership for updates.
- Chunking and embeddings: documents are split into passages and turned into vectors so they can be searched by meaning, not just keywords.
- Vector store: a database (pgvector, Pinecone, Qdrant) that returns the closest passages for any question in milliseconds.
- Retrieval + prompt: the top passages are injected into the prompt with an instruction to answer only from them and cite the source.
- Guardrails: refusal rules, a confidence threshold, and a fallback to a human when retrieval comes back weak.
What it actually costs
Two cost lines matter. Build cost is mostly one-off: cleaning the knowledge base, wiring the pipeline, writing guardrails and tuning retrieval. Run cost is per conversation — embeddings are cheap, but each answer pays for the LLM call plus the retrieved context tokens. A well-scoped agent typically resolves a clear majority of routine tickets at a fraction of the cost of a human reply, and the savings compound as the knowledge base improves. The mistake is treating it as a fixed asset; it is a system that needs an owner and a maintenance budget.
The pitfalls that sink projects
- Hallucination with no citation: if the model can answer without showing a source, it will eventually invent one. Force source-grounded answers.
- A stale knowledge base: the agent confidently quotes a policy you changed three months ago. Updates need an owner, not good intentions.
- No human fallback: when retrieval is weak, a polite “I’ll connect you to a person” beats a wrong answer every time.
- Retrieval that returns the wrong passages: usually a chunking or embedding problem, not a model problem — fix it before blaming the LLM.
- Shipping without evaluation: you cannot improve what you do not measure. Track resolution rate, escalation rate and answer accuracy from day one.
A good RAG support agent is mostly unglamorous work: clean data, honest guardrails, and a measurement loop. The model is the easy part. We build them the same way we build everything else — senior-reviewed, scoped to a clear job, and shipped with a plan for the day the knowledge base changes.
Related services