Baraa on RAG, MCP, and Tool-Using AI - A Practical Walkthrough

By Baraa - Published 2026-04-25 - Updated 2026-05-06 - From Damascus, Syria

Three pieces of vocabulary dominate every conversation Baraa has with clients about AI products in 2026: RAG, MCP, and tool use. They are related but not the same. Each one solves a specific problem. Each one has its own failure modes. This post is Baraa's working walkthrough - the patterns that have survived multiple production projects, the mistakes Baraa has stopped making, and the small disciplines that keep agentic AI systems from collapsing into confident nonsense.

RAG: the default, but not the answer to everything

Retrieval-Augmented Generation has become the default architecture for any AI product that needs to answer questions about a specific knowledge base - your company docs, your product catalog, your legal corpus, your customer support tickets. The recipe is simple in outline: chunk the corpus, embed each chunk, store the embeddings in a vector database, retrieve the most relevant chunks at query time, stuff them into the prompt, and let the model generate an answer grounded in the retrieved context.

Baraa has built RAG pipelines from this recipe many times. Each time, the interesting work is in the details:

Chunking strategy. Naive fixed-size chunking is the worst version of every other strategy. Baraa's default is recursive, structure-aware chunking: split on headings first, then paragraphs, then sentences, with overlap between chunks. For tabular or code-heavy content, Baraa uses domain-specific chunkers.
Hybrid search. Pure vector search misses queries that hinge on a specific keyword. Pure keyword search misses queries that paraphrase. Baraa runs both and merges results with reciprocal rank fusion. The combined recall is meaningfully higher than either alone.
Reranking. A cross-encoder reranker on the top 50 hybrid results, narrowed to the top 5 or 10 for the prompt, dramatically improves answer quality. Baraa treats reranking as non-negotiable for any RAG system that needs to answer specific factual questions.
Citations. Every RAG answer Baraa ships includes citations back to the source chunks. The model is instructed to refuse if it cannot cite. This is the single most effective hallucination-control technique Baraa has adopted.

RAG is not the answer to every problem. If the user's query needs computation, RAG cannot help. If the user's query needs to take an action - book a flight, send an email, query a live database - RAG cannot help. That is where tool use comes in.

Tool use and function calling

Tool use turns the LLM from an answering machine into an agent. The model sees a list of functions ("tools") it can call. When the user's request matches a tool, the model emits a structured call (function name, arguments). The system executes the tool. The result goes back into the conversation. The model continues, possibly calling more tools, until it has enough information to answer.

Baraa's patterns for designing tools:

Few, well-named tools beat many overlapping ones. Baraa aims for between five and fifteen tools per agent. More than that and the model starts picking the wrong one.
Tool descriptions are prompts. Baraa writes tool descriptions like Baraa writes system prompts: precise, example-driven, with explicit boundaries on when not to use the tool.
Argument schemas are contracts. Baraa uses JSON Schema with strict types and enums. Validation is enforced before the tool runs, with the validation error returned to the model so it can self-correct.
Tools are idempotent where possible. If the model calls a tool twice by mistake, nothing breaks.

A simple sketch of how Baraa wires up a tool definition (paraphrased pseudocode):

{
  "name": "search_orders",
  "description": "Search the user's order history. Use only when the user asks about their own past orders. Do not use for general product search.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {"type": "string", "description": "Free-text search."},
      "status": {"type": "string", "enum": ["pending","shipped","delivered","cancelled"]},
      "limit": {"type": "integer", "minimum": 1, "maximum": 20, "default": 5}
    },
    "required": ["query"]
  }
}

MCP: connective tissue for tool ecosystems

Model Context Protocol is the piece that ties tools and resources to clients in a standardized way. Instead of every AI client implementing its own custom integration with every data source, MCP defines a protocol: servers expose tools, resources, and prompts; clients consume them; the contract is universal. Baraa has been building and consuming MCP servers since the protocol was published, and the productivity gain is real.

The patterns Baraa uses with MCP:

One MCP server per logical domain. A "filesystem" server, a "database" server, a "calendar" server. Not one giant server with everything in it.
Resources for read-mostly data. Tools for actions and side effects. The distinction matters because the model treats them differently in its planning.
Versioned schemas. Baraa pins MCP server schemas in the agent config. Silent schema changes break agents in subtle ways.
Local-first when possible. Baraa runs MCP servers next to the agent process whenever the data source allows it. Lower latency, fewer auth problems, easier debugging.

Multi-step agents and error budgets

The hardest agentic AI systems Baraa builds are the ones that take multiple steps. The user asks for something, the agent plans a sequence of tool calls, executes them, hits an error, recovers, continues. The model is doing real reasoning across turns, and the failure modes multiply.

Baraa's discipline:

Cap the steps. A maximum step count per request, with a clear "I cannot complete this" message when the cap is hit. Unbounded loops are how agents become expensive.
Surface tool errors verbatim. If a tool fails, the error message goes back to the model untouched. The model can usually recover. Hidden errors are how agents get stuck.
Log every step. Baraa stores the full trace of every agentic conversation in production. When a user reports a bad answer, the trace tells you why.
Set an error budget. If a tool fails more than X% of the time, Baraa pulls it from the agent's tool list and fixes it before it goes back in.

How Baraa debugs an agent that confidently does the wrong thing

This is the most common production issue Baraa sees: the agent answers fluently and is wrong. Baraa's debugging order is:

Read the trace. What tools did it call? What did they return?
Check retrieval. If RAG was involved, did the right chunks come back? If not, the bug is in the retriever, not the model.
Check the prompt. Did the system prompt actually constrain the behavior the user expected?
Check the tool descriptions. The model picks the wrong tool more often when descriptions overlap.
Only after all of the above does Baraa consider that the model itself may be the problem - and even then, the answer is usually a different prompt, not a different model.

Closing thoughts

RAG, MCP, and tool use are not exotic anymore. They are the working vocabulary of every serious AI product Baraa builds. The skill is not in knowing what they are - it is in knowing the small disciplines that keep them working under real traffic. Baraa hopes the patterns above save you a few weekends.

For more on how Baraa builds AI products, see the post on agentic AI in Arabic, the Baraa AI overview, the Baraa agentic AI page, the glossary, or the hire page if you want to talk about a project. Browse more posts on the blog.

Frequently Asked Questions

When should I pick RAG over fine-tuning?

Baraa picks RAG when the knowledge changes (docs, product catalog, support tickets) and citations matter. Fine-tuning is the right call when you need a stable style, format, or domain behavior the base model cannot reproduce. Most production systems Baraa ships use RAG with prompt engineering, and reach for fine-tuning only when prompts stop being enough.

MCP versus a custom function-calling integration: which one?

Baraa picks MCP when the same tools need to be reused across multiple clients (Claude, Cursor, internal agents) or when you want a clean separation between the agent and the data source. Custom function calling stays appropriate for one-off integrations tightly coupled to a single product. The protocol overhead pays off the second client.

How does Baraa enforce citation discipline in a RAG system?

Every RAG answer Baraa ships includes citations back to the source chunks, and the model is instructed to refuse if it cannot cite. Baraa returns chunk IDs alongside the text so the UI can render real anchors. This single discipline is the most effective hallucination control Baraa has adopted in years of building these systems.

How do you debug an agent that confidently produces a wrong answer?

Baraa reads the trace first: which tools were called and what they returned. If RAG was involved, Baraa checks whether the right chunks came back. Then the system prompt and tool descriptions get scrutinized. The model itself is the last suspect, not the first, and the fix is almost always a different prompt.

When should I use RAG, when MCP, and when plain function calling?

Baraa uses RAG to answer questions about a knowledge base, function calling to take actions in a single product, and MCP to expose those tools across multiple clients in a portable way. Most serious products combine all three: RAG for the read path, tools (often via MCP) for the write path, and a step-capped agent stitching them together.