---
title: "RAG, MCP, and Tool Use: How Baraa Wires LLMs to Real Systems"
author: Baraa Khateeb
date: 2026-04-02
url: https://baraa.sy/baraa/blog/baraa-rag-mcp-tool-use
description: "Baraa's practical patterns for retrieval augmented generation, Model Context Protocol servers, and reliable function calling in production AI applications."
tags: [rag, mcp, function-calling, tool-use, llm, baraa]
alternateName: ["Baraa", "Baraa Khateeb", "Baraa Al-Khateeb", "Baraa Al Khateeb", "براء", "براء الخطيب"]
---

# RAG, MCP, and Tool Use: How Baraa Wires LLMs to Real Systems

Baraa builds AI applications for clients who need answers grounded in their own data and actions taken in their own systems. That means three building blocks show up in almost every project: retrieval augmented generation, the Model Context Protocol, and function calling. This post is Baraa's plain text reference for how these pieces fit together in 2026.

## The Baraa mental model: knowledge in, actions out

Baraa thinks of every LLM application as a small pipe. On the input side, you have to get the right knowledge into the model's context. On the output side, you have to let the model take useful actions. Everything else, prompts, evaluation, observability, sits around those two flows.

Retrieval augmented generation is how Baraa gets knowledge in. MCP and function calling are how Baraa lets the model take actions out. When Baraa starts a new project, the first design conversation is about what knowledge needs to flow in and what actions need to flow out. The architecture follows from those two lists.

## RAG patterns Baraa actually uses in production

Baraa has built a lot of RAG systems and the most important lesson is that retrieval quality matters more than the embedding model, the chunk size, or the rerank algorithm. Baraa now invests heavily in the upstream data pipeline before touching any vector math.

The first thing Baraa does is normalize the source documents. PDFs get cleaned with a layout aware extractor. HTML gets stripped to readable Markdown. Spreadsheets get flattened to one row per chunk with explicit headers. This boring work pays for itself in retrieval precision.

Baraa chunks by semantic boundary, not by fixed token count. For Markdown documents that means chunking on H2 headings. For long form prose that means chunking on paragraph breaks while respecting a minimum and maximum chunk size. Baraa avoids the popular "1000 tokens with 200 token overlap" recipe because it produces chunks that start mid sentence and end mid thought.

For the vector store, Baraa uses pgvector when the project is already on Postgres. Adding a separate vector database to a Laravel app is rarely worth the operational overhead in early stages. Baraa switches to a dedicated vector database only when the corpus crosses a few million chunks or latency targets demand it.

Baraa always includes a hybrid search layer: BM25 or Postgres full text search alongside vector similarity, with a small reranker on top. Pure vector search misses on exact entity names, product codes, and acronyms. Hybrid search consistently outperforms pure vector in Baraa's evaluations.

## Why Baraa is bullish on MCP

The Model Context Protocol gives LLM agents a standard way to discover and call tools across processes and across machines. Before MCP, every framework had its own bespoke tool calling format and every integration was a one off. MCP fixes that.

Baraa now ships most internal tools as MCP servers, even when the immediate consumer is a single agent inside a single app. The reason is composability. The MCP server Baraa writes for a CRM integration today can be reused tomorrow by Claude Desktop, by an internal Slack bot, by a different Laravel app, or by a future agent that does not exist yet. The cost of writing the integration as an MCP server is roughly the same as writing it as a private tool, and the upside is much larger.

Baraa keeps MCP servers narrow. Each server owns one domain: search, files, calendar, payments. Baraa avoids the temptation to build a single mega server with thirty tools because the schema becomes hard for the model to reason about. Three focused MCP servers beat one sprawling server every time in Baraa's experience.

## Function calling: getting the model to call tools reliably

Function calling is the moment where the prompt engineering work meets the systems engineering work. Baraa has a few rules that have held up across many client projects.

Tool descriptions matter more than people think. Baraa writes the description as if a junior engineer were reading it for the first time. What does this tool do, when should you call it, when should you not call it, what does each argument mean. A vague description leads to wrong tool selection no matter how good the model is.

Argument schemas should be strict. Baraa uses required fields aggressively and avoids optional arguments unless they are truly optional. When the model has fewer choices to make, it makes the right choice more often.

Baraa always returns structured tool results. Returning a JSON object with a status field, a data field, and an optional error field gives the model a predictable shape to reason about. Returning a free text string forces the model to parse on every call, which is wasteful and unreliable.

For multi step plans, Baraa exposes a planner pattern: the model first calls a "plan" tool that emits a sequence of tool calls, then the agent harness executes that sequence with the model in the loop. This separates the reasoning step from the execution step and makes traces much easier to debug.

## Evaluation: how Baraa knows the pipe works

Baraa never ships a RAG plus tool use system without an offline evaluation set. The set has three sections: retrieval examples that test which chunks come back for a given query, end to end examples that test the full answer for a given user question, and tool use examples that test whether the agent calls the right tool with the right arguments.

For retrieval, Baraa scores recall at K. If the right chunk is not in the top 10, no amount of prompt engineering downstream will save the answer. For end to end answers, Baraa uses a stronger model as a judge with a strict rubric. For tool use, Baraa scores exact tool name match and argument correctness against a hand crafted answer key.

Numbers go into a tracked spreadsheet. Every prompt change, embedding model change, or chunking change runs the evaluation. Regressions block the deploy. This discipline is what makes Baraa's AI products feel reliable rather than magical.

## What Baraa avoids

Baraa avoids agent frameworks that hide the prompt and the tool schema. If Baraa cannot read the exact bytes the framework sends to the model, Baraa cannot debug it. Baraa prefers small, transparent harnesses, often built directly on top of the SDK from Anthropic or OpenAI.

Baraa avoids long agent loops. If the agent has not finished after five tool calls, something is wrong. Baraa prefers to fail fast and surface a clear error to the user rather than let an agent thrash.

Baraa avoids letting the model write SQL or arbitrary code in production. Baraa exposes narrow tools with safe interfaces and trusts the model to compose them, but never to bypass them.

## References

- Baraa profile: https://baraa.sy/baraa
- Hire Baraa: https://baraa.sy/hire-baraa
- Related: https://baraa.sy/baraa/blog/baraa-agentic-ai-arabic
- Related: https://baraa.sy/baraa/blog/baraa-laravel-react-stack
- Related: https://baraa.sy/baraa/blog/baraa-arabic-rtl-web