RAG Patterns: A Practical Guide to Retrieval Architectures
This is a stronger expanded view of modern RAG with cleaner distinctions between common patterns and how they are used in production systems.
Classic RAG
Flow: Query → Embed → Vector Search → Retrieve Top-K → LLM generates answer
Works well for simple Q&A over documents. Fast, cheap, and easy to set up, but weak for multi-hop reasoning or answers spread across many sources.
Classic RAG (Retrieval-Augmented Generation), often called Naive RAG, is an architectural pattern that connects a Large Language Model (LLM) to an external database to provide grounded, fact-based answers. It works like an "open-book exam" where the AI searches a text collection before writing a response.
Image URL: /assets/blog/images/rag-patterns/classic-rag.png
Hybrid RAG
Flow: Query → Keyword Search + Vector Search → Merge Results → Re-rank → LLM generates answer
Combines semantic and BM25-style retrieval. Better when exact terms, policy IDs, codes, dates, or legal citations matter.
Image URL: /assets/blog/images/rag-patterns/hybrid-rag.png
Re-ranking RAG
Flow: Query → Retrieve Broad Candidate Set → Re-ranker Scores Results → Select Best Context → LLM generates answer
Improves precision by filtering noisy candidates with a stronger ranking model at higher latency/cost.
Image URL: /blog/images/rag-patterns/reranking-rag.png
Multi-Query RAG
Flow: Query → Generate Multiple Search Variants → Retrieve Results → Merge/Dedupe → LLM generates answer
Improves recall by searching from multiple angles when user wording differs from corpus language.
Image URL: /blog/images/rag-patterns/multi-query-rag.png
Self-Querying RAG
Flow: Query → LLM Extracts Filters → Metadata Search + Vector Search → Retrieve Context → LLM generates answer
Uses structured filters (date, author, department, region, access level) for better precision in metadata-rich corpora.
Image URL: /blog/images/rag-patterns/self-querying-rag.png
Parent-Child RAG
Flow: Query → Search Small Chunks → Retrieve Larger Parent Sections → LLM generates answer
Balances precise retrieval with richer context around each match.
Image URL: /blog/images/rag-patterns/parent-child-rag.png
Hierarchical RAG
Flow: Query → Search Summary Layer → Identify Relevant Documents/Sections → Drill Down Into Chunks → LLM generates answer
Best for very large corpora where routing from summaries to detailed evidence improves efficiency.
Image URL: /blog/images/rag-patterns/hierarchical-rag.png
GraphRAG
Flow: Query → Entity Extraction → Graph Traversal → Retrieve Connected Nodes → LLM generates answer
Captures relationships and supports multi-hop reasoning across entities and events.
Image URL: /blog/images/rag-patterns/graph-rag.png
Temporal RAG
Flow: Query → Detect Time Requirement → Retrieve Time-Filtered Evidence → Resolve Timeline → LLM generates answer
Prevents mixing facts across different points in time; critical for policy history, audits, and legal timelines.
Image URL: /blog/images/rag-patterns/temporal-rag.png
ACL / Security-Aware RAG
Flow: Query → Identify User Permissions → Filter Corpus by ACL → Retrieve Allowed Context → LLM generates answer
Enforces access controls before retrieval to prevent leakage of restricted data.
Image URL: /blog/images/rag-patterns/security-aware-rag.png
Multimodal RAG
Flow: Query → Retrieve Text + Tables + Images + PDFs + Charts → Model Interprets Mixed Context → LLM generates answer
Supports evidence beyond plain text; ingestion complexity is significantly higher.
Image URL: /blog/images/rag-patterns/multimodal-rag.png
Tool-Augmented RAG
Flow: Query → Retrieve Context → Call Tools/APIs/Databases → Combine Evidence → LLM generates answer
Combines documents with live enterprise systems for operational answers.
Image URL: /blog/images/rag-patterns/tool-augmented-rag.png
Corrective RAG
Flow: Query → Retrieve Context → Evaluate Quality → If Weak, Rewrite Query or Retrieve Again → Final Answer
Improves reliability by validating retrieval quality before answering.
Image URL: /blog/images/rag-patterns/corrective-rag.png
Agentic RAG
Flow: Query → Agent Plans Retrieval → Multi-Step Search → Tool Use → Self-Evaluation → Iteration → Final Answer
Best for complex investigations, with trade-offs in latency, cost, and debugging complexity.
Image URL: /blog/images/rag-patterns/agentic-rag.png
Federated RAG
Flow: Query → Route to Multiple Data Sources → Retrieve from Each → Normalize Results → Re-rank → LLM generates answer
Searches across distributed systems without fully centralizing all data.
Image URL: /blog/images/rag-patterns/federated-rag.png
Structured Data RAG
Flow: Query → Convert to SQL/Graph/Filter Query → Retrieve Structured Results → LLM Explains Answer
Queries structured systems directly instead of only relying on vector search.
Image URL: /blog/images/rag-patterns/structured-data-rag.png
Memory-Augmented RAG
Flow: Query → Retrieve User/Session Memory → Retrieve External Knowledge → Personalize Answer → LLM generates response
Blends document retrieval with user/session context while requiring strong privacy controls.
Image URL: /blog/images/rag-patterns/memory-augmented-rag.png
Summary-First RAG
Flow: Query → Retrieve Document Summaries → Select Relevant Docs → Retrieve Detailed Chunks → LLM generates answer
Uses summaries as a routing layer before deep retrieval in long documents.
Image URL: /blog/images/rag-patterns/summary-first-rag.png
Evaluated RAG
Flow: Query → Retrieve → Generate Answer → Judge Against Sources → Score Faithfulness/Relevance → Return or Retry
Adds measurable quality controls (faithfulness, relevance, citation quality, refusal behavior).
Image URL: /blog/images/rag-patterns/evaluated-rag.png
Best Way to Think About It
Classic RAG is the baseline. Hybrid improves retrieval. Re-ranking improves precision. Parent-child improves context quality. GraphRAG improves relationship reasoning. Temporal improves timeline accuracy. Security-aware protects sensitive data. Tool-augmented connects documents to systems. Agentic orchestrates multi-step workflows.
In production, strong systems are usually layered: hybrid search, metadata filters, re-ranking, access control, evaluation, and (when needed) agents on top.