Exploring LLMs and RAG: A Comparison of Approaches
I gave talks on vector databases during DOAG 2024 and KI-Navigator 2024 conferences. The architecture below is taken from my slide decks.
Using LLM Without RAG
Large Language Models (LLMs) are powerful tools capable of generating coherent and contextually relevant responses based on extensive training data. The traditional, non-RAG approach is simple:
- Input Question: A user submits a question or query.
- LLM Processing: The model generates a response purely based on its pre-trained knowledge, which encompasses vast amounts of information up until the training cutoff date.
- Output Response: The user receives the generated answer.
In this pipeline, the LLM is self-contained, relying solely on the knowledge embedded during training.
Downsides of Using LLM Without RAG
While this approach is straightforward, it suffers from significant limitations:
- Not Up to Date: LLMs have a fixed knowledge cutoff. If the model’s training ended in, for example, 2021, it lacks awareness of developments thereafter.
- No Access to Internal Documents: These models can’t incorporate company-specific or proprietary knowledge unless such data was included in training.
- Hallucinations: LLMs may fabricate information confidently when they lack the relevant data. This phenomenon, called hallucination, can mislead users.
- Context Window Limitations: LLMs have a restricted context window, meaning they can process only a limited amount of text at a time. This affects their ability to deal with large datasets or documents.
Introducing RAG (Retrieval-Augmented Generation)
Retrieval-Augmented Generation (RAG) addresses the above limitations by integrating LLMs with a retrieval system to enhance the generation process.
How Does RAG Work?
RAG introduces two key pipelines:
- Index Pipeline:
- Document Ingestion: Documents (e.g., internal files, articles) are processed and converted into embeddings using an embedding model.
- Vector Store Creation: These embeddings are stored in a vector database for semantic search. The embeddings represent the essence of the documents in a dense numerical format. The index pipeline ensures that external, up-to-date, and specific knowledge is made available to the system by storing the data in e.g. a vector store.
- RAG Pipeline:
- User Query: The user submits a question.
- Semantic Search: The query is transformed into an embedding and matched against the vector database to retrieve the most relevant documents.
- Augmentation: Retrieved documents are used to provide context to the LLM, enhancing its ability to generate accurate and relevant responses.
- LLM Generation: With the augmented context, the LLM generates a response, which is sent back to the user.
How RAG Resolves Non-RAG Downsides
- Up-to-Date Information: By incorporating a vector store with recent documents, RAG ensures that responses reflect current and accurate information.
- Internal Document Access: The index pipeline allows companies to add proprietary documents into the vector store, ensuring the LLM can utilize internal knowledge securely.
- Reduced Hallucination: RAG grounds the LLM’s responses in retrieved, relevant documents. This retrieval step significantly minimizes the risk of hallucination, as the model leans on factual data.
- Overcoming Context Window Limits: The vector store retrieval process breaks down vast information into relevant chunks, enabling the LLM to focus only on essential content within its context window.
My take
RAG setups can show quick success early on. Reaching a fully mature and reliable production system (or better: a reliable product) requires significant investments in software engineering, data engineering, automation, and cross-functional disciplines like FinOps. Data Quality is key. If the homework regarding data quality is not done, then the product will fail: Garbage in, garbage out. Data Security, Data Protection, and Data Mesh principles are vital to the overall success and reliability of the RAG system. The slide also lists other challenges for the vector store, index and RAG pipeline.
Additional Vector Tech 10/2024 resources
Summary of selected articles that caught my attention.
RAG Primer
innoQ published a primer on RAG in German. The primer is worth while to read to get an overview of concepts with theoretical and practical basics.
Source: RAG:Retrieval-Augmented Generation, INNOQ.com
What is agentic RAG?
Agents in the context of artificial intelligence (AI) are autonomous entities that perceive their environment, make decisions, and perform actions to achieve specific goals. They can be software programs or robots that interact with their surroundings, learn from experiences, and adapt their behaviors accordingly. Agents are designed to operate without continuous human guidance, making them essential for tasks that require adaptability and real-time decision-making.
Agentic RAG merges these two concepts by creating AI agents that not only generate responses augmented by retrieved information but also autonomously decide what information to seek and how to use it to accomplish specific tasks. For example, an agentic RAG system could:
- Identify a Goal: Determine what needs to be achieved based on user input or environmental cues.
- Plan Actions: Decide which resources or tools to use to achieve the goal.
- Retrieve Information: Access external databases or documents to gather necessary information.
- Generate Output: Produce responses or actions based on the retrieved data and the overarching goal.
- Learn and Adapt: Improve its strategies over time through feedback and new data.
There are many blog posts currently dealing with agents and/or agentic RAG. Two examples are listed below.
Source: What is agentic RAG?, Kacper Łukawski (qdrant), 22-NOV-2024 and What is agentic RAG?, Erika Cardenas & Leonie Monigatti (weaviate), 05-NOV-2024
Looking Ahead: Vector tech conferences or events
A selection of conferences or events containing vector tech sessions:
- Data Festival, 26-MAR-2025 until 27-MAR-2024, Munich
- Devoxx France, 16-APR-2025 until 18-APR-2025, Paris
- Devoxx UK, 07-MAY-2025 until 09-MAY-2025, London
- TDWI, 24-JUN-2025 until 26-JUN-2024, Munich
For more articles around Vector Tech see on my blog.