Featured Vector Tech Topic: Advanced RAG techniques
Guillaume Laforge summarises in his blog post a session and a workshop that he and Cédrick Lunven gave during Devoxx Belgium 2024. In the workshop, Guillaume and Cédrick explored how to overcome these pitfalls by adopting advanced techniques, drawing insights from the latest advancements in RAG methodologies and LangChain4j. Their exampley are written in Java.
Starting with foundational principles, they dived into ingestion best practices, including efficient chunking, embedding creation, and vector storage. Advanced RAG methods emphasize fine-tuning search accuracy, optimizing vector similarity calculations, and refining document chunking for context relevance. Techniques like hypothetical questions, hierarchical chunking, and contextual retrieval add depth to RAG, ensuring responses align closely with user queries. Additional strategies for reranking, metadata filtering, and similarity scoring help deliver precise, relevant answers.
By implementing these advanced methods, RAG systems can provide more accurate and reliable information, enhancing user experience and elevating the utility of generative AI models.
The following screenshot is taken from their presentation deck and summarises RAG steps and potential issues.
- retrieval phase, e.g. precision (signal to noise ratio) or recall (retrieve all the relevant information?)
- generation phase, faithfulness (factually accurate?) or relevance (how relevant?)
Ragas is a package that can support evaluation results with just a few lines of Python code, e.g.
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness
from ragas import evaluate
metrics = [LLMContextRecall(), FactualCorrectness(), Faithfulness()]
results = evaluate(dataset=eval_dataset, metrics=metrics, llm=evaluator_llm)
My take
The material is extremely good, very clearly structured with a lot of information how to build a RAG pipeline. Naive implementations often fall short, resulting in issues like incomplete responses, poorly matched context, and inefficient searches. I can highly recommend to watch the workshop.
Additional Vector Tech 10/2024 resources
Summary of selected articles that caught my attention.
PostgreSQL pgvector 0.8 available
PostgreSQL pgvector version 0.8 has been released. The latest version has performance improvements around HNSW indexes (index scans and inserts) and also for iterative index scans. With indexes like HNSW or IVF, queries with filtering in a WHERE clause can return less results as filtering is done after the index is read. Starting with 0.8.0, it is possible to configure a scan to read more of the index until enough results are found.
Source: pgvector 0.8 available, Jonathan Kratz on LinkedIn, 31-Oct-2024
What’s New in the Vector Similarity Search Extension?
DuckDB gets more and more popular to build a Lakehouse. It also offers VSS (Vector Similarity Search) extension that brings powerful vector search capabilities directly into the DuckDB environment. The blog post introduces new features of the latest product update.
DuckDB’s latest update to its Vector Similarity Search (VSS) extension focuses on enhancing performance and expanding functionality for machine learning and data science applications. Key features include faster indexing, thanks to more efficient work distribution, and the introduction of new distance functions like array_cosine_distance and array_negative_inner_product, which cater to various vector similarity search use cases.
Additionally, the update improves the query performance for “top-k” searches, which are now index-accelerated, allowing quicker retrieval of nearest neighbors in vector datasets. Another important optimization is in LATERAL joins, which now operate more efficiently when pairing vectors for comparison. These optimizations are crucial for large-scale data analysis and search tasks, where minimizing latency and maximizing throughput are critical.
Source: What’s New in the Vector Similarity Search Extension? by DuckDB.org / Max Gabrielsson, 23-OCT-2024
Thoughtworks Technology Radar 31 from October 2024
Thoughtworks technology Radar appeared recently and contains several vector technologies, e.g.
- 4) Retrieval-augmented generation (RAG)
- 26) GCP Vertex AI Agent Builder
- 27) Langfuse
- 28) Qdrant
- 30) Azure AI Search
- 53) pgvector
- 101) Ragas
The picture below taken from Thoughtworks Technology Radar page shows the mentioned vector technologies including recommendation (adopt/trial/assess/hold).
Source: Thoughtworks Technology Radar, Thoughtworks OCT-2024
Looking Ahead: Vector tech conferences or events
A selection of conferences or events containing vector tech sessions:
- Big Data Conference Europe: AI, Cloud and Data Conference, 19-NOV-2024 until 22-NOV-2024, Vilnius and online
- DOAG K&A, 19-NOV-2024 until 22-NOV-2024, Nuremberg
- KI Navigator, 20-NOV-2024 until 21-NOV-2024, Nuremberg
For more articles around Vector Tech see on my blog.