RAG मरा नहीं है

(hamel.dev)

33 पॉइंट द्वारा GN⁺ 2025-07-17 | 1 टिप्पणियां | WhatsApp पर शेयर करें

RAG का भविष्य "बड़ी context window" में नहीं, बल्कि "बेहतर retrieval" में है

"RAG Is Dead" जैसी बात सिर्फ 2023-शैली के साधारण RAG implementation पर लागू होती है; असली समस्या भारी information loss वाले single-vector आधारित retrieval में है
मौजूदा IR evaluation metrics, RAG के लिए उपयुक्त नहीं हैं, और factual coverage, diversity, relevance पर केंद्रित नए evaluation criteria की ज़रूरत है
RAG retriever, साधारण matching से आगे बढ़कर instruction को समझने और reasoning-आधारित तरीके से संबंधित documents चुनने की दिशा में विकसित हो रहा है
ColBERT-स्टाइल late interaction models information compression के बिना token-स्तर की representation बनाए रखते हैं, जिससे छोटे models बड़े models से बेहतर प्रदर्शन कर सकते हैं
एक perfect embedding खोजने के बजाय, विविध representations के लिए multiple indexes और smart routing structure नया standard बन रहे हैं

Why the future of RAG lies in better retrieval, not bigger context windows

“RAG मर गया” दावे का खंडन

Part 1. I don’t use RAG, I just retrieve documents - मरा है तो simple vector search, RAG खुद नहीं

Hamel और Ben Clavié का तर्क है कि RAG मरा नहीं है, बल्कि अब retrieval architecture के evolve होने का समय है
vector DB में documents डालकर cosine similarity से खोजने वाला तरीका पुराना हो चुका है और इसमें information loss बहुत है
LLM में training के बाद की जानकारी स्थिर हो जाती है, इसलिए retrieval-based information injection (RAG) अब भी महत्वपूर्ण है
सिर्फ context window बढ़ाने से सारी जानकारी डालना inefficient है

गलत evaluation metrics

Part 2. Modern IR Evals For RAG - यह समझाता है कि पारंपरिक IR evaluation metrics, RAG के लिए उपयुक्त नहीं हैं; FreshStack प्रस्तावित है

Nandan Thakur बताते हैं कि पारंपरिक information retrieval (IR) evaluation metrics, RAG के लिए उपयुक्त नहीं हैं
- BEIR जैसे benchmarks सिर्फ top-ranked document खोजने को optimize करते हैं
- RAG में factual coverage, विविध perspectives, contextual relevance आदि को एक साथ ध्यान में रखना चाहिए
- इसके लिए FreshStack नाम का नया evaluation system प्रस्तावित है

reasoning करने वाला retriever

Part 3. Optimizing Retrieval with Reasoning Models - instruction को समझने और reasoning करने में सक्षम retriever की design

Orion Weller का Rank1 system ऐसे जटिल instruction समझता है जैसे "data privacy पर metaphor शामिल करने वाले documents"
सिर्फ similarity calculation नहीं, बल्कि explicit reasoning trace बनाकर relevance judgement का आधार भी देता है
ऐसे documents भी understanding और reasoning-आधारित खोज से मिल सकते हैं जिन्हें पारंपरिक retrieval systems नहीं ढूंढ पाते

late interaction models की संभावना

Part 4. Late Interaction Models For RAG - ColBERT जैसी architecture के साथ information loss के बिना representation बनाए रखना

Antoine Chaffin बताते हैं कि ColBERT जैसे Late Interaction-आधारित models के जरिए
- documents को single vector में compress नहीं किया जाता, बल्कि token-स्तर की information बनाए रखी जाती है
- नतीजतन, 150M-parameter model के 7B model से बेहतर reasoning performance दिखाने वाले उदाहरण भी हैं
मुख्य बात है information हटाए बिना उसे संरक्षित रखने वाली representation structure

एक map नहीं, multiple maps की ज़रूरत

Part 5. RAG with Multiple Representations - अलग-अलग उद्देश्यों के लिए multiple indexes से retrieval performance में सुधार

Bryan Bischof और Ayush Chaurasia का कहना है कि सिर्फ एक embedding से अलग-अलग retrieval goals पूरे नहीं हो सकते
- उदाहरण: किसी चित्र की खोज में
  - textual description
  - poetic interpretation
  - similar images
    — इन्हें अलग-अलग indexes से खोजा जा सकता है
निष्कर्ष: perfect embedding खोजने के बजाय, अलग-अलग representation styles के अनुरूप multiple indexes + intelligent routing system की ज़रूरत है

RAG की भविष्य रणनीति

RAG के भविष्य के लिए ये चार बातें प्रस्तावित हैं:

उपयोग-उद्देश्य के अनुरूप नए evaluation criteria बनाना
instruction को समझने और reasoning करने वाले retrievers
information को compress किए बिना जैसा है वैसा represent करने वाली structure
अलग-अलग उद्देश्यों वाले indexes को combine करके smart routing करने का तरीका

Annotated Notes From the Series

यह series 5 भागों की है और मुख्य slides पर timestamp जोड़कर summary देती है। हर Part के लिए नीचे दिए गए links देखें

भाग	शीर्षक	विवरण
Part 1	I don’t use RAG, I just retrieve documents	मरा है तो simple vector search, RAG खुद नहीं
Part 2	Modern IR Evals For RAG	पारंपरिक IR evaluation metrics, RAG के लिए उपयुक्त नहीं; FreshStack प्रस्तावित
Part 3	Optimizing Retrieval with Reasoning Models	instruction को समझने और reasoning करने में सक्षम retriever की design
Part 4	Late Interaction Models For RAG	ColBERT जैसी architecture के साथ information loss के बिना representation बनाए रखना
Part 5	RAG with Multiple Representations	अलग-अलग उद्देश्यों के लिए multiple indexes से retrieval performance में सुधार

1 टिप्पणियां

ide127 2025-07-18

"परफेक्ट embedding ढूंढने के बजाय, अलग-अलग अभिव्यक्ति तरीकों के लिए अनुकूल multi-index + intelligent routing system"

क्योंकि वह आसान नहीं है...