ML शोध-पत्रों का संग्रह

(discuss.pytorch.kr)

7 पॉइंट द्वारा ninebow 2025-10-01 | अभी कोई टिप्पणी नहीं है. | WhatsApp पर शेयर करें

[2025/09/22 ~ 28] इस हफ्ते देखने लायक AI/ML शोध-पत्रों का संग्रह

PyTorchKR🔥🇰🇷 🤔💭

1️⃣ AI एजेंट्स का विकास: हालिया शोध-पत्र इस बात की पड़ताल कर रहे हैं कि शोध-पत्रों को AI एजेंट्स में कैसे बदला जाए। उदाहरण के लिए, Paper2Agent शोध-परिणामों ko सक्रिय सिस्टम में बदलता है ताकि उपयोगकर्ता उन्हें अधिक आसानी से इस्तेमाल कर सकें। यह तरीका शोध-परिणामों के प्रसार और पुन: उपयोग को बढ़ावा देता है और एक नया paradigm प्रस्तुत करता है जिसमें AI शोध-सहायक की भूमिका निभा सकता है।

2️⃣ parallel thinking और reinforcement learning का एकीकरण: Parallel-R1 और ParaThinker जैसे शोध यह प्रस्तावित करते हैं कि बड़े language models (LLM) की reasoning क्षमता को बेहतर बनाने के लिए parallel thinking का उपयोग कैसे किया जाए। ये reinforcement learning (RL) के जरिए जटिल समस्याओं ko हल करने के लिए आवश्यक अलग-अलग reasoning paths ko एक साथ खोजते हैं, जिससे प्रदर्शन बेहतर होता है। यह तरीका पारंपरिक sequential reasoning models की तुलना में अधिक accuracy हासिल करने में योगदान दे रहा है।

3️⃣ information retrieval और structuring का एकीकरण: Retrieval And Structuring (RAS) Augmented Generation जैसे शोध LLM ki सीमाओं ko दूर करने के लिए dynamic information retrieval और structured knowledge representation ko जोड़ने के तरीके खोज रहे हैं। यह तरीका unstructured text ko organized form में बदलने और external knowledge तक पहुंचने के विभिन्न mechanisms ki पड़ताल करके LLM के प्रदर्शन ko बेहतर बनाने में मदद करता है।

Paper2Agent: शोध-पत्रों ko इंटरैक्टिव और विश्वसनीय AI एजेंट्स के रूप में फिर से कल्पित करना / Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents

शोध-पत्र परिचय

Paper2Agent एक automated framework है जो शोध-पत्रों ko इंटरैक्टिव artificial intelligence (AI) एजेंट्स में बदलता है, और शोध-परिणामों के उपयोग तथा प्रसार ko तेज़ करने के लिए एक अभिनव दृष्टिकोण प्रस्तुत करता है। पारंपरिक शोध-पत्रों में पाठकों ko सामग्री ko समझने और लागू करने के लिए काफी प्रयास करना पड़ता है, जो शोध-परिणामों के पुन: उपयोग और प्रसार में बाधा बनता है। Paper2Agent इस समस्या ko हल करने के लिए शोध-पत्र और उससे जुड़े codebase ka व्यवस्थित विश्लेषण करता है, और कई एजेंट्स ki मदद से Model Context Protocol (MCP) server बनाता है। इस प्रक्रिया में iterative testing ke जरिए MCP ko बेहतर और अधिक मजबूत बनाया जाता है, और अंततः ऐसा AI एजेंट तैयार होता है जो natural language के माध्यम से जटिल वैज्ञानिक queries ko पूरा कर सकता है.

इस framework ने AlphaGenome, ScanPy और TISSUE जैसी विभिन्न methodologies ka उपयोग करके genomic variant interpretation और single-cell analysis करने वाले एजेंट्स विकसित किए हैं। ये एजेंट्स न केवल मूल शोध-पत्रों के परिणामों ko पुन: उत्पन्न कर सकते हैं, बल्कि नए user queries के लिए भी सटीक उत्तर देते हैं। स्थिर शोध-पत्रों ko dynamic और interactive AI एजेंट्स में बदलकर, Paper2Agent ज्ञान-प्रसार के लिए एक नया paradigm प्रस्तुत करता है और AI co-scientist ecosystem ki नींव रखता है।

यह शोध वैज्ञानिक communication के तरीके ko मूल रूप से बदलता है और शोध-परिणामों ko साधारण documents se सक्रिय knowledge-based systems में बदलने में योगदान देता है। Paper2Agent शोधकर्ताओं ko जटिल software ecosystems में महारत हासिल किए बिना, natural language में एजेंट्स se बातचीत करके शोध-पद्धतियों ko आसानी से लागू करने ki सुविधा देता है। यह दृष्टिकोण शोध-परिणामों ki accessibility बढ़ाने, advanced methodologies ke democratization ko प्रोत्साहित करने, और शोध के वास्तविक उपयोग ki गति तेज़ करने में महत्वपूर्ण भूमिका निभाता है।

अंततः, Paper2Agent एक ऐसे अभिनव platform के रूप में स्थापित हो सकता है जो शोध-पत्रों ki उपयोगिता ko अधिकतम करता है और वैज्ञानिक ज्ञान ke इंटरैक्टिव तथा सहयोगी प्रसार ko संभव बनाता है।

शोध-पत्र सारांश (Abstract)

हम Paper2Agent प्रस्तुत करते हैं, जो शोध-पत्रों ko AI एजेंट्स में बदलने वाला एक automated framework है। Paper2Agent शोध-परिणामों ko निष्क्रिय outputs se सक्रिय systems में बदलता है, जिससे downstream use, adoption और discovery ko तेज़ किया जा सकता है। पारंपरिक शोध-पत्र पाठकों se यह अपेक्षा करते हैं कि वे paper ke code, data और methods ko समझने और उन्हें अपने काम के अनुरूप ढालने में पर्याप्त प्रयास करें, और यही प्रसार तथा पुन: उपयोग में बाधा बनता है। Paper2Agent इस चुनौती ka समाधान शोध-पत्र ko स्वचालित रूप से ऐसे AI एजेंट में बदलकर करता है जो एक ज्ञानसम्पन्न research assistant ki तरह काम करता है। यह सिस्टम कई एजेंट्स ka उपयोग करके शोध-पत्र और उससे जुड़े codebase ka व्यवस्थित विश्लेषण करता है, Model Context Protocol (MCP) server बनाता है, और फिर iterative तरीके se tests बनाकर तथा चलाकर तैयार MCP ko refine और robust बनाता है। ये paper MCPs बाद में chat agent (उदाहरण के लिए Claude Code) se लचीले ढंग se जोड़े जा सकते हैं, ताकि मूल शोध-पत्र ke tools और workflows ka उपयोग करते हुए natural language ke माध्यम se जटिल scientific queries पूरी ki जा सकें। हम in-depth case studies ke जरिए यह दिखाते हैं कि Paper2Agent विश्वसनीय और सक्षम paper agents बनाने में प्रभावी है। Paper2Agent ने AlphaGenome ka उपयोग करके genomic variants ki व्याख्या करने वाला एक agent, और ScanPy तथा TISSUE पर आधारित single-cell और spatial transcriptomics analyses करने वाले agents बनाए। हम यह सत्यापित करते हैं कि ये paper agents मूल शोध-पत्रों ke परिणामों ko पुन: उत्पन्न कर सकते हैं और नए user queries ko सही ढंग se पूरा कर सकते हैं। स्थिर papers ko dynamic, interactive AI agents में बदलकर, Paper2Agent ज्ञान-प्रसार के लिए एक नया paradigm और AI co-scientists ke collaborative ecosystem ki नींव प्रदान करता है।

We introduce Paper2Agent, an automated framework that converts research papers into AI agents. Paper2Agent transforms research output from passive artifacts into active systems that can accelerate downstream use, adoption, and discovery. Conventional research papers require readers to invest substantial effort to understand and adapt a paper's code, data, and methods to their own work, creating barriers to dissemination and reuse. Paper2Agent addresses this challenge by automatically converting a paper into an AI agent that acts as a knowledgeable research assistant. It systematically analyzes the paper and the associated codebase using multiple agents to construct a Model Context Protocol (MCP) server, then iteratively generates and runs tests to refine and robustify the resulting MCP. These paper MCPs can then be flexibly connected to a chat agent (e.g. Claude Code) to carry out complex scientific queries through natural language while invoking tools and workflows from the original paper. We demonstrate Paper2Agent's effectiveness in creating reliable and capable paper agents through in-depth case studies. Paper2Agent created an agent that leverages AlphaGenome to interpret genomic variants and agents based on ScanPy and TISSUE to carry out single-cell and spatial transcriptomics analyses. We validate that these paper agents can reproduce the original paper's results and can correctly carry out novel user queries. By turning static papers into dynamic, interactive AI agents, Paper2Agent introduces a new paradigm for knowledge dissemination and a foundation for the collaborative ecosystem of AI co-scientists.

शोध-पत्र लिंक

https://arxiv.org/abs/2509.06917

समानांतर सोच: सुदृढीकरण अधिगम के माध्यम से समानांतर सोच क्षमता को बेहतर बनाने की विधि / Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

शोध-पत्र परिचय

समानांतर सोच बड़े भाषा मॉडलों (LLM) की reasoning क्षमता को बेहतर बनाने के लिए एक नवोन्मेषी approach है, जिसमें कई reasoning paths को एक साथ explore किया जाता है। लेकिन मौजूदा supervised learning (Supervised Fine-Tuning, SFT) तरीकों की सीमा यह है कि वे synthetic data पर निर्भर रहते हैं, जिससे मॉडल सिर्फ imitation learning तक सीमित रह जाता है और exploration व generalization बाधित होते हैं। इस समस्या को हल करने के लिए प्रस्तावित Parallel-R1 पहला reinforcement learning (Reinforcement Learning, RL) framework है, जो जटिल वास्तविक समस्याओं के समाधान के लिए समानांतर सोच व्यवहार को सक्षम बनाता है.

Parallel-R1 दो-चरणीय training process अपनाता है, जिसमें progressive curriculum के जरिए शुरुआत में आसान समस्याएँ सिखाई जाती हैं, और फिर RL के माध्यम से कठिन समस्याओं पर समानांतर सोच की क्षमता को explore और generalize किया जाता है। इस प्रक्रिया में मॉडल शुरुआती चरण में समानांतर सोच को exploration strategy के रूप में उपयोग करता है, और बाद के चरण में यह बहु-दृष्टिकोण verification की strategy में विकसित होता दिखता है। प्रयोगों के परिणाम बताते हैं कि Parallel-R1 ने विभिन्न गणित benchmark पर मौजूदा sequential thinking मॉडलों की तुलना में accuracy में 8.4% सुधार हासिल किया, और खास तौर पर AIME25 benchmark पर 42.9% performance improvement दर्ज किया।

इस शोध का मुख्य योगदान यह है कि यह समानांतर सोच के लिए एक RL framework प्रस्तावित करता है, जिससे मॉडल स्वयं समानांतर सोच सीख सके। साथ ही, समानांतर सोच के exploration mechanism के जरिए मॉडल को policy space के अधिक प्रभावी क्षेत्रों की ओर मार्गदर्शित किया जा सकता है, जिससे अंतिम policy learning में संरचनात्मक योगदान मिलता है। Parallel-R1, LLM की reasoning क्षमता सुधारने के लिए एक महत्वपूर्ण आधार सामग्री के रूप में उपयोगी हो सकता है, और उम्मीद है कि यह आगे के शोध में समानांतर सोच की अवधारणा को और विकसित करने में योगदान देगा।

शोध-पत्र सारांश(Abstract)

समानांतर सोच बड़े भाषा मॉडलों (LLM) की reasoning क्षमता को बेहतर बनाने के लिए एक नया approach है, जिसमें कई reasoning paths को एक साथ explore किया जाता है। लेकिन training के जरिए ऐसी क्षमता को सक्रिय करना अब भी चुनौतीपूर्ण है, क्योंकि मौजूदा तरीके मुख्यतः synthetic data पर supervised fine-tuning (SFT) पर निर्भर करते हैं, जो exploration और generalization के बजाय teacher-forced imitation को बढ़ावा देते हैं। इसके विपरीत, हम \textbf{Parallel-R1} प्रस्तावित करते हैं, जो जटिल वास्तविक reasoning tasks के लिए समानांतर सोच व्यवहार को सक्षम बनाने वाला पहला reinforcement learning (RL) framework है। हमारा framework एक progressive curriculum का उपयोग करता है, जो RL के साथ समानांतर सोच की training में cold start समस्या को स्पष्ट रूप से संबोधित करता है। हम पहले आसान tasks से generated prompt-based trajectories पर SFT का उपयोग करके समानांतर सोच की क्षमता को मॉडल में स्थापित करते हैं, फिर कठिन समस्याओं पर इस कौशल को explore और generalize करने के लिए RL पर जाते हैं। MATH, AMC23 और AIME सहित विभिन्न math benchmarks पर किए गए प्रयोग दिखाते हैं कि Parallel-R1 ने सफलतापूर्वक समानांतर सोच स्थापित की, जिससे चुनौतीपूर्ण tasks पर सीधे RL से प्रशिक्षित sequential thinking model की तुलना में accuracy में 8.4% सुधार मिला। अतिरिक्त विश्लेषण से मॉडल के thinking behavior में स्पष्ट बदलाव दिखा: शुरुआती चरण में यह समानांतर सोच को exploration strategy की तरह उपयोग करता है, जबकि बाद के चरण में यही क्षमता multi-perspective verification के लिए इस्तेमाल होती है। सबसे महत्वपूर्ण बात यह है कि हमने सत्यापित किया कि समानांतर सोच \textbf{mid-training exploration scaffold} की तरह काम करती है, जहाँ यह अस्थायी exploratory phase RL के बाद performance ceiling को और ऊँचा खोलता है, जिससे AIME25 पर baseline की तुलना में 42.9% सुधार मिला। हमारा model, data और code https://github.com/zhengkid/Parallel-R1 पर open source उपलब्ध कराया जाएगा।
> Parallel thinking has emerged as a novel approach for enhancing the reasoning capabilities of large language models (LLMs) by exploring multiple reasoning paths concurrently. However, activating such capabilities through training remains challenging, as existing methods predominantly rely on supervised fine-tuning (SFT) over synthetic data, which encourages teacher-forced imitation rather than exploration and generalization. Different from them, we propose \textbf{Parallel-R1}, the first reinforcement learning (RL) framework that enables parallel thinking behaviors for complex real-world reasoning tasks. Our framework employs a progressive curriculum that explicitly addresses the cold-start problem in training parallel thinking with RL. We first use SFT on prompt-generated trajectories from easier tasks to instill the parallel thinking ability, then transition to RL to explore and generalize this skill on harder problems. Experiments on various math benchmarks, including MATH, AMC23, and AIME, show that Parallel-R1 successfully instills parallel thinking, leading to 8.4% accuracy improvements over the sequential thinking model trained directly on challenging tasks with RL. Further analysis reveals a clear shift in the model's thinking behavior: at an early stage, it uses parallel thinking as an exploration strategy, while in a later stage, it uses the same capability for multi-perspective verification. Most significantly, we validate parallel thinking as a \textbf{mid-training exploration scaffold}, where this temporary exploratory phase unlocks a higher performance ceiling after RL, yielding a 42.9% improvement over the baseline on AIME25. Our model, data, and code will be open-source at https://github.com/zhengkid/Parallel-R1.

शोध-पत्र लिंक

https://arxiv.org/abs/2509.07980

बड़े भाषा मॉडलों का उपयोग करते हुए Retrieval और Structuring Augmented Generation पर एक सर्वे / A Survey on Retrieval And Structuring Augmented Generation with Large Language Models

शोध-पत्र परिचय

बड़े भाषा मॉडल (LLMs) ने natural language processing क्षेत्र में क्रांतिकारी प्रगति की है, लेकिन वास्तविक applications में hallucination उत्पन्न करना, पुराना ज्ञान, और सीमित domain expertise जैसी समस्याओं का सामना करना पड़ता है। इन सीमाओं को दूर करने के लिए प्रस्तावित Retrieval And Structuring (RAS) approach, dynamic information retrieval और structured knowledge representation को एकीकृत करके LLM के performance को बेहतर बनाने में मदद करता है। यह शोध बाहरी ज्ञान तक पहुँचने के लिए विभिन्न retrieval mechanisms की समीक्षा करता है, जिनमें sparse, dense और hybrid approaches शामिल हैं। ये retrieval mechanisms LLM को अधिक सटीक और विश्वसनीय जानकारी उत्पन्न करने में सहायता करते हैं।

साथ ही, असंरचित टेक्स्ट को संगठित अभिव्यक्तियों में बदलने वाली text structuring तकनीक भी महत्वपूर्ण भूमिका निभाती है। taxonomy construction, hierarchical classification, information extraction जैसी विधियों के जरिए LLM किसी विशेष domain में अपनी विशेषज्ञता बढ़ाते हैं और जटिल queries पर multi-step reasoning संभव बनाते हैं। RAS इन संरचित अभिव्यक्तियों को LLM के साथ एकीकृत करके prompt-based methods, reasoning frameworks और knowledge embedding तकनीकों के माध्यम से LLM की response generation क्षमता को बेहतर बनाता है.

यह शोध RAS की तकनीकी चुनौतियों की पहचान करता है और retrieval efficiency, structure quality, तथा knowledge integration के महत्व पर जोर देता है। साथ ही, multimodal retrieval, cross-lingual structures, और interactive systems जैसे भविष्य के research opportunities भी प्रस्तुत करता है, ताकि LLMs की applicability को और व्यापक बनाया जा सके। RAS approach को LLM के प्रदर्शन को अधिकतम करने वाली एक नवोन्मेषी methodology के रूप में देखा जा रहा है, और इससे natural language processing क्षेत्र की प्रगति में योगदान की उम्मीद है.

शोध सार (Abstract)

Large Language Models (LLM) ने text generation और reasoning में अपनी उल्लेखनीय क्षमताओं के साथ natural language processing में क्रांतिकारी बदलाव लाया है। लेकिन जब इन्हें वास्तविक applications में deploy किया जाता है, तब इन मॉडलों को hallucination generation, outdated knowledge, और limited domain expertise जैसी गंभीर चुनौतियों का सामना करना पड़ता है। Retrieval And Structuring (RAS) Augmented Generation dynamic information retrieval को structured knowledge representations के साथ एकीकृत करके इन सीमाओं का समाधान करता है। यह survey (1) external knowledge तक पहुंचने के लिए sparse, dense, और hybrid approaches सहित retrieval mechanisms की समीक्षा करता है; (2) taxonomy construction, hierarchical classification, और information extraction जैसी text structuring techniques का अध्ययन करता है, जो unstructured text को organized representations में बदलती हैं; और (3) यह जांचता है कि ये structured representations prompt-based methods, reasoning frameworks, और knowledge embedding techniques के माध्यम से LLMs के साथ कैसे एकीकृत होती हैं। साथ ही, यह retrieval efficiency, structure quality, और knowledge integration से जुड़ी तकनीकी चुनौतियों की पहचान करता है, और multimodal retrieval, cross-lingual structures, तथा interactive systems में research opportunities को रेखांकित करता है। यह व्यापक अवलोकन researchers और practitioners को RAS methods, applications, और future directions पर उपयोगी insights प्रदान करता है।

Large Language Models (LLMs) have revolutionized natural language processing with their remarkable capabilities in text generation and reasoning. However, these models face critical challenges when deployed in real-world applications, including hallucination generation, outdated knowledge, and limited domain expertise. Retrieval And Structuring (RAS) Augmented Generation addresses these limitations by integrating dynamic information retrieval with structured knowledge representations. This survey (1) examines retrieval mechanisms including sparse, dense, and hybrid approaches for accessing external knowledge; (2) explore text structuring techniques such as taxonomy construction, hierarchical classification, and information extraction that transform unstructured text into organized representations; and (3) investigate how these structured representations integrate with LLMs through prompt-based methods, reasoning frameworks, and knowledge embedding techniques. It also identifies technical challenges in retrieval efficiency, structure quality, and knowledge integration, while highlighting research opportunities in multimodal retrieval, cross-lingual structures, and interactive systems. This comprehensive overview provides researchers and practitioners with insights into RAS methods, applications, and future directions.

शोध लिंक

https://arxiv.org/abs/2509.10697

ParaThinker: LLM test-time compute को scale करने के लिए एक नए paradigm के रूप में native parallel thinking / ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute

शोध परिचय

Large Language Models (LLM) की प्रगति हाल के समय में test-time compute scaling strategies पर काफी हद तक निर्भर रही है, और इसने मॉडल की reasoning क्षमता को बेहतर बनाने में योगदान दिया है। लेकिन जैसे-जैसे computation बढ़ता है, performance gains बहुत सीमित हो जाते हैं, जिससे यह approach एक bottleneck का सामना करती है। इस समस्या को "Tunnel Vision" कहा जाता है, जहां शुरुआती चरणों में अपूर्ण reasoning मॉडल को non-optimal path पर फंसा देती है। इसे हल करने के लिए प्रस्तावित नया paradigm, Native Thought Parallelism, एक साथ कई अलग-अलग reasoning paths उत्पन्न करता है और उन्हें समेकित करके अंतिम उत्तर तैयार करता है.

ParaThinker नाम का end-to-end framework इसी thought parallelism को लागू करने पर केंद्रित है। इस system में मॉडल को स्वतंत्र रूप से विविध thoughts उत्पन्न करने के लिए train किया जाता है, जिससे वह Tunnel Vision समस्या से प्रभावी ढंग से बच सके और अपनी संभावित reasoning क्षमता को अधिकतम कर सके। ParaThinker इस लक्ष्य को तीन प्रमुख innovations के माध्यम से हासिल करता है। पहला, यह trainable control tokens लाता है ताकि हर path की मौलिकता सुनिश्चित हो; दूसरा, thought-specific positional embeddings के जरिए हर path के स्रोत को स्पष्ट रूप से अलग किया जाता है; और तीसरा, Supervised Fine-Tuning रणनीति के माध्यम से मॉडल को अधिक parallel paths उत्पन्न करने के लिए सक्षम बनाया जाता है.

इस approach ने challenging reasoning benchmarks पर मौजूदा autoregressive reasoning models की तुलना में 1.5B मॉडल में औसतन 12.3% और 7B मॉडल में औसतन 7.5% accuracy improvement हासिल किया, जबकि latency केवल 7.1% तक ही बढ़ी। इसके चलते ParaThinker यह संभावना दिखाता है कि छोटे मॉडल कहीं बड़े मॉडलों को पार कर सकते हैं, और यह भविष्य में LLM scaling के लिए एक नया रास्ता प्रस्तुत करता है। शोध निष्कर्ष LLM की reasoning प्रक्रिया में आने वाले bottlenecks को स्पष्ट करते हैं और यह साबित करते हैं कि Native Thought Parallelism बेहतर scaling method है, जिससे LLM research क्षेत्र में महत्वपूर्ण योगदान होता है.

शोध सार (Abstract)

हाल के वर्षों में बड़े भाषा मॉडल (LLM) की प्रगति test-time compute scaling से प्रेरित रही है। यह एक ऐसी रणनीति है जो लंबे, क्रमिक thought process उत्पन्न करके reasoning को बेहतर बनाती है। हालांकि यह प्रभावी है, लेकिन computation बढ़ने के साथ इस approach को एक बड़े bottleneck का सामना करना पड़ता है, जहाँ अतिरिक्त computation से प्रदर्शन में केवल मामूली सुधार मिलता है। हमारा तर्क है कि यह सीमा मॉडल की क्षमता में निहित कोई मूलभूत सीमा नहीं है, बल्कि scaling strategy की अपनी खामी है। हम इस phenomenon को "Tunnel Vision" नाम देते हैं, जिसमें मॉडल के शुरुआती अपूर्ण कदम उसे एक suboptimal reasoning path में फँसा देते हैं। इसे दूर करने के लिए, हम एक नया scaling paradigm पेश करते हैं: native thought parallelism। हम ParaThinker नामक एक end-to-end framework प्रस्तुत करते हैं, जो LLM को कई विविध reasoning path समानांतर रूप से उत्पन्न करने और उन्हें मिलाकर बेहतर अंतिम उत्तर तैयार करने के लिए train करता है। अलग-अलग thought lines को एक साथ explore करके, ParaThinker प्रभावी रूप से Tunnel Vision समस्या से बचता है और मॉडल की latent reasoning क्षमता को सामने लाता है। हमारा approach दिखाता है कि compute को parallel (width) में scale करना, सिर्फ sequential (depth) scaling की तुलना में, बेहतर reasoning के लिए अधिक प्रभावी और efficient तरीका है। चुनौतीपूर्ण reasoning benchmark पर ParaThinker ने sequential LLM की तुलना में accuracy में उल्लेखनीय सुधार हासिल किया (1.5B मॉडल पर औसतन 12.3% और 7B मॉडल पर औसतन 7.5% सुधार), जबकि latency overhead बहुत कम (7.1%) रहा। इससे छोटे मॉडल, काफी बड़े मॉडलों को भी पीछे छोड़ सकते हैं, और parallel thinking भविष्य के LLM scaling के लिए एक महत्वपूर्ण और efficient dimension के रूप में स्थापित होती है。

Recent advances in Large Language Models (LLMs) have been driven by test-time compute scaling - a strategy that improves reasoning by generating longer, sequential thought processes. While effective, this approach encounters a significant bottleneck as computation increases, where further computation offers only marginal performance gains. We argue this ceiling is not an inherent limit of the model's capability but a flaw in the scaling strategy itself, a phenomenon we term "Tunnel Vision", where a model's imperfect initial steps lock it into a suboptimal reasoning path. To overcome this, we introduce a new scaling paradigm: native thought parallelism. We present ParaThinker, an end-to-end framework that trains an LLM to generate multiple, diverse reasoning paths in parallel and synthesize them into a superior final answer. By exploring different lines of thoughts simultaneously, ParaThinker effectively sidesteps the Tunnel Vision issue and unlocks the model's latent reasoning potential. Our approach demonstrates that scaling compute in parallel (width) is a more effective and efficient way to superior reasoning than simply scaling sequentially (depth). On challenging reasoning benchmarks, ParaThinker achieves substantial accuracy improvements over sequential LLMs (12.3% for 1.5B and 7.5% for 7B models on average with 8 parallel paths), while adding only negligible latency overhead (7.1%). This enables smaller models to surpass much larger counterparts and establishes parallel thinking as a critical, efficient dimension for scaling future LLMs.

पेपर लिंक

https://arxiv.org/abs/2509.04475

टाइम-सीरीज़ foundation models के लिए in-context fine-tuning / In-Context Fine-Tuning for Time-Series Foundation Models

पेपर परिचय

टाइम-सीरीज़ डेटा का पूर्वानुमान कई क्षेत्रों में एक महत्वपूर्ण कार्य है, और हाल के समय में टाइम-सीरीज़ foundation models की प्रगति ने इस समस्या के लिए नई संभावनाएँ खोली हैं। इस शोध में इन मॉडलों के प्रदर्शन को अधिकतम करने के लिए एक नई methodology, $\textit{in-context fine-tuning}$, प्रस्तावित की गई है। यह methodology इस तरह डिज़ाइन की गई है कि pre-trained foundation model कई टाइम-सीरीज़ उदाहरणों का उपयोग करके किसी विशिष्ट टाइम-सीरीज़ का भविष्य पूर्वानुमान कर सके।

प्रस्तावित मॉडल को केवल target टाइम-सीरीज़ के इतिहास पर ही नहीं, बल्कि context window में मौजूद संबंधित टाइम-सीरीज़ उदाहरणों पर भी train किया जाता है, ताकि inference के समय वह target domain के विशिष्ट distribution के अनुरूप adapt कर सके। यह approach मॉडल को विभिन्न टाइम-सीरीज़ पैटर्न सीखने और उनके आधार पर अधिक सटीक prediction करने में सक्षम बनाती है। प्रयोगों के परिणामों में, इस मॉडल ने supervised learning आधारित deep learning methods, statistical models, और अन्य मौजूदा टाइम-सीरीज़ foundation models की तुलना में कहीं बेहतर प्रदर्शन दिखाया।

विशेष रूप से, in-context fine-tuning approach ने target domain के लिए स्पष्ट रूप से fine-tune किए गए मॉडलों के बराबर प्रतिस्पर्धी प्रदर्शन दिखाया, जो इस methodology की नवोन्मेषी प्रकृति को रेखांकित करता है। मॉडल architecture TimesFM पर आधारित है, और इसे टाइम-सीरीज़ डेटा को प्रभावी ढंग से प्रोसेस करने के लिए तैयार किया गया है। Input examples को लंबाई $p$ के patch में बाँटकर प्रोसेस किया जाता है, और padding mask के माध्यम से prediction की सटीकता सुनिश्चित की जाती है।

Tokenization प्रक्रिया में patch और mask को मिलाकर बनाए गए token stacked transformer layers में input किए जाते हैं, जिनसे prediction परिणाम प्राप्त होते हैं। यह पूरी प्रक्रिया मॉडल को input डेटा को प्रभावी रूप से प्रोसेस करने और अगले $h$ step की टाइम-सीरीज़ का पूर्वानुमान करने में महत्वपूर्ण भूमिका निभाती है। यह शोध टाइम-सीरीज़ डेटा forecasting के लिए एक नया approach प्रस्तुत करता है और अनुभवजन्य रूप से दिखाता है कि in-context fine-tuning मौजूदा तरीकों की तुलना में बेहतर प्रदर्शन दे सकता है।

पेपर सारांश (Abstract)

हाल के zero-shot forecasting के लिए time-series foundation models की सफलता से प्रेरित होकर, हम time-series foundation model के लिए $\textit{in-context fine-tuning}$ की एक कार्यप्रणाली प्रस्तुत करते हैं। विशेष रूप से, हम एक pretrained foundation model डिज़ाइन करते हैं जिसे inference के समय कई time-series उदाहरणों के साथ prompt किया जा सकता है, ताकि वह target time-series का भविष्य का पूर्वानुमान लगा सके। हमारा foundation model इस तरह विशेष रूप से प्रशिक्षित किया गया है कि वह target time-series के history के अलावा context window में कई संबंधित time-series के उदाहरणों का उपयोग कर सके, जिससे inference के समय target domain के विशिष्ट distribution के अनुसार अनुकूलन में मदद मिलती है। हम दिखाते हैं कि inference के समय in-context उदाहरणों का उपयोग करने वाला ऐसा foundation model, लोकप्रिय forecasting benchmarks पर supervised deep learning methods, statistical models, और अन्य time-series foundation models की तुलना में कहीं बेहतर प्रदर्शन हासिल कर सकता है। दिलचस्प बात यह है कि हमारा in-context fine-tuning दृष्टिकोण target domain पर स्पष्ट रूप से fine-tune किए गए foundation model के प्रदर्शन की भी बराबरी कर सकता है।
> Motivated by the recent success of time-series foundation models for zero-shot forecasting, we present a methodology for $\textit{in-context fine-tuning}$ of a time-series foundation model. In particular, we design a pretrained foundation model that can be prompted (at inference time) with multiple time-series examples, in order to forecast a target time-series into the future. Our foundation model is specifically trained to utilize examples from multiple related time-series in its context window (in addition to the history of the target time-series) to help it adapt to the specific distribution of the target domain at inference time. We show that such a foundation model that uses in-context examples at inference time can obtain much better performance on popular forecasting benchmarks compared to supervised deep learning methods, statistical models, as well as other time-series foundation models. Interestingly, our in-context fine-tuning approach even rivals the performance of a foundation model that is explicitly fine-tuned on the target domain.

पेपर लिंक

https://arxiv.org/abs/2410.24087

आगे पढ़ें

https://research.google/blog/…

https://icml.cc/virtual/2025/poster/43707

सिर्फ 1 bit ही काफी है: binary normalized neural networks / 1 bit is all we need: binary normalized neural networks

पेपर परिचय

बड़े neural network models की प्रगति ने विभिन्न अनुप्रयोग क्षेत्रों में उत्कृष्ट प्रदर्शन दिखाया है, लेकिन इन मॉडलों के आकार में वृद्धि के कारण memory requirements और computational efficiency से जुड़ी चुनौतियाँ भी पैदा हुई हैं। इस अध्ययन में इन समस्याओं के समाधान के लिए एक नए प्रकार का neural network model, binary normalized layer, प्रस्तावित किया गया है, जिसमें सभी layers के parameters को एक single bit तक सीमित किया जाता है। यह layer kernel weights और bias सहित सभी parameters को 0 या 1 पर सेट करती है, ताकि memory उपयोग को नाटकीय रूप से कम करते हुए भी पारंपरिक 32-bit floating-point parameters वाले models के समान प्रदर्शन बनाए रखा जा सके।

binary normalized layer को fully connected, convolution, attention जैसी विभिन्न neural network architectures में लागू किया जा सकता है, और training प्रक्रिया के दौरान stable learning सुनिश्चित करने के लिए full-precision 32-bit values और binarized values—इन दो रूपों का उपयोग किया जाता है। इस शोध में multi-class image classification और language decoding समस्याओं को हल करने के लिए binary normalized layer का उपयोग करने वाले दो models तैयार किए गए। प्रयोगों के परिणामों से पता चला कि इन मॉडलों ने पारंपरिक 32-bit parameters वाले models के लगभग समान प्रदर्शन दिखाया, जबकि memory उपयोग 32 गुना कम हो गया।

यह अभिनव दृष्टिकोण बड़े neural network models की efficiency को काफी बढ़ाने की क्षमता रखता है, और इसका यह भी लाभ है कि इसे कम लागत वाले hardware पर आसानी से लागू किया जा सकता है। binary normalized layer के माध्यम से neural network models की memory requirements को घटाकर, विभिन्न अनुप्रयोग क्षेत्रों में उनकी व्यावहारिकता बढ़ाने की नई संभावनाएँ सामने आती हैं। भविष्य का शोध binary normalized layer के प्रदर्शन को और बेहतर बनाने तथा इसे विभिन्न क्षेत्रों में लागू करने योग्य methodologies की खोज की दिशा में आगे बढ़ने की उम्मीद है।

पेपर सारांश(Abstract)

नीचे AI/ML क्षेत्र के शोध-पत्रों के सार दिए गए हैं। बड़े neural network models, खासकर language models और foundation image models, के आकार में वृद्धि deployment से जुड़ी चुनौतियाँ पैदा कर रही है, जिससे memory requirements कम करने और computational efficiency बेहतर बनाने के प्रयास तेज हुए हैं। ये प्रयास विभिन्न applications में इन models की व्यावहारिक deployment और प्रभावी उपयोग सुनिश्चित करने के लिए महत्वपूर्ण हैं। इस अध्ययन में एक नए प्रकार की neural network layers और models विकसित किए गए हैं जो केवल single-bit parameters का उपयोग करते हैं। इस नए प्रकार के models में सभी layers के सभी parameters, यानी kernel weights और biases, केवल 0 या 1 के मान रखते हैं। इस नए प्रकार के models में binary normalized layer नामक layer का उपयोग किया जाता है। binary normalized layer किसी भी प्रकार की हो सकती है, जैसे fully connected, convolutional, attention layer आदि, और यह संबंधित पारंपरिक layer के हल्के रूपांतरण से बनी होती है। binary normalized layer की प्रभावशीलता दिखाने के लिए, multiclass image classification समस्या को हल करने वाले दो models और sequence के अगले token की भविष्यवाणी करने वाला एक language decoder बनाया गया। image classification के लिए model में convolutional layers और fully connected layers हैं, और language model multi-head attention वाले transformer blocks से बना है। परिणाम दिखाते हैं कि binary normalized layer वाले models, वास्तविक 32-bit parameters वाले समकक्ष models के लगभग समान परिणाम देते हैं। binary normalized layer ऐसे models विकसित करना संभव बनाती है जो मौजूदा models की तुलना में 32 गुना कम memory उपयोग करते हैं और समान performance बनाए रखते हैं। इसके अलावा, binary normalized layer को 1-bit arrays का उपयोग करके वर्तमान computers पर आसानी से implement किया जा सकता है, और इसके लिए dedicated electronic hardware विकसित करने की आवश्यकता नहीं होती। इस प्रकार की नई layers, कम memory requirements वाले बड़े neural network models के लिए एक नए युग की शुरुआत करती हैं, जिन्हें simple और low-cost hardware, जैसे mobile devices या केवल CPUs, पर deploy किया जा सकता है。
> बड़े neural network models, विशेष रूप से language models और foundational image models, का बढ़ता आकार deployment चुनौतियाँ पैदा करता है, जिससे memory requirements कम करने और computational efficiency बढ़ाने के प्रयास शुरू हुए हैं। ये प्रयास विभिन्न applications में इन models की practical deployment और effective utilization सुनिश्चित करने के लिए महत्वपूर्ण हैं। इस कार्य में neural network layers और models का एक नया प्रकार विकसित किया गया है जो केवल single-bit parameters का उपयोग करता है। इस नए प्रकार के models में सभी layers के सभी parameters, जिनमें kernel weights और biases शामिल हैं, केवल zero या one के मान रखते हैं। यह नया प्रकार binary normalized layer नामक layers का उपयोग करता है। ये binary normalized layers किसी भी प्रकार की हो सकती हैं, जैसे fully connected, convolutional, attention आदि, और ये संबंधित conventional layers के हल्के रूपांतरणों से बनी होती हैं। binary normalized layers की प्रभावशीलता दिखाने के लिए, multiclass image classification समस्या को हल करने के लिए दो अलग-अलग models और sequence के अगले token की भविष्यवाणी करने के लिए एक language decoder तैयार किया गया। image classification को हल करने वाला model convolutional और fully connected layers रखता है, और language model multi-head attention वाले transformer blocks से बना है। परिणाम दिखाते हैं कि binary normalized layers वाले models, वास्तविक 32-bit parameters वाले equivalent models के लगभग समान परिणाम देते हैं। binary normalized layers ऐसे models विकसित करने की अनुमति देती हैं जो वर्तमान models की तुलना में 32 गुना कम memory उपयोग करते हैं और equivalent performance रखते हैं। साथ ही, binary normalized layers को 1-bit arrays का उपयोग करके मौजूदा computers पर आसानी से implement किया जा सकता है, और dedicated electronic hardware के development की आवश्यकता नहीं होती। इस नए प्रकार की layers, कम memory requirements वाले बड़े neural network models के लिए एक नए युग की शुरुआत करती हैं, जिन्हें simple और inexpensive hardware, जैसे mobile devices या केवल CPUs, का उपयोग करके deploy किया जा सकता है।

पेपर लिंक

https://arxiv.org/abs/2509.07025

भाषा मॉडल में self-consistency का आंतरिककरण: multi-agent consensus alignment / Internalizing Self-Consistency in Language Models: Multi-Agent Consensus Alignment

पेपर परिचय

language models (LM) एक ही prompt के लिए परस्पर-विरोधी responses उत्पन्न करने की प्रवृत्ति रखते हैं, जिससे उनकी reasoning में consistency की कमी दिखाई देती है। मौजूदा inference-time methods इन असंगतियों को कम कर सकती हैं, लेकिन वे उस मूल समस्या को हल नहीं करतीं जिसमें consistent परिणाम तक पहुँचाने वाले reasoning path का चयन कठिन होता है। इसे हल करने के लिए, इस अध्ययन में self-consistency को एक अच्छी तरह aligned reasoning model के अंतर्निहित गुण के रूप में औपचारिक रूप दिया गया है, और Multi-Agent Consensus Alignment (MACA) नामक reinforcement learning framework प्रस्तुत किया गया है। MACA majority/minority परिणामों का उपयोग करके मॉडल को internal consensus के अनुरूप reasoning paths को प्राथमिकता देने के लिए post-training करता है। ये paths agents के बीच गहन चर्चा से उत्पन्न होते हैं और स्वतंत्र प्रयासों के समूह के बजाय peer arguments पर आधारित reasoning के माध्यम से अधिक समृद्ध consensus signals बनाते हैं। MACA agents को external supervision के बिना भी अधिक निर्णायक और संक्षिप्त ढंग से स्वयं सीखने में सक्षम बनाता है, और self-consistency, single-agent reasoning, sampling-based reasoning, तथा multi-agent collective decision-making जैसे विभिन्न settings में महत्वपूर्ण सुधार लाता है। ये परिणाम, unseen benchmarks पर मजबूत generalization क्षमता के साथ, language models की निहित reasoning क्षमता को अधिक विश्वसनीय रूप से सामने लाने वाली एक सशक्त self-alignment को दर्शाते हैं।

पेपर सार (Abstract)

भाषा मॉडल (LLM) असंगत तरीके से तर्क करते हैं और एक ही prompt पर अक्सर परस्पर-विरोधी जवाब उत्पन्न करते हैं। हालांकि inference-time methods इन असंगतियों को कुछ हद तक कम कर सकते हैं, वे मूल समस्या का समाधान नहीं करते: exploratory sampling के तहत LLM ऐसे reasoning pathways को विश्वसनीय रूप से चुनने में संघर्ष करते हैं जो लगातार समान परिणामों तक पहुंचें। इसे हल करने के लिए, हम self-consistency को अच्छी तरह aligned reasoning models के एक intrinsic गुण के रूप में औपचारिक रूप देते हैं और Multi-Agent Consensus Alignment (MACA) पेश करते हैं। MACA एक reinforcement learning framework है जो multi-agent debate में majority/minority outcomes का उपयोग करके models को post-train करता है, ताकि वे उन reasoning trajectories को प्राथमिकता दें जो उनके internal consensus के साथ aligned हों। ये trajectories deliberative exchanges से उभरती हैं, जहां agents अपने peers के तर्कों के आधार पर reasoning को स्थापित करते हैं; यानी यह स्वतंत्र प्रयासों का साधारण aggregation नहीं है, बल्कि single-round majority voting की तुलना में अधिक समृद्ध consensus signals पैदा करता है और बेहतर परिणाम देता है। MACA agents को external supervision के बिना multi-agent settings में peer insights का बेहतर उपयोग करने और अधिक decisive तथा concise तरीके से खुद को सिखाने में सक्षम बनाता है। इसके परिणामस्वरूप self-consistency (+27.6% on GSM8K), single-agent reasoning (+23.7% on MATH), sampling-based inference (+22.4% Pass@20 on MATH), और multi-agent ensemble decision-making (+42.7% on MathQA) में उल्लेखनीय सुधार मिलता है। ये निष्कर्ष unseen benchmarks पर मजबूत generalization (+16.3% on GPQA, +11.6% on CommonsenseQA) के साथ मिलकर दिखाते हैं कि यह मजबूत self-alignment भाषा मॉडलों की latent reasoning क्षमता को अधिक विश्वसनीय तरीके से उभारता है।
> Language Models (LMs) are inconsistent reasoners, often generating contradictory responses to identical prompts. While inference-time methods can mitigate these inconsistencies, they fail to address the core problem: LMs struggle to reliably select reasoning pathways leading to consistent outcomes under exploratory sampling. To address this, we formalize self-consistency as an intrinsic property of well-aligned reasoning models and introduce Multi-Agent Consensus Alignment (MACA), a reinforcement learning framework that post-trains models to favor reasoning trajectories aligned with their internal consensus using majority/minority outcomes from multi-agent debate. These trajectories emerge from deliberative exchanges where agents ground reasoning in peer arguments, not just aggregation of independent attempts, creating richer consensus signals than single-round majority voting. MACA enables agents to teach themselves to be more decisive and concise, and better leverage peer insights in multi-agent settings without external supervision, driving substantial improvements across self-consistency (+27.6% on GSM8K), single-agent reasoning (+23.7% on MATH), sampling-based inference (+22.4% Pass@20 on MATH), and multi-agent ensemble decision-making (+42.7% on MathQA). These findings, coupled with strong generalization to unseen benchmarks (+16.3% on GPQA, +11.6% on CommonsenseQA), demonstrate robust self-alignment that more reliably unlocks latent reasoning potential of language models.

पेपर लिंक

https://arxiv.org/abs/2509.15172

Universal Deep Research: अपना मॉडल और strategy साथ लाएं / Universal Deep Research: Bring Your Own Model and Strategy

पेपर परिचय

Universal Deep Research (UDR) एक generalized agent system है, जिसे मौजूदा deep research tools की उस सीमा को दूर करने के लिए विकसित किया गया है जिसमें वे किसी खास research strategy को fixed तरीके से execute करने के लिए hard-code किए गए होते हैं। UDR उपयोगकर्ताओं को अपनी customized deep research strategy बनाने, संपादित करने और बेहतर करने की सुविधा देता है, और इसकी खास बात यह है कि इस प्रक्रिया में किसी अतिरिक्त training या finetuning की आवश्यकता नहीं होती। यह system minimal research strategy से लेकर expansive और intensive strategy तक, विभिन्न उदाहरणों के जरिए अपनी generality प्रदर्शित करता है।

UDR का मुख्य आधार यह है कि यह user interface के माध्यम से experimentation को आसान बनाता है, ताकि शोधकर्ता अपनी खुद की research strategies को स्वतंत्र रूप से explore कर सकें। यह approach शोधकर्ताओं को मौजूदा tools पर निर्भर हुए बिना अपनी मौलिक methodology विकसित करने का अवसर देता है। खास तौर पर, UDR विभिन्न language models को wrap करता है, जिससे उपयोगकर्ता अपनी पसंद का model चुनकर इस्तेमाल कर सकते हैं।

यह शोध deep research tools के विकास में योगदान देता है और इस बात पर केंद्रित है कि शोधकर्ता अधिक creative और personalized research strategies बना सकें। उम्मीद है कि UDR की शुरुआत research की efficiency और effectiveness दोनों को बढ़ाने में महत्वपूर्ण भूमिका निभाएगी। इस लिहाज से, UDR deep research के क्षेत्र में नई संभावनाएं खोलने वाला एक नवोन्मेषी system बन सकता है।

पेपर सारांश(Abstract)

Deep research tools आज के सबसे प्रभावशाली और सबसे आम तौर पर दिखाई देने वाले agentic systems में से हैं। हालांकि, हम देखते हैं कि अब तक पेश किया गया हर deep research agent fixed tool choices का उपयोग करके किसी विशेष research strategy को लागू करने के लिए hard-coded है। हम Universal Deep Research (UDR) पेश करते हैं, जो एक generalist agentic system है और किसी भी language model के ऊपर wrap होकर उपयोगकर्ता को बिना किसी अतिरिक्त training या finetuning के अपनी पूरी तरह customized deep research strategies बनाने, संपादित करने और परिष्कृत करने में सक्षम बनाता है। अपने system की generality दिखाने के लिए, हम UDR में minimal, expansive, और intensive research strategies के उदाहरण जोड़ते हैं, और system के साथ experimentation को आसान बनाने के लिए एक user interface प्रदान करते हैं।
> Deep research tools are among the most impactful and most commonly encountered agentic systems today. We observe, however, that each deep research agent introduced so far is hard-coded to carry out a particular research strategy using a fixed choice of tools. We introduce Universal Deep Research (UDR), a generalist agentic system that wraps around any language model and enables the user to create, edit, and refine their own entirely custom deep research strategies without any need for additional training or finetuning. To showcase the generality of our system, we equip UDR with example minimal, expansive, and intensive research strategies, and provide a user interface to facilitate experimentation with the system.

पेपर लिंक

https://arxiv.org/abs/2509.00244

AlphaAgents: बड़े पैमाने के भाषा मॉडल-आधारित बहु-एजेंटों का उपयोग कर स्टॉक पोर्टफोलियो निर्माण / AlphaAgents: Large Language Model based Multi-Agents for Equity Portfolio Constructions

शोध-पत्र परिचय

Large Language Models (LLM) में प्रगति ने AI agents की दक्षता और अनुकूलनशीलता को अधिकतम करने में योगदान दिया है, और इससे जटिल समस्याओं को हल करने के लिए multi-agent collaboration की संभावनाएँ खुली हैं। यह अध्ययन ऐसे multi-agent system का उपयोग करके stock selection और portfolio management में role-based approach की पड़ताल करता है। शोध का मुख्य उद्देश्य यह आकलन करना है कि कई AI agents मिलकर stock selection performance को कैसे बेहतर बनाते हैं, और इसकी तुलना मौजूदा benchmarks से करना है.

Multi-agent system ऐसे agents से बना है जिनकी अलग-अलग विशेषज्ञताएँ हैं, जैसे fundamental analysis, sentiment analysis, और valuation; ये आपस में विचार-विमर्श करके एक इष्टतम portfolio बनाते हैं। अध्ययन में 15 technology stocks को यादृच्छिक रूप से चुनकर backtesting के माध्यम से performance का मूल्यांकन किया गया है, और risk-adjusted return तथा Sharpe ratio के आधार पर portfolio की प्रभावशीलता का विश्लेषण किया गया है। यह methodology दिखाती है कि multi-agent का collaborative decision-making process बेहतर investment strategy निकालने की क्षमता रखता है.

यह अध्ययन multi-agent system के फायदे और सीमाओं का विश्लेषण करता है, और यह प्रस्तावित करता है कि AI agents द्वारा दिए गए विविध दृष्टिकोणों को एकीकृत करके decision-making को कैसे बेहतर बनाया जा सकता है। हालांकि, ऐसे system के implementation में logical consistency को human review के जरिए सत्यापित करने जैसी चुनौतियाँ मौजूद हैं। शोध के परिणाम दिखाते हैं कि multi-agent system stock portfolio construction में एक innovative approach प्रदान कर सकता है, और भविष्य के शोध में LLM की reliability के अनुसार stock weights को समायोजित करने वाली क्षमता की पड़ताल की जाएगी.

इस तरह का शोध AI-आधारित investment strategy development में योगदान देता है और multi-agent system के उपयोग की संभावनाएँ सामने रखता है.

शोध-पत्र सारांश (Abstract)

AI agents का क्षेत्र Large Language Models (LLMs) की क्षमताओं के कारण तेज़ी से विकसित हो रहा है, क्योंकि ये मानव-समान दक्षता और अनुकूलनशीलता के साथ कार्यों को स्वायत्त रूप से कर और सुधार सकते हैं। इस संदर्भ में, multi-agent collaboration एक आशाजनक approach के रूप में उभरा है, जो कई AI agents को मिलकर जटिल समस्याओं का समाधान करने में सक्षम बनाता है। यह अध्ययन equity research और portfolio management में stock selection को support करने के लिए role-based multi-agent systems के उपयोग की जाँच करता है। हम specialized agents की एक टीम द्वारा किए गए comprehensive analysis को प्रस्तुत करते हैं, और अलग-अलग risk tolerance स्तरों पर इनके stock-picking performance का established benchmarks के मुकाबले मूल्यांकन करते हैं। साथ ही, हम equity analysis में multi-agent framework के उपयोग के फायदे और सीमाओं की जाँच करते हैं, और उनकी practical efficacy तथा implementation challenges पर महत्वपूर्ण insights प्रदान करते हैं।
> The field of artificial intelligence (AI) agents is evolving rapidly, driven by the capabilities of Large Language Models (LLMs) to autonomously perform and refine tasks with human-like efficiency and adaptability. In this context, multi-agent collaboration has emerged as a promising approach, enabling multiple AI agents to work together to solve complex challenges. This study investigates the application of role-based multi-agent systems to support stock selection in equity research and portfolio management. We present a comprehensive analysis performed by a team of specialized agents and evaluate their stock-picking performance against established benchmarks under varying levels of risk tolerance. Furthermore, we examine the advantages and limitations of employing multi-agent frameworks in equity analysis, offering critical insights into their practical efficacy and implementation challenges.

शोध-पत्र लिंक

https://arxiv.org/abs/2508.11152

बड़े पैमाने के रीजनिंग मॉडलों के लिए रिइनफोर्समेंट लर्निंग सर्वे / A Survey of Reinforcement Learning for Large Reasoning Models

शोध-पत्र परिचय

Reinforcement Learning (RL) Large Language Models (LLMs) की reasoning क्षमता को बेहतर बनाने में महत्वपूर्ण भूमिका निभा रहा है, और यह शोध-पत्र RL के माध्यम से Large Reasoning Models (LRMs) की दिशा में हो रहे विकास का अवलोकन करता है। RL ने गणितीय समस्या-समाधान और coding tasks जैसे जटिल logical tasks में उल्लेखनीय प्रदर्शन दिखाया है, और यह LLM को LRM में बदलने की एक बुनियादी methodology के रूप में स्थापित हुआ है। हालांकि, LRM के लिए RL का विस्तार computational resources, algorithm design, training data, और infrastructure के संदर्भ में कई चुनौतियों का सामना कर रहा है.

यह अध्ययन LLM और LRM की reasoning क्षमता बढ़ाने के लिए RL के उपयोग पर विभिन्न शोधों की समीक्षा करता है, और खास तौर पर DeepSeek-R1 model सहित हालिया प्रगति के संदर्भ में reward design, policy optimization, और sampling strategy जैसे RL के मूल घटकों का विश्लेषण करता है। Reward design वह महत्वपूर्ण signal है जो model की learning direction तय करता है, और verifiable reward mechanism के महत्व पर ज़ोर दिया गया है। Policy optimization वह प्रक्रिया है जिसमें model को optimal action चुनना सिखाया जाता है, और इसमें critic-based algorithm तथा critic-free algorithm दोनों शामिल हैं। इसके अलावा, sampling strategy को RL की efficiency बढ़ाने के तरीके के रूप में देखा गया है, जहाँ dynamic sampling और hyperparameter tuning पर चर्चा की गई है.

यह शोध-पत्र RL के माध्यम से LLM के integrated training process और training resources की quality तथा structure के महत्व पर ज़ोर देता है, और software engineering तथा robotics tasks में RL के application cases के जरिए इसकी practical utility दिखाता है। विशेष रूप से, RL और agent paradigm का integration code generation में प्रगति को आगे बढ़ा रहा है, और multimodal tasks में भी सफल परिणाम दे रहा है। इस तरह का शोध LLM की reasoning क्षमता को बेहतर बनाने के लिए नई दिशाएँ सुझाता है, और अंततः Artificial SuperIntelligence (ASI) हासिल करने की बुनियाद तैयार करने में योगदान देने की उम्मीद की जाती है.

शोध-पत्र सारांश (Abstract)

यह शोध-पत्र Large Language Models (LLMs) के साथ reasoning के लिए Reinforcement Learning (RL) में हाल की प्रगति का सर्वे करता है। RL ने खास तौर पर गणित और coding जैसे जटिल तार्किक कार्यों को हल करने में LLMs की क्षमता को आगे बढ़ाने में उल्लेखनीय सफलता हासिल की है। परिणामस्वरूप, RL अब LLMs को Large Reasoning Models (LRMs) में बदलने की एक बुनियादी methodology के रूप में उभरा है। इस क्षेत्र की तेज़ प्रगति के साथ, LRMs के लिए RL को और scale करने में अब सिर्फ computational resources ही नहीं, बल्कि algorithm design, training data और infrastructure के स्तर पर भी बुनियादी चुनौतियों का सामना करना पड़ रहा है। इसी संदर्भ में, इस क्षेत्र के विकास की फिर से समीक्षा करना, इसकी दिशा का पुनर्मूल्यांकन करना, और Artificial SuperIntelligence (ASI) की ओर RL की scalability बढ़ाने की रणनीतियों का पता लगाना समयोचित है। विशेष रूप से, हम DeepSeek-R1 के रिलीज़ के बाद reasoning क्षमताओं के लिए LLMs और LRMs पर RL के अनुप्रयोग संबंधी शोध की समीक्षा करते हैं, जिसमें foundational components, core problems, training resources और downstream applications शामिल हैं, ताकि इस तेज़ी से विकसित होते क्षेत्र के लिए भविष्य के अवसरों और दिशाओं की पहचान की जा सके। हमें आशा है कि यह समीक्षा व्यापक reasoning models के लिए RL पर भविष्य के शोध को प्रोत्साहित करेगी। GitHub: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs
> In this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex logical tasks such as mathematics and coding. As a result, RL has emerged as a foundational methodology for transforming LLMs into LRMs. With the rapid progress of the field, further scaling of RL for LRMs now faces foundational challenges not only in computational resources but also in algorithm design, training data, and infrastructure. To this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of DeepSeek-R1, including foundational components, core problems, training resources, and downstream applications, to identify future opportunities and directions for this rapidly evolving area. We hope this review will promote future research on RL for broader reasoning models. Github: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs

शोध-पत्र लिंक

https://arxiv.org/abs/2509.08827

[2025/09/22 ~ 28] इस हफ्ते देखने लायक AI/ML शोध-पत्रों का संग्रह