12] इस हफ्ते के प्रमुख ML शोधपत्र (Top ML Papers of the Week)

(discuss.pytorch.kr)

3 पॉइंट द्वारा ninebow 2023-11-13 | अभी कोई टिप्पणी नहीं है. | WhatsApp पर शेयर करें

अवलोकन

DAIR.AI पर हर हफ्ते प्रकाशित होने वाले ML शोधपत्रों पर आधारित लेख का स्वचालित अनुवाद किया गया है।
इस हफ्ते चुने गए शोधपत्रों को देखें तो Transformer मॉडल और बड़े भाषा मॉडल (Large Language Models, LLM) पर कई शोध शामिल हैं।
'Simplifying Transformer Blocks', 'Understanding In-Context Learning Abilities in Transformers', 'S-LoRA' जैसे शीर्षकों से लगता है कि फोकस Transformer मॉडलों की संरचना और learning mechanism की समझ को और गहरा करने पर है।
'Hallucination in LLMs', 'On the Road with GPT-4V(ision)', 'GPT4All' जैसे शोध GPT जैसे बड़े भाषा मॉडलों के प्रदर्शन और उपयोग के मामलों को कवर करते हैं, जिससे यह स्पष्ट होता है कि बड़े भाषा मॉडलों के विकास और अनुप्रयोग पर विशेष जोर है।

बड़े भाषा मॉडलों में hallucination पर सर्वेक्षण: सिद्धांत, taxonomy, चुनौतियाँ और खुले प्रश्न / A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

शोधपत्र परिचय

LLM में hallucination पर एक व्यापक survey paper (50+ पेज), जो LLM hallucination की समस्या से जुड़े सिद्धांत, taxonomy, चुनौतियों और खुले प्रश्नों पर जानकारी देता है। #survey-paper #hallucination

A comprehensive survey (50+ pages) on hallucination in llms; provides information about principles, taxonomy, challenges, and open questions related to the issue of hallucination in llms.

शोधपत्र सार

बड़े भाषा मॉडलों (LLM) के उभरने से natural language processing (NLP) में एक बड़ा breakthrough आया है, जिसने text understanding और generation में उल्लेखनीय प्रगति संभव की है। लेकिन इन प्रगतियों के साथ-साथ, LLM में ऐसी सामग्री उत्पन्न करने की एक गंभीर प्रवृत्ति भी देखी जाती है जो वास्तविक तथ्यों या उपयोगकर्ता इनपुट से मेल नहीं खाती। यह घटना इनके व्यावहारिक deployment के सामने बड़ी चुनौतियाँ खड़ी करती है और वास्तविक परिदृश्यों में LLM की विश्वसनीयता को लेकर चिंताएँ बढ़ाती है, जिसके कारण इन hallucinations को detect और mitigate करने पर बढ़ता ध्यान दिया जा रहा है। इस survey में, हमारा उद्देश्य LLM hallucination के क्षेत्र में हालिया प्रगति का एक गहन और व्यापक अवलोकन प्रदान करना है। हम LLM hallucination की एक नवीन taxonomy से शुरुआत करते हैं, फिर hallucination में योगदान देने वाले कारकों की विस्तार से चर्चा करते हैं। इसके बाद, hallucination detection methods और benchmarks का एक व्यापक overview प्रस्तुत करते हैं। साथ ही, hallucination को कम करने के लिए डिज़ाइन किए गए प्रतिनिधि approaches का भी परिचय दिया गया है। अंत में, हम वर्तमान सीमाओं को उजागर करने वाली चुनौतियों का विश्लेषण करते हैं और खुले प्रश्नों को व्यवस्थित रूप से प्रस्तुत करते हैं, ताकि LLM में hallucination पर भविष्य के शोध के लिए आगे की दिशा स्पष्ट की जा सके।

The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation. Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs. This phenomenon poses substantial challenges to their practical deployment and raises concerns over the reliability of LLMs in real-world scenarios, which attracts increasing attention to detect and mitigate these hallucinations. In this survey, we aim to provide a thorough and in-depth overview of recent advances in the field of LLM hallucinations. We begin with an innovative taxonomy of LLM hallucinations, then delve into the factors contributing to hallucinations. Subsequently, we present a comprehensive overview of hallucination detection methods and benchmarks. Additionally, representative approaches designed to mitigate hallucinations are introduced accordingly. Finally, we analyze the challenges that highlight the current limitations and formulate open questions, aiming to delineate pathways for future research on hallucinations in LLMs.

शोधपत्र लिंक

https://arxiv.org/abs/2311.05232

Transformer ब्लॉक्स को सरल बनाना / Simplifying Transformer Blocks

शोधपत्र परिचय

Transformer block को सरल बनाने पर यह शोध दिखाता है कि block के कई components को हटाने पर भी training speed में कोई कमी नहीं आती; autoregressive decoder-only और BERT encoder-only models जैसी अलग-अलग architectures का उपयोग करके, सरल किए गए blocks standard Transformer की per-update training speed और performance की बराबरी करते हैं, और कम parameters (15%) के साथ 15% अधिक training throughput भी हासिल कर सकते हैं।

Explores simplifying the transformer block and finds that many block components can be removed with no loss of training speed; using different architectures like autoregressive decoder-only and bert encoder-only models, the simplified blocks emulate per-update training speed and performance of standard transformers, and even achieve 15% faster training throughput with fewer parameters (15%).

शोधपत्र सार

deep Transformer के लिए एक सरल design recipe यह है कि एक जैसे building blocks को जोड़ा जाए। लेकिन standard transformer blocks सरल नहीं होते; इनमें attention और MLP sub-blocks, skip connections और normalisation layers के साथ बहुत सटीक arrangement में intertwined रहते हैं। यह जटिलता architecture को fragile बना देती है, जहाँ मामूली बदलाव भी training speed को काफी घटा सकते हैं या model को train न होने लायक बना सकते हैं। इस शोध में सवाल उठाया गया है कि standard transformer block को कितनी हद तक सरल बनाया जा सकता है। signal propagation theory और empirical observations को मिलाकर, लेखक ऐसे modifications के लिए प्रेरणा देते हैं जिनसे training speed में कोई कमी आए बिना block के कई components हटाए जा सकते हैं, जिनमें skip connections, projection या value parameters, sequential sub-blocks और normalisation layers शामिल हैं। autoregressive decoder-only models और BERT encoder-only models पर किए गए experiments में, simplified transformers ने standard transformers की per-update training speed और performance की नकल की, जबकि training throughput 15% तेज रहा और parameters 15% कम इस्तेमाल हुए。

A simple design recipe for deep Transformers is to compose identical building blocks. But standard transformer blocks are far from simple, interweaving attention and MLP sub-blocks with skip connections & normalisation layers in precise arrangements. This complexity leads to brittle architectures, where seemingly minor changes can significantly reduce training speed, or render models untrainable. In this work, we ask to what extent the standard transformer block can be simplified? Combining signal propagation theory and empirical observations, we motivate modifications that allow many block components to be removed with no loss of training speed, including skip connections, projection or value parameters, sequential sub-blocks and normalisation layers. In experiments on both autoregressive decoder-only and BERT encoder-only models, our simplified transformers emulate the per-update training speed and performance of standard transformers, while enjoying 15% faster training throughput, and using 15% fewer parameters.

पेपर लिंक

https://arxiv.org/abs/2311.01906

आगे पढ़ें

https://x.com/maksym_andr/status/1722235666724192688

pretraining data mixtures के ज़रिए Transformer models में संकीर्ण model selection capabilities सक्षम करना / Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

पेपर परिचय

यह जाँचता है कि Transformers pretraining data mixtures के बीच कितना प्रभावी रूप से संबंध बनाकर pretraining distribution के भीतर और बाहर, दोनों तरह के नए tasks को in-context पहचान और सीख सकते हैं। अध्ययन किए गए regimes में इस बात के सीमित प्रमाण मिले कि models का in-context learning behavior अपने pretraining data से आगे generalize कर सकता है।

Investigates how effectively transformers can bridge between pretraining data mixture to identify and learn new tasks in-context which are both inside and outside the pretraining distribution; in the regimes studied, there is limited evidence that the models’ in-context learning behavior is capable of generalizing beyond their pretraining data.

पेपर सारांश

Transformer models, खासकर large language models (LLM), में in-context learning (ICL) की उल्लेखनीय क्षमता होती है — यानी बिना किसी explicit model training के, unseen input-output examples दिए जाने पर वे नए tasks कर सकते हैं। इस शोध में अध्ययन किया गया है कि कई अलग-अलग task families से बने अपने pretraining data mixtures के बीच Transformers कितना प्रभावी रूप से संबंध बना सकते हैं, ताकि pretraining distribution के भीतर और बाहर दोनों तरह के नए tasks को in-context पहचाना और सीखा जा सके। पिछले शोध पर आधारित यह प्रश्न एक controlled setting में जाँचा गया, जहाँ natural language के बजाय $(x, f(x))$ pairs की sequences पर train किए गए Transformer models का अध्ययन किया गया। empirical results से पता चला कि जब task families उनके pretraining data में अच्छी तरह represent होती हैं, तब Transformers लगभग optimal unsupervised model selection capabilities दिखाते हैं — यानी वे पहले context में अलग-अलग task families की पहचान कर सकते हैं और फिर उन्हीं के भीतर in-context learning कर सकते हैं। लेकिन जब ऐसे tasks या functions दिए गए जो उनके pretraining data के domain से बाहर थे, तब Transformers में कई तरह के failure modes दिखे और simple extrapolation tasks में भी generalization performance घटती दिखी। कुल मिलाकर, ये नतीजे इस बात पर ज़ोर देते हैं कि high-capacity sequence models की प्रभावशाली ICL abilities शायद fundamental generalization capabilities पैदा करने वाले inductive biases से कम, और उनके pretraining data mixtures की coverage से अधिक जुड़ी हैं।

Transformer models, notably large language models (LLMs), have the remarkable ability to perform in-context learning (ICL) -- to perform new tasks when prompted with unseen input-output examples without any explicit model training. In this work, we study how effectively transformers can bridge between their pretraining data mixture, comprised of multiple distinct task families, to identify and learn new tasks in-context which are both inside and outside the pretraining distribution. Building on previous work, we investigate this question in a controlled setting, where we study transformer models trained on sequences of $(x, f(x))$ pairs rather than natural language. Our empirical results show transformers demonstrate near-optimal unsupervised model selection capabilities, in their ability to first in-context identify different task families and in-context learn within them when the task families are well-represented in their pretraining data. However when presented with tasks or functions which are out-of-domain of their pretraining data, we demonstrate various failure modes of transformers and degradation of their generalization for even simple extrapolation tasks. Together our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases that create fundamental generalization capabilities.

पेपर लिंक

https://arxiv.org/abs/2311.00871

आगे पढ़ें

https://x.com/abacaj/status/1721223737729581437

सरल और नियंत्रित music generation / Simple and Controllable Music Generation

शोधपत्र परिचय

compressed discrete music representation की कई streams पर काम करने वाला एक single-stage transformer-आधारित llm, जो text description या melody features के अनुसार condition करते हुए high-quality samples (mono और stereo) generate कर सकता है।

A single-stage transformer-based llm that operates over several streams of compressed discrete music representation; it can generate high-quality samples (mono and stereo) while conditioning on textual description or melodic features.

शोधपत्र सार

यह conditional music generation की समस्या को हल करता है। इसमें MusicGen प्रस्तुत किया गया है, जो compressed discrete music representation यानी tokens की कई streams पर काम करने वाला एक single Language Model (LM) है। पिछले कार्यों के विपरीत, MusicGen एक single-stage transformer LM और efficient token interleaving patterns से बना है, इसलिए hierarchical या upsampling जैसे कई models को cascade करने की आवश्यकता नहीं होती। इस approach के तहत, हम दिखाते हैं कि MusicGen text description या melody features पर condition होकर mono और stereo दोनों में high-quality samples generate कर सकता है, साथ ही generated output पर बेहतर control भी देता है। हम automatic और human studies दोनों को ध्यान में रखते हुए व्यापक empirical evaluation करते हैं, जो दिखाता है कि प्रस्तावित approach standard text-to-music benchmark पर evaluate किए गए baselines से बेहतर है। ablation studies के माध्यम से, हम MusicGen को बनाने वाले प्रत्येक component के महत्व पर प्रकाश डालते हैं। music samples, code, और models https://github.com/facebookresearch/audiocraft पर उपलब्ध हैं।

We tackle the task of conditional music generation. We introduce MusicGen, a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models, e.g., hierarchically or upsampling. Following this approach, we demonstrate how MusicGen can generate high-quality samples, both mono and stereo, while being conditioned on textual description or melodic features, allowing better controls over the generated output. We conduct extensive empirical evaluation, considering both automatic and human studies, showing the proposed approach is superior to the evaluated baselines on a standard text-to-music benchmark. Through ablation studies, we shed light over the importance of each of the components comprising MusicGen. Music samples, code, and models are available at https://github.com/facebookresearch/audiocraft

शोधपत्र लिंक

https://arxiv.org/abs/2306.05284

आगे पढ़ें

https://x.com/AIatMeta/status/1723043913638810025

efficient transformer models के लिए alternating updates / Alternating Updates for Efficient Transformers

शोधपत्र परिचय

transformer models में scale और capacity बढ़ाने के फ़ायदों का उपयोग बिना computational cost बढ़ाए करने की एक विधि। इसमें हर layer पर widened representation के एक subblock पर काम किया जाता है और predict-and-correct mechanism का उपयोग करके inactivated blocks को update किया जाता है; इससे learned representation का विस्तार होता है, जबकि latency में केवल नगण्य वृद्धि होती है।

A method that makes it possible to take advantage of increasing scale and capacity in transformer models without increasing the computational cost; achieved by working on a subblock of the widened representation at each layer and using a predict-and-correct mechanism to update the inactivated blocks; it widens the learn representation while only incurring a negligible increase in latency.

शोधपत्र सार

यह पहले से अच्छी तरह स्थापित तथ्य है कि deep transformer networks का scale बढ़ाने पर quality और performance बेहतर होती है। हालांकि, scale में यह बढ़ोतरी अक्सर compute cost और inference latency में बहुत बड़ी वृद्धि के साथ आती है। Pure Storage ने Alternating Updates (AltUp) पेश किया है, जो computational burden के बिना model capacity बढ़ाने का एक सरल implementation method है। AltUp का उपयोग करके learned representation, यानी token embedding, को चौड़ा किया जा सकता है, जबकि latency में केवल नगण्य वृद्धि होती है। AltUp यह काम हर layer में widened representation के एक subblock पर काम करके और inactivated blocks को update करने के लिए predict-and-correct mechanism का उपयोग करके करता है। हम AltUp के extensions, जैसे sequence dimension पर इसकी applicability, प्रस्तुत करते हैं और दिखाते हैं कि AltUp को Sparse Mixture-of-Experts models जैसे मौजूदा approaches के साथ synergistically मिलाकर और भी अधिक capacity वाले efficient models कैसे प्राप्त किए जा सकते हैं। Benchmark transformer models और language tasks पर हमारे experiments दिखाते हैं कि AltUp विभिन्न scenarios में लगातार प्रभावी है। खास तौर पर, SuperGLUE और SQuAD benchmarks पर AltUp समान accuracy में dense baselines की तुलना में अधिकतम $87%$ speedup सक्षम बनाता है।

It has been well established that increasing scale in deep transformer networks leads to improved quality and performance. However, this increase in scale often comes with prohibitive increases in compute cost and inference latency. We introduce Alternating Updates (AltUp), a simple-to-implement method to increase a model's capacity without the computational burden. AltUp enables the widening of the learned representation, i.e., the token embedding, while only incurring a negligible increase in latency. AltUp achieves this by working on a subblock of the widened representation at each layer and using a predict-and-correct mechanism to update the inactivated blocks. We present extensions of AltUp, such as its applicability to the sequence dimension, and demonstrate how AltUp can be synergistically combined with existing approaches, such as Sparse Mixture-of-Experts models, to obtain efficient models with even higher capacity. Our experiments on benchmark transformer models and language tasks demonstrate the consistent effectiveness of AltUp on a diverse set of scenarios. Notably, on SuperGLUE and SQuAD benchmarks, AltUp enables up to $87%$ speedup relative to the dense baselines at the same accuracy.

पेपर लिंक

https://arxiv.org/abs/2301.13310

फिर से कहें और जवाब दें: बड़े भाषा मॉडल्स को खुद के लिए बेहतर सवाल पूछने दें / Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves

पेपर परिचय

एक प्रभावी prompting method जो LLMs का उपयोग करके मनुष्यों द्वारा पूछे गए सवालों को rephrase और expand करती है ताकि overall performance सुधरे; यह व्यापक tasks में अलग-अलग models की performance बेहतर कर सकती है; और इस approach को chain-of-thought के साथ जोड़कर performance को और बेहतर किया जा सकता है।

An effective prompting method that uses llms to rephrase and expand questions posed by humans to improve overall performance; it can improve the performance of different models across a wide range of tasks; the approach can be combined with chain-of-thought to improve performance further.

पेपर सारांश

गलतफहमियां सिर्फ़ लोगों के बीच के संचार में ही नहीं, बल्कि इंसानों और बड़े भाषा मॉडल (LLM) के बीच भी पैदा होती हैं। ऐसी असंगतियों के कारण LLM दिखने में स्पष्ट प्रश्नों को भी अप्रत्याशित तरीके से समझ सकते हैं और गलत उत्तर दे सकते हैं। यह व्यापक रूप से माना जाता है कि प्रश्न जैसे prompt की गुणवत्ता, LLM द्वारा दिए जाने वाले उत्तर की गुणवत्ता पर बड़ा असर डालती है, लेकिन ऐसे प्रश्न व्यवस्थित रूप से तैयार करने की विधि, जिन्हें LLM बेहतर समझ सकें, अभी भी पर्याप्त रूप से विकसित नहीं हुई है। इस शोधपत्र में Rephrase and Respond (RaR) नामक एक विधि प्रस्तुत की गई है, जो LLM को इंसानों द्वारा पूछे गए प्रश्नों को पुनर्गठित और विस्तृत करने तथा एक ही prompt में उत्तर देने की अनुमति देती है। यह तरीका प्रदर्शन सुधारने के लिए एक सरल लेकिन प्रभावी prompting method के रूप में काम करता है। इसके अलावा, RaR का एक 2-step variant भी प्रस्तुत किया गया है, जिसमें पहले एक rephrasing LLM प्रश्न की भाषा को दोबारा लिखता है, और फिर मूल प्रश्न तथा संशोधित प्रश्न दोनों को एक अलग responding LLM को साथ में भेजता है। इससे एक LLM द्वारा तैयार किए गए पुनर्गठित प्रश्नों का दूसरे LLM के साथ प्रभावी उपयोग संभव होता है। प्रयोगों से पता चलता है कि यह तरीका विभिन्न कार्यों में कई मॉडलों के प्रदर्शन को उल्लेखनीय रूप से बेहतर बनाता है। साथ ही, RaR और व्यापक रूप से उपयोग की जाने वाली Chain-of-Thought (CoT) विधि की सैद्धांतिक और अनुभवजन्य रूप से व्यापक तुलना भी की गई है। इससे दिखाया गया है कि RaR, CoT के साथ पूरक संबंध रखता है और CoT के साथ मिलाकर इससे और बेहतर प्रदर्शन हासिल किया जा सकता है। हमारा शोध न केवल LLM के प्रदर्शन को कुशल और प्रभावी ढंग से बेहतर बनाने में योगदान देता है, बल्कि LLM क्षमताओं के निष्पक्ष मूल्यांकन पर भी प्रकाश डालता है। डेटा और कोड https://github.com/uclaml/Rephrase-and-Respond पर उपलब्ध हैं।

गलतफहमियां सिर्फ़ पारस्परिक संचार में ही नहीं, बल्कि इंसानों और Large Language Models (LLMs) के बीच भी उत्पन्न होती हैं। ऐसी असंगतियां LLMs को दिखने में अस्पष्ट न लगने वाले प्रश्नों की भी अप्रत्याशित तरीके से व्याख्या करने पर मजबूर कर सकती हैं, जिससे गलत उत्तर मिलते हैं। यद्यपि यह व्यापक रूप से स्वीकार किया जाता है कि किसी प्रश्न जैसे prompt की गुणवत्ता, LLMs द्वारा दिए गए उत्तर की गुणवत्ता को महत्वपूर्ण रूप से प्रभावित करती है, फिर भी ऐसे प्रश्न तैयार करने की कोई व्यवस्थित विधि, जिन्हें LLMs बेहतर समझ सकें, अभी पर्याप्त रूप से विकसित नहीं हुई है। इस शोधपत्र में हम Rephrase and Respond (RaR) नामक एक विधि प्रस्तुत करते हैं, जो LLMs को मनुष्यों द्वारा पूछे गए प्रश्नों को पुनर्लेखित और विस्तृत करने तथा एक ही prompt में उत्तर देने की अनुमति देती है। यह तरीका प्रदर्शन सुधारने के लिए एक सरल लेकिन प्रभावी prompting method के रूप में काम करता है। हम RaR का एक two-step variant भी प्रस्तुत करते हैं, जिसमें एक rephrasing LLM पहले प्रश्न को पुनर्लेखित करता है और फिर मूल तथा पुनर्लेखित प्रश्नों को एक अलग responding LLM को साथ में भेजता है। इससे एक LLM द्वारा तैयार किए गए पुनर्लेखित प्रश्नों का दूसरे LLM के साथ प्रभावी उपयोग संभव होता है। हमारे प्रयोग दिखाते हैं कि हमारी विधियां कई तरह के कार्यों में अलग-अलग मॉडलों के प्रदर्शन को महत्वपूर्ण रूप से बेहतर बनाती हैं। हम आगे RaR और लोकप्रिय Chain-of-Thought (CoT) methods के बीच सैद्धांतिक और अनुभवजन्य, दोनों स्तरों पर व्यापक तुलना भी प्रस्तुत करते हैं। हम दिखाते हैं कि RaR, CoT का पूरक है और CoT के साथ मिलाकर इससे और भी बेहतर प्रदर्शन हासिल किया जा सकता है। हमारा काम न केवल LLM प्रदर्शन को कुशल और प्रभावी ढंग से बेहतर बनाने में योगदान देता है, बल्कि LLM क्षमताओं के निष्पक्ष मूल्यांकन पर भी प्रकाश डालता है। डेटा और कोड https://github.com/uclaml/Rephrase-and-Respond पर उपलब्ध हैं।

शोधपत्र लिंक

https://arxiv.org/abs/2311.04205

GPT-4V(ision) के साथ सड़क पर: autonomous driving पर visual-language model की शुरुआती पड़ताल / On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

शोधपत्र परिचय

नवीनतम visual-language model GPT-4V(ision) और autonomous driving में उसके अनुप्रयोग का विस्तृत मूल्यांकन प्रस्तुत किया गया है; यह मॉडल मौजूदा autonomous systems की तुलना में scene understanding और causal reasoning में बेहतर प्रदर्शन दिखाता है।

नवीनतम state-of-the-art visual language model gpt-4v(ision) और autonomous driving में उसके अनुप्रयोग का व्यापक मूल्यांकन प्रस्तुत करता है; यह मॉडल मौजूदा autonomous systems की तुलना में scene understanding और causal reasoning में बेहतर प्रदर्शन दिखाता है।

शोधपत्र सार

स्वायत्त ड्राइविंग तकनीक का साकार होना perception, decision-making, और control systems के परिष्कृत एकीकरण पर निर्भर करता है। मौजूदा data-driven और rule-based approaches की सीमा यह रही है कि वे जटिल ड्राइविंग वातावरण की बारीकियों और सड़क के अन्य उपयोगकर्ताओं की मंशा को समझ नहीं पाते। यह खास तौर पर सुरक्षित और भरोसेमंद स्वायत्त ड्राइविंग के लिए ज़रूरी common-sense reasoning और सूक्ष्म scene understanding विकसित करने में एक बड़ी बाधा रहा है। Visual Language Models (VLM) का आगमन पूर्ण स्वायत्त ड्राइविंग को साकार करने की दिशा में एक नई सीमा खोलता है। यह रिपोर्ट नवीनतम state-of-the-art VLM और स्वायत्त ड्राइविंग scenarios में उसके उपयोग का गहन मूल्यांकन प्रस्तुत करती है। इसमें ड्राइविंग scenes को समझने और उन पर तर्क करने, निर्णय लेने, और अंततः ड्राइवर की तरह व्यवहार करने की मॉडल की क्षमता का विश्लेषण किया गया है। बुनियादी scene recognition से लेकर जटिल causal reasoning और विभिन्न परिस्थितियों में real-time decision-making तक व्यापक परीक्षण किए गए। परीक्षण परिणामों से पता चला कि 'model name' ने मौजूदा स्वायत्त ड्राइविंग systems की तुलना में scene understanding और causal reasoning में बेहतर प्रदर्शन किया। यह वास्तविक ड्राइविंग संदर्भों में out-of-distribution scenarios को संभालने, इरादों को पहचानने, और सूचित निर्णय लेने की क्षमता दिखाता है। हालांकि, direction discernment, traffic light recognition, vision grounding, और spatial reasoning tasks जैसी चुनौतियाँ अभी भी बनी हुई हैं। ये सीमाएँ आगे के research और development की आवश्यकता को रेखांकित करती हैं। यह प्रोजेक्ट अभी GitHub पर उपलब्ध है, जहाँ इच्छुक लोग इसे एक्सेस और उपयोग कर सकते हैं: URL{https://github.com/PJLab-ADG/GPT4V-AD-Exploration}

The pursuit of autonomous driving technology hinges on the sophisticated integration of perception, decision-making, and control systems. Traditional approaches, both data-driven and rule-based, have been hindered by their inability to grasp the nuance of complex driving environments and the intentions of other road users. This has been a significant bottleneck, particularly in the development of common sense reasoning and nuanced scene understanding necessary for safe and reliable autonomous driving. The advent of Visual Language Models (VLM) represents a novel frontier in realizing fully autonomous vehicle driving. This report provides an exhaustive evaluation of the latest state-of-the-art VLM, \modelnamefull, and its application in autonomous driving scenarios. We explore the model's abilities to understand and reason about driving scenes, make decisions, and ultimately act in the capacity of a driver. Our comprehensive tests span from basic scene recognition to complex causal reasoning and real-time decision-making under varying conditions. Our findings reveal that \modelname demonstrates superior performance in scene understanding and causal reasoning compared to existing autonomous systems. It showcases the potential to handle out-of-distribution scenarios, recognize intentions, and make informed decisions in real driving contexts. However, challenges remain, particularly in direction discernment, traffic light recognition, vision grounding, and spatial reasoning tasks. These limitations underscore the need for further research and development. Project is now available on GitHub for interested parties to access and utilize: \url{https://github.com/PJLab-ADG/GPT4V-AD-Exploration}

पेपर लिंक

https://arxiv.org/abs/2311.05332

GPT4All: ओपन सोर्स compressed language model ecosystem / GPT4All: An Ecosystem of Open Source Compressed Language Models

पेपर परिचय

LLM access के लोकतंत्रीकरण का लक्ष्य रखने वाले open source repository के साथ GPT4All model family के technical details का संक्षिप्त परिचय दिया गया है।

Outlines technical details of the gpt4all model family along with the open-source repository that aims to democratize access to llms.

पेपर सारांश

हाल के वर्षों में large language models (LLM) ने विभिन्न professional और academic benchmarks पर human-level performance हासिल की है। इन मॉडलों की accessibility, इनके performance की तुलना में पीछे रह गई है। नवीनतम LLMs के लिए महँगा infrastructure चाहिए, वे केवल rate-limited, geo-locked, और censored web interfaces के माध्यम से ही उपलब्ध हैं, और उनके लिए सार्वजनिक रूप से उपलब्ध code तथा technical reports की कमी है। इस पेपर में LLMs तक पहुँच के लोकतंत्रीकरण का लक्ष्य रखने वाले लोकप्रिय open source repository GPT4All की कहानी प्रस्तुत की गई है। साथ ही, मूल GPT4All model family के technical details और GPT4All project के एक single model से एक पूर्ण open source ecosystem तक विकसित होने की रूपरेखा भी संक्षेप में दी गई है। आशा है कि यह पेपर मूल GPT4All models का technical overview होने के साथ-साथ GPT4All open source ecosystem की आगे की वृद्धि पर एक case study के रूप में भी काम करेगा।

Large language models (LLMs) have recently achieved human-level performance on a range of professional and academic benchmarks. The accessibility of these models has lagged behind their performance. State-of-the-art LLMs require costly infrastructure; are only accessible via rate-limited, geo-locked, and censored web interfaces; and lack publicly available code and technical reports. In this paper, we tell the story of GPT4All, a popular open source repository that aims to democratize access to LLMs. We outline the technical details of the original GPT4All model family, as well as the evolution of the GPT4All project from a single model into a fully fledged open source ecosystem. It is our hope that this paper acts as both a technical overview of the original GPT4All models as well as a case study on the subsequent growth of the GPT4All open source ecosystem.

पेपर लिंक

https://arxiv.org/abs/2311.04931

S-LoRA: हज़ारों concurrent LoRA adapters को serve करना / S-LoRA: Serving Thousands of Concurrent LoRA Adapters

शोधपत्र परिचय

यह एक ऐसा दृष्टिकोण है जो कई LoRA adapters की scalable serving को सक्षम बनाता है; यह सभी adapters को main memory में स्टोर करता है और वर्तमान में चल रही queries के adapters को GPU memory में लाता है, तथा LoRA computation की heterogeneous batching के लिए नई tensor parallelism strategy और अत्यधिक optimized custom CUDA kernels का उपयोग करता है; अन्य समाधानों की तुलना में throughput को 4 गुना तक बढ़ाता है और serve किए जाने वाले adapters की संख्या को कई orders of magnitude तक बढ़ाता है।

An approach that enables the scalable serving of many lora adapters; it stores all adapters in main memory and fetches adapters of currently running queries to the gpu memory; employs novel tensor parallelism strategy and highly optimized custom cuda kernels for heterogenous batching of lora computation; improves throughput by 4x, when compared to other solutions, and increases the number of served adapters by several orders of magnitude.

शोधपत्र सारांश

बड़े language models की deployment में आमतौर पर "pretrain-then-finetune" paradigm अपनाया जाता है। Parameter-efficient fine-tuning विधि Low-Rank Adaptation (LoRA) का उपयोग अक्सर एक base model को अनेक tasks के लिए अनुकूलित करने में किया जाता है, जिसके परिणामस्वरूप एक base model से निकले LoRA adapters का बड़ा संग्रह बनता है। हम देखते हैं कि यह paradigm serving के दौरान batched inference के लिए महत्वपूर्ण अवसर प्रस्तुत करता है। इन अवसरों का लाभ उठाने के लिए, हम S-LoRA प्रस्तुत करते हैं, जो कई LoRA adapters की scalable serving के लिए डिज़ाइन किया गया एक system है। S-LoRA सभी adapters को main memory में स्टोर करता है और वर्तमान में चल रही queries द्वारा उपयोग किए जा रहे adapters को GPU memory में लाता है। GPU memory का कुशल उपयोग करने और fragmentation को कम करने के लिए, S-LoRA Unified Paging प्रस्तावित करता है। Unified Paging एक unified memory pool का उपयोग करता है ताकि अलग-अलग ranks वाले dynamic adapter weights और अलग-अलग sequence lengths वाले KV cache tensors को प्रबंधित किया जा सके। इसके अतिरिक्त, S-LoRA LoRA computation की heterogeneous batching के लिए नई tensor parallelism strategy और अत्यधिक optimized custom CUDA kernels का उपयोग करता है। सामूहिक रूप से, ये विशेषताएँ S-LoRA को कम overhead के साथ एक single GPU या कई GPUs पर हजारों LoRA adapters serve करने में सक्षम बनाती हैं। HuggingFace PEFT और vLLM जैसी state-of-the-art libraries (जो LoRA serving के लिए naive support देती हैं) की तुलना में, S-LoRA throughput को 4 गुना तक बेहतर कर सकता है और serve किए जाने वाले adapters की संख्या को कई orders of magnitude तक बढ़ा सकता है। परिणामस्वरूप, S-LoRA कई task-specific fine-tuned models की scalable serving को सक्षम बनाता है और large-scale customized fine-tuning services की संभावनाएँ प्रदान करता है। कोड https://github.com/S-LoRA/S-LoRA पर उपलब्ध है

The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in a substantial collection of LoRA adapters derived from one base model. We observe that this paradigm presents significant opportunities for batched inference during serving. To capitalize on these opportunities, we present S-LoRA, a system designed for the scalable serving of many LoRA adapters. S-LoRA stores all adapters in the main memory and fetches the adapters used by the currently running queries to the GPU memory. To efficiently use the GPU memory and reduce fragmentation, S-LoRA proposes Unified Paging. Unified Paging uses a unified memory pool to manage dynamic adapter weights with different ranks and KV cache tensors with varying sequence lengths. Additionally, S-LoRA employs a novel tensor parallelism strategy and highly optimized custom CUDA kernels for heterogeneous batching of LoRA computation. Collectively, these features enable S-LoRA to serve thousands of LoRA adapters on a single GPU or across multiple GPUs with a small overhead. Compared to state-of-the-art libraries such as HuggingFace PEFT and vLLM (with naive support of LoRA serving), S-LoRA can improve the throughput by up to 4 times and increase the number of served adapters by several orders of magnitude. As a result, S-LoRA enables scalable serving of many task-specific fine-tuned models and offers the potential for large-scale customized fine-tuning services. The code is available at https://github.com/S-LoRA/S-LoRA

शोधपत्र लिंक

https://arxiv.org/abs/2311.03285v2

FreshLLM: सर्च इंजन augmentation के माध्यम से बड़े language models को refresh करना / FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

शोधपत्र परिचय

LLM द्वारा जनरेट किए गए टेक्स्ट की factuality को टेस्ट करने के लिए एक dynamic QA benchmark (FreshQA) प्रस्तावित करता है; FreshPrompt नाम की एक सरल few-shot prompting विधि प्रस्तावित करता है, जो search engine से प्राप्त प्रासंगिक और नवीनतम जानकारी को prompt में शामिल करके FreshQA पर LLM के प्रदर्शन को काफी बढ़ाती है; और यह पाता है कि LLM को संक्षिप्त और सीधे उत्तर उत्पन्न करने का निर्देश देना, अधिक verbose उत्तरों को प्रोत्साहित करने की तुलना में hallucination को कम करने में मदद करता है।

Proposes a dynamic qa benchmark (freshqa) to test the factuality of llm-generated text; proposes freshprompt, a simple few-shot prompting method that substantially boosts the performance of an llm on freshqa by incorporating relevant and up-to-date information retrieved from a search engine into the prompt; finds that instructing the llm to generate concise and direct answers helps reduce hallucination compared to encouraging more verbose answers.

शोधपत्र सारांश

अधिकांश बड़े भाषा मॉडल (LLM) को केवल एक बार train किया जाता है और फिर update नहीं किया जाता, इसलिए उनमें लगातार बदलती दुनिया के अनुसार dynamic रूप से adapt करने की क्षमता की कमी होती है। इस शोध में हम मौजूदा विश्व-ज्ञान को परखने वाले सवालों के जवाब देने के संदर्भ में LLM द्वारा जनरेट किए गए टेक्स्ट की factuality का विस्तृत अध्ययन करते हैं। खास तौर पर, हम FreshQA नाम का एक नया dynamic QA benchmark पेश करते हैं, जिसमें सवाल-जवाब के विविध प्रकार शामिल हैं, जैसे तेज़ी से बदलते विश्व-ज्ञान की आवश्यकता वाले प्रश्न और गलत premise वाले प्रश्न जिन्हें खारिज करना ज़रूरी है। हम closed तथा open-source, दोनों तरह के विभिन्न LLM का benchmarking दो-mode evaluation प्रक्रिया के तहत करते हैं, जो correctness और hallucination दोनों को मापने की अनुमति देती है। 50K से अधिक judgments वाली human evaluation के माध्यम से हमने इन मॉडलों की सीमाओं को उजागर किया और दिखाया कि सुधार की काफी गुंजाइश है। उदाहरण के लिए, हमने पाया कि model size की परवाह किए बिना सभी मॉडल तेज़ी से बदलते ज्ञान और गलत premise वाले प्रश्नों में संघर्ष करते हैं। इन निष्कर्षों के आधार पर, हम FreshPrompt पेश करते हैं, जो एक सरल few-shot prompting method है और search engine से प्राप्त प्रासंगिक व नवीनतम जानकारी को prompt में शामिल करके FreshQA पर LLM के performance को काफ़ी बढ़ाता है। प्रयोगों से पता चला कि FreshPrompt, Self-Ask (Press et al., 2022) जैसी प्रतिस्पर्धी search engine-augmented prompting methods और Perplexity.AI जैसे commercial systems, दोनों से बेहतर प्रदर्शन करता है। FreshPrompt के अतिरिक्त विश्लेषण से यह भी सामने आया कि retrieved evidence की संख्या और उनका क्रम, दोनों, LLM द्वारा जनरेट किए गए उत्तरों की correctness को प्रभावित करने में महत्वपूर्ण भूमिका निभाते हैं। साथ ही, LLM को संक्षिप्त और सीधे उत्तर जनरेट करने का निर्देश देना, लंबे-चौड़े उत्तरों को प्रोत्साहित करने की तुलना में hallucination कम करने में मददगार पाया गया। भविष्य के काम को आसान बनाने के लिए, हम FreshQA को github.com/freshllms/freshqa पर जारी कर रहे हैं और इसे नियमित अंतराल पर update करने का वादा करते हैं।

अधिकांश बड़े भाषा मॉडल (LLMs) को एक बार train किया जाता है और फिर कभी update नहीं किया जाता; इसलिए उनमें हमारी लगातार बदलती दुनिया के अनुसार dynamic रूप से adapt करने की क्षमता नहीं होती। इस कार्य में हम मौजूदा विश्व-ज्ञान की जांच करने वाले प्रश्नों का उत्तर देने के संदर्भ में LLM-generated text की factuality का विस्तृत अध्ययन करते हैं। विशेष रूप से, हम FreshQA प्रस्तुत करते हैं, जो एक नया dynamic QA benchmark है और जिसमें प्रश्न व उत्तर के विविध प्रकार शामिल हैं, जैसे तेज़ी से बदलते विश्व-ज्ञान की आवश्यकता वाले प्रश्न, साथ ही गलत premises वाले प्रश्न जिन्हें खंडित किया जाना चाहिए। हम closed और open-source, दोनों प्रकार के विविध LLMs का benchmarking एक two-mode evaluation procedure के तहत करते हैं, जो हमें correctness और hallucination दोनों को मापने की अनुमति देता है। 50K से अधिक judgments वाली human evaluations के माध्यम से हम इन मॉडलों की सीमाओं पर प्रकाश डालते हैं और सुधार की महत्वपूर्ण गुंजाइश दिखाते हैं: उदाहरण के लिए, सभी मॉडल (model size की परवाह किए बिना) तेज़ी से बदलते ज्ञान और गलत premises वाले प्रश्नों पर संघर्ष करते हैं। इन परिणामों से प्रेरित होकर, हम FreshPrompt प्रस्तुत करते हैं, जो एक सरल few-shot prompting method है और search engine से प्राप्त प्रासंगिक तथा up-to-date जानकारी को prompt में शामिल करके FreshQA पर किसी LLM के performance को काफ़ी बढ़ाता है। हमारे प्रयोग दिखाते हैं कि FreshPrompt, Self-Ask (Press et al., 2022) जैसी प्रतिस्पर्धी search engine-augmented prompting methods तथा Perplexity.AI जैसे commercial systems, दोनों से बेहतर प्रदर्शन करता है। FreshPrompt के आगे के विश्लेषण से पता चलता है कि retrieved evidences की संख्या और उनका क्रम, दोनों, LLM-generated answers की correctness को प्रभावित करने में महत्वपूर्ण भूमिका निभाते हैं। इसके अलावा, LLM को संक्षिप्त और सीधे उत्तर जनरेट करने का निर्देश देना, अधिक verbose answers को प्रोत्साहित करने की तुलना में hallucination कम करने में मदद करता है। भविष्य के कार्य को सुगम बनाने के लिए, हम FreshQA को github.com/freshllms/freshqa पर जारी करते हैं और इसे नियमित अंतराल पर update करने का संकल्प लेते हैं।

[2023/11/06 ~ 11/12] इस हफ्ते के प्रमुख ML शोधपत्र (Top ML Papers of the Week)

अवलोकन

शोधपत्र परिचय

शोधपत्र सार

शोधपत्र लिंक

और पढ़ें

Transformer ब्लॉक्स को सरल बनाना / Simplifying Transformer Blocks

शोधपत्र परिचय

शोधपत्र सार

पेपर लिंक

आगे पढ़ें

pretraining data mixtures के ज़रिए Transformer models में संकीर्ण model selection capabilities सक्षम करना / Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

पेपर परिचय

पेपर सारांश

पेपर लिंक

आगे पढ़ें

सरल और नियंत्रित music generation / Simple and Controllable Music Generation

शोधपत्र परिचय

शोधपत्र सार

शोधपत्र लिंक

आगे पढ़ें

efficient transformer models के लिए alternating updates / Alternating Updates for Efficient Transformers

शोधपत्र परिचय

शोधपत्र सार

पेपर लिंक

और पढ़ें

फिर से कहें और जवाब दें: बड़े भाषा मॉडल्स को खुद के लिए बेहतर सवाल पूछने दें / Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves

पेपर परिचय

पेपर सारांश

शोधपत्र लिंक

और पढ़ें

GPT-4V(ision) के साथ सड़क पर: autonomous driving पर visual-language model की शुरुआती पड़ताल / On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

शोधपत्र परिचय

शोधपत्र सार

पेपर लिंक

और पढ़ें

GPT4All: ओपन सोर्स compressed language model ecosystem / GPT4All: An Ecosystem of Open Source Compressed Language Models

पेपर परिचय

पेपर सारांश

पेपर लिंक

और पढ़ें

S-LoRA: हज़ारों concurrent LoRA adapters को serve करना / S-LoRA: Serving Thousands of Concurrent LoRA Adapters

शोधपत्र परिचय

शोधपत्र सारांश

शोधपत्र लिंक

और पढ़ें

FreshLLM: सर्च इंजन augmentation के माध्यम से बड़े language models को refresh करना / FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation

शोधपत्र परिचय

शोधपत्र सारांश

पेपर लिंक

आगे पढ़ें

मूल लेख

संबंधित पढ़ाई

अभी कोई टिप्पणी नहीं है.