17] इस हफ़्ते के प्रमुख ML पेपर (Top ML Papers of the Week)

(discuss.pytorch.kr)

4 पॉइंट द्वारा ninebow 2024-03-19 | 6 टिप्पणियां | WhatsApp पर शेयर करें

[2024/03/11 ~ 03/17] इस हफ़्ते के प्रमुख ML पेपर (Top ML Papers of the Week)

DAIR.AI द्वारा हर हफ़्ते प्रकाशित ML पेपरों पर आधारित इस लेख का स्वचालित अनुवाद किया गया है।
इस हफ़्ते बड़े भाषा मॉडल (Large Language Models, LLMs) पर आधारित पेपर एक प्रमुख ट्रेंड के रूप में उभरे। कई पेपरों में LLMs पर फोकस करते हुए विभिन्न समस्याओं को हल करने या उन्हें समझने की कोशिश दिखाई देती है। उदाहरण के लिए, "SIMA", "Retrieval Augmented Thoughts", "LMs Can Teach Themselves to Think Before Speaking", "Knowledge Conflicts for LLMs" और "LLMs Predict Neuroscience Results" जैसे पेपर बड़े भाषा मॉडल का उपयोग करते हैं या उनके प्रदर्शन से जुड़ी समस्याओं को संबोधित करते हैं। वहीं, "Stealing Part of a Production Language Model" जैसा पेपर दिखाता है कि भाषा मॉडल पर सुरक्षा के दृष्टिकोण से भी शोध किया जा रहा है।
यह रुझान हाल के वर्षों में AI रिसर्च कम्युनिटी में बड़े भाषा मॉडलों द्वारा लाए गए बदलाव और उनके प्रभाव को दर्शाता है। बड़े भाषा मॉडल केवल Natural Language Processing (NLP) तक सीमित नहीं हैं, बल्कि विभिन्न डोमेन में प्रभावी foundation model के रूप में अपनी जगह बना चुके हैं। LLMs भाषा समझ और जनरेशन से जुड़े कई कार्यों में उच्च प्रदर्शन दिखा रहे हैं, और आगे बढ़कर application research में भी व्यापक रूप से खोजे जा रहे हैं। इसके अलावा, "Multimodal LLM Pre-training" जैसा पेपर यह दिखाता है कि LLMs को image, speech और अन्य प्रकार के data के साथ जोड़कर multimodal learning क्षमता को मजबूत करने पर भी नवीनतम शोध हो रहा है।
इस विश्लेषण के आधार पर अनुमान लगाया जा सकता है कि आगे भी LLMs पर शोध natural language understanding को और बेहतर करेगा, नए application क्षेत्रों तक फैलेगा, और AI तकनीक के विकास में महत्वपूर्ण भूमिका निभाएगा। केवल LLMs के प्रदर्शन सुधार ही नहीं, बल्कि application research, security, और ethical issues तक फैले व्यापक विषयों की भी खोज जारी रहने की संभावना है।

SIMA / SIMA

पेपर परिचय

व्यापक 3D virtual environments और video games में natural language निर्देशों का पालन करने वाला 3D virtual environments के लिए एक generalist AI agent, जो navigation, object interaction, और menu use सहित 600 बुनियादी skills पर मूल्यांकन किया गया है। भाषा का प्रदर्शन पर बड़ा प्रभाव दिखाई देता है.

A generalist ai agent for 3d virtual environments that follows natural-language instructions in a broad range of 3d virtual environments and video games; sima is evaluated across 600 basic skills, spanning navigation, object interaction, and menu use. language seems to be a huge factor in performance.

पेपर सारांश (Abstract)

ऐसे embodied AI systems बनाना जो किसी भी 3D environment में मनमाने language instructions का पालन कर सकें, general AI निर्माण की एक प्रमुख चुनौती है। इस लक्ष्य को हासिल करने के लिए language को perception और embodied actions से जोड़कर सीखना ज़रूरी है, ताकि जटिल कार्य पूरे किए जा सकें। Scalable, Instructable, Multiworld Agent (SIMA) प्रोजेक्ट इस समस्या का समाधान करने के लिए agents को विविध virtual 3D environments में free-form instructions का पालन करने के लिए प्रशिक्षित करता है। इसमें curated research environments के साथ-साथ open-ended commercial video games भी शामिल हैं। हमारा लक्ष्य ऐसा instructable agent विकसित करना है जो किसी भी simulated 3D environment में वह सब कर सके जो एक इंसान कर सकता है। हमारा approach language-driven generality पर केंद्रित है और न्यूनतम assumptions रखता है। हमारे agents real-time में environments के साथ एक generic, human-like interface का उपयोग करके interact करते हैं: inputs के रूप में image observations और language instructions होते हैं, और outputs के रूप में keyboard-and-mouse actions। यह general approach चुनौतीपूर्ण है, लेकिन इससे agents कई visually complex और semantically rich environments में language को ground कर सकते हैं, और साथ ही नए environments में agents को आसानी से चलाया जा सकता है। इस पेपर में हम अपनी प्रेरणा और लक्ष्य, अब तक की प्रारंभिक प्रगति, और कई विविध research environments तथा विभिन्न commercial video games पर मिले आशाजनक शुरुआती परिणामों का वर्णन करते हैं।

Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructions across a diverse range of virtual 3D environments, including curated research environments as well as openended, commercial video games. Our goal is to develop an instructable agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface: the inputs are image observations and language instructions and the outputs are keyboard-and-mouse actions. This general approach is challenging, but it allows agents to ground language across many visually complex and semantically rich environments while also allowing us to readily run agents in new environments. In this paper we describe our motivation and goal, the initial progress we have made, and promising preliminary results on several diverse research environments and a variety of commercial video games.

पेपर लिंक

https://storage.googleapis.com/deepmind-media/DeepMind.com/…

आगे पढ़ें

https://discuss.pytorch.kr/t/gn-google-sima-3d-ai/3764

https://x.com/GoogleDeepMind/status/1767918515585994818

RAT: retrieval-augmented thinking के ज़रिए context-aware reasoning को long-horizon generation में उभारना / RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

पेपर परिचय

यह दिखाता है कि information retrieval की मदद से chain of thought को बार-बार संशोधित करने पर long-horizon generation tasks में LLM की reasoning और generation क्षमता में काफ़ी सुधार किया जा सकता है। मुख्य विचार यह है कि हर thought step को task query, वर्तमान और पिछले thought steps से संबंधित retrieved information के आधार पर संशोधित किया जाता है। retrieval-augmented thoughts (RAT) को GPT-4 और CodeLLaMA-7b जैसे अलग-अलग models पर लागू कर long-horizon generation tasks (जैसे creative writing और embodied task planning) में सुधार किया जा सकता है; RAT एक zero-shot prompting approach है और zero-shot CoT prompting, vanilla RAG तथा अन्य baselines की तुलना में उल्लेखनीय सुधार देता है.

Shows that iteratively revising a chain of thoughts with information retrieval can significantly improve llm reasoning and generation in long-horizon generation tasks; the key idea is that each thought step is revised with relevant retrieved information to the task query, the current and past thought steps; retrieval augmented thoughts (rat) can be applied to different models like gpt-4 and codellama-7b to improve long-horizon generation tasks (e.g., creative writing and embodied task planning); rat is a zero-shot prompting approach and provides significant improvements to baselines that include zero-shot cot prompting, vanilla rag, and other baselines.

पेपर सारांश (Abstract)

हम यह जांचते हैं कि information retrieval की मदद से chain of thought को iterative तरीके से संशोधित करना long-horizon generation tasks में बड़े language models की reasoning और generation क्षमता को कैसे काफ़ी बेहतर बनाता है, साथ ही hallucination को भी बड़े पैमाने पर कम करता है। विशेष रूप से, प्रस्तावित विधि retrieval-augmented thoughts (RAT) शुरुआती zero-shot CoT बनने के बाद हर thought step को एक-एक करके संशोधित करती है, जिसमें task query, वर्तमान thought step और पिछले thought steps से संबंधित retrieved information का उपयोग होता है। GPT-3.5, GPT-4 और CodeLLaMA-7b पर RAT लागू करने से विभिन्न long-horizon generation tasks में प्रदर्शन में उल्लेखनीय सुधार हुआ; औसतन rating scores में code generation पर 13.63%, mathematical reasoning पर 16.96%, creative writing पर 19.2% और embodied task planning पर 42.78% की सापेक्ष वृद्धि दर्ज की गई। डेमो पेज https://craftjarvis.github.io/RAT पर देखा जा सकता है

We explore how iterative revising a chain of thoughts with the help of information retrieval significantly improves large language models' reasoning and generation ability in long-horizon generation tasks, while hugely mitigating hallucination. In particular, the proposed method -- retrieval-augmented thoughts (RAT) -- revises each thought step one by one with retrieved information relevant to the task query, the current and the past thought steps, after the initial zero-shot CoT is generated. Applying RAT to GPT-3.5, GPT-4, and CodeLLaMA-7b substantially improves their performances on various long-horizon generation tasks; on average of relatively increasing rating scores by 13.63% on code generation, 16.96% on mathematical reasoning, 19.2% on creative writing, and 42.78% on embodied task planning. The demo page can be found at https://craftjarvis.github.io/RAT

पेपर लिंक

https://arxiv.org/abs/2403.05313

Quiet-STaR: language models बोलने से पहले खुद सोचना सीख सकते हैं / Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking

पेपर परिचय

यह STAR का एक generalization, quiet-star, प्रस्तुत करता है ताकि language models (LMs) अधिक सामान्य और scalable तरीके से reasoning सीख सकें; quiet-star LMs को हर token पर भविष्य के text को समझाने के लिए rationales generate करने में सक्षम बनाता है; यह एक token-wise parallel sampling algorithm प्रस्तावित करता है, जो internal thoughts को कुशलतापूर्वक generate करके LM की predictions को बेहतर बनाने में मदद करता है; rationale generation को REINFORCE का उपयोग करके सुधारा जाता है।

Presents a generalization of star, called quiet-star, to enable language models (lms) to learn to reason in more general and scalable ways; quiet-star enables lms to generate rationales at each token to explain future text; it proposes a token-wise parallel sampling algorithm that helps improve lm predictions by efficiently generating internal thoughts; the rationale generation is improved using reinforce.

पेपर सारांश (Abstract)

लिखते या बोलते समय लोग कभी-कभी रुककर सोचते हैं। यद्यपि reasoning-केंद्रित कार्यों में अक्सर reasoning को सवालों के जवाब देने या agentic tasks पूरा करने की विधि के रूप में देखा गया है, reasoning लगभग हर लिखित पाठ में निहित होती है। उदाहरण के लिए, यह किसी proof की पंक्तियों के बीच छोड़े गए चरणों या किसी बातचीत के आधार में मौजूद theory of mind पर लागू होती है। Self-Taught Reasoner (STaR, Zelikman et al. 2022) में, question-answering के few-shot examples से rationales का अनुमान लगाकर और सही उत्तर तक पहुँचाने वाले उदाहरणों से सीखकर उपयोगी thinking सीखी जाती है। आदर्श रूप से, language model को मनमाने text में अनकहे rationales का अनुमान लगाना सीखना चाहिए, लेकिन यह एक बहुत सीमित setting है। हम Quiet-STaR प्रस्तुत करते हैं, जो STaR का एक generalized version है, जिसमें LMs भविष्य के text को समझाने के लिए प्रत्येक token पर rationales जनरेट करना सीखते हैं, जिससे उनकी predictions बेहतर होती हैं। हम प्रमुख चुनौतियों को संबोधित करते हैं, जिनमें 1) continuations जनरेट करने की computational cost, 2) यह तथ्य कि LM शुरुआत में internal thoughts जनरेट करना या उनका उपयोग करना नहीं जानता, और 3) individual next tokens से आगे predict करने की आवश्यकता शामिल है। इन समस्याओं को हल करने के लिए, हम tokenwise parallel sampling algorithm प्रस्तावित करते हैं, जिसमें thought की शुरुआत और अंत को दर्शाने वाले learnable tokens और extended teacher-forcing technique का उपयोग किया गया है। उत्साहजनक रूप से, जनरेट किए गए rationales, predict करना कठिन tokens को model करने में असमान रूप से अधिक मदद करते हैं और कठिन सवालों के सीधे जवाब देने की LM की क्षमता को बेहतर बनाते हैं। विशेष रूप से, internet text corpus पर Quiet-STaR के साथ LM का continued pretraining करने के बाद, हमें GSM8K (5.9% $\rightarrow$ 10.9%) और CommonsenseQA (36.3% $\rightarrow$ 47.2%) पर zero-shot improvements मिले, और natural text में कठिन tokens की perplexity में सुधार देखा गया। महत्वपूर्ण बात यह है कि इन improvements के लिए इन tasks पर किसी fine-tuning की आवश्यकता नहीं होती। Quiet-STaR ऐसे LMs की दिशा में एक कदम है जो अधिक सामान्य और scalable तरीके से reasoning करना सीख सकते हैं。

लिखते और बोलते समय लोग कभी-कभी सोचने के लिए रुक जाते हैं। हालांकि reasoning-focused कार्यों में अक्सर reasoning को सवालों के जवाब देने या agentic tasks पूरा करने की विधि के रूप में प्रस्तुत किया गया है, reasoning लगभग सभी लिखित पाठ में निहित होती है। उदाहरण के लिए, यह किसी proof की पंक्तियों के बीच स्पष्ट रूप से न बताए गए चरणों या किसी बातचीत के आधार में मौजूद theory of mind पर लागू होती है। Self-Taught Reasoner (STaR, Zelikman et al. 2022) में, question-answering के few-shot examples से rationales निकालकर और सही उत्तर तक पहुँचाने वाले उदाहरणों से सीखकर उपयोगी thinking सीखी जाती है। यह एक बहुत सीमित setting है — आदर्श रूप से, language model को इसके बजाय मनमाने text में अनकहे rationales का अनुमान लगाना सीखना चाहिए। हम Quiet-STaR प्रस्तुत करते हैं, जो STaR का एक generalization है, जिसमें LMs भविष्य के text को समझाने के लिए हर token पर rationales जनरेट करना सीखते हैं, जिससे उनकी predictions बेहतर होती हैं। हम प्रमुख चुनौतियों को संबोधित करते हैं, जिनमें 1) continuations जनरेट करने की computational cost, 2) यह तथ्य कि LM शुरुआत में internal thoughts जनरेट करना या उनका उपयोग करना नहीं जानता, और 3) individual next tokens से आगे predict करने की आवश्यकता शामिल है। इन्हें हल करने के लिए, हम tokenwise parallel sampling algorithm प्रस्तावित करते हैं, जिसमें thought की शुरुआत और अंत को दर्शाने वाले learnable tokens और extended teacher-forcing technique का उपयोग किया गया है। उत्साहजनक रूप से, जनरेट किए गए rationales, predict करना कठिन tokens को model करने में असमान रूप से अधिक मदद करते हैं और कठिन सवालों के सीधे जवाब देने की LM की क्षमता को बेहतर बनाते हैं। विशेष रूप से, internet text के corpus पर Quiet-STaR के साथ LM का continued pretraining करने के बाद, हमें GSM8K (5.9%$\rightarrow$10.9%) और CommonsenseQA (36.3%$\rightarrow$47.2%) पर zero-shot improvements मिले, और natural text में कठिन tokens की perplexity में सुधार देखा गया। महत्वपूर्ण रूप से, इन improvements के लिए इन tasks पर किसी fine-tuning की आवश्यकता नहीं है। Quiet-STaR ऐसे LMs की दिशा में एक कदम है जो अधिक सामान्य और scalable तरीके से reasoning करना सीख सकते हैं।

पेपर लिंक

https://arxiv.org/abs/2403.09629

आगे पढ़ें

https://x.com/omarsar0/status/1768681638009975088

LLMs के लिए ज्ञान संघर्ष: एक सर्वेक्षण / Knowledge Conflicts for LLMs: A Survey

पेपर परिचय

यह survey paper, LLMs के साथ काम करते समय अक्सर सामने आने वाली knowledge conflict की समस्या को context-memory, inter-context, और intra-memory conflict में वर्गीकृत करता है, और इन knowledge conflict समस्याओं को कम करने के कारणों तथा संभावित तरीकों पर insights प्रदान करता है.

LLMs के साथ काम करते समय knowledge conflict की आम समस्या का एक overview; यह survey paper इन conflicts को context-memory, inter-context, और intra-memory conflict में वर्गीकृत करता है; साथ ही यह इन knowledge conflict समस्याओं को कम करने के कारणों और संभावित तरीकों पर insights भी प्रदान करता है।

पेपर सार (Abstract)

यह सर्वे बड़े language models (LLM) में knowledge conflicts का गहन विश्लेषण प्रस्तुत करता है और contextual knowledge तथा parametric knowledge को मिलाते समय आने वाली जटिल चुनौतियों को उजागर करता है। यहाँ तीन प्रकार के broad-attention knowledge conflicts पर फोकस किया गया है: context-memory, inter-context, और intra-memory conflict। ये conflicts खासकर उन वास्तविक applications में, जहाँ noise और misinformation आम हैं, LLM की reliability और performance पर बड़ा असर डाल सकते हैं। इन conflicts को वर्गीकृत करके, उनके कारणों की पड़ताल करके, ऐसे conflicts की स्थिति में LLM के व्यवहार का परीक्षण करके, और उपलब्ध solutions की समीक्षा करके, यह सर्वे LLM की robustness सुधारने की रणनीतियों पर प्रकाश डालने का लक्ष्य रखता है, ताकि इस विकसित होते क्षेत्र में शोध को आगे बढ़ाने के लिए यह एक मूल्यवान संसाधन बन सके।

This survey provides an in-depth analysis of knowledge conflicts for large language models (LLMs), highlighting the complex challenges they encounter when blending contextual and parametric knowledge. Our focus is on three categories of knowledge conflicts: context-memory, inter-context, and intra-memory conflict. These conflicts can significantly impact the trustworthiness and performance of LLMs, especially in real-world applications where noise and misinformation are common. By categorizing these conflicts, exploring the causes, examining the behaviors of LLMs under such conflicts, and reviewing available solutions, this survey aims to shed light on strategies for improving the robustness of LLMs, thereby serving as a valuable resource for advancing research in this evolving area.

पेपर लिंक

https://arxiv.org/abs/2403.08319

प्रोडक्शन language model का एक हिस्सा चुराना / Stealing Part of a Production Language Model

पेपर परिचय

ChatGPT या PaLM-2 जैसे प्रोडक्शन language models से जानकारी निकालने वाले पहले model-stealing attack को पेश किया गया है, और यह दिखाया गया है कि सामान्य API access के ज़रिए transformer-आधारित model की embedding projection layer को recover करना संभव है। उदाहरण के तौर पर, openai ada और babbage models से पूरी projection matrix को 20 डॉलर से कम लागत में extract किया गया।

Presents the first model-stealing attack that extracts information from production language models like chatgpt or palm-2; shows that it's possible to recover the embedding projection layer of a transformer-based model through typical api access; as an example, the entire projection matrix was extracted from the openai ada and babbage models for under $20.

पेपर सार(Abstract)

OpenAI के ChatGPT या Google के PaLM-2 जैसे black-box प्रोडक्शन language models से सटीक और महत्वपूर्ण जानकारी निकालने वाले पहले model-stealing attack को प्रस्तुत किया गया है। खास तौर पर, यह attack सामान्य API access के आधार पर transformer model की embedding projection layer को (symmetries तक) recover करता है। 20 डॉलर से कम लागत में OpenAI के Ada और Babbage language models की पूरी projection matrix निकाली जा सकती है। इससे पहली बार यह पुष्टि हुई कि इन black-box models की hidden dimension क्रमशः 1024 और 2048 है। इसके अलावा, gpt-3.5-turbo model की सटीक hidden dimension size भी recover की गई, और अनुमान लगाया गया कि पूरी projection matrix recover करने के लिए queries पर 2,000 डॉलर से कम खर्च आएगा। अंत में, संभावित defenses और mitigations प्रस्तुत किए गए हैं, और इस attack को आगे बढ़ा सकने वाले संभावित भविष्य के काम के निहितार्थों पर चर्चा की गई है।

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under $20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Babbage language models. We thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix. We conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend our attack.

पेपर लिंक

https://arxiv.org/abs/2403.06634

Branch-Train-MiX: expert LLMs को Mixture-of-Experts LLM में मिलाना / Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

पेपर परिचय

LLM training के लिए एक अधिक compute-efficient approach के रूप में expert LLMs को mixture-of-experts training में मिलाने का प्रस्ताव किया गया है। यह approach, बड़े generalist LLM या कई अलग-अलग specialized LLMs को train करने की तुलना में अधिक efficient साबित हुई है। इस approach में पहले अलग-अलग domains में specialized एक seed LLM की कई copies को parallel में train किया जाता है, फिर moe feed-forward layers का उपयोग करके उन्हें एक single LLM में merge किया जाता है, और इसके बाद पूरे unified model का fine-tuning किया जाता है।

Proposes mixing expert llms into a mixture-of-experts llm as a more compute-efficient approach for training llms; it's shown to be more efficient than training a larger generalist llm or several separate specialized llms; the approach, btx, first trains (in parallel) multiple copies of a seed llm specialized in different domains (i.e., expert llms) and merges them into a single llm using moe feed-forward layers, followed by fine-tuning of the overall unified model.

पेपर सार(Abstract)

हम coding, math reasoning, world knowledge जैसे कई specialized domains में क्षमताएँ रखने वाले Large Language Models (LLMs) को train करने के efficient तरीकों का अध्ययन करते हैं। BTX (Branch-Train-MiX) नाम की यह विधि एक branched seed model से शुरू होती है, जिससे high throughput और कम communication cost के साथ experts को train किया जाता है। अलग-अलग experts के asynchronous training के बाद, BTX उनके feedforward parameters को Mixture-of-Expert (MoE) layers में experts के रूप में एकत्र करता है और बाकी parameters का average लेता है, फिर token-level routing सीखने के लिए MoE finetuning चरण से गुजरता है। BTX दो special cases को generalize करता है: Branch-Train-Merge method, जिसमें routing सीखने के लिए MoE finetuning चरण नहीं होता, और sparse upcycling, जिसमें experts को asynchronously train करने वाला चरण हटा दिया जाता है। अन्य approaches की तुलना में BTX accuracy और efficiency के बीच सबसे अच्छा tradeoff हासिल करता है।

We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge. Our method, named Branch-Train-MiX (BTX), starts from a seed model, which is branched to train experts in embarrassingly parallel fashion with high throughput and reduced communication cost. After individual experts are asynchronously trained, BTX brings together their feedforward parameters as experts in Mixture-of-Expert (MoE) layers and averages the remaining parameters, followed by an MoE-finetuning stage to learn token-level routing. BTX generalizes two special cases, the Branch-Train-Merge method, which does not have the MoE finetuning stage to learn routing, and sparse upcycling, which omits the stage of training experts asynchronously. Compared to alternative approaches, BTX achieves the best accuracy-efficiency tradeoff.

पेपर लिंक

https://arxiv.org/abs/2403.07816

आगे पढ़ें

https://x.com/jaseweston/status/1767727740952682667

Large language models neuroscience परिणामों की भविष्यवाणी में मानव विशेषज्ञों से आगे निकलते हैं / Large language models surpass human experts in predicting neuroscience results

पेपर परिचय

neuroscience परिणामों की भविष्यवाणी में llms की क्षमता का मूल्यांकन करने के लिए BrainBench नाम का benchmark प्रस्तावित किया गया है; इसमें पाया गया कि experimental outcomes की भविष्यवाणी में llms विशेषज्ञों से बेहतर हैं; neuroscience literature पर tuned एक llm ने इससे भी बेहतर प्रदर्शन किया।

Proposes a benchmark, brainbench, for evaluating the ability of llms to predict neuroscience results; finds that llms surpass experts in predicting experimental outcomes; an llm tuned on neuroscience literature was shown to perform even better.

पेपर सारांश (Abstract)

वैज्ञानिक खोजें अक्सर दशकों के शोध को synthesize करने पर निर्भर करती हैं, और यह काम संभावित रूप से मानव की information processing capacity से आगे जा सकता है। Large language models (LLMs) एक समाधान पेश करते हैं। विशाल वैज्ञानिक literature पर train किए गए LLMs, noisy लेकिन परस्पर जुड़े निष्कर्षों को integrate करके, नए परिणामों की भविष्यवाणी मानव विशेषज्ञों से बेहतर कर सकते हैं। इस संभावना का मूल्यांकन करने के लिए हमने neuroscience results की prediction के लिए forward-looking benchmark BrainBench बनाया। हमने पाया कि experimental outcomes की भविष्यवाणी में LLMs विशेषज्ञों से आगे हैं। BrainGPT, जिसे हमने neuroscience literature पर tune किया, ने इससे भी बेहतर प्रदर्शन किया। मानव विशेषज्ञों की तरह, जब LLMs अपनी predictions को लेकर confident थे, तब उनके सही होने की संभावना भी अधिक थी, जो ऐसे भविष्य का संकेत देता है जहाँ इंसान और LLMs मिलकर खोजें करेंगे। यह approach सिर्फ neuroscience तक सीमित नहीं है और अन्य knowledge-intensive क्षेत्रों में भी लागू किया जा सकता है।

Scientific discoveries often hinge on synthesizing decades of research, a task that potentially outstrips human information processing capacities. Large language models (LLMs) offer a solution. LLMs trained on the vast scientific literature could potentially integrate noisy yet interrelated findings to forecast novel results better than human experts. To evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results. We find that LLMs surpass experts in predicting experimental outcomes. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet. Like human experts, when LLMs were confident in their predictions, they were more likely to be correct, which presages a future where humans and LLMs team together to make discoveries. Our approach is not neuroscience-specific and is transferable to other knowledge-intensive endeavors.

पेपर लिंक

https://arxiv.org/abs/2403.03230

आगे पढ़ें

https://x.com/ProfData/status/1765689739682754824

C4AI Command-R

पेपर परिचय

reasoning, summarization, question answering जैसे use cases के लिए optimized, 128k context length वाला 35b parameter model, 10 भाषाओं में evaluated multilingual generation capability, और high-performance tool use तथा RAG capabilities के साथ command-r को research purposes के लिए जारी किया गया है।

A 35b parameter model, with a context length of 128k, optimized for use cases that include reasoning, summarization, and question answering; command-r has the capability for multilingual generation evaluated in 10 languages and performant tool use and rag capabilities; it has been released for research purposes.

पेपर लिंक

https://huggingface.co/CohereForAI/c4ai-command-r-v01

आगे पढ़ें

https://x.com/CohereForAI/status/1767275927505977455

क्या embeddings की cosine similarity सच में similarity के बारे में है? / Is Cosine-Similarity of Embeddings Really About Similarity?

पेपर परिचय

नियमितीकृत linear models से निकले embeddings का अध्ययन करते हुए यह विश्लेषणात्मक रूप से दिखाया गया है कि cosine similarity कैसे मनमानी और अर्थहीन समानताएँ पैदा कर सकती है। साथ ही, यह भी पाया गया कि कुछ linear models में similarity अद्वितीय भी नहीं होती, जबकि कुछ में यह regularization द्वारा नियंत्रित होती है। लेखक cosine similarity का अंधाधुंध उपयोग करने से सावधान रहने और कुछ विचारणीय बिंदु व विकल्प प्रस्तुत करते हैं।

Studies embeddings derived from regularized linear models and derive analytically how cosine-similarity can yield arbitrary and meaningless similarities; also finds that for some linear models, the similarities are not even unique and others are controlled by regularization; the authors caution against blindly using cosine similarity and presents considerations and alternatives.

पेपर सारांश (Abstract)

cosine similarity दो vectors के बीच कोण का cosine होती है, या समान रूप से कहें तो उनके normalized रूपों के बीच dot product। इसका एक लोकप्रिय उपयोग high-dimensional objects के बीच semantic similarity को मापना है, जहाँ इसे सीखी हुई low-dimensional feature embedding पर लागू किया जाता है। व्यवहार में यह embedded vectors के unnormalized dot product से बेहतर भी काम कर सकती है, लेकिन कभी-कभी उससे खराब भी। इस अनुभवजन्य अवलोकन को समझने के लिए, हम regularized linear models से निकले embeddings का अध्ययन करते हैं, जहाँ closed-form solutions विश्लेषणात्मक अंतर्दृष्टि को संभव बनाते हैं। हम विश्लेषणात्मक रूप से दिखाते हैं कि cosine similarity कैसे मनमानी, और इसलिए अर्थहीन, `similarities.' उत्पन्न कर सकती है। कुछ linear models में ये similarities अद्वितीय भी नहीं होतीं, जबकि अन्य में वे regularization द्वारा अप्रत्यक्ष रूप से नियंत्रित होती हैं। हम linear models से आगे के प्रभावों पर भी चर्चा करते हैं: deep models को सीखते समय विभिन्न regularizations के संयोजन उपयोग किए जाते हैं; परिणामस्वरूप embeddings की cosine similarities लेते समय इनके अप्रत्यक्ष और अनपेक्षित प्रभाव पड़ते हैं, जिससे परिणाम अपारदर्शी और संभवतः मनमाने हो सकते हैं। इन अंतर्दृष्टियों के आधार पर, हम cosine similarity का अंधाधुंध उपयोग करने के खिलाफ सावधान करते हैं और विकल्पों की रूपरेखा प्रस्तुत करते हैं।

Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.

पेपर लिंक

https://arxiv.org/abs/2403.05440

MM1: multimodal LLM pre-training से तरीके, विश्लेषण और अंतर्दृष्टियाँ / MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

पेपर परिचय

विभिन्न architecture components का अध्ययन करते हुए यह पाया गया कि image-caption, interleaved image-text, और text-only data को सावधानी से मिलाना state-of-the-art प्रदर्शन की कुंजी है। साथ ही, यह multimodal LLM pre-training पर तरीकों, विश्लेषण और अंतर्दृष्टियों का व्यापक अवलोकन प्रस्तुत करता है, और 30b parameters तक के multimodal models का एक परिवार प्रस्तावित करता है, जो pre-training metrics में sota हासिल करता है और बेहतर in-context learning, multi-image reasoning, तथा few-shot chain-of-thought prompting को सक्षम बनाने जैसी विशेषताएँ शामिल करता है।

Provides a comprehensive overview of methods, analysis, and insights into multimodal llm pre-training; studies different architecture components and finds that carefully mixing image-caption, interleaved image-text, and text-only data is key for state-of-the-art performance; it also proposes a family of multimodal models up to 30b parameters that achieve sota in pre-training metrics and include properties such as enhanced in-context learning, multi-image reasoning, enabling few-shot chain-of-thought prompting.

पेपर सारांश (Abstract)

इस काम में उच्च-प्रदर्शन वाले Multimodal Large Language Models (MLLMs) बनाने के तरीके पर चर्चा की गई है। खास तौर पर, इसमें विभिन्न architecture components और data choices के महत्व का अध्ययन किया गया है। image encoder, vision-language connector, और विभिन्न pre-training data choices पर सावधानीपूर्वक और व्यापक ablation के ज़रिए कई अहम design lessons की पहचान की गई। उदाहरण के लिए, यह दिखाया गया है कि image-caption, interleaved image-text, और text-only data के संतुलित मिश्रण के साथ बड़े पैमाने पर multimodal pre-training करना कई benchmarks पर अन्य प्रकाशित pre-training परिणामों की तुलना में state-of-the-art (SOTA) few-shot नतीजे हासिल करने के लिए बेहद महत्वपूर्ण है। साथ ही, यह भी दिखाया गया है कि image encoder, image resolution, और image token count का प्रभाव काफी बड़ा होता है, जबकि vision-language connector का design तुलनात्मक रूप से कम महत्वपूर्ण है। प्रस्तुत recipe को scale up करके MM1 बनाया गया, जो 30B parameters तक के multimodal models का एक family है, जिसमें dense models और mixture-of-experts (MoE) variants दोनों शामिल हैं। यह pre-training metrics में SOTA है और स्थापित multimodal benchmarks की एक विस्तृत श्रृंखला पर supervised fine-tuning के बाद प्रतिस्पर्धी प्रदर्शन हासिल करता है। बड़े पैमाने के pre-training की बदौलत MM1 में बेहतर in-context learning और multi-image reasoning जैसी आकर्षक क्षमताएँ हैं, जो few-shot chain-of-thought prompting को संभव बनाती हैं。

In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models up to 30B parameters, consisting of both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting.

पेपर लिंक

https://arxiv.org/abs/2403.09611

मूल लेख

https://nlp.elvissaravia.com/p/top-ml-papers-of-the-week-6a6

यह लेख GPT मॉडल की मदद से संक्षेपित किया गया है, इसलिए इसमें कुछ गलतियाँ हो सकती हैं। कृपया नीचे दिए गए मूल लेख को भी साथ में देखें। पढ़ते समय अगर आपको कोई अटपटी या गलत बात दिखे, तो कृपया टिप्पणी में बताएं!

⚠️विज्ञापन⚠️: क्या PyTorch Korea User Group द्वारा तैयार किया गया यह लेख आपके लिए उपयोगी रहा? सदस्य के रूप में जुड़ें, तो हम आपको प्रमुख लेख ईमेल से भेजेंगे! (डिफ़ॉल्ट रूप से Weekly, लेकिन Daily में भी बदला जा सकता है.)

6 टिप्पणियां

prelude9903 2024-03-19

कृपया बताइए कि आपने कौन-सा ऑटोमैटिक ट्रांसलेशन टूल इस्तेमाल किया था।

ninebow 2024-03-19

हाँ, मैं DeepL इस्तेमाल कर रहा/रही हूँ hehe
हाल ही में Korean के लिए भी translation glossary बनाने की सुविधा आई, तो मैंने उसे इस्तेमाल करके देखा, लेकिन समस्या हुई orz...

libner 2024-03-19

RAT वाले हिस्से में पेपर का परिचय देते समय लगता है कि rat और rag का अनुवाद क्रमशः चूहा और कपड़ा पोछने वाला चिथड़ा कर दिया गया है
शायद मॉडल ने lowercase को जैसा का तैसा पढ़ लिया होगा

ninebow 2024-03-20

इसे निम्नानुसार संशोधित किया गया है। धन्यवाद! :D

यह दिखाता है कि information retrieval के ज़रिए chain-of-thought (CoT) को बार-बार संशोधित करने से long-form generation tasks में LLM reasoning और generation को काफ़ी बेहतर बनाया जा सकता है। मुख्य विचार यह है कि सोच के हर चरण को task query, वर्तमान और पिछले सोच चरणों से संबंधित retrieved information के आधार पर संशोधित किया जाता है। Retrieval-Augmented Thoughts (RAT) को GPT-4 और CodeLlama-7b जैसे अन्य models पर लागू कर long-form generation tasks (जैसे, creative writing और विस्तृत task planning) में इस्तेमाल किया जा सकता है; RAT एक zero-shot prompting approach है और zero-shot CoT prompting, basic RAG तथा अन्य baselines सहित baseline तरीकों की तुलना में काफ़ी बेहतर प्रदर्शन करता है.

ninebow 2024-03-19

अरे, सही कहा आपने; मैं मूल पाठ को ठीक कर देता हूँ, हाहा
धन्यवाद!

ninebow 2024-03-19

अरे, शीर्षक... कृपया इसे 'इस सप्ताह के प्रमुख ML शोधपत्र' में बदल दें;;

[2024/03/11 ~ 03/17] इस हफ़्ते के प्रमुख ML पेपर (Top ML Papers of the Week)