17] इस सप्ताह के प्रमुख ML शोधपत्र (Top ML Papers of the Week)

(discuss.pytorch.kr)

2 पॉइंट द्वारा ninebow 2023-12-18 | अभी कोई टिप्पणी नहीं है. | WhatsApp पर शेयर करें

अवलोकन

DAIR.AI द्वारा हर हफ्ते प्रकाशित किए जाने वाले ML शोधपत्रों पर आधारित इस लेख का स्वचालित अनुवाद किया गया है।
इस हफ्ते चुने गए शोधपत्रों को देखें तो यह साफ़ नज़र आता है कि मुख्य फ़ोकस 'LLM(Large Language Models)' पर शोध का है। विशेष रूप से, गणित में खोज, generalization की समस्या, चिकित्सा क्षेत्र में अनुप्रयोग, और human data से आगे की learning methods जैसे विविध विषयों में LLM के उपयोग और performance improvement का विश्लेषण किया गया है।
यह रुझान इस बात को दर्शाता है कि पिछले कुछ वर्षों में AI क्षेत्र में LLM के विकास पर खास ध्यान दिया गया है। खासकर, OpenAI के GPT-3 जैसे बड़े language models के आने के बाद इन मॉडलों के विभिन्न क्षेत्रों में लागू होने की संभावनाएँ काफ़ी बढ़ गई हैं। इससे संकेत मिलता है कि LLM केवल साधारण text processing तक सीमित नहीं हैं, बल्कि जटिल problem-solving में भी महत्वपूर्ण भूमिका निभा सकते हैं, और इस हफ्ते के शोधपत्रों से लगता है कि इनका प्रभाव गणित और चिकित्सा जैसे पारंपरिक क्षेत्रों तक भी फैल रहा है। साथ ही, transparency और openness पर बढ़ता ज़ोर research और application दोनों में trust और collaboration के महत्व को रेखांकित करता है।
इसके अलावा 'Weak-to-strong Generalization' और 'Beyond Human Data for LLMs' जैसे शोधपत्र LLM की generalization क्षमता और learning methodology पर चल रहे शोध की ओर इशारा करते हैं। इसे LLM के मौजूदा सीमित learning environment से आगे बढ़कर अधिक शक्तिशाली learning mechanisms विकसित करने की कोशिश के रूप में देखा जा सकता है। यह रुझान दिखाता है कि LLM research अब केवल performance improvement तक सीमित नहीं है, बल्कि model की generalization क्षमता और उपयोगिता को उल्लेखनीय रूप से बढ़ाने की दिशा में आगे बढ़ रही है।

गणितीय विज्ञान में खोजों के लिए LLM / LLMs for Discoveries in Mathematical Sciences

शोधपत्र परिचय

गणित और computer science में नए समाधान खोजने के लिए LLM का उपयोग करते हुए, pre-trained LLM और एक systematic evaluator को जोड़कर तथा उन पर iteration करके कम score वाले programs को नए ज्ञान खोजने वाले high-score programs में विकसित करने वाली funsearch नामक विधि प्रस्तावित की गई है। इस शोध का एक प्रमुख निष्कर्ष यह है कि गणितीय खोजों और अन्य वास्तविक समस्याओं के समाधान के लिए LLM hallucinations से सुरक्षा बहुत महत्वपूर्ण है.

Uses llms to search for new solutions in mathematics & computer science; proposes funsearch which combines a pre-trained llm with a systematic evaluator and iterates over them to evolve low-scoring programs into high-scoring ones discovering new knowledge; one of the key findings in this work is that safeguarding against llm hallucinations is important to produce mathematical discoveries and other real-world problems.

शोधपत्र लिंक

https://www.nature.com/articles/s41586-023-06924-6

आगे पढ़ें

https://x.com/GoogleDeepMind/status/1735332722208284797

कमजोर से मजबूत generalization / Weak-to-strong Generalization

शोधपत्र परिचय

यह अध्ययन देखता है कि क्या weak model supervision, stronger models की पूरी क्षमता को सामने ला सकती है। इसमें पाया गया कि weak model द्वारा बनाए गए labels पर strong pre-trained models को सीधे fine-tune करने पर वे अपने weak supervisors से बेहतर प्रदर्शन कर सकते हैं। साथ ही, यह रिपोर्ट किया गया कि GPT-2 स्तर के supervisor के साथ GPT-4 को fine-tune करने पर NLP tasks में GPT-3.5 स्तर के क़रीब performance वापस प्राप्त किया जा सकता है।

Studies whether weak model supervision can elicit the full capabilities of stronger models; finds that when naively fine-tuning strong pretrained models on weak model generated labels they can perform better than their weak supervisors; reports that finetuning gpt-4 with a gpt-2-level supervisor it’s possible to recover close to gpt-3.5-level performance on nlp tasks.

शोधपत्र लिंक

https://cdn.openai.com/papers/weak-to-strong-generalization.pdf

आगे पढ़ें

https://x.com/OpenAI/status/1735349718765715913

ऑडियो बॉक्स / Audiobox

शोधपत्र परिचय

flow-matching पर आधारित एक unified model, जो विभिन्न audio modalities उत्पन्न कर सकता है; controllability बढ़ाने और speech तथा sound generation paradigms को एकीकृत करने के लिए description-based और example-based prompting डिज़ाइन करता है; बिना labels वाले बड़े पैमाने के audio पर pre-train करने के लिए self-supervised infilling objective को अनुकूलित करता है; speech और sound generation में अच्छा प्रदर्शन करता है और नए vocal तथा acoustic styles के साथ audio generation के नए तरीक़े खोलता है।

A unified model based on flow-matching capable of generating various audio modalities; designs description-based and example-based prompting to enhance controllability and unify speech and sound generation paradigms; adapts a self-supervised infilling objective to pre-train on large quantities of unlabeled audio; performs well on speech and sound generation and unlocks new methods for generating audio with novel vocal and acoustic styles.

शोधपत्र लिंक

https://ai.meta.com/research/publications/…

आगे पढ़ें

https://x.com/AIatMeta/status/1734257634008531453

गणितीय language models: एक सर्वेक्षण / Mathematical Language Models: A Survey

शोधपत्र परिचय

गणितीय कार्यों पर LLM की प्रगति का एक सर्वेक्षण, जिसमें math word problem-solving और theorem proving जैसे कार्यों तथा prompting techniques के इर्द-गिर्द LLM research से जुड़े शोधपत्रों और संसाधनों को शामिल किया गया है।

A survey on the progress of llms on mathematical tasks; covers papers and resources on llm research around prompting techniques and tasks such as math word problem-solving and theorem proving.

शोधपत्र सार

पिछले कुछ वर्षों में गणित के क्षेत्र में Language Models (LMs), जिनमें Pre-trained Language Models (PLMs) और Large-scale Language Models (LLMs) शामिल हैं, के उपयोग में उल्लेखनीय प्रगति हुई है। यह शोधपत्र mathematical LMs पर एक व्यापक सर्वे प्रस्तुत करता है और प्रमुख शोध प्रयासों को दो अलग-अलग दृष्टिकोणों—tasks और methodologies—से व्यवस्थित रूप से वर्गीकृत करता है। इसके परिणामस्वरूप बड़ी संख्या में mathematical LMs प्रस्तावित किए गए हैं, जिन्हें आगे instruction learning, tool-based methods, बुनियादी CoT techniques, और उन्नत CoT methodologies में विभाजित करके देखा जा सकता है। इसके अलावा, यह सर्वे 60 से अधिक mathematical datasets को संकलित करता है, जिनमें training datasets, benchmark datasets, और augmented datasets शामिल हैं। mathematical LMs के क्षेत्र की प्रमुख चुनौतियों को संबोधित करते हुए और भविष्य की दिशा को रेखांकित करते हुए, यह सर्वे इस क्षेत्र को आगे बढ़ाने में जुटे शोधकर्ताओं के लिए एक मूल्यवान संसाधन के रूप में स्थापित होता है, जो भविष्य के innovation को बढ़ावा देने और प्रेरित करने की क्षमता रखता है।

In recent years, there has been remarkable progress in leveraging Language Models (LMs), encompassing Pre-trained Language Models (PLMs) and Large-scale Language Models (LLMs), within the domain of mathematics. This paper conducts a comprehensive survey of mathematical LMs, systematically categorizing pivotal research endeavors from two distinct perspectives: tasks and methodologies. The landscape reveals a large number of proposed mathematical LLMs, which are further delineated into instruction learning, tool-based methods, fundamental CoT techniques, and advanced CoT methodologies. In addition, our survey entails the compilation of over 60 mathematical datasets, including training datasets, benchmark datasets, and augmented datasets. Addressing the primary challenges and delineating future trajectories within the field of mathematical LMs, this survey is positioned as a valuable resource, poised to facilitate and inspire future innovation among researchers invested in advancing this domain.

शोधपत्र लिंक

https://arxiv.org/abs/2312.07622

आगे पढ़ें

https://x.com/omarsar0/status/1735323577392542084

LLM360: पूरी तरह पारदर्शी ओपन सोर्स LLMs की ओर यात्रा / LLM360: Towards Fully Transparent Open-Source LLMs

शोधपत्र परिचय

end-to-end machine learning training process को पारदर्शी और reproducible बनाकर खुले और collaborative AI research का समर्थन करने के लिए LLM360 प्रस्तावित किया गया है, और training code, data, intermediate checkpoints तथा analyses सहित scratch से pre-train किए गए 7B parameter models Amber और CrystalCoder जारी किए गए हैं.

Proposes llm360 to support open and collaborative ai research by making the end-to-end llm training process transparent and reproducible; releases 7b parameter llms pre-trained from scratch, amber and crystalcoder, including their training code, data, intermediate checkpoints, and analyses.

शोधपत्र सार

हाल के समय में LLaMA, Falcon और Mistral जैसे open-source Large Language Models (LLMs) की तेज़ बढ़ोतरी ने AI practitioners और researchers को कई विकल्प दिए हैं। हालांकि, अधिकांश LLMs ने केवल आंशिक artifacts ही जारी किए हैं, जैसे अंतिम model weights या inference code, और technical reports भी अब अपने दायरे को high-level design choices और सतही statistics तक सीमित करती जा रही हैं। ये विकल्प LLM training में transparency को कम करते हैं और training process की कई बारीकियों को टीमों से दोबारा खोजने के लिए मजबूर करके इस क्षेत्र की प्रगति में बाधा डालते हैं। Unity ने LLM360 प्रस्तुत किया है, जो LLMs को पूरी तरह open-source बनाने की एक initiative है, और जिसका उद्देश्य सभी training code और data, model checkpoints, तथा intermediate results को community के लिए उपलब्ध कराना है। LLM360 का लक्ष्य end-to-end LLM training process को सभी के लिए transparent और reproducible बनाकर खुले और collaborative AI research का समर्थन करना है। LLM360 के पहले चरण के रूप में, Unity ने scratch से pre-train किए गए दो 7B parameter LLMs, Amber और CrystalCoder, जारी किए हैं, जिनमें उनका training code, data, intermediate checkpoints, और analyses शामिल हैं (https://www.llm360.ai)। Unity इस open-source प्रयास के माध्यम से LLMs की सीमाओं को लगातार आगे बढ़ाने के लिए प्रतिबद्ध है। और बड़े पैमाने तथा अधिक शक्तिशाली models पर काम जारी है और उन्हें भविष्य में जारी किया जाएगा।

The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder progress in the field by degrading transparency into the training of LLMs and forcing teams to rediscover many details in the training process. We present LLM360, an initiative to fully open-source LLMs, which advocates for all training code and data, model checkpoints, and intermediate results to be made available to the community. The goal of LLM360 is to support open and collaborative AI research by making the end-to-end LLM training process transparent and reproducible by everyone. As a first step of LLM360, we release two 7B parameter LLMs pre-trained from scratch, Amber and CrystalCoder, including their training code, data, intermediate checkpoints, and analyses (at https://www.llm360.ai). We are committed to continually pushing the boundaries of LLMs through this open-source effort. More large-scale and stronger models are underway and will be released in the future.

शोधपत्र लिंक

https://arxiv.org/abs/2312.06550

आगे पढ़ें

https://x.com/omarsar0/status/1734591071575744820

चिकित्सा क्षेत्र में Large Language Models पर एक सर्वे: सिद्धांत, अनुप्रयोग और चुनौतियाँ / A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges

शोधपत्र परिचय

यह चिकित्सा क्षेत्र में machine learning पर एक व्यापक survey है (300 से अधिक शोधपत्रों का विश्लेषण), जिसमें चिकित्सा क्षेत्र में machine learning के सामने मौजूद सिद्धांतों, अनुप्रयोगों और चुनौतियों का अवलोकन शामिल है।

A comprehensive survey (analyzing 300+ papers) on llms in medicine; includes an overview of the principles, applications, and challenges faced by llms in medicine.

शोधपत्र सार

ChatGPT जैसे बड़े language model (LLM) अपनी प्रभावशाली मानव भाषा समझ और generation क्षमताओं के कारण काफ़ी ध्यान आकर्षित कर रहे हैं। इसलिए डॉक्टरों और patient care की सहायता के लिए चिकित्सा क्षेत्र में LLM का उपयोग, artificial intelligence और clinical medicine दोनों में एक आशाजनक शोध दिशा के रूप में उभर रहा है। इस रुझान को ध्यान में रखते हुए, यह survey चिकित्सा क्षेत्र में LLM के सिद्धांतों, अनुप्रयोगों और उनके सामने आने वाली चुनौतियों का एक व्यापक अवलोकन प्रस्तुत करता है। विशेष रूप से, यह निम्नलिखित प्रश्नों को संबोधित करता है: 1) medical LLM कैसे बनाए जा सकते हैं? 2) medical LLM का downstream performance क्या है? 3) real-world clinical practice में medical LLM का उपयोग कैसे किया जा सकता है? 4) medical LLM के उपयोग से कौन-सी चुनौतियाँ उत्पन्न होती हैं? 5) medical LLM को और बेहतर तरीके से कैसे बनाया और उपयोग किया जा सकता है? अंततः, यह survey चिकित्सा क्षेत्र में LLM के अवसरों और चुनौतियों पर insight प्रदान करने तथा व्यावहारिक और प्रभावी medical LLM के निर्माण के लिए एक मूल्यवान संसाधन बनने का लक्ष्य रखता है। medical LLM पर practical guides की नियमित रूप से अपडेट होने वाली सूची https://github.com/AI-in-Health/MedLLMsPracticalGuide पर उपलब्ध है।

Large language models (LLMs), such as ChatGPT, have received substantial attention due to their impressive human language understanding and generation capabilities. Therefore, the application of LLMs in medicine to assist physicians and patient care emerges as a promising research direction in both artificial intelligence and clinical medicine. To reflect this trend, this survey provides a comprehensive overview of the principles, applications, and challenges faced by LLMs in medicine. Specifically, we aim to address the following questions: 1) How can medical LLMs be built? 2) What are the downstream performances of medical LLMs? 3) How can medical LLMs be utilized in real-world clinical practice? 4) What challenges arise from the use of medical LLMs? and 5) How can we better construct and utilize medical LLMs? As a result, this survey aims to provide insights into the opportunities and challenges of LLMs in medicine and serve as a valuable resource for constructing practical and effective medical LLMs. A regularly updated list of practical guides on medical LLMs can be found at https://github.com/AI-in-Health/MedLLMsPracticalGuide.

शोधपत्र लिंक

https://arxiv.org/abs/2311.05112

मानव डेटा से आगे: language model के साथ समस्या-समाधान के लिए self-training का विस्तार / Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

शोधपत्र परिचय

यह feedback के साथ self-training का एक ऐसा दृष्टिकोण प्रस्तावित करता है जो मानव-निर्मित डेटा पर निर्भरता को काफ़ी हद तक कम कर सकता है; model-generated data और reward function का संयोजन problem-solving tasks पर LLM के performance को बेहतर बनाता है।

Proposes an approach for self-training with feedback that can substantially reduce dependence on human-generated data; the model-generated data combined with a reward function improves the performance of llms on problem-solving tasks.

शोधपत्र सार

मानव-निर्मित डेटा पर language model (LM) को fine-tune करना अभी भी व्यापक रूप से प्रचलित है। लेकिन ऐसे models का performance अक्सर उच्च-गुणवत्ता वाले मानव डेटा की मात्रा और विविधता से सीमित होता है। इस शोधपत्र में हम यह जाँचते हैं कि क्या हम उन tasks पर, जहाँ scalar feedback उपलब्ध है — उदाहरण के लिए गणित के प्रश्न, जहाँ उत्तर की शुद्धता सत्यापित की जा सकती है — मानव डेटा से आगे जा सकते हैं। इसके लिए हम expectation-maximization पर आधारित एक सरल self-training विधि का अध्ययन करते हैं, जिसे हम ReST $^{EM}$ कहते हैं, जिसमें हम (1) model से samples generate करते हैं और binary feedback का उपयोग कर उन्हें filter करते हैं, (2) इन samples पर model को fine-tune करते हैं, और (3) इस प्रक्रिया को कुछ बार दोहराते हैं। PaLM-2 models का उपयोग करते हुए advanced MATH reasoning और APPS coding benchmarks पर परीक्षण में हमने पाया कि ReST $^{EM}$ model size के साथ अनुकूल रूप से scale करता है और केवल मानव डेटा पर fine-tune करने की तुलना में काफ़ी बेहतर प्रदर्शन करता है। कुल मिलाकर, हमारे निष्कर्ष संकेत देते हैं कि feedback के साथ self-training मानव-निर्मित डेटा पर निर्भरता को काफ़ी कम कर सकता है।

Fine-tuning language models(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST $^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that ReST $^{EM}$ scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can substantially reduce dependence on human-generated data.

शोधपत्र लिंक

https://arxiv.org/abs/2312.06585

Gaussian-SLAM

शोधपत्र परिचय

यह एक neural RGBD SLAM विधि है जो गति और दक्षता से समझौता किए बिना वास्तविक दुनिया के दृश्यों का photorealistic पुनर्निर्माण कर सकती है; यह पिछले तरीकों की सीमाओं को दूर करने के लिए scene representation हेतु पारंपरिक 3D Gaussian विधि का विस्तार करती है।

A neural rgbd slam method capable of photorealistically reconstructing real-world scenes without compromising speed and efficiency; extends classical 3d gaussians for scene representation to overcome the limitations of the previous methods.

पेपर लिंक

https://vladimiryugay.github.io/gaussian_slam/

Pearl: प्रोडक्शन में सीधे लागू किया जा सकने वाला reinforcement learning एजेंट / Pearl: A Production-ready Reinforcement Learning Agent

पेपर परिचय

शोधकर्ताओं और प्रैक्टिशनर्स के लिए एक नया production-ready AI agent software package पेश किया गया है, जो सीमित observability, sparse feedback और उच्च stochasticity वाले environments के अनुरूप RL AI agents विकसित करने में सक्षम बनाता है।

Introduces a new production-ready rl agent software package that enables researchers and practitioners to develop rl ai agents that adapt to environments with limited observability, sparse feedback, and high stochasticity.

पेपर सारांश

Reinforcement Learning (RL) दीर्घकालिक लक्ष्यों को हासिल करने के लिए एक versatile framework प्रदान करता है। इसकी सामान्यता हमें उन व्यापक समस्याओं को formalize करने देती है जिनका सामना वास्तविक दुनिया की intelligent systems करती हैं, जैसे delayed rewards से निपटना, partial observability को संभालना, exploration-exploitation dilemma को संबोधित करना, online performance सुधारने के लिए offline data का उपयोग करना, और safety constraints के पूरा होने को सुनिश्चित करना। इन समस्याओं को हल करने में RL research community ने उल्लेखनीय प्रगति की है, फिर भी मौजूदा open-source RL libraries आमतौर पर RL solution pipeline के केवल एक संकीर्ण हिस्से पर ध्यान देती हैं और अन्य पहलुओं को काफी हद तक अनदेखा छोड़ देती हैं। यह पेपर Pearl को प्रस्तुत करता है, जो एक production-ready RL agent software package है और जिसे इन चुनौतियों को modular तरीके से अपनाने के लिए स्पष्ट रूप से डिज़ाइन किया गया है। यह पेपर प्रारंभिक benchmark results प्रस्तुत करने के साथ-साथ Pearl के industry adoption examples भी दिखाता है, ताकि यह प्रदर्शित किया जा सके कि यह production usage के लिए तैयार है। Pearl को Github पर github.com/facebookresearch/pearl में open source किया गया है, और इसकी आधिकारिक वेबसाइट pearlagent.github.io पर उपलब्ध है।

Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling partial observability, addressing the exploration and exploitation dilemma, utilizing offline data to improve online performance, and ensuring safety constraints are met. Despite considerable progress made by the RL research community in addressing these issues, existing open-source RL libraries tend to focus on a narrow portion of the RL solution pipeline, leaving other aspects largely unattended. This paper introduces Pearl, a Production-ready RL agent software package explicitly designed to embrace these challenges in a modular fashion. In addition to presenting preliminary benchmark results, this paper highlights Pearl's industry adoptions to demonstrate its readiness for production usage. Pearl is open sourced on Github at github.com/facebookresearch/pearl and its official website is located at pearlagent.github.io.

पेपर लिंक

https://arxiv.org/abs/2312.03814

Quip / Quip

पेपर परिचय

यह trained model weights को low-precision format में compress करके memory requirements को कम करता है; यह approach lattice codebooks और incoherence processing को मिलाकर 2-bit quantized models बनाती है; और 2-bit quantized LLMs तथा unquantized 16-bit models के बीच के अंतर को काफी कम करती है।

Compresses trained model weights into a lower precision format to reduce memory requirements; the approach combines lattice codebooks with incoherence processing to create 2 bit quantized models; significantly closes the gap between 2 bit quantized llms and unquantized 16 bit models.

[2023/12/11 ~ 12/17] इस सप्ताह के प्रमुख ML शोधपत्र (Top ML Papers of the Week)

अवलोकन

गणितीय विज्ञान में खोजों के लिए LLM / LLMs for Discoveries in Mathematical Sciences

शोधपत्र परिचय

शोधपत्र लिंक

आगे पढ़ें

कमजोर से मजबूत generalization / Weak-to-strong Generalization

शोधपत्र परिचय

शोधपत्र लिंक

आगे पढ़ें

ऑडियो बॉक्स / Audiobox

शोधपत्र परिचय

शोधपत्र लिंक

आगे पढ़ें

गणितीय language models: एक सर्वेक्षण / Mathematical Language Models: A Survey

शोधपत्र परिचय

शोधपत्र सार

शोधपत्र लिंक

आगे पढ़ें

LLM360: पूरी तरह पारदर्शी ओपन सोर्स LLMs की ओर यात्रा / LLM360: Towards Fully Transparent Open-Source LLMs

शोधपत्र परिचय

शोधपत्र सार

शोधपत्र लिंक

आगे पढ़ें

चिकित्सा क्षेत्र में Large Language Models पर एक सर्वे: सिद्धांत, अनुप्रयोग और चुनौतियाँ / A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges

शोधपत्र परिचय

शोधपत्र सार

शोधपत्र लिंक

और पढ़ें

मानव डेटा से आगे: language model के साथ समस्या-समाधान के लिए self-training का विस्तार / Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

शोधपत्र परिचय

शोधपत्र सार

शोधपत्र लिंक

और पढ़ें

Gaussian-SLAM

शोधपत्र परिचय

पेपर लिंक

और पढ़ें

Pearl: प्रोडक्शन में सीधे लागू किया जा सकने वाला reinforcement learning एजेंट / Pearl: A Production-ready Reinforcement Learning Agent

पेपर परिचय

पेपर सारांश

पेपर लिंक

और पढ़ें

Quip / Quip

पेपर परिचय

पेपर लिंक

और पढ़ें

मूल लेख

संबंधित पढ़ाई

अभी कोई टिप्पणी नहीं है.