A* से आगे: Transformer का उपयोग करके बेहतर planning

(arxiv.org)

2 पॉइंट द्वारा GN⁺ 2024-02-25 | 1 टिप्पणियां | WhatsApp पर शेयर करें

Transformer और LLM बातचीत, image understanding और code completion में मजबूत हैं, लेकिन multi-step planning और उच्च-स्तरीय reasoning में स्थिर प्रदर्शन देना अभी भी कठिन है
यह पेपर planning tasks और optimal solutions को token sequences के रूप में बनाता है, और A* द्वारा समस्या हल करने की execution trace को भी training data में शामिल करता है
Searchformer पहले A* की search process की नकल करना सीखता है, फिर optimal plan बनाए रखते हुए उससे भी छोटे search sequences बनाने के लिए fine-tune किया जाता है
Sokoban प्रयोगों में Searchformer परिवार के मॉडलों ने test tasks के 93.7% हल किए, और A* baseline implementation की तुलना में औसतन 26.8% कम search steps का उपयोग किया
execution traces का बोझ यह है कि generated sequences 10×~100× लंबे हो जाते हैं, लेकिन फिर भी बड़े solution-only models की तुलना में कम training sequences के साथ अनदेखे tasks पर optimal plans अधिक बार बनते हैं

Transformer क्या अच्छी तरह करते हैं और planning tasks में उनकी कमजोरी

Transformer-आधारित architectures कई कार्यों में उच्च प्रदर्शन दिखाते हैं
- मानव-स्तरीय बातचीत
- उच्च-गुणवत्ता image understanding
- video generation
- multimodal generation
- code completion
इंटरनेट-स्तरीय डेटा पर प्रशिक्षित LLM जैसे मॉडल वास्तविक उपयोग मामलों में अच्छी generalization दिखा सकते हैं
लेकिन planning और reasoning tasks में अब भी सीमाएँ बनी हुई हैं
- LLM, multi-step planning tasks में कमज़ोर दिखते हैं
- उच्च-स्तरीय reasoning में भी कठिनाइयाँ सामने आती हैं

step-by-step thinking prompts की सीमाएँ

हाल के approaches यह कोशिश करते हैं कि Transformer पहले बीच के “विचार” generate करे और फिर उत्तर दे, ताकि प्रदर्शन बेहतर हो
Chain-of-Thought(CoT) prompting और Tree-of-thoughts(ToT) मॉडल को चरणबद्ध तरीके से “सोचने” के लिए प्रोत्साहित करते हैं
ये तकनीकें अक्सर प्रभावी होती हैं, लेकिन self-enforcing जैसी वजहों से प्रदर्शन घट भी सकता है
जो तरीका एक dataset पर अच्छा काम करता है, वह दूसरे dataset पर विफल हो सकता है
- spatial reasoning और mathematical reasoning जैसे मामलों में, जहाँ ज़रूरी reasoning का प्रकार अलग होता है, यह देखा जा सकता है
Transformer और LLM को planning, multi-step decision-making और reasoning में स्थिर रूप से सक्षम बनाने के तरीके अब भी सक्रिय शोध का विषय हैं

A* search dynamics को training data में शामिल करने का तरीका

यह approach इस बात पर केंद्रित है कि Transformer को जटिल planning tasks अधिक robust तरीके से हल करना सिखाया जाए
मॉडल को LLM की तरह इस तरह train किया जाता है कि दिए गए शब्दों की sequence के बाद अगला शब्द predict करे
प्रयोग synthetic language और synthetic vocabulary का उपयोग करने वाले synthetic generated dataset पर किए गए
planning tasks और optimal solution plans को token कहलाने वाली word sequences के रूप में व्यक्त किया गया
A* द्वारा की गई computation process को execution trace token sequence के रूप में रिकॉर्ड किया गया
- execution trace, A* की search dynamics को समेटे हुए sequence dataset बनाता है
- Transformer को search-augmented sequences के माध्यम से ऐसी token sequences generate करने के लिए train किया जाता है, जो A* की search dynamics और optimal plan दोनों को encode करें

Searchformer की training procedure

अंतिम मॉडल Searchformer दो चरणों में तैयार किया गया
- पहले Transformer को A* की search process की नकल करना सिखाया गया
- उसके बाद optimal plan output करते हुए कम search steps में plan खोजने के लिए fine-tune किया गया
इस प्रक्रिया को search dynamics bootstrapping कहा जाता है
लक्ष्य ऐसा Transformer पाना है जो A* baseline implementation की तुलना में कम search steps के साथ जटिल planning tasks हल कर सके

Sokoban प्रयोग और generalization performance

Sokoban puzzle में Searchformer परिवार के मॉडलों ने पूरे test tasks के 93.7% हल किए
औसत search steps, A* baseline implementation की तुलना में 26.8% कम थे
task complexity, dataset size और model size को नियंत्रित करने वाले प्रयोगों में execution traces शामिल करने का प्रभाव पुष्टि किया गया
training data में execution traces जोड़ने पर generated sequence length 10×~100× बढ़ जाती है
इसके बावजूद, independent test task sets पर प्रदर्शन बढ़ता है
search-augmented models, बड़े solution-only models की तुलना में 10 गुना कम training sequences होने पर भी अनदेखे tasks पर optimal plans अधिक बार generate करते हैं
- search-augmented models को task descriptions, solutions और execution traces वाले डेटा पर train किया जाता है
- solution-only models को केवल task descriptions और task solutions वाली sequences पर train किया जाता है
यह परिणाम दिखाता है कि A* की search dynamics को Transformer training process में शामिल करने से planning tasks का प्रदर्शन सुधर सकता है

1 टिप्पणियां

GN⁺ 2024-02-25

Hacker News की राय

रोबोट motion planning में transformer इस्तेमाल करने पर और भी दिलचस्प रिसर्च थी 0
रोबोटिक arm को point A से point B तक ले जाते हुए टकराव से बचाना high-dimensional और continuous समस्या है, इसलिए बहुत कठिन है, और मौजूदा planning methods में computation ज़्यादा लगता है फिर भी performance खास अच्छी नहीं होती
इसलिए रोबोट की हरकतें “अस्वाभाविक” दिखती हैं और रोबोट वे कई काम अच्छे से नहीं कर पाते जो हम उनसे करवाना चाहते हैं; यह approach लगभग optimal paths को ज़्यादा तेज़ी से plan करती है, इसलिए दूसरी methods के मुकाबले काफी competitive लगती है
research direction में जाने से पहले, क्या उन्होंने game graph/path search के लिए A* optimization, Modified J algorithm*, आज़माया था, यह जानना चाहूंगा
जिनको जिज्ञासा हो, यह Game AI Pro 2 में है 0
- इससे संबंधित https://github.com/anvaka/ngraph.path भी है
- निष्पक्ष रूप से देखें तो paper के अंत में उन्होंने कहा है कि उनका pathfinder अभी state-of-the-art techniques से compete करने के स्तर पर नहीं है
  यह paper test करता है कि transformer execution traces को कितना अच्छा predict करते हैं, जैसे JIT compiler वाले मामलों में, और क्या वह path search जैसी जगहों पर heuristics सुधारने में मदद करता है
  हालांकि transformer धीमे होते हैं, इसलिए मैं इसे सावधानी से देखता हूं
- मुझे ये किताबें पसंद हैं और Steve Rabin का लगातार काम करते रहना भी अच्छा लगता है, लेकिन ebook 120 डॉलर की है, यह अप्रत्याशित है
planning problems को पहले से ही graph search, SAT solvers, operations research, Prolog जैसी स्थापित techniques अच्छी तरह handle करती हैं
आम तौर पर core बात कई संभव alternatives के बीच optimization होती है, और transformer उसके लिए उपयुक्त हैं या नहीं, यह मुझे ठीक से नहीं पता
LLM-family techniques की भूमिका natural-language descriptions को executable programs में translate करने की ओर ज़्यादा लगती है, लेकिन Prolog भी शुरुआत से ही classical natural language processing के लिए design किया गया था, इसलिए वह पहले से ही काफी करीब है
- इसी तरह के उद्देश्य के लिए Prolog और LLM की तुलना करना दिलचस्प होगा
machine translation में पहले search इस्तेमाल करने वाली complex grammar decoding की ज़रूरत होती थी, लेकिन अब बहुत simpler और व्यावहारिक रूप से बिना search वाली decoding के लिए transformer इस्तेमाल होते हैं
अब शायद पूरी recursive structure तक भी जाया जा सकता है
विचार यह है कि मौजूदा top-tier prediction models से neural architecture search (NAS) की heuristics सीखें, और transformer या mamba से बेहतर नए neural network blocks खोजें
- “हर बार जब मैं किसी linguist को निकालता हूं, speech recognizer की performance बढ़ जाती है।” — Frederick Jelinek
- अंततः हम ऐसी दुनिया में प्रवेश कर सकते हैं जहां technology develop करने वाले लोग भी उसके काम करने का तरीका अब नहीं समझते
  singularity आ रही है…
अगर Sokoban-स्टाइल games में रुचि है, तो https://thinky.gg देख सकते हैं
वहां Sokopath नाम का मज़ेदार Sokoban variant है, और Pathology नाम का एक और NP-hard variant है जिसका goal point A से point B तक सबसे कम steps में पहुंचना है
community ने कई solvers बनाने की कोशिश की, लेकिन grid 5x5 से बड़ी होते ही यह बहुत कठिन हो जाता है, और thinky community ने simulated annealing से ऐसे दिलचस्प levels भी खोजे हैं जिनमें maximum step count बहुत बड़ा है
“standard A* search की तुलना में search steps में 26.8% कमी”
यानी Sokoban में यह A* से थोड़ा ही बेहतर है, जो state-of-the-art से काफी दूर है (https://festival-solver.site/)
इस paper में impressive क्या है, और यह Hacker News पर क्यों आया, मुझे समझ नहीं आता
- A* अपनी explicit की गई specific constraints के तहत सबसे optimal search algorithm है, इसलिए उससे बेहतर नहीं किया जा सकता
  लेकिन अगर search domain में इस्तेमाल करने योग्य अन्य constraints हों, तो A* से बेहतर किया जा सकता है
  उदाहरण के लिए Jump Point Search उन grid search properties का फायदा उठाता है जहां movement केवल कुछ specific तरीकों से हो सकती है
  अगर underlying domain की special properties को इंसान द्वारा manual analysis किए बिना “automatically” प्रभावी ढंग से उपयोग करने वाला general search algorithm बनाया जा सके, तो वह उपयोगी नहीं होगा क्या?
- क्योंकि transformer से standard A* search से बेहतर एक ठीक-ठाक समाधान तक पहुंचे हैं
  A* एक “naive” baseline solution के करीब है, और इन्होंने algorithm design पर सीधे माथापच्ची नहीं की
  एक simple encoder-decoder transformer इतना कर सकता है, यह काफी impressive है
- abstract की पहली line में ही है
  “Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symbolic planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks ...”
  यह paper decision-making में transformer इस्तेमाल करने का example है, इसलिए दिलचस्प है, और अभी वह A level* पर है या नहीं, इसमें मेरी ज्यादा रुचि नहीं है
- HN पर इसलिए आया क्योंकि community को पसंद आया
- यह इस बात का एक और सबूत है कि transformer सिर्फ next-token prediction नहीं बल्कि तरह-तरह के learning tasks पर लागू होने वाला पूरी तरह general-purpose approach है—यानी transformer की unreasonable effectiveness
  बेशक उस hypothesis के strong और weak versions हैं, और strong version शायद सच नहीं है, लेकिन जब तक ऐसा लगता है कि हम nature के काम सीखने के “एक सच्चे तरीके” के करीब जा रहे हैं, यह एक महत्वपूर्ण खबर लगती है
अगर transformer plan कर सकते हैं, तो इसका मतलब हो सकता है कि artificial general intelligence को बस बेहतर training की ज़रूरत है
- exhaustive search को approximate करना logic या causality नहीं है
- ज़रूरी pieces बहुत ज़्यादा हैं, और agency उसका बड़ा हिस्सा है
  online learning भी चाहिए और इसके अलावा भी कई layers होनी चाहिए
- निकट भविष्य में दिशा शायद यही होगी कि hallucination रोकने के लिए लगातार और ज़्यादा data खिलाया जाए
auditory learners के लिए इस paper को summarized audiobook format में बनाया गया है
https://player.oration.app/09fefe41-f2a7-4257-a25e-30e479b30d6f
A* या Focal search, और कई integer linear programming families जैसे discrete algorithms में learned heuristics इस्तेमाल करने को लेकर मैं बहुत optimistic हूं
CPLEX जैसी modern discrete optimization libraries में performance difference का बड़ा कारण heuristics और tuning होते हैं
अच्छी तरह समझे गए optimal search routines को end-to-end learned approach से replace करना कम convincing लगता है, लेकिन यह शायद बेकार की चिंता भी हो सकती है
हालांकि लगता है कि authors ने वह मौका गंवा दिया
- यह बस transformer और AI के आसपास की bubble/hype effect जैसा लगता है
  सोचता हूं मैं भी transformer से tic-tac-toe solve करूं और VC money के लिए apply कर दूं
  कुछ साल बाद शायद हर कोई लिख रहा होगा कि actual code AI से कितना ज़्यादा efficient है ;)
- सहमत हूं
  admissible heuristic सीखने पर worst-case performance बनाए रखी जा सकती है, और यही हमेशा ऐसे algorithms का standard रहा है
  average या p99 cases में तेज़, लेकिन worst-case guarantees न देने वाले solutions मिलना बिल्कुल दुर्लभ नहीं है
सोच रहा हूं कि deep learning से बेहतर perform करने लगे classical algorithms या NP-complete problems की list कौन maintain कर रहा है
- सुविधा के लिए, ऐसे NP-complete problems की list लिखें जहां “AI” worst case में state-of-the-art techniques से बेहतर करता है:
- मेरी समझ में यह अभी भी बहुत active research stage में है, और production environment में deployed कोई स्पष्ट जीत अभी नहीं है

A* से आगे: Transformer का उपयोग करके बेहतर planning

Transformer क्या अच्छी तरह करते हैं और planning tasks में उनकी कमजोरी

step-by-step thinking prompts की सीमाएँ

A* search dynamics को training data में शामिल करने का तरीका

Searchformer की training procedure

Sokoban प्रयोग और generalization performance

संबंधित पढ़ाई

1 टिप्पणियां

Hacker News की राय