Meta के HyperAgents — जब एजेंट खुद अपना harness डिज़ाइन करते हैं

(cobusgreyling.medium.com)

46 पॉइंट द्वारा GN⁺ 20 일 전 | अभी कोई टिप्पणी नहीं है. | WhatsApp पर शेयर करें

Meta और UBC द्वारा संयुक्त रूप से पेश किया गया HyperAgents एक self-referential AI agent फ्रेमवर्क है, जो सिर्फ task execution code ही नहीं बल्कि improvement mechanism को भी खुद modify कर सकता है
coding, paper review, robotics, math grading जैसे कई domains में self-improvement दोहराने के परिणामस्वरूप, एजेंट ने persistent memory, performance tracking, multi-stage verification pipeline जैसी चीजें स्वतंत्र रूप से invent कीं
एजेंट द्वारा खुद बनाए गए ये components, developers द्वारा हाथ से बनाए जाने वाले production harness के core elements से ठीक मेल खाते हैं
harness सिर्फ development convenience नहीं, बल्कि agentic systems की convergent architecture है, और एजेंट अब infrastructure के consumer से producer की ओर बढ़ रहे हैं
developers की भूमिका harness को सीधे बनाने से बदलकर, ऐसे initial conditions design करने की ओर जा रही है जिनसे एजेंट effective harness evolve कर सकें

HyperAgents का अवलोकन

Meta और UBC के नए paper में पेश किया गया HyperAgents एक self-referential agent है, जो task-solving behavior के साथ-साथ future improvements पैदा करने वाले mechanism को भी modify कर सकता है
self-improvement पर छोड़ने पर एजेंट जिस नतीजे पर converge करता है, वह ध्यान देने योग्य है: उसने उन्हीं components को फिर से invent किया जिन्हें आज developers हाथ से बनाते हैं
Hyperagent को infrastructure के producer के रूप में परिभाषित किया गया है

HyperAgents vs Universal Agents

Universal Agent एक अत्यधिक adaptive executor है, जो code लिखकर लगभग किसी भी समस्या को तुरंत हल कर सकता है, लेकिन फिर भी human-designed infrastructure (harness) के भीतर काम करता है
Hyperagent infrastructure का producer है, जो minimal state से शुरू होकर self-referential evolution के जरिए खुद production-grade harness को bootstrap करता है

Harness की परिभाषा और मुख्य components

harness एक software system है जो AI agent के operate करने के तरीके को govern करता है; यह tools, memory, retries, context engineering, और verification को manage करता है ताकि model reasoning पर focus कर सके
production harness के लिए ज़रूरी 6 core components:
- Tool Integration: tools का registration और execution
- Memory & State: steps के बीच results का persistence
- Context Engineering: dynamic prompt assembly
- Planning: complex tasks को steps में break करना
- Verification: rules के मुकाबले output validate करना
- Modularity: components को independently toggle करना
पारंपरिक रूप से यह human engineering का क्षेत्र रहा है, जहाँ developers ToolRegistry class, MemoryManager, retry loops, prompt assembly logic आदि सीधे लिखते हैं
HyperAgents paper का सवाल है: "अगर एजेंट ये components खुद बनाए तो क्या होगा?"

HyperAgents कैसे काम करते हैं

paper में पेश किया गया framework DGM-Hyperagents(DGM-H) है, और इसका core idea सीधा है
Hyperagent एक single editable program है जिसमें दो चीजें शामिल हैं:
- दिए गए task को solve करने वाला Task Agent
- Task Agent और खुद को modify करने वाला Meta Agent
यहाँ key word है "itself"; Meta Agent अपने ही code को rewrite कर सकता है, यानी improvements पैदा करने वाला mechanism खुद improvement का target बन जाता है
paper इसे metacognitive self-modification कहता है
यह evolution loop के जरिए काम करता है:
- base agent से शुरुआत
- Meta Agent agent code पढ़ता है, past performance का analysis करता है, और modified version बनाता है
- modified version evaluate किया जाता है; performance बेहतर हो तो उसे archive में जोड़ दिया जाता है
- archive से selection के बाद यह प्रक्रिया दोहराई जाती है
सैकड़ों iterations के बाद एजेंट सिर्फ task में बेहतर नहीं होता, बल्कि खुद को बेहतर तरीके से improve करने की क्षमता भी बेहतर हो जाती है

self-improvement से उभरे components

coding, paper review, robotics reward design, olympiad math grading जैसे कई domains में self-improvement चलाने पर एजेंट ने स्वतंत्र रूप से ये चीजें invent कीं:

Persistent Memory

developer instructions के बिना ही एजेंट ने खुद memory system evolve किया
क्योंकि जो एजेंट past results याद रख सके, performance trends track कर सके, और synthesized insights store कर सके, वह ऐसे एजेंट से बेहतर निकला जो यह नहीं कर सका
paper में दिए गए memory entries के उदाहरण: causal hypotheses store करना, best-performing generations identify करना, over-modification diagnose करना, और successful strategies को combine करने के तरीके सुझाना

Performance Tracking

एजेंट ने खुद अपनी observability बनाई
इसमें improvement trends के moving averages, generation-wise aggregate stats, और domain-wise score histories शामिल थीं
यह वही functionality है जो developers harness में हाथ से token tracking और audit logging के रूप में बनाते हैं

Multi-stage Evaluation Pipelines

paper review domain में एजेंट सतही behavior instructions से evolve होकर एक स्पष्ट multi-stage evaluation pipeline तक पहुँचा
इसमें checklist, decision rules, और clearly defined criteria शामिल थे
यह इसलिए हुआ क्योंकि एजेंट ने खुद अपना verifier बनाया

Threshold-based Decision Protocols

एजेंट ने explicit decision boundaries विकसित कीं: accept/reject ratios, score thresholds, confidence levels आदि
यह harness verifiers द्वारा implement किए जाने वाले rule-based checks जैसा ही है

Domain Knowledge Bases

robotics reward design में एजेंट ने environment constraints, valid state variables, और reward scaling heuristics जैसी चीजों की internal knowledge base को धीरे-धीरे build और refine किया
यह context engineering का परिणाम है, जहाँ एजेंट ने अपने लिए सही context assemble करना सीखा

Retry and Self-Correction

जब agent modifications से performance खराब हुई, तो बाद की generations ने regression diagnose करके उसे ठीक किया
यह harness द्वारा implement किए जाने वाले feedback injection वाले retry loop जैसा ही pattern है

बड़ी तस्वीर — एक ही दिशा में converge होते trends

कई research lines में दिखा pattern एक ही flow में जुड़ता है:
- Harness Engineering: developers द्वारा agents के आसपास बनाए जाने वाले 6 components की परिभाषा
- From Copilot to Codex: human-written code से agent-delegated code की ओर बदलाव
- Universal Agents: यह दावा कि coding ability agents को universal बनाती है
- HyperAgents: agents self-modification के जरिए अपना harness खुद बनाते हैं
agents अब infrastructure के consumer से producer की ओर बढ़ रहे हैं — harness के भीतर execution से harness engineering तक
DGM-H paper का ठोस demonstration: सिर्फ एक single LLM call वाले bare agent से शुरू होकर, सैकड़ों self-modification iterations के बाद persistent memory, performance tracking, multi-stage evaluation pipeline, domain knowledge base, और modular code structure वाले system तक पहुँचना
developers की भूमिका खत्म नहीं हो रही, बल्कि transform हो रही है; paper ज़ोर देता है कि human oversight अब भी essential है
harness को सीधे build करने से भूमिका बदलकर ऐसे initial conditions design करने की ओर जा रही है, जिनसे agents effective harness evolve कर सकें

Meta के HyperAgents — जब एजेंट खुद अपना harness डिज़ाइन करते हैं

HyperAgents का अवलोकन

HyperAgents vs Universal Agents

Harness की परिभाषा और मुख्य components

HyperAgents कैसे काम करते हैं

self-improvement से उभरे components

Persistent Memory

Performance Tracking

Multi-stage Evaluation Pipelines

Threshold-based Decision Protocols

Domain Knowledge Bases

Retry and Self-Correction

बड़ी तस्वीर — एक ही दिशा में converge होते trends

संबंधित पढ़ाई

अभी कोई टिप्पणी नहीं है.