OpenAI की एजेंट बनाने के लिए व्यावहारिक गाइड

LLM की reasoning, multimodal, और tool-use क्षमताओं में सुधार के साथ, स्वतंत्र रूप से workflows को पूरा करने वाली नई सिस्टम श्रेणी agents सामने आई है, जो उपयोगकर्ताओं की ओर से काम करती है
agents तीन मुख्य घटकों से बने होते हैं: model (LLM), tools (API/बाहरी functions), और instructions (guidelines); इन्हें single-agent या multi-agent system के रूप में orchestrate किया जा सकता है
जिन workflows में complex decision-making, maintain करना कठिन rule systems, और unstructured data processing की आवश्यकता होती है, वहाँ agents को अपनाना उपयुक्त है
guardrails डेटा privacy, content safety, और brand consistency की रक्षा करने वाला multi-layer defense mechanism हैं, और agent deployment का आवश्यक हिस्सा हैं
single agent से शुरू करके, वास्तविक users के साथ validation के बाद धीरे-धीरे विस्तार करने वाला iterative approach सफल deployment की कुंजी है

एजेंट की परिभाषा

agent एक ऐसा सिस्टम है जो उपयोगकर्ता की ओर से स्वतंत्र रूप से कार्य करता है, और customer service issue resolution, restaurant reservation, code change commit, report generation जैसे workflows संभाल सकता है
जो applications LLM को integrate तो करते हैं, लेकिन workflow execution को control नहीं करते—जैसे simple chatbot, single-turn LLM, sentiment classifier—वे agents नहीं हैं
agent की मुख्य विशेषताएँ:
- LLM का उपयोग करके workflow execution को manage करना, निर्णय लेना, workflow पूरा होने का सही समय पहचानना, और आवश्यकता पड़ने पर proactively अपना व्यवहार बदलना
- failure होने पर execution रोकना और control वापस user को देना
- विभिन्न tools तक access करके external systems के साथ interact करना, और workflow की current state के अनुसार उपयुक्त tool को dynamically चुनना, लेकिन स्पष्ट guardrails के भीतर operate करना

एजेंट कब बनाने चाहिए

पारंपरिक automation की तुलना में, agents उन workflows के लिए उपयुक्त हैं जहाँ deterministic और rule-based approaches अपनी सीमा पर पहुँच जाती हैं
payment fraud analysis उदाहरण: पारंपरिक rule engine predefined criteria के आधार पर transactions को flag करने वाला checklist approach अपनाता है, जबकि LLM agent context का मूल्यांकन करता है, subtle patterns को ध्यान में रखता है, और स्पष्ट rule violation न होने पर भी suspicious activity की पहचान करने वाले अनुभवी investigator की तरह काम करता है
तीन प्रकार की स्थितियाँ जहाँ agents मूल्य जोड़ते हैं:
- complex decision-making: ऐसे workflows जहाँ सूक्ष्म judgment, exceptions, और context-sensitive decisions की आवश्यकता हो (जैसे customer service में refund approval)
- maintain करना कठिन rules: ऐसे systems जिनमें बड़े और जटिल rule sets हों, जहाँ updates महँगे हों या errors की संभावना अधिक हो (जैसे vendor security review)
- unstructured data पर उच्च निर्भरता वाले scenarios: natural language interpretation, documents से meaning extract करना, और conversational user interaction (जैसे home insurance claim processing)
यदि ये मानदंड स्पष्ट रूप से पूरे नहीं होते, तो deterministic solution ही पर्याप्त हो सकता है

एजेंट डिज़ाइन की बुनियाद

तीन मुख्य घटक
- Model: agent की reasoning और decision-making को चलाने वाला LLM
- Tools: external functions या APIs जिनका उपयोग agent action लेने के लिए करता है
- Instructions: explicit guidelines और guardrails जो agent के व्यवहार को परिभाषित करते हैं
मॉडल चयन
- हर task के लिए सबसे शक्तिशाली model की ज़रूरत नहीं होती — simple retrieval या intent classification को छोटे और तेज models से संभाला जा सकता है, जबकि refund approval decision जैसे कठिन tasks में अधिक शक्तिशाली model फायदेमंद होता है
- prototype चरण में सबसे शक्तिशाली model से performance baseline सेट करने के बाद, छोटे models से replace करके यह जाँचना कि acceptable results मिलते हैं या नहीं—यह तरीका प्रभावी है
- model selection के सिद्धांत:
  - performance baseline स्थापित करने के लिए evals सेट करें
  - सर्वोत्तम model के साथ accuracy target हासिल करने पर ध्यान दें
  - जहाँ संभव हो, cost और latency optimization के लिए छोटे models अपनाएँ
tools की परिभाषा
- tools, base application या system की APIs का उपयोग करके agent की capabilities बढ़ाते हैं
- यदि legacy system में API नहीं है, तो computer-use model की मदद से web और application UI के माध्यम से सीधे interaction किया जा सकता है
- हर tool की standardized definition होनी चाहिए, ताकि tools और agents के बीच flexible many-to-many relationship को support किया जा सके
- अच्छी documentation और thorough testing वाले reusable tools discoverability बढ़ाते हैं, version management को सरल बनाते हैं, और duplicate definitions को रोकते हैं
- agent के लिए tools के तीन प्रकार:
  - Data: workflow execution के लिए ज़रूरी context और information retrieve करना (जैसे transaction DB query, CRM system, PDF पढ़ना, web search)
  - Action: systems के साथ interact करके DB में information जोड़ना, records update करना, messages भेजना जैसे actions करना (जैसे email/SMS भेजना, CRM record update करना, customer service ticket को human तक escalate करना)
  - Orchestration: agent स्वयं दूसरे agents के लिए tool की तरह काम करे (जैसे refund agent, research agent, writing agent)
instructions की संरचना
- high-quality instructions सभी LLM-based apps के लिए ज़रूरी हैं, लेकिन agents में विशेष रूप से महत्वपूर्ण हैं
- स्पष्ट instructions ambiguity कम करती हैं और agent decision-making में सुधार करती हैं, जिससे workflows अधिक smoothly चलते हैं और errors कम होते हैं
- agent instructions के best practices:
  - मौजूदा documents का उपयोग: existing operating procedures, support scripts, और policy documents का उपयोग करके LLM-friendly routines बनाना (customer service में routines लगभग knowledge base के individual documents से map होते हैं)
  - task decomposition prompts: dense resources से छोटे और स्पष्ट steps प्रदान करके ambiguity कम करना
  - clear action definition: routine के हर step को किसी specific action या output से स्पष्ट रूप से जोड़ना (जैसे order number माँगना, API call से account details retrieve करना)
  - edge cases को पकड़ना: incomplete information देने वाले users या unexpected questions जैसी सामान्य variations का अनुमान लगाकर conditional steps या branches में handling शामिल करना
- o1 या o3‑mini जैसे advanced models का उपयोग करके existing documents से instructions को automatically generate करना भी संभव है

orchestration

single-agent system
- एक single agent, धीरे-धीरे tools जोड़ते हुए, कई tasks संभाल सकता है; इससे complexity management और evaluation/maintenance सरल हो जाते हैं
- हर orchestration approach में 'run' की अवधारणा चाहिए, जो आम तौर पर ऐसे loop के रूप में लागू होती है जहाँ agent termination condition तक काम करता है
- सामान्य termination conditions: tool call, specific structured output, error, या maximum turn count तक पहुँचना
- Agents SDK में Agents.run() method से agent शुरू होता है, और final output tool call या tool call के बिना model response मिलने पर loop समाप्त हो जाता है
- prompt template strategy: कई अलग-अलग prompts की जगह policy variables लेने वाला एक flexible base prompt इस्तेमाल करना, जिससे अलग-अलग contexts के अनुसार adaptation संभव हो और maintenance व evaluation काफी सरल हो जाएँ
multi-agent system पर कब जाना चाहिए
- सामान्य recommendation यह है कि पहले single agent की capabilities को अधिकतम किया जाए
- अधिक agents intuitive conceptual separation देते हैं, लेकिन अतिरिक्त complexity और overhead भी लाते हैं; इसलिए कई बार tools वाला single agent ही पर्याप्त होता है
- agent splitting के practical guidelines:
  - complex logic: यदि prompt में कई conditional statements (if-then-else branches) हों और prompt template को scale करना कठिन हो जाए, तो हर logical segment को अलग agent में बाँटें
  - tool overload: समस्या tools की संख्या नहीं, बल्कि उनकी similarity या duplication में होती है — कुछ implementations 15 से अधिक clearly differentiated tools को सफलतापूर्वक manage करती हैं, जबकि 10 से कम overlapping tools के साथ भी कठिनाई आ सकती है
manager pattern (agent को tool की तरह उपयोग करना)
- केंद्रीय LLM "manager" tool calls के माध्यम से specialized agents के network को orchestrate करता है
- manager context या control खोए बिना, सही समय पर सही agent को task delegate करता है और परिणामों को एकीकृत interaction में synthesize करता है
- यह उन workflows के लिए उपयुक्त है जहाँ केवल एक agent को workflow execution control करना हो और user तक access रखना हो
- उदाहरण: translation agent, Spanish, French, और Italian agents को tools की तरह call करता है
decentralized pattern (agents के बीच handoff)
- agent workflow execution को दूसरे agent को 'handoff' करने वाले one-way transition pattern का उपयोग करता है
- Agents SDK में handoff एक प्रकार का tool या function है; handoff function call होने पर latest conversation state पास की जाती है और नया agent तुरंत execution शुरू कर देता है
- यह वहाँ सबसे उपयुक्त है जहाँ एक single agent को central control या synthesis बनाए रखने की आवश्यकता न हो, और हर agent execution संभालते हुए user के साथ सीधे interact करे
- उदाहरण: triage agent user query का मूल्यांकन करके उसे technical support, sales, या order management agent तक route करता है
declarative vs non-declarative graph
- कुछ frameworks में declarative तरीके से सभी branches, loops, और conditions को nodes (agents) और edges (handoffs) वाले graph के रूप में पहले से define करना पड़ता है — इससे visual clarity मिलती है, लेकिन workflow अधिक dynamic और complex होने पर यह cumbersome हो सकता है और domain-specific language सीखनी पड़ती है
- Agents SDK code-first approach अपनाता है, जिससे परिचित programming structures के माध्यम से workflow logic को सीधे व्यक्त किया जा सकता है; पूरा graph पहले से define किए बिना अधिक dynamic और adaptable agent orchestration संभव होती है

guardrails

guardrails की भूमिका
- ये data privacy risks (जैसे system prompt leakage रोकना) और reputation risks (जैसे model behavior को brand के अनुरूप रखना) को manage करने में मदद करते हैं
- एक single guardrail पर्याप्त protection नहीं दे पाता; कई specialized guardrails को साथ में इस्तेमाल करके अधिक resilient agent बनाना ज़रूरी है
- guardrails महत्वपूर्ण component हैं, लेकिन इन्हें strong authentication/authorization protocols, strict access control, और standard software security measures के साथ जोड़ना चाहिए
guardrails के प्रकार
- Relevance classifier: जाँचता है कि agent response intended scope के भीतर है या नहीं, और off-topic queries को flag करता है (जैसे: "Empire State Building की ऊँचाई क्या है?" को off-topic flag करना)
- Safety classifier: unsafe inputs का पता लगाता है, जैसे system vulnerabilities का फायदा उठाने की कोशिश करने वाले jailbreak या prompt injection
- PII filter: model output में personally identifiable information (PII) के अनावश्यक exposure को रोकता है
- Moderation: hate speech, harassment, violence जैसी harmful या inappropriate inputs को flag करता है
- Tool safeguards: हर tool को read-only vs write access, reversibility, required account permissions, और financial impact के आधार पर low/medium/high risk rating देना; और high-risk functions से पहले guardrail checks pause करना या human escalation जैसे automated actions trigger करना
- rules-based protections: blocklists, input length limits, regex filters जैसी simple deterministic measures से prohibited terms या SQL injection जैसे known threats को रोकना
- output validation: prompt engineering और content checks के माध्यम से यह सुनिश्चित करना कि responses brand values के अनुरूप हों
guardrails बनाने का तरीका
- पहले उन risks के लिए guardrails सेट करें जो पहले से पहचाने जा चुके हैं, और नई vulnerabilities मिलने पर अतिरिक्त layers जोड़ें
- प्रभावी heuristics:
  - data privacy और content safety पर ध्यान दें
  - वास्तविक edge cases और failure examples के आधार पर नए guardrails जोड़ें
  - security और user experience दोनों को optimize करें, और agent के evolve होने के साथ guardrails को adjust करें
- Agents SDK में guardrails को first-class concept माना जाता है, और default रूप से optimistic execution model अपनाया जाता है — base agent proactively output generate करता है जबकि guardrails समानांतर रूप से चलते हैं, और violation होने पर exception trigger करते हैं
human-in-the-loop योजना
- human intervention एक मुख्य safety mechanism है, जो user experience को नुकसान पहुँचाए बिना agent की वास्तविक performance बेहतर बना सकता है
- deployment के शुरुआती चरण में यह विशेष रूप से महत्वपूर्ण है, क्योंकि यह failures की पहचान, edge cases की खोज, और robust evaluation cycle स्थापित करने में मदद करता है
- human intervention के दो मुख्य triggers:
  - failure threshold पार होना: agent retries या actions पर limits सेट करें, और limit पार होने पर (जैसे कई प्रयासों के बाद भी customer intent समझने में विफलता) human को escalate करें
  - high-risk actions: sensitive, irreversible, या high-stakes actions (जैसे user order cancel करना, large refund approve करना, payment process करना) के लिए, जब तक agent पर पर्याप्त भरोसा न हो जाए, human oversight आवश्यक है

निष्कर्ष

agents workflow automation के नए युग का प्रतिनिधित्व करते हैं, जो ambiguity पर reasoning कर सकते हैं, tools के माध्यम से action ले सकते हैं, और multi-step tasks को उच्च autonomy के साथ संभाल सकते हैं
simple LLM applications से अलग, agents end-to-end workflows execute करते हैं, इसलिए वे complex decision-making, unstructured data, और fragile rule-based systems के लिए उपयुक्त हैं
भरोसेमंद agents बनाने के लिए: सक्षम model, अच्छी तरह परिभाषित tools, और स्पष्ट structured instructions को मिलाएँ; complexity के अनुरूप orchestration pattern अपनाएँ; और single agent से शुरू करके केवल आवश्यकता होने पर multi-agent तक विस्तार करें
guardrails input filtering से लेकर tool use और human intervention तक हर चरण में महत्वपूर्ण हैं, और सुनिश्चित करते हैं कि agents production में सुरक्षित और predictable तरीके से operate करें
सफल deployment all-or-nothing नहीं है, बल्कि छोटे स्तर से शुरू करके, वास्तविक users के साथ validate करते हुए, समय के साथ capabilities बढ़ाने वाला iterative approach है

PDF के रूप में भी उपलब्ध

OpenAI की एजेंट बनाने के लिए व्यावहारिक गाइड

एजेंट की परिभाषा

एजेंट कब बनाने चाहिए

एजेंट डिज़ाइन की बुनियाद

तीन मुख्य घटक

मॉडल चयन

tools की परिभाषा

instructions की संरचना

orchestration

single-agent system

multi-agent system पर कब जाना चाहिए

manager pattern (agent को tool की तरह उपयोग करना)

decentralized pattern (agents के बीच handoff)

declarative vs non-declarative graph

guardrails

guardrails की भूमिका

guardrails के प्रकार

guardrails बनाने का तरीका

human-in-the-loop योजना

निष्कर्ष

संबंधित पढ़ाई

अभी कोई टिप्पणी नहीं है.