GPT-5.3-Codex जारी

(openai.com)

7 पॉइंट द्वारा GN⁺ 2026-02-06 | अभी कोई टिप्पणी नहीं है. | WhatsApp पर शेयर करें

GPT-5.2-Codex की coding performance और GPT-5.2 की reasoning·domain knowledge को एक ही मॉडल में जोड़कर, 25% अधिक तेज़ गति प्रदान करता है
Codex के agentic work scope को long-running tasks तक बढ़ाता है, और काम के बीच भी दिशा बदलने व feedback को शामिल करने वाली real-time interactive collaboration को Codex app में एकीकृत करता है
अपने ही training process की debugging, deployment management और evaluation diagnosis में शुरुआती संस्करण का सीधे उपयोग किया गया पहला self-development-participating model
SWE-Bench Pro, Terminal-Bench 2.0, OSWorld जैसे प्रमुख benchmarks में उद्योग का सर्वोच्च प्रदर्शन दर्ज किया, और पिछले मॉडल की तुलना में कम tokens में काम करता है
code writing से आगे बढ़कर presentation, spreadsheet, data analysis जैसे पूरे software lifecycle के knowledge work को support करता है

अवलोकन

GPT-5.3-Codex को “सबसे सक्षम agentic coding model” बताया गया है
GPT-5.2-Codex की frontier coding performance और GPT-5.2 की reasoning·domain knowledge क्षमताओं को एक मॉडल में जोड़कर, गति में 25% सुधार किया गया
research, tool use और complex execution वाले long-running tasks के लिए डिज़ाइन किया गया, और काम के दौरान भी context खोए बिना समन्वय कर सकता है
शुरुआती संस्करण का उपयोग करके self-training·deployment·evaluation diagnosis में भाग लेते हुए “अपने ही development को accelerate” करने वाला पहला मॉडल
Codex की भूमिका को “code writing·review” से बढ़ाकर “कंप्यूटर पर developer·expert द्वारा किए जाने वाले लगभग सभी काम” तक विस्तारित किया गया

Frontier agentic capabilities

Coding performance
- SWE-Bench Pro (वास्तविक software engineering evaluation) में सर्वोच्च प्रदर्शन हासिल किया; यह benchmark Python-only SWE-Bench Verified के विपरीत 4 भाषाओं को कवर करता है, contamination resistance अधिक है और industry relevance भी अधिक है
- Terminal-Bench 2.0 में भी पिछले सर्वोच्च प्रदर्शन को काफ़ी पीछे छोड़ा; यह coding agents के लिए आवश्यक terminal skills को मापता है
- पिछले मॉडल की तुलना में कम tokens के साथ वही काम कर सकता है
Web development
- frontier coding capability, बेहतर aesthetic sense, और compression techniques के संयोजन से complex games और apps को कई दिनों में scratch से बनाया जा सकता है
- web development और long-term agentic capability testing के लिए दो गेम बनाए गए: racing game और diving game
  - racing game: कई racers, 8 maps, और spacebar से इस्तेमाल होने वाले items शामिल
  - diving game: विभिन्न coral reefs की खोज, fish catalog collection, oxygen·water pressure·hazards management
- "develop web game" skill और "fix the bug", "improve the game" जैसे preconfigured generic follow-up prompts का उपयोग करके लाखों tokens तक autonomously iterative improvement किया
- रोज़मर्रा की websites बनाते समय GPT-5.2-Codex की तुलना में user intent को बेहतर समझता है, और simple या insufficient prompts पर भी ज़्यादा features और sensible defaults अपने आप लागू करता है
- landing page comparison उदाहरण: GPT-5.3-Codex discounted monthly pricing के साथ annual plan को अपने आप दिखाता है, और 3 user quotes वाले auto-converting recommendation carousel बनाकर अधिक polished output देता है
Coding के बाहर की capabilities
- software engineers, designers, product managers, data scientists द्वारा किए जाने वाले debugging, deployment, monitoring, PRD writing, copy editing, user research, testing, metric analysis जैसे पूरे software lifecycle को support करता है
- slide deck creation, sheet data analysis जैसे software के बाहर के क्षेत्रों तक भी विस्तार
- GDPval (44 job categories के well-defined knowledge work tasks को मापने वाला evaluation) में GPT-5.2 के बराबर 70.9% हासिल किया
  - इसमें presentation, spreadsheet जैसे वास्तविक work outputs शामिल हैं
- financial advisory slides, retail training documents, NPV analysis spreadsheets, fashion presentation PDFs जैसे विभिन्न outputs के उदाहरण दिए गए
- OSWorld-Verified (visual desktop environment में productivity tasks करने वाले agentic computer-use benchmark) में 64.7% हासिल किया, जो पिछले GPT model (38.2%) की तुलना में बड़ी बढ़त है
  - मानव स्कोर लगभग 72% है

Interactive collaborator

मॉडल capability बढ़ने के साथ मुख्य चुनौती यह हो गई है कि agent क्या कर सकता है से अधिक, मानव कितनी आसानी से parallel में काम कर रहे कई agents को निर्देश और supervise कर सकता है
Codex app agent management और instruction को आसान बनाता है, और GPT-5.3-Codex में ज़्यादा interactivity प्रदान करता है
काम के दौरान अहम निर्णयों और progress पर बार-बार updates देता है, जिससे उपयोगकर्ता final result का इंतज़ार किए बिना real time में सवाल पूछ सके, approach पर चर्चा कर सके और दिशा बदल सके
यह बताता है कि वह क्या कर रहा है, feedback का जवाब देता है, और शुरुआत से अंत तक उपयोगकर्ता को loop में बनाए रखता है
setting path: Settings > General > Follow-up behavior में मॉडल के काम के दौरान steering सक्रिय करें

Codex का उपयोग करके GPT-5.3-Codex की training और deployment

OpenAI भर में महीनों से लेकर वर्षों तक चले research projects के आधार पर हाल के तेज़ Codex improvements बनाए गए हैं
OpenAI के कई researchers और engineers का कहना है कि उनका मौजूदा काम करने का तरीका 2 महीने पहले की तुलना में मूल रूप से अलग है
GPT-5.3-Codex के शुरुआती संस्करण ने भी बेहतरीन capability दिखाई, इसलिए टीम ने इसी शुरुआती version का उपयोग बाद के versions की training improvements और deployment support के लिए किया
Research team use cases
- इस release के training run monitoring और debugging में Codex का उपयोग किया गया
- infrastructure issues की debugging से आगे बढ़कर training process के पैटर्न tracking, interaction quality के deep analysis, fix suggestions, और पिछले model के साथ behavior differences को बारीकी से समझने के लिए rich applications बनाए गए
Engineering team use cases
- GPT-5.3-Codex के लिए harness optimization और adaptation में Codex का उपयोग किया गया
- user-impacting अजीब edge cases आने पर Codex से context rendering bugs की पहचान और low cache hit rate के root cause analysis किए गए
- launch period के दौरान traffic spikes से निपटने के लिए GPU cluster dynamic scaling और latency stabilization में लगातार उपयोग किया गया
Alpha test use cases
- एक researcher यह समझना चाहता था कि GPT-5.3-Codex प्रति turn कितना अतिरिक्त काम करता है और productivity difference क्या है
- GPT-5.3-Codex ने clarification questions की आवृत्ति, positive·negative responses, और task progress का अनुमान लगाने वाले सरल regex classifiers कई बनाए, उन्हें पूरे session logs पर बड़े पैमाने पर चलाया, और फिर निष्कर्ष रिपोर्ट तैयार की
- Codex के साथ बनाने वाले लोगों की संतुष्टि अधिक थी; agent user intent को बेहतर समझता था, प्रति turn अधिक progress दिखाता था, और clarification questions कम थे
Data pipeline निर्माण
- alpha test data पिछले models से बहुत अलग था, इसलिए असामान्य और counterintuitive results कई बार सामने आए
- data scientists ने GPT-5.3-Codex के साथ नई data pipeline बनाई, और standard dashboard tools की तुलना में काफ़ी अधिक समृद्ध visualizations किए
- Codex के साथ results का संयुक्त analysis करके, हज़ारों data points से निकले key insights को 3 मिनट के भीतर summarize किया गया

Cybersecurity frontier को सुरक्षित करना

पिछले कुछ महीनों में cybersecurity tasks पर मॉडल का प्रदर्शन अर्थपूर्ण रूप से बेहतर हुआ है, जिससे developers और security experts दोनों को लाभ मिलता है
इसके साथ ही defensive use और broader ecosystem resilience को support करने के लिए मज़बूत cybersecurity safeguards तैयार किए गए
Preparedness Framework के तहत cybersecurity-related tasks के लिए High rating पाने वाला यह पहला मॉडल है, और software vulnerability identification पर सीधे trained होने वाला भी पहला मॉडल है
end-to-end cyberattack automation संभव होने का निर्णायक प्रमाण नहीं है, फिर भी preventive approach अपनाते हुए अब तक का सबसे व्यापक cybersecurity safety stack deploy किया गया है
- safety training, automated monitoring, advanced features के लिए trust-based access, और threat intelligence सहित enforcement pipeline
cybersecurity की मूल dual-use प्रकृति को देखते हुए, defenders की vulnerability discovery·fixing क्षमता को तेज़ करते हुए misuse को धीमा करने वाला evidence-based iterative approach अपनाया गया
Defensive research और ecosystem protection programs
- cybersecurity defense research को accelerate करने के लिए Trusted Access for Cyber pilot program लॉन्च किया गया
- security research agent Aardvark की private beta का विस्तार किया गया; यह Codex Security product family की पहली offering है
- open source maintainers के साथ मिलकर व्यापक रूप से उपयोग किए जाने वाले projects (जैसे Next.js) के लिए free codebase scanning दी जा रही है
  - security researchers ने Codex का उपयोग करके पिछले हफ्ते सार्वजनिक हुई vulnerabilities (CVE-2025-59471, CVE-2025-59472) खोजीं
- 2023 में शुरू हुए 1 million dollar cybersecurity grant program के आधार पर, सबसे शक्तिशाली models के उपयोग से cyber defense को accelerate करने के लिए 10 million dollar API credits का अतिरिक्त निवेश किया गया
  - विशेष रूप से open source software और critical infrastructure systems के लिए
  - good-faith security research में शामिल organizations Cybersecurity Grant Program के माध्यम से API credits और support के लिए आवेदन कर सकती हैं

उपलब्धता और विवरण

GPT-5.3-Codex paid ChatGPT plans में उपलब्ध है, और जहाँ-जहाँ Codex supported है (app, CLI, IDE extension, web) वहाँ इस्तेमाल किया जा सकता है
API access को सुरक्षित रूप से enable करने की तैयारी चल रही है
infrastructure और inference stack improvements की बदौलत Codex users के लिए 25% तेज़ गति से चलाया जा रहा है, जिससे तेज़ interaction और results मिलते हैं
NVIDIA GB200 NVL72 systems पर co-design, training और serving किया गया

आगे की दिशा

Codex code writing से आगे बढ़कर code को tool की तरह इस्तेमाल करते हुए कंप्यूटर को संचालित करने और tasks को शुरू से अंत तक पूरा करने की दिशा में जा रहा है
coding agents की frontier का विस्तार करके software build·deployment के साथ-साथ research, analysis, complex task execution जैसे और व्यापक knowledge work क्षेत्रों को unlock किया जा रहा है
सर्वश्रेष्ठ coding agent से शुरू होकर यह कंप्यूटर पर एक general-purpose collaborator के रूप में विकसित हो रहा है, जिससे क्या बनाया जा सकता है और कौन बना सकता है—दोनों का दायरा बढ़ता है

Appendix: benchmark आँकड़े

सभी evaluations को xhigh reasoning effort के साथ चलाया गया
SWE-Bench Pro(Public): GPT-5.3-Codex 56.8% / GPT-5.2-Codex 56.4% / GPT-5.2 55.6%
Terminal-Bench 2.0: GPT-5.3-Codex 77.3% / GPT-5.2-Codex 64.0% / GPT-5.2 62.2%
OSWorld-Verified: GPT-5.3-Codex 64.7% / GPT-5.2-Codex 38.2% / GPT-5.2 37.9%
GDPval (win या tie): GPT-5.3-Codex 70.9% / GPT-5.2 70.9%(high)
Cybersecurity Capture The Flag Challenges: GPT-5.3-Codex 77.6% / GPT-5.2-Codex 67.4% / GPT-5.2 67.7%
SWE-Lancer IC Diamond: GPT-5.3-Codex 81.4% / GPT-5.2-Codex 76.0% / GPT-5.2 74.6%

GPT-5.3-Codex जारी

अवलोकन

Frontier agentic capabilities

Coding performance

Web development

Coding के बाहर की capabilities

Interactive collaborator

Codex का उपयोग करके GPT-5.3-Codex की training और deployment

Research team use cases

Engineering team use cases

Alpha test use cases

Data pipeline निर्माण

Cybersecurity frontier को सुरक्षित करना

Defensive research और ecosystem protection programs

उपलब्धता और विवरण

आगे की दिशा

Appendix: benchmark आँकड़े

संबंधित पढ़ाई

अभी कोई टिप्पणी नहीं है.