Python को compile करके इसे कहीं भी चलाने योग्य बनाना

(blog.codingconfessions.com)

13 पॉइंट द्वारा GN⁺ 2025-09-30 | 1 टिप्पणियां | WhatsApp पर शेयर करें

यह लेख pure Python code को ahead-of-time compile (AOT) करके cross-platform executable में बदलने के प्रयास और डिज़ाइन को उदाहरणों के साथ समझाता है
मुख्य विचार यह है कि नया JIT बनाने या C++ में फिर से लिखने के बजाय, symbolic tracing → IR → C++ code generation → multi-target compilation पाइपलाइन के जरिए optimized kernel तैयार किए जाएँ
PEP 484 type annotations से type propagation को seed किया जाता है, और AI code generation से सैकड़ों C++ operators अपने-आप implement कर Numpy·OpenCV·PyTorch जैसी विस्तृत library calls को कवर किया जाता है
एक ही Python function के लिए अलग-अलग implementation paths को बड़े पैमाने पर generate·deploy किया जाता है और measured telemetry से सबसे तेज variant चुना जाता है; यह empirical performance optimization रणनीति है
लक्ष्य है कंटेनर पर निर्भर न रहने वाले छोटे, तेज, portable binaries देना, ताकि उन्हें server·desktop·mobile·web तक कहीं भी चलने वाली deployment unit बनाया जा सके

Foreword

Python की सादगी और productivity उसकी ताकत है, लेकिन high-load workloads में performance और portability की सीमाएँ मौजूद हैं
guest author Yusuf Olokoba का यह लेख मूल Python को बनाए रखते हुए तेज़ और portable executable बनाने वाले compiler design का परिचय देता है
यह JIT जोड़ने या पूरी तरह C++ rewrite किए बिना पाइपलाइन बनाकर kernel optimization हासिल करने का दृष्टिकोण है

Introduction

लक्ष्य है बिना बदलाव वाले Python को पूर्ण AOT से compile करना ताकि वह interpreter के बिना चले, C/C++ के करीब तेज हो, और हर platform पर चल सके
Jython, RustPython, Numba, PyTorch, Mojo जैसी मौजूदा कोशिशों से अलग, यह language/runtime replacement के बजाय code transformation और kernel generation चुनता है
ये compiled Python functions पहले से ही हर महीने हज़ारों devices पर उपयोग में हैं

Containers Are the Wrong Way to Distribute AI

वास्तविक deployment में containers अत्यधिक payload (interpreter·packages·OS snapshot) साथ लाते हैं, जिससे startup delay और portability constraints पैदा होते हैं
इसका विकल्प है सिर्फ model वाला self-contained executable, जो छोटा आकार, तेज startup और server·desktop·mobile·web में व्यापक रूप से चलने की क्षमता देता है
मूल बात यह है कि deployment unit को OS snapshot से हटाकर self-executing binary बनाया जाए

Arm64, Apple, and Unity: How It All Began

Apple के arm64 transition के दौरान Unity ने IL2CPP के जरिए CIL को C++ में बदलकर हर target पर compile करने का रास्ता बनाया; इसी उदाहरण को benchmark किया गया
इसी सोच को Python पर लागू कर कहीं भी चल सकने वाले code paths हासिल करने की vision बनाई गई

Sketching Out a Python Compiler

ऊपरी डिज़ाइन Python input → symbolic trace (IR) → C++ generation → multi-target compilation चरणों से बना है
IR से सीधे object code पर न जाकर C++ को intermediate output चुनने का कारण CUDA·MLX·TensorRT·AMX जैसी acceleration paths का अधिकतम उपयोग करना है
लक्ष्य है ऐसा extensible design पाना जिसमें hardware-specific optimal paths आसानी से जोड़े जा सकें

Building a Symbolic Tracer for Python

शुरुआती PyTorch FX आधारित tracing में execution की आवश्यकता और PyTorch operations तक सीमित होने जैसी बाधाएँ थीं
इसके बजाय AST parsing आधारित symbolic tracer बनाया गया, जो control flow और call resolution को IR में बदलता है
मौजूदा tracer static analysis, partial evaluation, sandbox-based live value inspection जैसी सुविधाएँ देता है

Lowering to C++ via Type Propagation

Python की dynamic typing और C++ की static typing के बीच पुल बनाने के लिए type propagation का उपयोग किया जाता है
input argument types दिए जाने पर intermediate variable types को operator definitions के आधार पर deterministically infer किया जा सकता है
हर Python operation को उसके संबंधित C++ implementation से map किया जाता है और पूरे function में types propagate किए जाते हैं

Seeding the Type Propagation Process

type propagation की शुरुआत के लिए PEP 484 type annotations का उपयोग किया जाता है
यह source code को न बदलने के सिद्धांत से कुछ टकराता है, लेकिन संक्षिप्त interface और compatibility के लिए इसे स्वीकार्य समझौता माना गया है
साथ ही function signature में types की संख्या सीमित जैसी शर्तें रखी जाती हैं, ताकि सरल consumer interface सुनिश्चित हो सके

Building a Library of C++ Operators

हर function को सीधे C++ में implement नहीं किया जाता; केवल untraceable leaf operations को manually/automatically implement करना पड़ता है
अधिकांश Python code कुछ बुनियादी operations के संयोजन से बनता है, इसलिए कवर किए जाने वाले operators का सेट अपेक्षाकृत छोटा है
LLM-based code generation और constraints, tests, conditional compilation infrastructure की मदद से Numpy·OpenCV·PyTorch आदि के सैकड़ों function implementations को automate किया जाता है

Performance Optimization via Exhaustive Search

इस समझ के आधार पर कि performance optimization हमेशा empirical होती है, कई implementation variants तैयार किए जाते हैं और वास्तविक माप से सर्वश्रेष्ठ चुना जाता है
उदाहरण: Apple Silicon पर सिर्फ resize के लिए भी Accelerate, vImage, Core Image, Metal जैसी कई paths generate की जाती हैं और एक ही functionality के कई binaries deploy किए जाते हैं
fine-grained telemetry से हर path की latency इकट्ठी की जाती है और statistical model के आधार पर सबसे तेज variant का prediction और selection किया जाता है
उपयोगकर्ता के नज़रिए से इसका असर यह है कि execution अनुभव समय के साथ अपने-आप तेज होता जाता है

Designing a User Interface for the Compiler

developer experience को लगभग zero learning curve के करीब रखने के लिए PEP 318 decorator @compile को interface के रूप में अपनाया गया
CLI इस decorator को entry point बनाकर dependency code graph को खोजता और compile करता है
decorator arguments के रूप में tag·description·sandbox·metadata लेकर environment reproduction और backend selection (ONNXRuntime, TensorRT, CoreML, IREE, QNN आदि) का समर्थन किया जाता है

Closing Thoughts

exceptions, lambda, recursion, classes जैसी चीज़ें अभी आंशिक रूप से समर्थित/असमर्थित हैं, और खासकर complex types और higher-order types के लिए type propagation के विस्तार की ज़रूरत है
debugging experience भी एक चुनौती है, क्योंकि optimized compilation में symbol information कम हो जाती है और tracing कठिन बन जाती है
C++20 के std::span, concepts, coroutines इसकी मुख्य नींव हैं, जबकि C++23 के std::generator, <stdfloat>, <stacktrace> आगे चलकर streaming·half/bfloat16·exception tracing में योगदान देंगे
अंतिम लक्ष्य है container के बिना छोटे, तेज, सुरक्षित executable के रूप में embedding/detection जैसे AI workloads के लिए कहीं भी चलने योग्य deployment unit स्थापित करना

1 टिप्पणियां

secret3056 2025-09-30

मुझे लगा था कि यह APE जैसी कोई चीज़ है, लेकिन ऐसा नहीं है।

Python को compile करके इसे कहीं भी चलाने योग्य बनाना

Foreword

Introduction

Containers Are the Wrong Way to Distribute AI

Arm64, Apple, and Unity: How It All Began

Sketching Out a Python Compiler

Building a Symbolic Tracer for Python

Lowering to C++ via Type Propagation

Seeding the Type Propagation Process

Building a Library of C++ Operators

Performance Optimization via Exhaustive Search

Designing a User Interface for the Compiler

Closing Thoughts

संबंधित पढ़ाई

1 टिप्पणियां