SQL इंजन की संरचना का विश्लेषण

(dolthub.com)

2 पॉइंट द्वारा GN⁺ 2025-04-29 | अभी कोई टिप्पणी नहीं है. | WhatsApp पर शेयर करें

SQL engine डेटाबेस की logical layer है, जो client और storage के बीच काम करती है
SQL इंजन की मुख्य प्रक्रियाएँ parsing, binding, plan simplification, join exploration एवं cost evaluation, execution, और result spooling हैं, जिनका विस्तृत वर्णन दिया गया है
- parsing SQL query को structured abstract syntax tree (AST) में बदलता है
- binding AST के fields को मौजूदा database catalog के symbols से match करता है
- plan simplification SQL के complex syntax को simplified form में normalize करके execution speed बढ़ाता है
- plan exploration join, aggregate, और window function के विभिन्न variants को explore करके optimal execution plan खोजता है
- execution और result spooling अंतिम plan को executable form में बदलते हैं और result को client तक लौटाते हैं

SQL इंजन का अवलोकन

SQL इंजन client requests और data storage के between एक logical intermediary layer है
मुख्य चरण
- Parsing: query को AST (abstract syntax tree) में बदलना
- Binding: AST के identifiers को database catalog के symbols से जोड़ना
- Plan Simplification: विभिन्न SQL syntax को standardized plan form में simplify करना
- Join Exploration & Costing: अलग-अलग join order को explore करना और cost evaluate करना
- Execution: optimal execution plan का उपयोग करके query चलाना
- Spooling Results: result को client तक लौटाना

Parsing

parsing input query को tokenize करके AST में बदलने की प्रक्रिया है
right-recursive parser को समझना और debug करना आसान होता है, लेकिन यह stack memory ज्यादा उपयोग करता है
left-recursive parser (Yacc आधारित) memory-efficient होता है, लेकिन इसके लिए complex logic की आवश्यकता होती है
Dolt तेज parsing के लिए left-recursive parser का उपयोग करता है
parsing सफल होने पर AST structure Yacc rules से मेल खाती है

Binding

binding AST के fields को database की वास्तविक tables और column symbols से जोड़ने की प्रक्रिया है
मुख्य अवधारणाएँ
- table definition: data source की भूमिका
- column definition: table source में किसी specific column को refer करना
- alias: scalar value को source और sink दोनों के रूप में उपयोग करना
- scalar subquery: parent scope को share करते हुए name binding करना
binding के परिणामस्वरूप sql.Node format के execution plan nodes बनते हैं

Plan Simplifications

विभिन्न SQL expressions को normalized form में बदलकर execution optimization में मदद करने की प्रक्रिया
प्रमुख optimizations
- filter pushdown: अनावश्यक rows हटाना
- column pruning: गैर-ज़रूरी columns हटाना
subquery decorrelation जैसी transformations के जरिए join plan optimization भी किया जाता है

Type Coercion

type coercion context के अनुसार expression type को automatically convert करने की प्रक्रिया है
WHERE, INSERT आदि contexts के अनुसार type बदल सकती है
Dolt binding चरण में type conversion को धीरे-धीरे handle कर रहा है

Plan Exploration

join exploration विभिन्न join orders को generate और review करने की प्रक्रिया है
दो exploration strategies
- top-down backtracking: केवल valid join orders को explore करना
- bottom-up dynamic programming (DP): सभी combinations आज़माकर optimal join order खोजना
intermediate states को efficiently manage करने के लिए Memo structure का उपयोग होता है

Functional Dependencies

5 या उससे अधिक tables के join में cost तेज़ी से बढ़ सकती है
primary key (PK) आधारित join जैसी 1:1 relationship का उपयोग करने पर exploration cost कम की जा सकती है
optimization के लिए LOOKUP_JOIN को प्राथमिकता से consider किया जाता है

IR Intermission

अब तक के 3-stage IR का सार
- AST: tokens को व्यवस्थित करना
- scope binding: column references को verify करना
- Memo: join exploration और cost evaluation के लिए representation

Join Costing

join costing सभी संभावित plans के execution cost का अनुमान लगाने की प्रक्रिया है
cost factors
- input table size
- result table size
- join operator का प्रकार (LOOKUP_JOIN, HASH_JOIN आदि)
Dolt सटीक table statistics (histogram) के आधार पर cost evaluate करता है

Join Hints

user द्वारा दिए गए hints के अनुसार join strategy को प्राथमिकता से लागू करने की कोशिश की जाती है
विरोधाभासी या अनुपयुक्त hints को नज़रअंदाज़ कर दिया जाता है

Execution

optimal plan को वास्तविक executable iterator (Volcano Iterator) structure में बदला जाता है
विशेषताएँ
- non-materializing iterator: तुरंत rows लौटाता है
- materializing iterator: सभी rows collect करने के बाद लौटाता है
column references को execution से पहले index offset आधारित mapping में बदला जाता है

I/O एवं Result Spooling

execution result को MySQL protocol format में बदलकर client तक भेजा जाता है
कुछ मामलों में key-value (KV) storage layer से सीधे result पढ़कर optimization भी किया जाता है
batch processing और buffer reuse के जरिए throughput और memory efficiency बेहतर की जाती है

Future

Dolt मूल रूप से local server पर row-based execution का उपयोग करता है
AST, scope-based binding, और Memo structure के जरिए join exploration जैसी 3-stage intermediate representation (IR) का उपयोग optimal execution plan बनाने में किया जाता है
Join Search और Join Costing के जरिए optimal join strategy तय की जाती है
आगे IR integration और memory reuse optimization के जरिए performance improvement की योजना है

SQL इंजन की संरचना का विश्लेषण

SQL इंजन का अवलोकन

Parsing

Binding

Plan Simplifications

Type Coercion

Plan Exploration

Functional Dependencies

IR Intermission

Join Costing

Join Hints

Execution

I/O एवं Result Spooling

Future

संबंधित पढ़ाई

अभी कोई टिप्पणी नहीं है.