ChatGPT जैसे LLM को शुरुआत से चरण-दर-चरण लागू करना

(github.com/rasbt)

8 पॉइंट द्वारा GN⁺ 2024-01-28 | 1 टिप्पणियां | WhatsApp पर शेयर करें

rasbt/LLMs-from-scratch एक repository है जिसमें GPT जैसे LLM को develop, pretrain और fine-tune करने का code है, और यह Manning की किताब Build a Large Language Model (From Scratch) की official code repository है
सीखने का तरीका educational उद्देश्य से छोटा लेकिन काम करने वाला model शुरुआत से बनाने की प्रक्रिया पर आधारित है, और यह ChatGPT के पीछे मौजूद बड़े foundation model बनाने की approach जैसी flow को follow करता है
इसमें text data processing, attention mechanism, GPT implementation, unlabeled data pretraining, text classification fine-tuning, और instruction-following fine-tuning तक chapter-wise code और notebooks दिए गए हैं
मुख्य chapters का code सामान्य notebooks पर reasonable समय में चलने के लिए design किया गया है, उपलब्ध होने पर GPU का automatic उपयोग करता है, और external LLM libraries के बिना PyTorch में implement किया गया है
appendices और bonus materials LoRA, KV Cache, MoE, Llama/Qwen/Gemma परिवारों के implementations, evaluation, DPO, और UI examples तक विस्तार करते हैं, जिससे LLM learning process को practice-focused तरीके से और व्यापक बनाया जा सकता है

repository का उद्देश्य और किताब से संबंध

rasbt/LLMs-from-scratch GPT जैसे LLM को शुरुआत से implement करने की code repository है
इसे Manning की किताब Build a Large Language Model (From Scratch) की official code repository के रूप में उपलब्ध कराया गया है
किताब की संरचना ऐसी है कि step-by-step coding के जरिए समझा जा सके कि LLM अंदर से कैसे काम करते हैं
- explanation में text, diagrams और examples शामिल हैं
- educational उद्देश्य से छोटा लेकिन काम करने वाला model खुद develop और train किया जाता है
repository में बड़े pretrained model weights load करके fine-tune करने वाला code भी शामिल है
किताब की जानकारी:
- Manning book page
- Amazon.com book page
- ISBN: 9781633437166

installation और code usage

repository को ZIP download या git clone से प्राप्त किया जा सकता है

git clone --depth 1 https://github.com/rasbt/LLMs-from-scratch.git

अगर Manning website से code bundle लिया गया है, तो latest updates GitHub की official repository में check करने की सलाह दी गई है
Python और package installation, code environment setup को setup/README.md में cover किया गया है
problem solving document Troubleshooting Guide के रूप में उपलब्ध है

chapter-wise learning flow

किताब और repository LLM implementation को step-by-step curriculum में बांटते हैं
मुख्य chapters की संरचना:
- Ch 1: LLM को समझना, कोई code नहीं
- Ch 2: text data handling
  - ch02.ipynb
  - dataloader.ipynb
- Ch 3: attention mechanism implement करना
  - ch03.ipynb
  - multihead-attention.ipynb
- Ch 4: GPT model को शुरुआत से implement करना
  - ch04.ipynb
  - gpt.py
- Ch 5: unlabeled data से pretraining
- Ch 6: text classification के लिए fine-tuning
  - ch06.ipynb
  - gpt_class_finetune.py
- Ch 7: instruction-following fine-tuning
appendices में PyTorch introduction, references, exercise solutions, training loop improvements, और LoRA-based parameter-efficient fine-tuning शामिल हैं

prerequisite knowledge और execution environment

सबसे महत्वपूर्ण prerequisite knowledge Python programming की basics है
deep learning neural networks का experience हो तो कुछ concepts अधिक familiar लग सकते हैं
code external LLM libraries के बिना PyTorch में शुरुआत से implement किया गया है
- PyTorch में proficiency जरूरी नहीं है
- PyTorch की basic knowledge मददगार है
- Appendix A PyTorch का छोटा introduction देता है
मुख्य chapters का code सामान्य notebook पर reasonable समय में चलने के लिए design किया गया है
special hardware की जरूरत नहीं है, और GPU हो तो यह automatically इस्तेमाल करता है

video lecture और follow-up book

Manning में किताब की structure follow करने वाला 17 घंटे 15 मिनट का companion video lecture है
- यह किताब के हर chapter और section structure को reflect करता है
- इसे independent alternative या coding along के लिए supplementary material के रूप में इस्तेमाल किया जा सकता है
follow-up nature वाली किताब Build A Reasoning Model (From Scratch) भी introduce की गई है
- यह independent किताब है, लेकिन Build A Large Language Model (From Scratch) की follow-up मानी जा सकती है
- pretrained model से शुरू करके reasoning ability improve करने की approaches implement करती है
- शामिल approaches: inference-time scaling, reinforcement learning, distillation
- related repository: rasbt/reasoning-from-scratch

exercises और bonus materials

हर chapter में कई exercises शामिल हैं
solutions Appendix C में summarized हैं, और संबंधित code notebooks हर chapter folder में हैं
Manning website से free 170-page PDF Test Yourself On Build a Large Language Model (From Scratch) प्राप्त की जा सकती है
- इसमें हर chapter के लगभग 30 quizzes और solutions शामिल हैं
मुख्य bonus topics
- Setup:
- Python setup tips
- packages और libraries installation
- Docker environment setup
- Ch 2:
- BPE tokenizer को शुरुआत से implement करना
- कई BPE implementations की comparison
- embedding layer और linear layer का difference
- simple numbers के जरिए dataloader intuition
- Ch 3:
- efficient multi-head attention implementations की comparison
- PyTorch buffers को समझना
- Ch 4:
- FLOPs analysis
- KV Cache
- Grouped-Query Attention, Multi-Head Latent Attention, Sliding Window Attention
- Gated DeltaNet, DeepSeek Sparse Attention, Cross-Layer KV Sharing
- Mixture-of-Experts
- Ch 5:
- alternative weight loading methods
- Project Gutenberg dataset pretraining
- training loop improvements
- hyperparameter optimization
- pretrained LLM से interact करने वाला UI
- GPT को Llama में convert करना
- memory-efficient model weight loading
- Tiktoken BPE tokenizer extension
- faster LLM training के लिए PyTorch performance tips
- Llama 3.2, Qwen3, Gemma 3, Olmo 3, Tiny Aya, Qwen3.5, Gemma 4 implementations
- Ch 6:
- other layers और larger models को fine-tune करने के additional experiments
- 50k IMDb movie review dataset classification fine-tuning
- GPT-based spam classifier UI
- Ch 7:
- near-duplicate detection और passive-voice items generate करने के लिए dataset utilities
- OpenAI API और Ollama से instruction response evaluation
- instruction fine-tuning dataset generation और improvement
- Llama 3.1 70B और Ollama से preference dataset generation
- DPO से LLM alignment implement करना
- instruction-fine-tuned GPT model UI

contribution और citation

feedback और questions Manning Forum या GitHub Discussions पर लिए जाते हैं
यह printed book से corresponding code repository है, इसलिए फिलहाल main chapter code की contents को expand करने वाली contributions स्वीकार नहीं की जा सकतीं
- यह restriction physical book और code के बीच difference बनने से रोकने के लिए है
अगर research में किताब या code उपयोगी हो, तो citation की सलाह दी जाती है
- Chicago-style citation और BibTeX entries उपलब्ध हैं

1 टिप्पणियां

GN⁺ 2024-01-28

Hacker News टिप्पणियाँ

अतिरिक्त सामग्री के तौर पर एक guidebook लिख रहा हूँ, लेकिन यह अभी कई चरणों में पूरा हो रहा है
अब तक finetuning guide सबसे अच्छा resource लग रहा है
https://ravinkumar.com/GenAiGuidebook/language_models/finetu...
वाकई शानदार लग रहा है। सोच रहा हूँ कि मुख्य लक्ष्य समझ बढ़ाना और रहस्य का पर्दा हटाना है, या लोगों को अपनी जरूरत के हिसाब से छोटे model खुद बनाने में सक्षम करना
- मुख्य motivation ज़्यादा education purpose के करीब है, ताकि लोग खुद बनाकर समझ सकें कि LLM कैसे काम करता है
  LLM एक अहम विषय है, लेकिन इस पर सतही videos और articles बहुत हैं। मुझे लगता है कि नीचे से LLM code करने पर कई concepts साफ हो जाते हैं
  secondary goal यह भी है कि जिन लोगों को जरूरत हो, वे अपना LLM बना सकें। किताब में pretraining और finetuning समेत पूरी pipeline code की गई है, लेकिन आर्थिक रूप से LLM pretrain करना व्यावहारिक नहीं लगता, इसलिए pretrained weights load करना भी दिखाया जाएगा
  GPT-2 जैसे LLM का इस्तेमाल करके सब कुछ शुरुआत से implement किया जाएगा, और laptop पर चलने वाले 124M model से लेकर छोटे GPU पर चलने वाले 1558M model तक के weights load किए जा सकेंगे। असल में लोग HF transformers या axolotl जैसे framework इस्तेमाल करेंगे, लेकिन उम्मीद है कि यह खुद implement करने वाला तरीका प्रक्रिया को कम black box जैसा दिखाएगा
technical book को public में लिखना कल्पना से भी ज्यादा anxiety पैदा करने वाला काम होगा, इसलिए author को सलाम
- कुछ हद तक सही है, लेकिन साथ ही यह काफी motivating भी है :)
- उल्टा risk कम भी हो सकता है। क्योंकि किताब सच में खत्म किए बिना भी किताब लिखने के फायदे मिल सकते हैं। ideally तो शायद 1 chapter से ज्यादा लिखना भी न पड़े
अगर पहला code example import torch है, तो यह पूरी तरह from scratch implementation जैसा नहीं लगता :-)
- सही है, लेकिन वरना यह बहुत लंबा और पढ़ने में मुश्किल हो जाता। फिर भी किताब में pre-packaged torch versions इस्तेमाल किए बिना LayerNorm, Softmax, Linear layers, GeLU आदि implement करना दिखाया गया है
- automatic differentiation की वजह से Transformer जैसे complex model बनाए जा सकते हैं। विशाल data और भारी compute resources के अलावा, इसे आज की AI revolution को संभव बनाने वाले मुख्य कारणों में माना जा सकता है
  इस क्षेत्र में काम करने वालों में कोई भी ऐसे models के derivatives हाथ से calculate नहीं करता। differentiable programming के नजरिए से सोचना basic premise है, और इस मामले में इसे पर्याप्त रूप से “from scratch” माना जा सकता है
  ऐसे comments देखते समय हर बार शक होता है कि commenter को अंदर क्या होता है या modern machine learning कैसे काम करती है, इसकी अच्छी समझ नहीं है
- Transformer कैसे काम करता है यह सीखने के लिए autograd implementation कम relevant और scope से बाहर लगता है। Transformer के gradients हाथ से लिखने की तो कल्पना भी नहीं कर सकता
मुझे लगा यह free resource होगा, इसलिए सीधे GitHub पर गया। author के काम का सम्मान है, लेकिन from scratch implementation flow वाले free resources में कौन-सा recommend करोगे, यह जानना चाहता हूँ
- Andrej Karpathy का Neural Networks: Zero to Hero[1]
  [1] https://karpathy.ai/zero-to-hero.html
- NumPy में बना GPT-2 inference engine https://jaykmody.com/blog/gpt-from-scratch/ पर है, और उसके बाद KV cache implementation जोड़ने के लिए https://www.dipkumar.dev/becoming-the-unbeatable/posts/gpt-k... देखें
- मैं https://course.fast.ai/ recommend करूँगा
  सामान्य developers के लिए यह कहीं ज्यादा accessible है, और math background मानकर नहीं चलता। यह अच्छा starting point है, जिसके बाद दूसरे मिलते-जुलते resources बेहतर समझ में आने लगते हैं
- सच कहूँ तो समझना मुश्किल है कि AI field में काम करने वाला कोई व्यक्ति इस topic पर गहरी insight पाने के लिए $50 भी ज्यादा क्यों समझेगा
  educational material बनाने में अविश्वसनीय रूप से बहुत काम लगता है, और यह किताब कितनी भी सफल हो, rasbt अगर लगाए गए समय के मुकाबले income calculate करें तो hourly rate के हिसाब से बात नहीं बनेगी
  इस topic को समझने वाले लोग बहुत हैं, लेकिन उन्होंने उस knowledge का क्या किया? उसे अपने पास रखा और OpenAI जाकर knowledge को private रखते हुए कहीं ज्यादा पैसा कमाया
  अगर हम ऐसी दुनिया में रहना चाहते हैं जहाँ यह knowledge खुली हो, तो ठीक-ठाक dinner जितनी कीमत वाली किताब के बारे में publicly शिकायत करने से तो बचना ही चाहिए
- मैंने Jupyter notebooks में explanatory notes जोड़ दिए हैं, इसलिए उम्मीद है कि repository अपने आप में भी independently पढ़ी जा सकेगी
सोच रहा हूँ कि क्या इस किताब की content से reinforcement learning सीख पाऊँगा
लक्ष्य यह है कि कोई चीज lunar lander की तरह landing सीख सके। simple रूप में 100 feet ऊँचाई से शुरू करके एक direction में thrust देना, और crater कम बनने तक बार-बार कोशिश करना
फिर horizontal movement जैसे variables जोड़ना, horizontal thrusters लगाना, और बाद में horizontal thrusters हटाकर lander को rotate करने देना—इस तरह expand करना चाहता हूँ
बिलकुल नहीं पता कहाँ से शुरू करूँ, और यह किताब “mainstream” machine learning जैसी लगती है, इसलिए सोच रहा हूँ कि क्या इससे मदद मिलेगी
- "Grokking Deep Reinforcement Learning"[0] मुझे अच्छी लगी। इसमें Transformer content नहीं है
  Python की gymnasium[1] library में lunar lander environment है, उसे देखना अच्छा रहेगा। सीखते समय मैंने सबसे ज्यादा इसी environment पर ध्यान दिया था और इसे कुछ तरीकों से solve किया था
  कुछ समय पहले PyTorch में Soft Actor Critic implement करते समय इस्तेमाल की गई मेरी notebook2 भी देख सकते हैं। यह सिखाने के लिए बहुत अच्छा resource नहीं है, लेकिन शायद कुछ मिल जाए
  [0]: https://www.manning.com/books/grokking-deep-reinforcement-le...
  [1]: https://gymnasium.farama.org/environments/box2d/

Reinforcement learning, LLM से बिल्कुल अलग research field है। यह machine learning के हिस्से के रूप में अक्सर दिखता जरूर है, और Tom Mitchell की क्लासिक Machine Learning में Q-learning पर एक बेहतरीन section भी है, लेकिन modern machine learning काम से इसका संबंध कम है
AlphaGo जैसी चीज़ों को भी आखिरकार क्लासिक reinforcement learning techniques के input के रूप में deep neural network इस्तेमाल करने वाला काम माना जा सकता है
Sutton और Barto की Reinforcement Learning: An Introduction को इस विषय की definitive introductory book माना जाता है
उस मामले में मैं dedicated reinforcement learning book recommend करूँगा। LLM में reinforcement learning वाला हिस्सा LLM के लिए बहुत specific है, और background knowledge में भी सिर्फ वही हिस्से cover किए जाएँगे जो सचमुच relevant हैं
कुछ दूसरी general machine learning/deep learning books में reinforcement learning introduction chapters काफी लंबे लिखे हुए हैं (https://github.com/rasbt/machine-learning-book/tree/main/ch1...). फिर भी इस case में, जैसा दूसरों ने कहा है, dedicated reinforcement learning book ज़्यादा सही रहेगी
OpenAI का Spinning Up try करना अच्छा रहेगा: https://spinningup.openai.com/en/latest/
इस course की Q-learning lab बिल्कुल वही चीज़ cover करती है
https://www.ida.liu.se/~TDDC17/info/labs/rl.en.shtml
Karpathy के video[0] से इसकी तुलना कैसी है, यह जानना चाहूँगा। मैं LLM में शुरुआत करना चाहता हूँ, और उस level की समझ पाने के लिए सबसे अच्छा resource कौन-सा है, यह देख रहा हूँ
[0] https://www.youtube.com/watch?v=kCc8FmEb1nY
- मैंने video पूरा नहीं देखा, लेकिन सरसरी तौर पर देखने के आधार पर, book में कुछ differences हैं
  character-level LLM के बजाय actual word-level LLM implement करती है, pretraining के बाद pretrained weights load करना दिखाती है, और उस LLM की instruction fine-tuning करती है
  साथ ही instruction fine-tuned LLM का alignment process code करती है, और classification task के लिए fine-tuning भी दिखाती है। पूरी book में बहुत illustrations हैं, और सिर्फ chapter 3 में ही 26 figures हैं :)
  Video भी अच्छा लगता है। 2 घंटे का है, इसलिए solid introductory supplementary material के रूप में अच्छा हो सकता है। Book पढ़ने में शायद उसका करीब 10 गुना समय लगेगा
- अगर आप पहले से ज़्यादातर content नहीं जानते, तो समझना मुश्किल है
  मैंने भी ज़्यादातर चीज़ों को अच्छी तरह समझने के लिए इसे कई बार देखा
  जाहिर है, PyTorch भी बहुत अच्छी तरह आना चाहिए, और matrix multiplication, backpropagation वगैरह भी पता होना चाहिए। बोलने की speed भी बहुत तेज़ है
मुझे language models में खुद रुचि नहीं है, लेकिन language models में इस्तेमाल होने वाली techniques में से कुछ चीज़ें ऐसी हैं जिन्हें मैं दूसरी जगहों पर इस्तेमाल करना चाहता हूँ
उदाहरण के लिए, मुझे पता है कि attention कई तरह के models में इस्तेमाल होता है, और Transformer भी language models के अलावा दूसरी जगहों पर इस्तेमाल होता है
जानना चाहता हूँ कि क्या इस book को देखने से attention और Transformer को language model के बाहर भी इस्तेमाल करने लायक अच्छी समझ मिल पाएगी
- इस book में implement किया गया attention mechanism text input के लिहाज़ से LLM-specific है, लेकिन मूल रूप से यह Vision Transformer में इस्तेमाल होने वाले attention mechanism जैसा ही है
  फर्क यह है कि LLM में text को tokens में बदला जाता है, और उन tokens को LLM में जाने वाली vector embeddings में convert किया जाता है। Vision Transformer में image को token मानने के बजाय image patches को tokens के रूप में इस्तेमाल किया जाता है, और उन्हें vector embeddings में बदला जाता है
  Text हो या vision, attention mechanism वही है, और दोनों cases में input के रूप में vector embeddings लेते हैं
  (*chapter 3 मैंने पिछले हफ्ते ही submit किया था और जल्द ही MEAP पर आ जाएगा। तब तक code को notes के साथ यहाँ देखा जा सकता है: https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01...)
Model architecture अपने आप में, खासकर torch इस्तेमाल करने पर, इतनी complicated नहीं है। पूरी process भी काफी सीधी है, इसलिए यह एक feasible project जैसा लगता है जिसे किया जा सकता है

ChatGPT जैसे LLM को शुरुआत से चरण-दर-चरण लागू करना

repository का उद्देश्य और किताब से संबंध

installation और code usage

chapter-wise learning flow

prerequisite knowledge और execution environment

video lecture और follow-up book

exercises और bonus materials

मुख्य bonus topics

contribution और citation

संबंधित पढ़ाई

1 टिप्पणियां

Hacker News टिप्पणियाँ