DeepSeek का Fire-Flyer File System

(github.com/deepseek-ai)

1 पॉइंट द्वारा GN⁺ 2025-03-01 | अभी कोई टिप्पणी नहीं है. | WhatsApp पर शेयर करें

Fire-Flyer File System(3FS) AI training और inference workloads के लिए एक high-performance distributed file system है, जो नवीनतम SSD और RDMA network का उपयोग करके shared storage layer प्रदान करता है
Disaggregated architecture हज़ारों SSDs की throughput और सैकड़ों storage nodes की network bandwidth को जोड़कर applications को storage resources तक location-aware हुए बिना पहुँचने देता है
Consistency model CRAQ(Chain Replication with Apportioned Queries) पर आधारित strong consistency प्रदान करता है, और metadata service FoundationDB जैसे transaction key-value store को backend के रूप में इस्तेमाल करने वाली stateless संरचना है
मुख्य workloads हैं data preparation, data loader, checkpointing, और inference के लिए KVCache; बड़े cluster read stress test में लगभग 6.6 TiB/s aggregate read throughput दर्ज किया गया
Build के समय, पहले के std::shuffle उपयोग के कारण compiler version के अनुसार binary compatibility issues हो सकते हैं, इसलिए -DSHUFFLE_METHOD से g++10 या g++11 method को स्पष्ट करना होगा और cluster deployment के बाद वही setting बनाए रखनी होगी

3FS जिस समस्या को हल करना चाहता है

Fire-Flyer File System(3FS) AI training और inference workloads की ज़रूरतों को संभालने के लिए डिज़ाइन किया गया एक high-performance distributed file system है
यह आधुनिक SSD और RDMA network का उपयोग करके एक shared storage layer प्रदान करता है, जो distributed application development को सरल बनाता है
यह file interface देता है, इसलिए किसी नए storage API को अलग से सीखने की ज़रूरत नहीं होती

Architecture और consistency

Disaggregated architecture हज़ारों SSDs की throughput और सैकड़ों storage nodes की network bandwidth को जोड़ता है
- Applications storage resources की location की परवाह किए बिना उन तक पहुँच सकती हैं
Strong consistency को Chain Replication with Apportioned Queries(CRAQ) से लागू किया गया है
- लक्ष्य application code को सरल और reasoning के लिए आसान बनाना है
Metadata service को stateless रूप में डिज़ाइन किया गया है, और backend के रूप में FoundationDB जैसे transaction key-value store का उपयोग किया जाता है

Supported workloads

Data preparation
- Data analysis pipeline के output को hierarchical directory structure में व्यवस्थित करता है
- बड़ी मात्रा में intermediate artifacts को कुशलता से manage करता है
Data loader
- Compute nodes में training samples तक random access संभव बनाता है, जिससे dataset prefetch या shuffle की ज़रूरत खत्म हो जाती है
Checkpointing
- Large-scale training के लिए high-throughput parallel checkpointing को support करता है
Inference के लिए KVCache
- DRAM-based caching के cost-effective alternative के रूप में अधिक capacity और higher throughput प्रदान करता है

Performance results

Peak throughput
- Large-scale 3FS cluster read stress test में aggregate read throughput लगभग 6.6 TiB/s तक पहुँचा
- Test cluster 180 storage nodes से बना था
  - हर storage node में 2×200Gbps InfiniBand NIC और 16 14TiB NVMe SSD लगे थे
  - लगभग 500 से अधिक client nodes का उपयोग हुआ
  - हर client node 1×200Gbps InfiniBand NIC से बना था
- परिणाम training jobs के background traffic की मौजूदगी में मापे गए
- 3FS benchmark के लिए fio का USRBIO engine इस्तेमाल किया जा सकता है
GraySort
- smallpond का GraySort benchmark के रूप में मूल्यांकन किया गया
- Implementation दो चरणों से बनी है
  - Key के prefix bit का उपयोग करने वाला shuffle-based data partitioning
  - Partition के भीतर sorting
- दोनों चरण 3FS से data पढ़ते हैं और 3FS में data लिखते हैं
- Test cluster configuration:
  - 25 storage nodes
  - प्रति node 2 NUMA domains
  - प्रति NUMA 1 storage service
  - प्रति node 2×400Gbps NIC
  - 50 compute nodes
  - Compute nodes में 2 NUMA domains, 192 physical cores, 2.2TiB RAM, और प्रति node 1×200Gbps NIC था
- 110.5TiB data को 8,192 partitions में sort करने का काम 30 मिनट 14 सेकंड में पूरा हुआ
- औसत throughput 3.66 TiB/min था
KVCache
- KVCache LLM inference के दौरान पिछले tokens के key/value vectors को decoder layers में cache करके duplicate computation से बचाने की तकनीक है
- KVCache clients प्रति node 1×400Gbps NIC का उपयोग करते हैं
- Read throughput peak पर 40 GiB/s तक पहुँचा
- उसी अवधि में GC reclamation work IOPS भी मापा गया

Documentation और build

उपलब्ध documents:
- Design Notes
- Setup Guide
- USRBIO API Reference
- P Specifications
Source code को GitHub से clone करने के बाद submodules initialize किए जाते हैं और patches apply किए जाते हैं
- git submodule update --init --recursive
- ./patches/apply.sh
Supported dependency installation examples निम्न environments के लिए दिए गए हैं
- Ubuntu 20.04
- Ubuntu 22.04
- openEuler 2403sp1
- OpenCloudOS 9
- TencentOS 4
अतिरिक्त build prerequisites:
- libfuse 3.16.1 या उससे ऊपर
- FoundationDB 7.1 या उससे ऊपर
- Rust toolchain न्यूनतम 1.75.0, अनुशंसित 1.85.0 या उससे ऊपर, या नवीनतम stable version
3FS को build फ़ोल्डर में CMake से build किया जाता है
- C/C++ compiler examples clang-14, clang++-14 हैं
- Build type के लिए RelWithDebInfo example उपयोग किया गया है
Shuffle algorithm compatibility
- पहले के std::shuffle उपयोग के कारण g++10 और g++11+ जैसे अलग compiler versions से बने binaries एक-दूसरे के साथ compatible नहीं हो सकते
- Build के समय -DSHUFFLE_METHOD को स्पष्ट करके एक consistent shuffle algorithm को fix करना चाहिए
- Existing clusters को उस method का उपयोग करना चाहिए जो पिछले deployment में इस्तेमाल किए गए compiler version से मेल खाता हो
- New clusters g++10 या g++11 में से कोई एक चुन सकते हैं, लेकिन deployment के बाद भविष्य के सभी builds में वही setting बनाए रखनी होगी
- Docker build images TencentOS-4 और OpenCloudOS-9 के लिए उपलब्ध हैं
- Test cluster run करने के लिए Setup Guide का पालन किया जाता है
- Issues को GitHub Issues पर report किया जाता है

DeepSeek का Fire-Flyer File System

3FS जिस समस्या को हल करना चाहता है

Architecture और consistency

Supported workloads

Data preparation

Data loader

Checkpointing

Inference के लिए KVCache

Performance results

Peak throughput

GraySort

KVCache

Documentation और build

Shuffle algorithm compatibility

संबंधित पढ़ाई

अभी कोई टिप्पणी नहीं है.