Flux 2 Klein के लिए शुद्ध C-आधारित inference

(github.com/antirez)

1 पॉइंट द्वारा GN⁺ 2026-01-19 | अभी कोई टिप्पणी नहीं है. | WhatsApp पर शेयर करें

FLUX.2-klein-4B मॉडल का उपयोग करके टेक्स्ट या इमेज इनपुट से इमेज जनरेट करने वाला शुद्ध C implementation
बिना किसी external dependency के चलता है, और वैकल्पिक BLAS या Metal acceleration के जरिए अधिकतम 30 गुना तक speed-up संभव
Qwen3-4B text encoder built-in है, इसलिए अलग embedding calculation process की ज़रूरत नहीं
Text-to-image और image-to-image transformation दोनों को support करता है, साथ ही command-line interface और C library API भी देता है
Python runtime या PyTorch के बिना भी चल सकता है, इसलिए lightweight inference environments और open source AI accessibility बढ़ाने के लिहाज़ से महत्वपूर्ण

प्रोजेक्ट अवलोकन

FLUX.2-klein-4B Black Forest Labs का एक image generation model है, जो text prompt या existing image को input लेकर नई image बनाता है
पूरा code केवल standard C library से लिखा गया है, और वैकल्पिक रूप से MPS(Apple Metal) तथा BLAS(OpenBLAS) acceleration को support करता है
मॉडल को HuggingFace से लगभग 16GB आकार में डाउनलोड किया जा सकता है, और इसके components हैं VAE(300MB), Transformer(4GB), Qwen3-4B encoder(8GB), और Tokenizer

Zero dependencies: बाहरी libraries के बिना standalone execution संभव
- BLAS उपयोग करने पर लगभग 30 गुना speed-up, macOS पर Apple Accelerate और Linux पर OpenBLAS उपलब्ध
Metal GPU acceleration: Apple Silicon environment में अपने-आप सक्रिय
Text-to-image: text prompt से image generation
Image-to-image: existing image को prompt के अनुसार transform करना
Integrated text encoder: Qwen3-4B encoder built-in, external embedding की ज़रूरत नहीं
Memory efficient: encoding के बाद encoder memory अपने-आप release हो जाती है, लगभग 8GB की बचत

टेक्स्ट से इमेज जनरेशन

./flux -d flux-klein-model -p "A fluffy orange cat sitting on a windowsill" -o cat.png

इमेज ट्रांसफ़ॉर्मेशन
```
./flux -d flux-klein-model -i photo.png -o painting.png -p "oil painting style" -t 0.7
```
- -t मान transformation strength को control करता है; 0.0 पर मूल image बनी रहती है, 1.0 पर पूरी तरह से regenerate होती है

Transformer: 5 double blocks और 20 single blocks, 3072 hidden dimensions, 24 attention heads
VAE: AutoencoderKL, 128 latent channels, 8x spatial compression
Text Encoder: Qwen3-4B, 36 layers, 2560 hidden dimensions
Inference steps: 4-step sampling से high-quality results
Memory requirements
- text encoding: लगभग 8GB
- diffusion: लगभग 8GB
- peak maximum: 16GB (encoder release होने से पहले)
Performance benchmark (Apple M3 Max, 128GB RAM)
- 512×512: MPS 49.6 सेकंड, BLAS 51.9 सेकंड, PyTorch MPS 5.4 सेकंड
- 256×256: MPS 32.4 सेकंड, BLAS 29.7 सेकंड, PyTorch MPS 3.0 सेकंड
- 64×64: MPS 25.0 सेकंड, BLAS 23.5 सेकंड, PyTorch MPS 2.2 सेकंड
- शुद्ध C backend बहुत धीमा है और केवल testing के लिए उपयुक्त है

backend चयन
- make mps: macOS Apple Silicon (सबसे तेज)
- make blas: Intel Mac या Linux (OpenBLAS आवश्यक)
- make generic: शुद्ध C, बिना dependency (धीमा)

मॉडल डाउनलोड

pip install huggingface_hub
python download_model.py

output resolution अधिकतम 1024×1024, न्यूनतम 64×64, और 16 के गुणज की अनुशंसा

मॉडल लोड और मुक्त करना
- flux_load_dir(path) / flux_free(ctx)
इमेज जनरेशन और ट्रांसफ़ॉर्मेशन
- flux_generate(ctx, prompt, params)
- flux_img2img(ctx, prompt, input, params)
इमेज input/output
- flux_image_load(path) / flux_image_save(img, path)
यूटिलिटी
- flux_set_seed(seed) से reproducibility सुनिश्चित
- flux_get_error() से error message की जाँच
- flux_release_text_encoder(ctx) से memory को manually release किया जा सकता है

MIT लाइसेंस के तहत जारी
repository language composition: C 93.9%, Objective-C 3.5%, Makefile 1.7%, Python 0.9%
446 stars और 20 forks के साथ community की सक्रिय रुचि