Moonshine - edge devices के लिए high-speed और high-accuracy speech recognition (ASR) open source

(github.com/moonshine-ai)

10 पॉइंट द्वारा GN⁺ 2026-03-01 | अभी कोई टिप्पणी नहीं है. | WhatsApp पर शेयर करें

पूरी processing on-device पर करने वाला real-time speech recognition framework
streaming-based model architecture के जरिए यूज़र के बोलते समय भी real-time में text generate करता है, और Whisper Large v3 से कम error rate (WER 6.65%) हासिल करता है
Python, iOS, Android, MacOS, Linux, Windows, Raspberry Pi सहित कई platforms पर एक ही API के साथ काम करता है, और C++ core तथा OnnxRuntime पर आधारित optimized implementation देता है
language-specific models (English, Korean, Japanese, Spanish आदि) और command recognition (Intent Recognition) फीचर शामिल हैं, जिससे developers आसानी से voice interfaces बना सकते हैं
Whisper की 30-second fixed input, cache की कमी, और language accuracy limitations को बेहतर बनाकर, edge environments में low-latency voice interfaces लागू करने के लिए उपयुक्त विकल्प के रूप में ध्यान आकर्षित कर रहा है

Moonshine Voice का overview

Moonshine Voice, real-time voice applications के development के लिए open source AI toolkit है
- सभी computations local device पर होती हैं, जिससे fast response और privacy protection सुनिश्चित होती है
- streaming processing के कारण यूज़र के बोलते समय भी text updates संभव हैं
मॉडल, in-house research पर आधारित scratch से trained architecture है, और Whisper Large v3 से बेहतर accuracy प्रदान करता है
26MB ultra-compact model से 245M-parameter mid-size model तक कई sizes उपलब्ध हैं
English, Korean, Japanese, Chinese, Spanish, Vietnamese, Arabic, Ukrainian सहित multilingual support

Whisper की 30-second fixed input window हटाकर variable-length input support जोड़ा गया है
caching feature जोड़कर streaming के दौरान duplicate computation कम की गई है और latency को काफी घटाया गया है
language-specific single-model training के जरिए समान size पर अधिक accuracy हासिल की गई है
cross-platform C++ core library के माध्यम से Python, Swift, Java आदि में एक ही API इस्तेमाल किया जा सकता है
Whisper Large v3 (1.5B parameters) से छोटे 245M-parameter model के साथ कम error rate हासिल किया गया है

speech recognition pipeline को एक single library में integrate करके microphone input, voice activity detection (VAD), text conversion, speaker identification, command recognition को एक साथ process करता है
core classes:
- Transcriber: audio input को text में बदलता है
- MicTranscriber: microphone input को automatic तरीके से process करता है
- IntentRecognizer: natural language आधारित command recognition
event-based architecture के साथ LineStarted / LineUpdated / LineCompleted जैसी state changes को real-time में detect किया जा सकता है

Moonshine Medium Streaming (245M): WER 6.65%, Whisper Large v3 (7.44%) से बेहतर
Moonshine Small Streaming (123M): WER 7.84%
Moonshine Tiny Streaming (34M): WER 12.00%
Korean Tiny model का मूल्यांकन WER 6.46% के साथ किया गया है
सभी models OnnxRuntime-based .ort format में उपलब्ध हैं, और 8-bit quantization से lightweight बनाए गए हैं

Python (pip install moonshine-voice), Swift (SPM), Android (Maven), Windows (C++ headers) आदि प्रमुख environments में install किया जा सकता है
Raspberry Pi optimized package उपलब्ध है, जिससे USB microphone के साथ real-time recognition संभव है
MIT license (English models) और Moonshine Community License (अन्य language models) के तहत जारी
आगे की roadmap: mobile binaries को lightweight बनाना, अतिरिक्त भाषाएँ, बेहतर speaker identification, domain customization

Whisper की तुलना में 5x से अधिक तेज processing speed के कारण real-time voice interfaces के लिए उपयुक्त
200ms से कम response latency target के साथ design किया गया है, इसलिए conversational applications में उपयोग किया जा सकता है
command recognition example के जरिए “Turn on the lights” जैसी natural-language variation commands भी पहचानी जा सकती हैं
HuggingFace OpenASR Leaderboard पर public performance verification पूरा हो चुका है