Microsoft Kosmos-1: Multimodal LLM (MLLM)

xguru · 2023-03-02T09:56:41+09:00

एक सामान्य-purpose format को पहचानने वाला, context से सीखने वाला (few-shot), और निर्देशों का पालन करने वाला (zero-shot) Multimodal Large Language Model (MLLM) यह मॉडल text, image, और image-caption pair आदि पर train किया गया है, और नीचे दिए गए कार्यों में प्रभावशाली प्रदर्शन दिखाता है language understanding, generation, और OCR के बिना NLP (document image से direct recognition) multimodal conversation, image captioning, visual question answering description सहित image recognition (text निर्देशों के माध्यम से classification specification) जैसे vision tasks cross-modal transfer (language से multimodal की ओर, और multimodal से language की ओर knowledge transfer) के माध्यम से MLLM लाभ प्राप्त कर सकता है

(arxiv.org)

9 पॉइंट द्वारा xguru 2023-03-02 | 1 टिप्पणियां | WhatsApp पर शेयर करें

एक सामान्य-purpose format को पहचानने वाला, context से सीखने वाला (few-shot), और निर्देशों का पालन करने वाला (zero-shot) Multimodal Large Language Model (MLLM)
यह मॉडल text, image, और image-caption pair आदि पर train किया गया है, और नीचे दिए गए कार्यों में प्रभावशाली प्रदर्शन दिखाता है
1. language understanding, generation, और OCR के बिना NLP (document image से direct recognition)
2. multimodal conversation, image captioning, visual question answering
3. description सहित image recognition (text निर्देशों के माध्यम से classification specification) जैसे vision tasks
cross-modal transfer (language से multimodal की ओर, और multimodal से language की ओर knowledge transfer) के माध्यम से MLLM लाभ प्राप्त कर सकता है

1 टिप्पणियां

xguru 2023-03-02

Repo : https://github.com/microsoft/unilm

Microsoft Kosmos-1: Multimodal LLM (MLLM)

संबंधित पढ़ाई

1 टिप्पणियां