Thread - Nostr Hypermedia

deadmanoz deadmanoz@primal.net 3 months ago

Wow, on a cursory first look this looks pretty amazing… (and multimodal models are apparently now called “omni” modal) … Qwen3-Omni “capable of understanding text, audio, images, and video, as well as generating speech in real time.” Will be adding this to the self-hosted stack I run at home to put it through its paces!

GitHub

GitHub - QwenLM/Qwen3-Omni: Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and ...